Decision Optimization

Decision Optimization

Delivers prescriptive analytics capabilities and decision intelligence to improve decision-making.

 View Only
  • 1.  CPLEX seg fault when creating problem

    Posted Thu January 18, 2018 01:15 PM

    Originally posted by: cssartori


    Hello, I've been working with the CPLEX C++ API for some time now (almost a whole year) and recently I have reached a point where a bug is not allowing me to continue to do my research. I've been trying for the past few days to debug my program and I just can't find the error. All the applications I have used to try to identify where it is happening point me to CPLEX library, even though I think it is not the case.

    I'm building a pretty standard MIP model with a matrix of binary variables, and three arrays of real/integer variables.

    The bug is segmentation fault that happens sometimes only, and in a different place each time (or at least it is detected at a different place each time it happens). Basically, from around 50 runs, three give me this seg fault, and only when using big instances (with a binary matrix of size 1000 x 1000 ). It is more common when I'm running instances of the program in parallel. I have already checked memory usage, and there is enough to run them in parallel, and the swap is on anyway (it is not used, but it could before killing the process). Each program calls CPLEX in single thread mode.

     

    What I have done so far is to use GDB, which was not successful because it is hard to reproduce the error. By calling the program with catchsegv (from Linux) the program tells me the problem is either in the calling to solve, or in the moment of extracting the problem. Once or twice it told me it was when I was building one constraint or another. So, this is not very helpfull anyway.

    The final test was using valgrind, which gave me some more information, though I don't know if actually usefull. It tells me that in the moment of extracting certain objects, such as Objective Function, or others, there is an invalid read of some bytes that were not allocated before. below follows a small print:
     

    ==12561== Parent PID: 12560
    ==12561==
    ==12561== Warning: set address range perms: large range [0x79de2040, 0x8ccf2b10) (undefined)
    ==12561== Warning: set address range perms: large range [0x8ccf9040, 0x9fc09b10) (undefined)
    ==12561== Invalid read of size 4
    ==12561==    at 0x6E6510: _c10eb1f4d9a2d5f2419fc4b6fdd63002 (in /home/code/main)
    ==12561==    by 0x7004EF: _db5b20d0578ce6bb874a10606d0d2157 (in /home/code/main)
    ==12561==    by 0x700225: _06d59c776fe54a486c95a0b14a460289 (in /home/code/main)
    ==12561==    by 0x86DA23: _b15b628f6e2221db01548906080666df (in /home/code/main)
    ==12561==    by 0x7E0571: _88460010ab4bed13741d30767523c0a1 (in /home/code/main)
    ==12561==    by 0x7E05D4: CPXLchgprobtype (in /home/code/main)
    ==12561==    by 0x4168CA: IloCplexI::flushCtype() (in /home/code/main)
    ==12561==    by 0x41B5B8: IloCplexI::doflush() (in /home/code/main)
    ==12561==    by 0x41B619: IloCplexI::flush(long) const (in /home/code/main)
    ==12561==    by 0x43717A: IloCplexI::setObj(int, IloCarray<double> const&, IloCarray<int*> const&, double) (in /home/code/main)
    ==12561==    by 0x479AFE: IloDefaultLPExtractor::doextractObj(IloObjectiveI const*) (in /home/code/main)
    ==12561==    by 0x479F14: IloDefaultLPExtractor::extractObj(IloObjectiveI const*, int**) (in /home/code/main)
    ==12561==  Address 0xff093e80 is not stack'd, malloc'd or (recently) free'd
    ==12561==
    ==12561==
    ==12561== Process terminating with default action of signal 11 (SIGSEGV)
    ==12561==  Access not within mapped region at address 0xFF093E80
    ==12561==    at 0x6E6510: _c10eb1f4d9a2d5f2419fc4b6fdd63002 (in /home/code/main)
    ==12561==    by 0x7004EF: _db5b20d0578ce6bb874a10606d0d2157 (in /home/code/main)
    ==12561==    by 0x700225: _06d59c776fe54a486c95a0b14a460289 (in /home/code/main)
    ==12561==    by 0x86DA23: _b15b628f6e2221db01548906080666df (in /home/code/main)
    ==12561==    by 0x7E0571: _88460010ab4bed13741d30767523c0a1 (in /home/code/main)
    ==12561==    by 0x7E05D4: CPXLchgprobtype (in /home/code/main)
    ==12561==    by 0x4168CA: IloCplexI::flushCtype() (in /home/code/main)
    ==12561==    by 0x41B5B8: IloCplexI::doflush() (in /home/code/main)
    ==12561==    by 0x41B619: IloCplexI::flush(long) const (in /home/code/main)
    ==12561==    by 0x43717A: IloCplexI::setObj(int, IloCarray<double> const&, IloCarray<int*> const&, double) (in /home/code/main)
    ==12561==    by 0x479AFE: IloDefaultLPExtractor::doextractObj(IloObjectiveI const*) (in /home/code/main)
    ==12561==    by 0x479F14: IloDefaultLPExtractor::extractObj(IloObjectiveI const*, int**) (in /home/code/main)
    ==12561==  If you believe this happened as a result of a stack
    ==12561==  overflow in your program's main thread (unlikely but
    ==12561==  possible), you can try to increase the size of the
    ==12561==  main thread stack using the --main-stacksize= flag.
    ==12561==  The main thread stack size used in this run was 8388608.
    ==12561==
     

     I'm compiling the program with g++ 5.4.0 under Ubuntu 16.04 LTS. My CPLEX API version is 12.6.2

    I have no idea how to proceed, everything I know about debugging I have already tried, and all the tools I have used are pointing to CPLEX, thus I come here hoping anyone can give me some light, even if it is by saying that it is certainly NOT CPLEX.

    Thanks in advance


    #CPLEXOptimizers
    #DecisionOptimization


  • 2.  Re: CPLEX seg fault when creating problem

    Posted Thu January 18, 2018 01:31 PM

    Well, it certainly looks like the problem is with CPLEX :-(

    I checked the functions shown in the valgrind backtrace but nothing obvious stuck out.

    Do you get a valgrind warning every time you run the code or does it only happen right before a crash?

    Would it be an option to switch to a more recent version of CPLEX?

    Can you share your code (if not in public then maybe in private by email)?


    #CPLEXOptimizers
    #DecisionOptimization


  • 3.  Re: CPLEX seg fault when creating problem

    Posted Thu January 18, 2018 02:01 PM

    Originally posted by: cssartori


    Hi Daniel, thank you for the fast response.

    I get something close to 100 errors on valgrind every time I run the program. However the invalid reading is only when the program crashes. I'm attaching the complete result of the run I previously pasted the backtrace. It points a lot to flush functions inside CPLEX apparently, though I'm no CPLEX expert to know whether this is something or not.

    I'm not sure I can get a more recent version, since I'm using the one provided by my university, and installed on our servers.

    I can share my code with no problem, since I had to really make it as small as possible to traceback to the error, however it is still based on three major files, so perhaps it would be easier sending it over by e-mail or anything of the sort.

    @EDIT: oh, the warning on valgrind address range appears in all executions, in case that was your question.


    #CPLEXOptimizers
    #DecisionOptimization


  • 4.  Re: CPLEX seg fault when creating problem

    Posted Fri January 19, 2018 12:47 AM

    In the attached log, all the additional warnings come after the crash and they are memory leak warnings. These are expected in your case since the program did not shut down properly.

    In general, you should get no valgrind warnings from CPLEX at all. Any warning you get is reason for concern and should be fixed.

    Can you please send your code to daniel(dot)junglas(at)de(dot)ibm(dot)com? Please also include instructions how to build and run to reproduce the problem. I will take a look then and try to reproduce your problem. Currently my guess is that you run out of memory and for some reason that is not handled properly.


    #CPLEXOptimizers
    #DecisionOptimization


  • 5.  Re: CPLEX seg fault when creating problem

    Posted Fri January 19, 2018 05:37 AM

    Originally posted by: cssartori


    Hi Daniel,

    Well, I think I've managed to fix the problem, however I'm still confused about it.

    Apparently I was not ending the environment variable properly (not at all actually). By doing so, at the end of my program, all CPLEX warnings were gone in valgrind (as you pointed out), so there was no more memory leak. After that I started trying to reproduce the error, but it seems that it is now fixed.

    What I understand is that by not "freeing" the environment properly there are memory areas with undefined state that impact future executions of CPLEX (?). This explains why it was happening only when running the biggest instances of my set, and also why it was more common when running batches of executions (~50) in parallel (x5), since this effect appears to turn into a snowball of unallocated memory. On the other hand, as far as I know about memory management in OS, once the program has halted, its memory in use should be released, and even if it was not released, it should not impact other programs executions, but since CPLEX is such a black-box library to me, I could be incredibly wrong.

    This has raised some questions to me, though:
    1) Do you think this could be the problem? As I've said, so far all my batches of executions that would raise at least one seg fault haven't done anything, they have actually stopped successfully.
    2) The IloEnv object has no destructor that would be called automatically to free the allocated memory?
    3) If this could in fact be the problem, is there any particular reason you think this could be?

    Another fact that makes me believe the .end() was the problem, is that now my executions of the program, 5 at a time, seem to be using less RAM memory than before (before: ~24 GB, now: ~18GB), probably because now all instances of the program are correctly freeing their allocated memory.

    If you think everything I've said here is somehow senseless, I can still send you my minimal working example via e-mail so that you can have a look.

    PS: Oh, I guess it could not be running out of memory because the server I'm running the tests has 32 GB RAM, and all the 5 parallel tests take something around 18 GB RAM.


    #CPLEXOptimizers
    #DecisionOptimization


  • 6.  Re: CPLEX seg fault when creating problem

    Posted Mon February 05, 2018 01:27 AM

    Sorry, I had missed that you still have some open questions. Here are the answers:

    1. This sounds indeed weird. As you said, once the program terminates, all memory is released by the OS. And there is no way that a corruption created by program1 can cause trouble in a newly started program2.
    2. You always have to explicitly end IloEnv. The IloEnv destructor does not release the memory. An easy and safe way to achieve that is to either wrap the IloEnv instance into some RAII or to use code like the one below.
    3. Not end()ing IloEnv properly can only make a difference if you create more than once instance of this class in the same program. Then it can indeed result in strange out-of-memory problems.

    Safe use of IloEnv:

    IloEnv env;
    try {
       // your code here
       ...
       env.end(); // end() the instance if no exception was thrown
    }
    catch (...) {
       env.end(); // end() the instance in case of an exception
       throw; // rethrow exception (we only caught it so that we could properly end() the environment)
    }

     


    #CPLEXOptimizers
    #DecisionOptimization


  • 7.  Re: CPLEX seg fault when creating problem

    Posted Mon February 05, 2018 11:23 AM

    Originally posted by: cssartori


    Hey Daniel,

     

    No worries. In fact, I have been wanting to post here, just needed some time and reminding. In fact, ending the environment did not solve the problem. I had thought so because in fact after several tests (I mean > 500 executions) no error occurred. However, just a couple of days later, the error popped up again, and I got desperate again.

    My laboratory has 3 identical servers, and I had been testing in only one of them, because the others were full, though I did not for once thought it could be the computer. Then, one of the other two servers got free and I tried in it one of those days and, surprise, surprise, no error, nothing. I've already ran easily > 5000 executions and nothing. I'm still not sure if it was the computer, or if the error will pop up any day, but so far so good, everything is working as expected.

    What puzzles me is that, as I said, the two computer servers are identical in everything that could impact the execution: hardware, CPLEX version, g++ compiler, Ubuntu version, kernel, etc., etc. After the vacations (it is this time here) we are going to take a look at the computer more closely and try to check whether the error is in it, or not. Until then, I'm not going to trouble you with the code, since the error is sort of "random", and perhaps lies elsewhere.

    If we come to a conclusion I can post it here, in case anyone happens to have the same error (if I'm allowed to leave this thread open for now, of course).

     


    #CPLEXOptimizers
    #DecisionOptimization


  • 8.  Re: CPLEX seg fault when creating problem

    Posted Mon February 05, 2018 11:47 AM

    Your thread, your choice :-)

    I guess anything you can find is potentially helpful for others.


    #CPLEXOptimizers
    #DecisionOptimization