Decision Optimization

Decision Optimization

Delivers prescriptive analytics capabilities and decision intelligence to improve decision-making.

 View Only
Expand all | Collapse all

CPLEX stuck while solving a MIP

  • 1.  CPLEX stuck while solving a MIP

    Posted Wed September 18, 2019 02:23 AM

    Originally posted by: StefanRopke


    We have implemented a column generation algorithm that uses CPLEX 12.9.0 for solving the master and sub-problem. Unfortunately, it seems like we once in a while gets stuck while solving the sub-problem (which is a MIP). We have put a time limit on the sub-problem solves by setting CPX_PARAM_DETTILIM = 100000 so I think we can rule out that we have just encountered a hard sub-problem (the program is stuck for hours). The issue is non-deterministic, the problem may or may not occur when solving a specific instance. We believe that our column generation code is deterministic. Computations take place on a Linux cluster where we are submitting jobs using IBM LSF (https://en.wikipedia.org/wiki/Platform_LSF). The problem seems to only occur (or perhaps it's just more pronounced) when we have submitted multiple jobs to LSF. We do not use all threads of the computers in the cluster so multiple instances of our program may be running on the same computer. 

    The algorithm is programmed in Julia v1.1.1 and we are using CPLEX.jl v0.5.0 and JuMP.jl 0.18.5 as an interface to CPLEX. When solving the sub-problem we allow CPLEX to use multiple threads (in the case below we used 4 threads).

    When the program gets stuck in the sub-problem it is eventually stopped by the queuing system and we get a stack trace. An example of the stack trace is shown below. I interpret this as CPLEX waiting for a condition using pthread_cond_wait. I guess that "_380e5c4656a6f664b227344161da1705" and so on are function names that have been obfuscated. iI the function names can be translated back to something meaningful, then perhaps someone in the CPLEX team can use this to provide us a hint to why we are stuck, based on the stack-trace?

     

    Best regards,

    Stefan

     

    signal (2): Interrupt
    in expression starting at /zhome/e7/c/23631/clusterDikuSVN/Julia/AutoDec/AutoDecCMD.jl:69
    pthread_cond_wait at /lib64/libpthread.so.0 (unknown line)
    _380e5c4656a6f664b227344161da1705 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _a2c676a7e5fa4ef62804ea26f60b6985 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _489ebd4c5c9af8d527368ef7798879f4 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _c1c77f10ee3987dbacd6e1ff57562a0d at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _458c27ff5e8b53b24f24b11298e4748a at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _6874c3b6b6be3f68ecba8390ec5eef57 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _f68671c86e58ee857262d57e613a989e at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _6461e05fddcc3cd8f9bc66780cf8fd0f at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _d8cf0d78b73b992f462a67ac0246cadf at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _a5ac66bf2a77f76a39d6d39cb35ec3b8 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _8727296eaaa73edeafa0b13f8264cf6b at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _c61c6b0d728c97d9284b71d6d09582c0 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _00d3484724425db51115f7f77592bc7d at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _c69cadd8f44da6e42b444e6f0807896a at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _6874c3b6b6be3f68ecba8390ec5eef57 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _0ff8510b91f0240cafe918f6ab601dab at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _cac20a5c347b91b75f3929b6cf82c567 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _5ce57952ba3c58d45c4ff1caf38ccdb0 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    _8bd0e61f623cb30e4bf612edcfdd2080 at /apps/cplex/cplex129/cplex/bin/x86-64_linux/libcplex1290.so (unknown line)
    optimize! at /zhome/e7/c/23631/.julia/packages/CPLEX/NzxB3/src/cpx_solve.jl:7 [inlined]
    optimize! at /zhome/e7/c/23631/.julia/packages/CPLEX/NzxB3/src/CplexSolverInterface.jl:185


    #CPLEXOptimizers
    #DecisionOptimization


  • 2.  Re: CPLEX stuck while solving a MIP

    Posted Wed September 18, 2019 09:00 AM

    Originally posted by: Laci Ladanyi


    Hi Stefan,

     

    Unfortunately the stack trace does not reveal much :-(. Basically it is what you have guessed: cplex is waiting for a condition, for all children to report back. However, this trace is the trace for only one of the threads. Is there any way to get the trace for all threads? Can LSF be instructed to do that? Or can you login to the node where the hung process is running (after you are reasonably certain that it should have stopped, but before LSF cuts it off), attach gdb to it, and get a trace of all threads? Of all else fails, you can try to run cplex in single threaded mode and hopefully get the relevant stack trace that way. Let me know!

     

    --Laci


    #CPLEXOptimizers
    #DecisionOptimization


  • 3.  Re: CPLEX stuck while solving a MIP

    Posted Wed September 18, 2019 09:23 AM

    Originally posted by: StefanRopke


    Thank you for looking into this Laci! I will try to get a stack-trace from all the threads using gdb - I think that should be possible, perhaps with some help from the sysadmins :-)

    Best regards,

    Stefan

     


    #CPLEXOptimizers
    #DecisionOptimization


  • 4.  Re: CPLEX stuck while solving a MIP

    Posted Thu September 19, 2019 08:48 AM

    Originally posted by: StefanRopke


    Hi Laci,

     

    I think I reached the "stuck" situation again and the sysadmin of the cluster was able to get a stacktrace for each thread (the pstack command does exactly that, maybe that information is valuable in case someone is reading this in the future). 

    I have attached the stacktraces. There are 5 threads in the stack-trace. I guess that one thread is Julia specific (LWP 11803) since CPLEX was supposed to just use 4 threads.

    I also attached the output from top, here it seems like it is LWP 4247 that is doing a lot of work while the other CPLEX threads are just waiting.

     

    Regarding parameters for the CPLEX call, then we set the following ones:

    CPX_PARAM_DETTILIM=10000 or 100000 (we increase the deterministic time limit from 10000 to 100000 and resolve if the first solve has not found a sufficiently good solution or proven optimality within the short time limit)

    CPX_PARAM_SCRIND=0,

    CPX_PARAM_THREADS=4,

    CPX_PARAM_EPINT=1E-8

    CPX_PARAM_EPAGAP=0.0001 

    CPX_PARAM_EPGAP=0

     

    Best regards,

    Stefan


    #CPLEXOptimizers
    #DecisionOptimization


  • 5.  Re: CPLEX stuck while solving a MIP

    Posted Thu September 19, 2019 10:57 PM

    Originally posted by: Laci Ladanyi


    It does look like a deadlock :-(. Is there any chance that every time before calling optimization you would create a sav file and save the log of the optimization? Then delete the sav file and the log if cplex returns just fine, but if it hangs, then post the log and the sav file? Or if it's confidential, you can email them to me to ladanyi at us dot ibm dot com.

     

    Thanks,

    --Laci


    #CPLEXOptimizers
    #DecisionOptimization


  • 6.  Re: CPLEX stuck while solving a MIP

    Posted Mon November 04, 2019 05:59 PM

    Originally posted by: Laci Ladanyi


    Hi Stefan,

     

    Did you manage to create a sav/log file? I'm afraid we can't really do anything about the issue without that... We need something to reproduce the issue with.

     

    Thanks,

    --Laci


    #CPLEXOptimizers
    #DecisionOptimization