Decision Optimization

Decision Optimization

Delivers prescriptive analytics capabilities and decision intelligence to improve decision-making.

 View Only
  • 1.  CPLEX: Barrier freezes during computation

    Posted Tue August 05, 2014 12:34 PM

    Originally posted by: p_weiss


    Hello everyone,

    I have the following problem with CPLEX: I have been solving large LPs from the C interface with the barrier method. Those LPs normally take about half an hour to solve (8 threads). From time to time, it happens that CPLEX stops outputting in the middle of its computation and does not resume even after waiting more than a day. A top command tells me that 8 threads are running the whole time at full capacity. The process consumes 20 GB RAM, of which not all belongs to the CPLEX subroutine, (probably not even the majority), while 15 GB are still available, so it is probably not a memory issue.

    Unfortunately, the problem does not seem to be reproducible. It is always a different LP that freezes, and when I load the responsible LP into the command line interface of CPLEX, it is solved normally.

    Below, you can find the full log of a frozen run. Any ideas on what the problem might be or how I can figure it out?

    Thanks, and all the best,

    Philipp Weiß

    -------------------

    Found 0 integer variables.
    Tried aggregator 1 time.
    LP Presolve eliminated 1260221 rows and 138743 columns.
    Aggregator did 377840 substitutions.
    Reduced LP has 2884245 rows, 8275845 columns, and 15864857 nonzeros.
    Presolve time = 9.98 sec. (4779.08 ticks)
    Parallel mode: using up to 8 threads for barrier.

    ***NOTE: Found 470 dense columns.

    Number of nonzeros in lower triangle of A*A' = 189631488
    Elapsed ordering time = 7.44 sec. (10000.02 ticks)
    Elapsed ordering time = 16.51 sec. (20000.62 ticks)
    Using Nested Dissection ordering
    Total time for automatic ordering = 290.54 sec. (381484.66 ticks)
    Summary statistics for Cholesky factor:
      Threads                   = 8
      Rows in Factor            = 2884715
      Integer space required    = 11115910
      Total non-zeros in factor = 850988236
      Total FP ops to factor    = 2074231947276
     Itn      Primal Obj        Dual Obj  Prim Inf Upper Inf  Dual Inf          
       0  -8.4011111e+12   1.1330902e+11  2.28e+07  1.92e+05  2.06e+13
       1  -9.2540026e+12   6.6942577e+11  1.15e+07  9.67e+04  1.62e+13
       2  -9.1027286e+12   1.5622640e+12  1.21e+06  1.02e+04  8.49e+12
       3  -4.5672012e+12   1.1924255e+12  8.81e-09  2.22e-16  5.67e+11
       4  -6.6451057e+11   5.6604539e+11  5.44e-09  3.77e-15  3.40e+10
       5  -1.7959695e+11   1.6730791e+11  2.17e-09  4.00e-15  3.42e+09
       6  -1.1180964e+11   1.5248766e+11  1.51e-09  3.44e-15  2.87e+09
       7  -5.2311182e+10   8.8418608e+10  1.23e-09  4.77e-15  6.78e+08
       8  -3.7372055e+10   6.2245820e+10  6.48e-10  4.00e-15  9.79e+07
       9  -1.4357052e+10   4.1006122e+10  1.12e-09  4.88e-15  9.81e+06
      10  -5.3156947e+09   3.0820618e+10  1.61e-09  4.33e-15  2.07e+06
      11   1.0298159e+09   2.3400648e+10  1.67e-09  4.66e-15  6.70e+05
      12   1.9429231e+09   2.1157063e+10  1.08e-09  4.33e-15  3.80e+05
      13   4.5429084e+09   1.7987447e+10  1.26e-09  3.22e-15  2.01e+05
      14   5.1873872e+09   1.5366420e+10  1.36e-09  4.66e-15  8.10e+04
      15   5.7967066e+09   1.4370489e+10  1.07e-09  3.22e-15  4.75e+04
      16   7.0380513e+09   1.2521724e+10  2.07e-09  3.33e-15  3.10e+03
      17   7.6469605e+09   1.1502086e+10  2.85e-09  2.55e-15  1.00e+02
      18   7.9052222e+09   1.1219198e+10  2.70e-09  3.77e-15  5.44e+01
      19   8.2517090e+09   1.0706125e+10  2.98e-09  3.66e-15  8.88e+00
      20   8.6660556e+09   1.0359728e+10  3.88e-09  3.11e-15  1.08e+00
      21   8.9084139e+09   1.0243482e+10  4.30e-09  4.22e-15  3.68e-01
      22   9.0081746e+09   1.0147381e+10  3.93e-09  3.89e-15  1.39e-01
      23   9.3457296e+09   9.9729640e+09  1.08e-08  3.44e-15  4.76e-03
      24   9.4546333e+09   9.9370115e+09  1.04e-08  2.57e-15  3.05e-03
      25   9.5697961e+09   9.8502384e+09  1.80e-08  3.12e-15  9.92e-04
      26   9.5954976e+09   9.8215611e+09  1.16e-08  3.23e-15  7.21e-04
      27   9.6429996e+09   9.7900880e+09  1.60e-08  3.66e-15  6.71e-04
      28   9.6829159e+09   9.7706738e+09  2.35e-08  3.71e-15  6.46e-04
      29   9.6920866e+09   9.7643297e+09  1.53e-08  4.07e-15  6.11e-04
      30   9.7051252e+09   9.7605857e+09  2.51e-08  3.92e-15  6.31e-04
      31   9.7157804e+09   9.7497834e+09  3.93e-08  3.28e-15  6.75e-04
      32   9.7204737e+09   9.7477242e+09  2.74e-08  3.77e-15  6.50e-04
      33   9.7288376e+09   9.7445950e+09  6.69e-08  3.91e-15  6.38e-04
      34   9.7314177e+09   9.7431287e+09  4.45e-08  3.54e-15  6.11e-04
      35   9.7358418e+09   9.7417737e+09  6.05e-08  3.77e-15  6.64e-04
      36   9.7372378e+09   9.7413496e+09  6.54e-08  4.80e-15  6.83e-04
      37   9.7380191e+09   9.7407392e+09  6.62e-08  4.83e-15  6.80e-04
      38   9.7384802e+09   9.7406089e+09  7.13e-08  3.90e-15  7.04e-04
      39   9.7387916e+09   9.7402754e+09  6.48e-08  4.33e-15  6.87e-04
      40   9.7390753e+09   9.7401122e+09  6.70e-08  4.83e-15  7.06e-04
      41   9.7394472e+09   9.7400047e+09  9.96e-08  4.15e-15  6.83e-04
      42   9.7395758e+09   9.7398822e+09  8.44e-08  4.48e-15  6.94e-04
      43   9.7396745e+09   9.7398325e+09  1.21e-07  4.29e-15  6.61e-04
      44   9.7397162e+09   9.7398131e+09  1.35e-07  3.49e-15  6.99e-04
     


    #CPLEXOptimizers
    #DecisionOptimization


  • 2.  Re: CPLEX: Barrier freezes during computation

    Posted Wed August 06, 2014 12:38 AM

    First thing I would try is to enable CPX_PARAM_DATACHECK to make sure you do not pass bogus data (such as NaNs) to CPLEX which could cause trouble down the road. Next, it might be worth while to run your code through valgrind or a similar tool to check that neither your code nor CPLEX performs bad memory operations, which may also cause any sort of trouble.

    If that does not give you any more information, do you have any chance to attach a debugger to the running process and check where CPLEX is currently hanging?

    What version of CPLEX are you using? The most recent one is 12.6.0.1. If you are not yet using that it might be worth trying whether the problem persists with that version.


    #CPLEXOptimizers
    #DecisionOptimization


  • 3.  Re: CPLEX: Barrier freezes during computation

    Posted Fri September 05, 2014 10:02 AM

    Originally posted by: p_weiss


    CPX_PARAM_DATACHECK is set to one (CPXsetintparam (_env, CPX_PARAM_DATACHECK, 1)).

    Up until now, the freezing never occurred when I used valgrind, at least in those cases there were no memory errors.

    The freezing didn't occur for some time, as I was focusing some other project. Yesterday, I again had a frozen run (without valgrind), even though I included a wall time limit of 7200s (CPXsetintparam (_env, CPX_PARAM_CLOCKTYPE, 2); CPXsetdblparam (_env, CPX_PARAM_TILIM, num_seconds);) It seems, the time limit was ignored.

    I attached gdb to the process, unfortunately, there is no debug information in the cplex lib files. It said (after single stepping multiple times)

    0x00007f3122e0f89c in __lll_lock_wait () from /lib64/libpthread.so.0
    (gdb) n
    Single stepping until exit from function __lll_lock_wait,
    which has no line number information.
    0x00007f3122e0b4d7 in _L_lock_913 () from /lib64/libpthread.so.0
    (gdb) n
    Single stepping until exit from function _L_lock_913,
    which has no line number information.
    0x00007f3122e0b300 in pthread_mutex_lock () from /lib64/libpthread.so.0
    (gdb) n
    Single stepping until exit from function pthread_mutex_lock,
    which has no line number information.
    0x00007f30f70b47e1 in _5b993699a1a5c9fd7161ae008bbeb2a5 () from <some path>/cplex126/bin/Linux64/libcplex126.so
    (gdb) n
    Single stepping until exit from function _5b993699a1a5c9fd7161ae008bbeb2a5,
    which has no line number information.
     

    Then it stopped, i.e. it never exited this function. I detached and reattached gdb multiple times, and it always went either like this or with "unlock" instead of "lock" everywhere, and ended inside the same function.

    Regarding the version, it says 12.6.0 in the README.html, I guess there is just a minor difference to 12.6.0.1, if at all?

    Any more ideas what I could try?

    Here the complete log of the frozen run:

    Found 0 integer variables.
    Tried aggregator 1 time.
    LP Presolve eliminated 225328 rows and 28842 columns.
    Aggregator did 79248 substitutions.
    Reduced LP has 449721 rows, 1234378 columns, and 2353751 nonzeros.
    Presolve time = 1.87 sec. (783.82 ticks)
    Parallel mode: using up to 8 threads for barrier.

    ***NOTE: Found 468 dense columns.

    Number of nonzeros in lower triangle of A*A' = 4083271
    Using Nested Dissection ordering
    Total time for automatic ordering = 12.96 sec. (9153.89 ticks)
    Summary statistics for Cholesky factor:
      Threads                   = 8
      Rows in Factor            = 450189
      Integer space required    = 1706338
      Total non-zeros in factor = 44116620
      Total FP ops to factor    = 48009042878
     Itn      Primal Obj        Dual Obj  Prim Inf Upper Inf  Dual Inf          
       0  -2.2725586e+11   6.9107417e+08  1.89e+06  3.01e+04  1.01e+11
       1  -1.2366787e+11   5.5256110e+09  1.03e+06  1.64e+04  5.91e+10
       2  -9.7606899e+10   1.3451611e+10  4.05e+05  6.43e+03  2.76e+10
       3  -1.0894040e+10   1.2118791e+10  5.25e+02  8.34e+00  1.03e+10
       4  -6.4992818e+09   5.8101901e+09  2.61e+02  4.14e+00  3.66e+09
       5  -3.0368811e+09   5.7409818e+09  7.10e+00  1.13e-01  2.71e+09
       6  -1.7025683e+09   3.5384470e+09  4.02e+00  6.39e-02  1.07e+09
       7  -1.0815752e+09   3.1364594e+09  2.44e+00  3.87e-02  6.81e+08
       8  -4.9227157e+06   2.2383293e+09  7.21e-01  1.15e-02  2.27e+08
       9   4.4555028e+08   1.6335403e+09  2.85e-01  4.53e-03  4.52e+07
      10   7.7919893e+08   1.5071687e+09  6.35e-02  1.01e-03  2.50e+07
      11   8.8272935e+08   1.3934206e+09  1.87e-02  2.97e-04  1.18e+07
      12   9.9579129e+08   1.2756661e+09  1.07e-03  1.70e-05  3.81e+06
      13   1.0245776e+09   1.2295162e+09  3.35e-04  5.32e-06  1.64e+06
      14   1.0773196e+09   1.1726417e+09  5.32e-05  8.45e-07  3.59e+05
      15   1.0857848e+09   1.1637071e+09  3.41e-05  5.42e-07  1.94e+05
      16   1.1093654e+09   1.1491161e+09  1.25e-05  1.99e-07  7.80e+04
      17   1.1161374e+09   1.1424542e+09  3.34e-06  5.31e-08  1.01e+04
      18   1.1211978e+09   1.1399249e+09  1.37e-06  2.18e-08  4.16e+03
      19   1.1283184e+09   1.1367039e+09  3.42e-07  5.44e-09  1.21e+03
      20   1.1297214e+09   1.1357052e+09  1.72e-07  2.73e-09  4.41e+02
      21   1.1313988e+09   1.1352350e+09  7.51e-08  1.19e-09  2.72e+02
      22   1.1329246e+09   1.1344811e+09  1.28e-08  1.94e-10  6.33e+01
      23   1.1332920e+09   1.1343278e+09  8.69e-09  1.25e-10  4.50e+01
      24   1.1336073e+09   1.1341408e+09  5.35e-09  6.50e-11  2.27e+01
      25   1.1337620e+09   1.1340537e+09  4.17e-09  3.58e-11  1.22e+01
      26   1.1338541e+09   1.1340011e+09  4.48e-09  1.84e-11  5.93e+00
      27   1.1338887e+09   1.1339784e+09  4.16e-09  1.19e-11  3.22e+00
      28   1.1339195e+09   1.1339707e+09  5.09e-09  6.04e-12  2.29e+00
      29   1.1339300e+09   1.1339597e+09  4.61e-09  6.73e-08  9.76e-01
      30   1.1339382e+09   1.1339567e+09  5.21e-09  4.18e-08  6.14e-01
      31   1.1339425e+09   1.1339552e+09  5.91e-09  2.83e-08  4.33e-01
      32   1.1339449e+09   1.1339539e+09  1.53e-07  2.07e-08  2.83e-01
      33   1.1339481e+09   1.1339527e+09  8.49e-08  6.31e-08  1.41e-01
      34   1.1339503e+09   1.1339521e+09  1.08e-07  2.22e-08  5.83e-02
      35   1.1339511e+09   1.1339519e+09  1.24e-07  4.49e-08  3.74e-02
      36   1.1339512e+09   1.1339517e+09  4.19e-07  1.16e-07  1.55e-02
      37   1.1339514e+09   1.1339517e+09  2.88e-07  9.72e-08  1.16e-02
     


    #CPLEXOptimizers
    #DecisionOptimization


  • 4.  Re: CPLEX: Barrier freezes during computation

    Posted Mon September 22, 2014 06:10 AM

    It is expected that the CPLEX library does not contain debugging information. However, you can still extract valuable information from that. If the thing freezes again, you can do the following after attaching gdb:

    (gdb) info threads

    This shows all threads that are currently active. Threads are numbered 1 through N. For each thread i you can then do

    (gdb) thread i
    (gdb) backtrace

    That gives a stacktrace for each thread and would tell us in which function each thread is currently executing. This may help to pinpoint the issue.

     


    #CPLEXOptimizers
    #DecisionOptimization


  • 5.  Re: CPLEX: Barrier freezes during computation

    Posted Thu October 09, 2014 07:20 AM

    Originally posted by: p_weiss


    Ok, here we go.  Anything with "..." was manually removed by me.


    (gdb) info threads
    Id   Target Id         Frame
    8    Thread 0x7fec9f1a9700 (LWP 10878) "..." 0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5
       () from /.../cplex126/bin/Linux64/libcplex126.so
    7    Thread 0x7fec9d6a6700 (LWP 10879) "..." 0x00007feca0f40a0e in _5b993699a1a5c9fd7161ae008bbeb2a5
       () from /.../cplex126/bin/Linux64/libcplex126.so
    6    Thread 0x7fec9cda5700 (LWP 10880) "..." 0x00007feccbc9b560 in __pthread_mutex_unlock_usercnt ()
       from /lib64/libpthread.so.0
    5    Thread 0x7fec9dfa7700 (LWP 10881) "..." 0x00007feca0f40a21 in _5b993699a1a5c9fd7161ae008bbeb2a5
       () from /.../cplex126/bin/Linux64/libcplex126.so
    4    Thread 0x7fec9e9a8700 (LWP 10882) "..." 0x00007feca0f40bc1 in _5b993699a1a5c9fd7161ae008bbeb2a5
       () from /.../cplex126/bin/Linux64/libcplex126.so
    3    Thread 0x7fec97fff700 (LWP 10883) "..." 0x00007feccbc9e89c in __lll_lock_wait ()
       from /lib64/libpthread.so.0
    2    Thread 0x7fec977fe700 (LWP 10884) "..." 0x00007feccbc9e93a in __lll_unlock_wake ()
       from /lib64/libpthread.so.0
    * 1    Thread 0x7feccc2a8780 (LWP 10906) "..." 0x00007feca0f40a0b in _5b993699a1a5c9fd7161ae008bbeb2a5
       () from /.../cplex126/bin/Linux64/libcplex126.so
    (gdb) thread 1
    [Switching to thread 1 (Thread 0x7feccc2a8780 (LWP 10906))]
    #0  0x00007feca0f40a0b in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    (gdb) backtrace
    #0  0x00007feca0f40a0b in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #1  0x00007feca0c2b086 in _39dc42df39dac4f0c57c941b5a321630 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #2  0x00007feca0c24e50 in _8ba376e143be5a6b411e79cd744f5082 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #3  0x00007feca0f3d3fd in _8e36aa2cd7af6f0491203cea1a2e1a3f ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0f3ce9b in _c9d0a1b2ca3747fbbf1ac20b6a223ef8 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feca0eeced5 in _73ac3169e8208994876da49b8a5f79fb ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #6  0x00007feca0ed8c85 in _9e4c2e748463464e6104aa9bb97f91bb ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #7  0x00007feca0ed70c1 in _6b77dbeec5b0abbba32326cf8fbfe4d2 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #8  0x00007feca0f12fcf in _e24c09ecb6c6a7662603d3360f3d29be ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #9  0x00007feca0f0e851 in _7c91636d0f4019694580539dc05d96bd ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #10 0x00007feca0f0df7b in _8627dde9fee56ed64361e3f808ee5674 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #11 0x00007feca0bf8fe6 in _6edd731a909fa1d6c1c46d6625ad8945 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #12 0x00007feca0bf8f7e in _aeec6ab7d0e2b8ce52ceb53b5733bc5b ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #13 0x00007feca0c2451e in _c69cadd8f44da6e42b444e6f0807896a ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #14 0x00007feca0c282e1 in _6874c3b6b6be3f68ecba8390ec5eef57 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #15 0x00007feca0c271c8 in _0ff8510b91f0240cafe918f6ab601dab ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #16 0x00007feca0c2463b in _cac20a5c347b91b75f3929b6cf82c567 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #17 0x00007feca0bf8596 in _3f3480dbba5f22a546d86030315ee732 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #18 0x00007feca0c901eb in _deaafe5d0782fcf82ccd00343a50946d ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #19 0x00007feca0c90242 in CPXlpopt () from /.../cplex126/bin/Linux64/libcplex126.so
    #20 - #44: ...
    (gdb) thread 2
    [Switching to thread 2 (Thread 0x7fec977fe700 (LWP 10884))]
    #0  0x00007feccbc9e93a in __lll_unlock_wake () from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9e93a in __lll_unlock_wake () from /lib64/libpthread.so.0
    #1  0x00007feccbc9b629 in _L_unlock_578 () from /lib64/libpthread.so.0
    #2  0x00007feccbc9b566 in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0
    #3  0x00007feca0f407fe in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #6  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 3
    [Switching to thread 3 (Thread 0x7fec97fff700 (LWP 10883))]
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    #1  0x00007feccbc9a4d7 in _L_lock_913 () from /lib64/libpthread.so.0
    #2  0x00007feccbc9a300 in pthread_mutex_lock () from /lib64/libpthread.so.0
    #3  0x00007feca0f407e1 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #6  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 4
    [Switching to thread 4 (Thread 0x7fec9e9a8700 (LWP 10882))]
    #0  0x00007feca0f40bc1 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    (gdb) backtrace
    #0  0x00007feca0f40bc1 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #1  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #2  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #3  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 5
    [Switching to thread 5 (Thread 0x7fec9dfa7700 (LWP 10881))]
    #0  0x00007feca0f40a21 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    (gdb) backtrace
    #0  0x00007feca0f40a21 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #1  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #2  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #3  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 6
    [Switching to thread 6 (Thread 0x7fec9cda5700 (LWP 10880))]
    #0  0x00007feccbc9b560 in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9b560 in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0
    #1  0x00007feca0f407fe in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #2  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #3  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #4  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 7
    [Switching to thread 7 (Thread 0x7fec9d6a6700 (LWP 10879))]
    #0  0x00007feca0f40a0e in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    (gdb) backtrace
    #0  0x00007feca0f40a0e in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #1  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #2  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #3  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 8
    [Switching to thread 8 (Thread 0x7fec9f1a9700 (LWP 10878))]
    #0  0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    (gdb) backtrace
    #0  0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #1  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #2  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #3  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb)

    Then I continued the process, for about 10 s to get a snapshot at a different time:

    (gdb) info threads
      Id   Target Id         Frame
      8    Thread 0x7fec9f1a9700 (LWP 10878) "..." 0x00007feccbc9e89c in __lll_lock_wait ()
       from /lib64/libpthread.so.0
      7    Thread 0x7fec9d6a6700 (LWP 10879) "..." 0x00007feccbc9e93a in __lll_unlock_wake ()
       from /lib64/libpthread.so.0
      6    Thread 0x7fec9cda5700 (LWP 10880) "..." 0x00007feccbc9e89c in __lll_lock_wait ()
       from /lib64/libpthread.so.0
      5    Thread 0x7fec9dfa7700 (LWP 10881) "..." 0x00007feccbc9e89c in __lll_lock_wait ()
       from /lib64/libpthread.so.0
      4    Thread 0x7fec9e9a8700 (LWP 10882) "..." 0x00007feccbc9e93a in __lll_unlock_wake ()
       from /lib64/libpthread.so.0
      3    Thread 0x7fec97fff700 (LWP 10883) "..." 0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5
        () from /.../cplex126/bin/Linux64/libcplex126.so
      2    Thread 0x7fec977fe700 (LWP 10884) "..." 0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5
        () from /.../cplex126/bin/Linux64/libcplex126.so
    * 1    Thread 0x7feccc2a8780 (LWP 10906) "..." 0x00007feccbc9e89c in __lll_lock_wait ()
       from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    #1  0x00007feccbc9a4d7 in _L_lock_913 () from /lib64/libpthread.so.0
    #2  0x00007feccbc9a300 in pthread_mutex_lock () from /lib64/libpthread.so.0
    #3  0x00007feca0f407e1 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0c2b086 in _39dc42df39dac4f0c57c941b5a321630 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feca0c24e50 in _8ba376e143be5a6b411e79cd744f5082 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #6  0x00007feca0f3d3fd in _8e36aa2cd7af6f0491203cea1a2e1a3f ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #7  0x00007feca0f3ce9b in _c9d0a1b2ca3747fbbf1ac20b6a223ef8 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #8  0x00007feca0eeced5 in _73ac3169e8208994876da49b8a5f79fb ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #9  0x00007feca0ed8c85 in _9e4c2e748463464e6104aa9bb97f91bb ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #10 0x00007feca0ed70c1 in _6b77dbeec5b0abbba32326cf8fbfe4d2 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #11 0x00007feca0f12fcf in _e24c09ecb6c6a7662603d3360f3d29be ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #12 0x00007feca0f0e851 in _7c91636d0f4019694580539dc05d96bd ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #13 0x00007feca0f0df7b in _8627dde9fee56ed64361e3f808ee5674 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #14 0x00007feca0bf8fe6 in _6edd731a909fa1d6c1c46d6625ad8945 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #15 0x00007feca0bf8f7e in _aeec6ab7d0e2b8ce52ceb53b5733bc5b ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #16 0x00007feca0c2451e in _c69cadd8f44da6e42b444e6f0807896a ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #17 0x00007feca0c282e1 in _6874c3b6b6be3f68ecba8390ec5eef57 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #18 0x00007feca0c271c8 in _0ff8510b91f0240cafe918f6ab601dab ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #19 0x00007feca0c2463b in _cac20a5c347b91b75f3929b6cf82c567 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #20 0x00007feca0bf8596 in _3f3480dbba5f22a546d86030315ee732 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #21 0x00007feca0c901eb in _deaafe5d0782fcf82ccd00343a50946d ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #22 0x00007feca0c90242 in CPXlpopt () from /.../cplex126/bin/Linux64/libcplex126.so
    #23 - #47 ...
    (gdb) thread 2
    [Switching to thread 2 (Thread 0x7fec977fe700 (LWP 10884))]
    #0  0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    (gdb) backtrace
    #0  0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #1  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #2  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #3  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 3
    [Switching to thread 3 (Thread 0x7fec97fff700 (LWP 10883))]
    #0  0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    (gdb) backtrace
    #0  0x00007feca0f40a12 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #1  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #2  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #3  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 4
    [Switching to thread 4 (Thread 0x7fec9e9a8700 (LWP 10882))]
    #0  0x00007feccbc9e93a in __lll_unlock_wake () from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9e93a in __lll_unlock_wake () from /lib64/libpthread.so.0
    #1  0x00007feccbc9b629 in _L_unlock_578 () from /lib64/libpthread.so.0
    #2  0x00007feccbc9b566 in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0
    #3  0x00007feca0f407fe in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #6  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 5
    [Switching to thread 5 (Thread 0x7fec9dfa7700 (LWP 10881))]
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    #1  0x00007feccbc9a4d7 in _L_lock_913 () from /lib64/libpthread.so.0
    #2  0x00007feccbc9a300 in pthread_mutex_lock () from /lib64/libpthread.so.0
    #3  0x00007feca0f4109d in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #6  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 6
    [Switching to thread 6 (Thread 0x7fec9cda5700 (LWP 10880))]
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    #1  0x00007feccbc9a4d7 in _L_lock_913 () from /lib64/libpthread.so.0
    #2  0x00007feccbc9a300 in pthread_mutex_lock () from /lib64/libpthread.so.0
    #3  0x00007feca0f407e1 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #6  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 7
    [Switching to thread 7 (Thread 0x7fec9d6a6700 (LWP 10879))]
    #0  0x00007feccbc9e93a in __lll_unlock_wake () from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9e93a in __lll_unlock_wake () from /lib64/libpthread.so.0
    #1  0x00007feccbc9b629 in _L_unlock_578 () from /lib64/libpthread.so.0
    #2  0x00007feccbc9b566 in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0
    #3  0x00007feca0f407fe in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #6  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb) thread 8
    [Switching to thread 8 (Thread 0x7fec9f1a9700 (LWP 10878))]
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    (gdb) backtrace
    #0  0x00007feccbc9e89c in __lll_lock_wait () from /lib64/libpthread.so.0
    #1  0x00007feccbc9a4d7 in _L_lock_913 () from /lib64/libpthread.so.0
    #2  0x00007feccbc9a300 in pthread_mutex_lock () from /lib64/libpthread.so.0
    #3  0x00007feca0f407e1 in _5b993699a1a5c9fd7161ae008bbeb2a5 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #4  0x00007feca0c26a81 in _556768ecdc5cc99d00e4926a7d8945b7 ()
       from /.../cplex126/bin/Linux64/libcplex126.so
    #5  0x00007feccbc980db in start_thread () from /lib64/libpthread.so.0
    #6  0x00007feccaf9e90d in clone () from /lib64/libc.so.6
    (gdb)

    I'm keeping the process suspended now, so I don't have to wait for an occurrence again if you need me to do more experiments.

    Thanks for your help,

    Philipp Weiß


    #CPLEXOptimizers
    #DecisionOptimization


  • 6.  Re: CPLEX: Barrier freezes during computation

    Posted Tue October 14, 2014 09:45 AM

    Thank you for your patience with this issue!

    All threads are in the same function. I looked hard at that function but could not spot anything that would result in an infinite loop.

    It looks like I can only get to the bottom of that issue if I manage to reproduce it here. If you still have the process and are willing to share the model with me then you could capture the model like this:

    (gdb) thread 1
    (gdb) frame 20
    (gdb) call CPXwriteprob(ENV, LP, "problem.sav", 0)
    (gdb) call CPXwriteparam(ENV, "problem.prm")

    This exports the model and current parameter settings. If you send the SAV and PRM files to me daniel(dot)junglas(at)de(dot)ibm(dot)com then I can try to reproduce the issue here. If you cannot share the model then I will continue to look at the code but it is not clear whether I will find anything.


    #CPLEXOptimizers
    #DecisionOptimization