Decision Optimization

Decision Optimization

Delivers prescriptive analytics capabilities and decision intelligence to improve decision-making.

 View Only
  • 1.  Control callbacks in multithreaded MIP Optimization

    Posted Fri May 03, 2019 11:34 PM

    Originally posted by: UserCplex


    Hello,

    I have searched for this in the forum but could not find posts that explicitly detail the components required to have multithreaded MIP solutions with user cut callbacks. Hence, this post.

    Before the details, a background of how I develop and solve the MIP in my case.

    My MIP, [MIP] is:

    Min cx

    s.t. Ax = b

    x >= 0, binary.

    I begin with the LP relaxation of the above, called [LP].

    Global variables declared are:

    CPXENVptr env;
    CPXLPptr lp;
    

    Given a current fractional LP solution, I have a function

     

    bool find_violated_cuts(double* x, int wherefrom, void* cbdata){
    
    }
    

     

    I have a global variable bool INTO_CB which is true if MIP optimization has started and is false if I am still working with [LP].

     

    I have twelve different class of inequalities numbered 1 through 12, say.

    Each one of these inequality classes has a separation routine that takes in the current LP solution and attempts to find violations. So, the structure of the (say) first separation routine is thus:

    void sep1(double *x);
    

    So, within find_violated_cuts(), I have:

    sep1(x);
    
    ...
    
    sep12(x);
    

     

    As suggested in this thread: https://www.ibm.com/developerworks/community/forums/html/topic?id=cf534d14-c3f1-47e2-84ed-feb44fb66958

     

    all of my data structures have a first dimension [] pertaining to the thread number within which it is read/written so as to avoid race conditions.

    So, all of my non-cplex user defined data structures accessed in find_violated_cuts() and subsequently in the sepx() functions are thread safe.

    When a violation is found, the appropriate cut is added to [LP] within find_violated_cuts() based on the value of INTO_CB

    if (!INTO_CB)
    
                        status = CPXaddrows(env, lp, 0, 1, ctr, rhs, sense, rmatbeg, rmatind, rmatval, NULL, conname);
    else
                        CPXcutcallbackadd(env, cbdata, wherefrom, ctr, rhs[0], sense[0], rmatind, rmatval, CPX_USECUT_PURGE);
    

     

    And the LP is re-solved.

    Eventually, when no more violations are found and the [LP]'s solution is fractional, I convert the variables to binary using

    int status = CPXchgctype(env, lp, cnt, cons tint * indices, ctype);
    //where ctype = 'B';
    

    Once this is done, I give problem [MIP] to be solved to CPLEX via CPXmipopt. INTO_CB is made true. (Hence, CPXcutcallbackadd() will be run within the above function.)

    My user cut callback function registered with CPLEX is structured thus:

     

    int CPXPUBLIC LP::mycutcallback(CPXCENVptr env, void *cbdata, int wherefrom, void *cbhandle, int *useraction_p){}
    

     

    Within this function, I access the following CPLEX functions at different times/places:

    CUTINFOptr cutinfo = (CUTINFOptr)cbhandle;
    
    int depth;
    
    status = CPXgetcallbacknodeinfo(env, cbdata, wherefrom, 0, CPX_CALLBACK_INFO_NODE_DEPTH, &depth);
    
    CPXgetcallbacknodex(env, cbdata, wherefrom, cutinfo->x, 0, cutinfo->numcols - 1);
    
    cutinfo->find_violated_cuts(cutinfo->x, wherefrom, cbdata);
    

    My questions specifically are:

    Are all of the CPLEX functions that I access in the framework about thread-safe?

    In particular, if two threads are solving the LP subproblem at two different nodes of the BB Tree, given the framework above, there are going to be calls to CPXgetcallbacknodeinfo, CPXgetcallbacknodex, and CPXcutcallbackadd from different BB Tree nodes in different threads. Are these thread-safe calls? Is there need for any explicit lock needed around any of the above functions across threads?

     

    Are there any other guidelines/frameworks to follow to develop multithreaded user cut callbacks/lazy cuts/branch callback, etc.?

    Are there CPLEX provided examples with source codes that explain the questions above?

    Thanks.


    #CPLEXOptimizers
    #DecisionOptimization


  • 2.  Re: Control callbacks in multithreaded MIP Optimization

    Posted Sat May 04, 2019 04:21 PM

    The rule basically is this: Any function that takes as argument the (env, cbdata, wherefrom) triplet is safe to call simultaneously from multiple threads. These functions will only access thread-local internal CPLEX information (each thread has its own node lp). That means the code for your callback function should be thread-safe.

    What looks suspicious is the query for the x vector: Each thread will see the same 'cutinfo' pointer in the callback, thus each thread will use the same 'cutinfo->x' array and that will be a race condition. On the other hand, you stated that you do some "magic" with that x vector so that each thread has its own x vector. If that is indeed the case (and you just left out those gory details for the sake of clarity) then you are fine.

    The only thing that looks suspicious is this:

    if (!INTO_CB)

        status = CPXaddrows(env, lp, 0, 1, ctr, rhs, sense, rmatbeg, rmatind, rmatval, NULL, conname);

    This is definitely not thread-safe. I am not sure whether this may or may not be invoked concurrently. Is INTO_CB ever false if callbacks are on? If so then you have to lock around this call. But note that even if you do this then the results may still become non-deterministic: the order in which threads acquire the lock is non-determinstic. Thus the order of rows added will be non-deterministic.

    As a general rule of thumb in multi-threaded programming, it is a good idea to have a thread access only thread-local data or read-only global data.

    My suggestion for your code would be to get rid of the global env and lp variables. Instead pass env, lp, cbdata, wherefrom into the separation functions. If you call the separation from a callback then pass lp=NULL, if you call it from outside a callback pass cbdata=NULL, wherefrom=0. Then, in the separation functions, call CPXcutcallbackadd() or CPXaddrows() depending on whether lp is NULL or not.


    #CPLEXOptimizers
    #DecisionOptimization


  • 3.  Re: Control callbacks in multithreaded MIP Optimization

    Posted Sun May 05, 2019 12:29 AM

    Originally posted by: UserCplex


    Thanks Daniel,

     

    The 'magic' is nothing but brute force replication of the data structures for each thread. I use OpenMP. So, I query omp_get_max_threads() at start. x is actually double **x and based on the maximum number of threads, x is dynamically allocated space. x[t][varindex] stores the value of the varindex'ed variable in the t'th thread. My space requirements get multiplied omp_get_max_threads() times.

    I have a class SEP{}; which thus far in single threaded applications had a data structure, say, std::vector<int> sort_index; This class has methods and data structures that are needed for separation routines, sepx();

    Now, in multithreaded application, this class needs to have omp_get_max_threads() times all the data structures. So, the sort_index (say) would be std::vector<std::vector<int>> sort_index;

    I cannot see anyway in which this can be avoided unless I employ locks usage of which I would like to minimize.

    CPXaddrows () does not get called within callback. At the root node, given a fractional solution, the only parallelization of the code is running the twelve separation routines across threads. These (in a thread safe fasion, employing explicit locks) update a shared data structure storing the violated inequalities. Then, sequentially (since an OpenMP parallel section construct has an implicit barrier at its end after which the processing occurs sequentially along the master thread), the most violated of the identified cuts are added via CPXaddrows(). Before starting CPXMIPopt, INTO_CB is always false. When CPXMIPopt is called, INTO_CB is made true. That, per your message, is fine and threadsafe even if parallelized since (because INTO_CB is true) 

    CPXcutcallbackadd() gets called that uses the triplets (env, cbdata, wherefrom)

     


    #CPLEXOptimizers
    #DecisionOptimization


  • 4.  Re: Control callbacks in multithreaded MIP Optimization

    Posted Tue May 07, 2019 08:42 AM

    Sorry, I did not see your previous reply that explains the "magic". What you describe there all looks pretty robust.

    There is one thing though: if you separate in parallel and put all the cuts you find on a global queue, then the order of cuts in that queue will be random (if two threads compete for a lock then it is random which one is granted the lock). So in order to get repeatable behavior, I suggest to add some deterministic sorting. As far as I understand, you use only the N best cuts. If all of the cuts have a different score then there is no problem. But if two cuts have the same score then you should break the tie using a deterministic criterion. For example, always pick the cut that comes from the thread with the smaller id.


    #CPLEXOptimizers
    #DecisionOptimization


  • 5.  Re: Control callbacks in multithreaded MIP Optimization

    Posted Mon May 06, 2019 01:53 AM

    Originally posted by: UserCplex


    I have a followup question:

    https://docs.oracle.com/cd/E19205-01/819-5270/aewbc/index.html

    The above (towards the end of the document, though not cplex related) provides some general OpenMP guidelines on nested parallelism.

    The guideline there seems to be that it may not be a good idea to nest parallelism to a high degree. 

    In my case, where I am solving [MIP], solving the LP relaxation at each node of the BB Tree requires running twelve independent separation routines, which can be parallelized.

    Once inside the BB Tree, is parallelizing the separation routines for each node even possible/recommended?

    Suppose there are 16 threads, does CPLEX break down the MIP tree into 16 mutually exclusive and collectively exhaustive nodes of the tree? What happens if one of the threads becomes idle since that particular subtree gets fully fathomed? If I parallelize my separation routines, would this free 16th thread be utilized to solve the separation routines (sep7 through sep12) for the current node being explored in any of the other 15 threads?

    Does the user have control over how the threads are going to be utilized (solving parallel separation routines without partitioning the tree, vs. splitting the BB Tree into mutually exclusive and collective exhaustive partitions)?


    #CPLEXOptimizers
    #DecisionOptimization


  • 6.  Re: Control callbacks in multithreaded MIP Optimization

    Posted Tue May 07, 2019 08:36 AM

    Basically CPLEX works like this: there is a global queue of open nodes. Whenever a thread becomes idle it pulls some nodes from that queue and processes them. At certain intervalls a thread also pushes back nodes to the global queue so that other threads do not become idle. The implementation is in fact a lot more complex and sophisticated but that is the basic scheme.

    If you give 16 threads to CPLEX then CPLEX will use all of them to process nodes. Consequently, each node will be solved with a single thread.

    If you spawn new threads in a callback then you may end up oversubscribing your machine, i.e., you may use more threads than you have cores, which usually is not a good idea with CPLEX (it is a good idea in other applications).

    You cannot control how many threads do separation, branch-and-cut etc.

    But you can control how many threads CPLEX uses by setting the thread parameter. Assuming you have 16 cores on your machine, then you could ask CPLEX to only use 8 threads. With that setup you could do parallel separation in callbacks with up to two threads without oversubscribing the machine.

    It is not clear a priori whether this will speed up the solve or not. Maybe it is better to first get a handle on how much time is spent in separation and how many cuts you usually get from the separation routine. Maybe not every separation routine has to be invoked at every node. Maybe there are some properties you can check that would allow you to skip certain separators. There are a lot of things to consider ...


    #CPLEXOptimizers
    #DecisionOptimization