Decision Optimization

Decision Optimization

Delivers prescriptive analytics capabilities and decision intelligence to improve decision-making.

 View Only
  • 1.  CPLEX libraries optimized for AIX / P8 Processor

    Posted Mon October 09, 2017 02:51 AM

    Originally posted by: M6BN_Stefano_Gliozzi


    Hello, 

    I've an application solving several thousand LP/MIP models, every night. SLAs require to run all the batch in a few hours, so I'm constantly looking for performance improvements to meet SLAs. 

    Models are generated in C++ via Concert, and I got the impression that libraries compiled with a specific optimization for AIX over p8 processor, could be of some benefit (at least other pieces of the code before and after the concert/CPLEX calls had a huge benefit from compiler optimization for p8) 

    Is there any Concert/CPLEX library available specifically for AIX/p8 ? 


    #CPLEXOptimizers
    #DecisionOptimization


  • 2.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Mon October 09, 2017 07:58 AM

    There are no CPLEX libraries specific to Power 8 under AIX.  The closest would be the Power Linux Little Endian port, which is built on and for Power 8.


    #CPLEXOptimizers
    #DecisionOptimization


  • 3.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Mon October 09, 2017 09:43 AM

    Originally posted by: M6BN_Stefano_Gliozzi


    Xavier, Thank you for the quick response. 

    S.


    #CPLEXOptimizers
    #DecisionOptimization


  • 4.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Tue October 10, 2017 11:19 AM

    Originally posted by: Laci Ladanyi


    A little bit more information: 

    The reason why there is no P8 specific port is that enabling P8 specific optimizations resulted in a minimal improvement in performance. To put it simply: while the P8 cpu is a vast improvement over the P7, the memory access patterns and in general the algorithms in CPLEX do not benefit much from those cpu improvements.

     

    But you can improve CPLEX's performance on AIX. The kernel behavior on AIX is very configurable, and the default settings are good for many applications, but not the best for cplex. You can change these settings by simply setting a few environment variables. I recommend that you take a look at https://www.ibm.com/partnerworld/wps/servlet/RedirectServlet?contentId=PlyVq1TRwHNiPCA$cnt&attachmentName=Best_practices_of_IBM_ILOG_CPLEX_Optimizer_AIX7.1_2final5_17_13.pdf

    Changing SMT setting, large page allocation, and malloc behavior should help you a lot.

     

    Note that even though we *know* the kernel settings can be improved, we cannot change them ourselves. After all, CPLEX is a library, we don't know what CPLEX gets embedded into, and if we change any kernel parameters, that might have a detrimental effect on the overall performance of the user's application.

     

    I hope this info helps...

    --Laci


    #CPLEXOptimizers
    #DecisionOptimization


  • 5.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Tue October 10, 2017 11:42 AM

    Originally posted by: M6BN_Stefano_Gliozzi


    Laszlo, 

     

    Many many thanks. It is very important for us to know that, CPLEX (I assume Concert is considered part of), does not benefit so much of the specific-to-P8 compiler optimizations. 

    And is of paramount importance for me and my fellow architect, to have a look in the system parameters and study the paper you point out. We will study. 

    May I cannibalize a bit more your competences ? let me explain what happens in the application (right,we will have to manage cohexistence with other pieces). 

    Every night we need to solve some (independent from each other) stochastic MILP with in our formulation are roughly 2 to 10 Million variables and 1 to 5 million constraints. (the number of general integer range form  about 300 to about 50'000. 

    We do this in 12 fully parallel processes who get in turn a payload, consisting of a set of models who share part of the input (this to take advantage of some economies on the DB Server queries, which alone are roughly 40% of the overall elapsed). 

    I know that these models are really easy to solve, generally speaking. They usually solve at integer optimum in 0 or less than 10 nodes. taking a few seconds for the presolve and LP phase

    I also know that they reduce usually to less than 100'000 columns and rows after presolve. This is due to our (lazy ? maintainable ?) strategy in building the model. We generate several constraint/variables that willcertainly---or after a first analysis---will be 0 (the variables) or redundant (the constraints). 

    We adopted this development schema, since it was much easier to build (and even more to test) this kind of model in concert,  than generating  it in C++ with CPLEX direct call, doing ourselves a first level of presolving. 

    I suspect that now, on top of the presolve overhead that we can measure, probably we have too much overhead in building the concert model. Do you see any tip specific for the concert part ? Should we plan to re-do it in plain C++ / CPLEX ?

     

     


    #CPLEXOptimizers
    #DecisionOptimization


  • 6.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Wed October 11, 2017 12:09 AM

    Originally posted by: EdKlotz


    First of all, do you have some concrete evidence that Concert is consuming a large part of the model generation time here, or is it just a suspicion?   Do you have some examples where the optimization takes only a few seconds, but the model generation takes longer than that?

     

    For starters, assuming you confirm that your suspicion is indeed correct, before rewriting your code and implementing your own basic presolve, I would recommend reviewing the Concert code to see if you can speed it up.   That should take much less time than rewriting.    While Concert offers some powerful modeling constructs in a generic programming language, we have seen use cases where two seemingly equivalent pieces of Concert code that generate the same model have dramatically different run time or memory footprints.   Perhaps that is happening here.    Here's a useful technote that might help in this regard:

    http://www-01.ibm.com/support/docview.wss?uid=swg21400056

     


    #CPLEXOptimizers
    #DecisionOptimization


  • 7.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Wed October 11, 2017 02:00 AM

    Originally posted by: M6BN_Stefano_Gliozzi


    Ed, Thank you for your hints. Yes, we've measured the modeling time and the solution time separately. The modeling part is not pathological---given the model size---but we are trying to save any possible tenth of second (times several thousand run every evening !). 

    We will review the code following the guidelines you linked, and, yes, it will take much less time than re-writing the "preprocessed" model without Concert.


    #CPLEXOptimizers
    #DecisionOptimization


  • 8.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Wed October 11, 2017 02:41 AM

    If you are trying to save time at the milli- or micro-second level then two other things come to mind that you could try (each independent of the other):

    1. If you know that some variables will be 0 and if you also know which variables this will be then explicitly set their bounds to 0. CPLEX will figure that out eventually itself in presolve but the sooner it knows about this fact the better it is. Note that calling setBounds() is not for free, so there may be a trade-off.
    2. If you are willing to rewrite you could implement your current model (not the presolved one!) using the callable library (still from C++). That will eliminate all overhead you get from Concert. You can still leave all the presolve etc. to CPLEX. Using the callable library from C++ is what I do if I want to have an application that is as fast as possible but still robust.

    Also, in general there are two patterns of creating and solving a model in Concert:

    IloModel model(env);
    IloCplex cplex(model);
    // build your model here
    ...
    cplex.solve();

    and

    IloModel model(env);
    // build your model here
    ...
    IloCplex cplex(model);
    cplex.solve();

    I would expect the second pattern (extract only right before the solve) to be slightly faster than the first. But I am not sure that difference is actually measurable, so maybe it is not worth to change code from the first to the second pattern.


    #CPLEXOptimizers
    #DecisionOptimization


  • 9.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Wed October 11, 2017 12:39 PM

    Originally posted by: M6BN_Stefano_Gliozzi


    Daniel, 

    Thank you for your help. We know a lot of variables that will be 0, but I'm a bit shy on fixing bounds, since I want to get true Shadow prices at the very end of the LP Relaxation.

    Yes we optimize twice: one in continuous relaxation, to get the shadow prices, and then integer to get a workable (not feasible, in strict term any integer value is feasible in our model. may be strange, but surely feasible)  solution.

    Usually the MIP optimal is very very near to the continuous (since feasibility is not an issue here, I think I could even round the solution w/o solving the MIP, but this is in the plans for my next life :-) ).

    I know that this is not formally correct, but it is fairly reasonable in our application. and the shadow prices are  may be more relevant to the application than the actual Integer solution. 

    So, I'm reluctant to set to 0 variables which have indeed a value in the objective function. I would rather prefer to do my own presolving, not generating the column anyway. 

    On the other hand, It will be probably easy to give a try to your last advice. 

     

    to all the contributors : we have a lot to study and work out, in a living environment (the application is already operational). I will share and work with my team on all of your suggestions, and be sure I'll feed-back, for the benefit of the forum, as soon as we get solid results. 

     


    #CPLEXOptimizers
    #DecisionOptimization


  • 10.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Thu October 12, 2017 04:27 PM

    Originally posted by: Laci Ladanyi


    Just to add one more piece :-). Given that your solves are already fast, I'd try the following (these are the usual things to try when you try to shave a fraction of a second off of an already fast solve):

    First, solve a number of your instances with setting mip display to 4. That will display the root LP solution process, and you will know whether barrier or simplex solves your root LP faster. My guess is that it's going to be dual simplex. If that is the case then given that your instances solve in a few nodes, you may be better off setting the number of threads to 1. That gets rid of lots of setup code, synchronization, and youmay not be able to use more than 1 thread effectively anyway. BTW, I'd try this even if the root LP is solved by barrier.

    Next, even though presolve shrinks your problem size significantly, I would try to disable it! The reason is that it may not matter whether you shrink the problem if the variables that are deleted always stay 0 during the whole solution process; or if lots of aggregation happens and even though the dimension of the matrix shrinks the number of non-zeros not so much...

    Finally, I would try to disable cut generation, too (that is, use plain branch and bound). You say that the continuous objective is very close to the integer objective, so cuts may not play a significant role.

     

    Good luck playing with the parameters!

    --Laci


    #CPLEXOptimizers
    #DecisionOptimization


  • 11.  Re: CPLEX libraries optimized for AIX / P8 Processor

    Posted Fri October 13, 2017 03:46 AM

    Originally posted by: M6BN_Stefano_Gliozzi


    Laszlo, 

    thanks again for your time. We already did the solver tuning-w/o this the times were not that fast.

    Yes, we set threads to one, the root optimization algorithm to primal, (dual had a couple of instances where tended to stuck and have precision problems) and anyway perturbing it ensured faster performances.

    I kept presolve (just one pass), since when omitted, the root solution time, due to model dimension, was way to high. The presolve reduced also a lot of non-zero-coefficients, and the reduction in term of variables / constraints was more than 95%. 

    I also managed to do only 1 aggregation pass, though I didn't play with the AggFill parameter, since I didn't see any density issue (I might nevertheless give it a try)

    And, yes, no cuts for this model, most of times the integer part is solved at root node by some heuristic.

    This is not surprising given we have a relatively loose gap threshold: from a business point of view a relative gap of less than 1e-4 would mean a precision far higher than what the forecast precision would reasonably allow for.   


    #CPLEXOptimizers
    #DecisionOptimization