Hi Olivier,
After further investigation, I found that docplex - the Python library - is the one killing the cpoptimizer process.
First clue was the Python documentation for
Popen.returncode. It states:
"A negative value -N indicates that the child was terminated by signal N (POSIX only)." So something was is sending a SIGKILL to the process.
After some looking around I found this logic in the constructor in solver_local.py:
# Read initial version info from process
self.version_info = None
timer = threading.Timer(1, lambda: self.process.kill() if self.version_info is None else None)
timer.start()
evt, data = self._read_message()
timer.cancel()
This timer for 1 second is very arbitrary, and seems like it's not enough in my case.
I run the tests on a corporate cloud environment where not all disks are always mounted, and therefore the first access can be slow.
Specifically, launching the cpoptimizer process can take more than 1 second sometimes, and this causes the timer to be triggered before even reading the first message.
I think there are 2 issues here:
1. Arbitrary time limit of 1 second for launching the process.
2. Using SIGKILL causes a very confusing error message.
Right now my solution is to catch the LocalSolverException error and try again, since only the first access is too slow. But that's an ugly workaround...
------------------------------
Tomer Vromen
------------------------------
Original Message:
Sent: Thu October 22, 2020 10:11 AM
From: Olivier Oudot
Subject: Getting random "Nothing to read from local solver process" from docplex.cp
Hi Tomer,
Your problem is very strange and unusual. Could you please check the following.
1) Check that there is no external cause that may explain that the process is killed externaly (lack of memory, anti-virus, etc).
2) Check the version of your solver.
To do this, start the cpoptimizer process in standalone and copy-pase in this topic the complete banner that is displayed.
If your version is an academic version, big problems can not be solved and are rejected.
3) Verify that your problem occurs whatever is the model you want to solve, or only on a single one.
4) If so, it could be a bug in the solver that occurs randomly.
Export your model in CPO format using method mdl.export_model(out=<filename>).
Try to solve this model directly with cpoptimizer process in standalone. Commands are "read <filename>" and then "optimize".
If possible, post your model in this topic.
I hope we will find your problem. Sorry for the inconvenience.
------------------------------
Olivier Oudot
Original Message:
Sent: Wed October 21, 2020 04:02 AM
From: Tomer Vromen
Subject: Getting random "Nothing to read from local solver process" from docplex.cp
Hi all,
I'm occasionally getting this error when using docplex.cp:
LocalSolverException: Nothing to read from local solver process. Check its availability.
The weird thing is that this appears randomly, and when trying again in the same Python script, the second time is successful. I.e., this works:
try:
solution = solve(constraints)
except LocalSolverException as e:
# try again
solution = solve(constraints)
Where solve() is my function that creates a model using CpoModel, add constraints, and calls CpoSolver on it.
I did some manual modifications to the docplex library and added debug prints.
The error is coming from here:
def _read_frame(self, nbb):
""" Read a byte frame from input stream
Args:
nbb: Number of bytes to read
Returns:
Byte array
"""
# Read data
data = self.pin.read(nbb)
if len(data) != nbb:
if len(data) == 0:
# Check if first read of data
if self.process_infos.get(CpoProcessInfos.TOTAL_DATA_RECEIVE_SIZE, 0) == 0:
if IS_WINDOWS:
raise LocalSolverException("Nothing to read from local solver process. Possibly not started because cplex dll is not accessible.")
else:
raise LocalSolverException("Nothing to read from local solver process. Check its availability.")
This code reads data from STDOUT of the subprocess, which is running "cpoptimizer -angel".
I can verify that indeed this is the first byte being read from the process, and that the process is not returning anything.
If I change some more things, I can see that the process is not alive, and the returncode from it is -9.
So I guess my question is: why is "cpoptimizer -angel" exisiting with error code -9, and why is this happening randomly?
Can I somehow enable more debug prints to gather more info?
Thanks.
------------------------------
Tomer Vromen
------------------------------
#DecisionOptimization