Originally posted by: SystemAdmin
We have a job that runs and fails intermitently (1/200 tries)
The vendor supplied this information ...
execv() system call and run a command like:
/bin/su - cdbatch -c "/bin/sh -x /home/cdbatch//test1.ksh > /bmc/ctmagent/ctm/sysout/CDOT_TEST1.LOG_00u8kb_01242 2>&1"
this means that the job process performs the following actions
1. redirects the output of the process to 2 temporary files for STDOUT and STDERR
2. creates a new shell process running /bin/sh that still runs as root
3. this process runs su which performs the login procedure to the job owner's account
4. it then runs the job script as the job owner with the output redirected to the sysout file which overrides the prevoius redirection.
the agent waits for the job process to terminate.
when it does, the agent finds the empty sysout file because the job never ran so it puts the message the customer saw in the sysout file, dumps the content of the temporary files and the job ends NOTOK.
if this was happenning to all the jobs then it would be possible that it a problem in the agent but an intermittent failure points to a problem in SU caused probably by a temporary problem with OS resources.
Any workarounds / solutions would be greatly appreciated -- thanks!
Message was edited by: pacet