Hi David, thanks for your response… I had to wait for the issue occured again.
SO RHEL 7
This time I got the output from both the strace command and the java and ontape process, and found very interesting things:
informix 18528 18511 0 Feb18 ? 00:00:56 java -jar /informix/informix/bin/ifxbkpcloud.jar BACKUP_FILE amazon https://bucket1.s3.amazonaws.com /logs/server_10_Log0000014249
ids>strace -p 18528
Process 18528 attached
futex(0x7ff0f60989d0, FUTEX_WAIT, 18529, NULL
If I type ps -efL|grep /informix/informix/bin/ifxbkpcloud.jar
I found several threads for the java class.
informix 18528 18511 18528 0 57 Feb18 ? 00:00:00 java -jar /informix/informix/bin/ifxbkpcloud.jar BACKUP_FILE amazon https://bucket1.s3.amazonaws.com /LOGS/server_10_Log0000014249
….
To
informix 18528 18511 18585 0 57 Feb18 ? 00:00:00 java -jar /informix/informix/bin/ifxbkpcloud.jar BACKUP_FILE amazon https://bucket1.s3.amazonaws.com /LOGS/server_10_Log0000014249
From PID 18530 to 18585 in state FUTEX_WAIT_PRIVATE
ids>strace -p 18585
Process 18585 attached
futex(0x7fefcc017674, FUTEX_WAIT_PRIVATE, 1, NULL^CProcess 18585 detached
<detached ...>
But, for process 18529
ids>strace -p 18529
Process 18529 attached
restart_syscall(<... resuming interrupted call ...>) = -1 ETIMEDOUT (Connection timed out)
futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354707, 978830382}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354709, 979094256}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354711, 979290151}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354713, 979555340}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354715, 979849831}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0
So .. I am stucked now in the ETIMEDOUT state… the keep alive option in sqlhosts relief something?.
Greetings
------------------------------
Jessica Flores
------------------------------
Original Message:
Sent: Tue February 02, 2021 11:43 AM
From: David Williams
Subject: ifxbkpcloud.jar stucked
Which OS Platform?
Try lsof on the backup process, does it use tcp sockets to communicate with the remote aws service? if so then configure keepalive to detect dead connections if possible.
Also if it is using TCP/IP why does the socket not get closed on network failure?
Try running onstat -g ath and onstat -g tpf and see if the arcbackup threads are doing reads/writes.
You may also need to run strace on the ontape or backup process to see where it is geting stuck.
Regards,
David.
------------------------------
David Williams
Original Message:
Sent: Tue January 26, 2021 03:40 PM
From: Jessica Flores
Subject: ifxbkpcloud.jar stucked
Good afternoon. Lately, I have noticed, my automatic logical log backup to aws stucked without reason apparently. The ps command shows:
0 S informix 12615 12564 0 80 0 - 9477873 futex_ Jan22 ? 00:02:13 java -jar /informix/informix/bin/ifxbkpcloud.jar BACKUP_FILE amazon https://mybucket.s3.amazonaws.com /LOGS/db1_20_Log0000031177
My ontape session :
IBM Informix Dynamic Server Version 14.10.FC4W1 -- On-Line -- Up 62 days 00:00:59 -- 6441304 Kbytes
session effective #RSAM total used dynamic
id user user tty pid hostname threads memory memory explain
101966 informix - - 12564 mxdb1 1 2170880 2148984 off
Program :
/informix/iee1410fc4w1/bin/ontape
tid name rstcb flags curstk status
259843 ontape 48b360a8 Y--P--M 3824 cond wait netnorm -
This stuff doesn't generate output file on $INFORMIXDIR/ifxbkpcloud.log. In the time I noticed it, the online.log doesn't show any hint, I have to associate this as a network failure?. How could I detect and resume without intervention... I had to put a script to alarm when the logical log backups is stucked and I have to kill the java process to resume the backup. Any suggestion ?
------------------------------
Jessica Flores
------------------------------
#Informix