Informix

Expand all | Collapse all

ifxbkpcloud.jar stucked

  • 1.  ifxbkpcloud.jar stucked

    Posted Tue January 26, 2021 03:41 PM
    Good afternoon. Lately, I have noticed, my automatic logical log backup to aws stucked without reason apparently. The ps command shows: 

    0 S informix 12615 12564  0  80   0 - 9477873 futex_ Jan22 ?      00:02:13 java -jar /informix/informix/bin/ifxbkpcloud.jar BACKUP_FILE amazon https://mybucket.s3.amazonaws.com /LOGS/db1_20_Log0000031177

    My ontape session :
    IBM Informix Dynamic Server Version 14.10.FC4W1 -- On-Line -- Up 62 days 00:00:59 -- 6441304 Kbytes


    session           effective                            #RSAM    total      used       dynamic 

    id       user     user      tty      pid      hostname threads  memory     memory     explain 
    101966   informix -         -        12564    mxdb1    1        2170880    2148984    off 


    Program :
    /informix/iee1410fc4w1/bin/ontape


    tid      name     rstcb            flags    curstk   status
    259843   ontape   48b360a8         Y--P--M  3824     cond wait  netnorm   -

    This stuff doesn't generate output file on $INFORMIXDIR/ifxbkpcloud.log. In the time I noticed it, the online.log doesn't show any hint, I have to associate this as a network failure?. How could I detect and resume without intervention... I had to put a script to alarm when the logical log backups is stucked and I have to kill the java process to resume the backup. Any suggestion ?

    ------------------------------
    Jessica Flores
    ------------------------------


  • 2.  RE: ifxbkpcloud.jar stucked

    Posted Tue February 02, 2021 11:44 AM


    Which OS Platform?

    Try lsof on the backup process, does it use tcp sockets to communicate with the remote aws service? if so then configure keepalive to detect dead connections if possible.

    Also if it is using TCP/IP why does the socket not get closed on network failure?

    Try running onstat -g ath and onstat -g tpf and see if the arcbackup threads are doing reads/writes.

    You may also need to run strace on the ontape or backup process to see where it is geting stuck.

    Regards,

    David.



    ------------------------------
    David Williams
    ------------------------------



  • 3.  RE: ifxbkpcloud.jar stucked

    Posted Tue February 23, 2021 10:30 AM

    Hi David, thanks for your response… I had to wait for the issue occured again. 

    SO RHEL 7

    This time I got the output from both the strace command and the java and ontape process, and found very interesting things:

     

    informix 18528 18511  0 Feb18 ?        00:00:56 java -jar /informix/informix/bin/ifxbkpcloud.jar BACKUP_FILE amazon https://bucket1.s3.amazonaws.com /logs/server_10_Log0000014249

    ids>strace -p 18528

    Process 18528 attached

    futex(0x7ff0f60989d0, FUTEX_WAIT, 18529, NULL

    If I type ps -efL|grep /informix/informix/bin/ifxbkpcloud.jar

    I found several threads for the java class. 

    informix 18528 18511 18528  0   57 Feb18 ?        00:00:00 java -jar /informix/informix/bin/ifxbkpcloud.jar BACKUP_FILE amazon https://bucket1.s3.amazonaws.com /LOGS/server_10_Log0000014249

    ….

    To 

    informix 18528 18511 18585  0   57 Feb18 ?        00:00:00 java -jar /informix/informix/bin/ifxbkpcloud.jar BACKUP_FILE amazon https://bucket1.s3.amazonaws.com /LOGS/server_10_Log0000014249

    From PID 18530 to 18585 in state FUTEX_WAIT_PRIVATE

    ids>strace -p 18585

    Process 18585 attached

    futex(0x7fefcc017674, FUTEX_WAIT_PRIVATE, 1, NULL^CProcess 18585 detached

     <detached ...>

    But, for process 18529

    ids>strace -p 18529

    Process 18529 attached

    restart_syscall(<... resuming interrupted call ...>) = -1 ETIMEDOUT (Connection timed out)

    futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354707, 978830382}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

    futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354709, 979094256}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

    futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354711, 979290151}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

    futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354713, 979555340}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

    futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7ff0ec009a54, FUTEX_WAIT_BITSET_PRIVATE, 1, {63354715, 979849831}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

    futex(0x7ff0ec009a28, FUTEX_WAKE_PRIVATE, 1) = 0

    So .. I am stucked now in the ETIMEDOUT state… the keep alive option in sqlhosts relief something?. 
    Greetings



    ------------------------------
    Jessica Flores
    ------------------------------