Informix 11.50.FC6 on HP-UX 11.31 PA-RISC
We're working toward upgrading to 14.10 on Linux, but until then, we're running the environment shown above. Since 11.50 is way past EOL, we're kind of out of luck for support from IBM/HCL. If it was a production system, I might try to push them for support, but since it's not technically a "system down" situation, I doubt I'd have much luck.
The current problem involves our disaster recovery server, which is kept nearly up-to-date via Continuous Log Restore. Every 15 minutes, our primary server does an 'onmode -l' to change logical log files, backs up any logical logs used during those 15 minutes, then transfers the backups to the DR server. A job on the DR server then applies the log backups to the instance.
This has been running for months without a problem We've occasionally brought the DR server to online mode to confirm that everything is working correctly, then restored from a level 0 and restarted the Continuous Log Restore.
Last night, we got an error which resulted in a down chunk on our DR server:
23:16:00 Resuming Logical Restore
23:16:00 Logical Log 64637 Complete, timestamp: 0x2befe189.
23:16:02 Checkpoint Completed: duration was 0 seconds.
23:16:02 Tue Mar 31 - loguniq 64638, logpos 0xe018, timestamp: 0x2befe240 Interval: 5170003
23:16:02 Maximum server connections 0
23:16:02 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 719, Llog used 0
23:16:03 Checkpoint Completed: duration was 0 seconds.
23:16:03 Tue Mar 31 - loguniq 64638, logpos 0x1ab018, timestamp: 0x2beff0d7 Interval: 5170004
23:16:03 Maximum server connections 0
23:16:03 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 329, Llog used 0
23:16:03 Suspending Logical Restore
23:31:26 Resuming Logical Restore
23:31:26 Logical Log 64638 Complete, timestamp: 0x2beff66e.
23:31:30 Rollforward of log record failed. iserrno = 0
23:31:30 Log Record: log = 64639, pos = 0x1530584, type = OLDRSAM:CHALLOC(51), trans = 5074
23:31:43 Assert Warning: Chunk 7 is being taken OFFLINE.
23:31:43 IBM Informix Dynamic Server Version 11.50.FC6WE
23:31:43 Who: Session(41, informix@drserver, 0, c0000001456c5288)
Thread(98, xchg_2.0, c00000014568b0a8, 1)
File: rsmirror.c Line: 1794
23:31:43 Results: Dynamic Server will block at next checkpoint
23:31:43 Action: Shutdown (onmode -k) or override (onmode -O)
23:31:43 stack trace for pid 3951 written to /ifmx_dump/my_instance/af.44a0b12
23:31:43 See Also: /ifmx_dump/my_instance/af.44a0b12
23:31:44 Chunk 7 is being taken OFFLINE.
23:31:44 Rollforward of log record failed. iserrno = 0
23:31:44 Log Record: log = 64639, pos = 0x1530584, type = OLDRSAM:CHALLOC(51), trans = 5074
23:32:20 Logical Log 64639 Complete, timestamp: 0x2bf1dc64.
23:32:39 Checkpoint blocked by down space, waiting for override or shutdown
Looking at the af file, there are several HINSERT and ADDITEM entries listed, until we get to this:
logpos:64639:15303fc HINSERT tx:5074 pn:00611ad7 fl: 112
c000000146da0060: 00000084 00000028 00000112 00000000 .......( ........
c000000146da0070: 00000000 00000000 000013d2 015326c0 ........ .....S&.
c000000146da0080: 91e8c3aa 00611ad7 00611ad7 00078817 .....a.. .a......
c000000146da0090: 00430004 00000000 00000000 80017f0a .C...... ........
c000000146da00a0: 00017f0b 30313139 39208000 00000000 ....0119 9 ......
c000000146da00b0: 00800000 00000000 80000000 00000080 ........ ........
c000000146da00c0: 00000000 00008000 00000000 00800000 ........ ........
c000000146da00d0: 00000000 80000000 00000080 00000000 ........ ........
c000000146da00e0: 000000d7 ....
logpos:64639:1530544 ADDITEM tx:5074 pn:00611ad8 fl: 10
c000000147004060: 00000040 0000001c 00000010 00000000 ...@.... ........
c000000147004070: 00000000 00000000 000013d2 01532700 ........ .....S'.
c000000147004080: 91e8c3aa 00611ad8 00611ad8 00611ad7 .....a.. .a...a..
c000000147004090: 00078817 000002a9 00010004 80017f0b ........ ........
logpos:64639:1530338 HINSERT tx:5074 pn:00611ad7 fl: 112
c000000146db1060: 00000084 00000028 00000112 00000000 .......( ........
c000000146db1070: 00000000 00000000 000013d2 01532784 ........ .....S'.
c000000146db1080: 91e8c3ac 00611ad7 00611ad7 00078818 .....a.. .a......
c000000146db1090: 00430004 00000000 00000000 80017f0b .C...... ........
c000000146db10a0: 00017f0c 30363339 39208000 00000000 ....0639 9 ......
c000000146db10b0: 00800000 00000000 80000000 00000080 ........ ........
c000000146db10c0: 00000000 00008000 00000000 00800000 ........ ........
c000000146db10d0: 00000000 80000000 00000080 00000000 ........ ........
c000000146db10e0: 000000d7 ....
23:31:30 End of queued log recs
Log Record: log = 64639, pos = 0x1530584, type = OLDRSAM:CHALLOC(51), trans = 5074
c000000146f6b060: 00000034 00000033 00000090 00000000 ...4...3 ........
c000000146f6b070: 00000000 00000000 000013d2 01530544 ........ .....S.D
c000000146f6b080: 91e8c375 00000000 00220730 0000000a ...u.... .".0....
c000000146f6b090: 00000080 ....
23:31:43
23:31:43 IBM Informix Dynamic Server Version 11.50.FC6WE Software Serial Number AAA#B000000
23:31:43 Assert Warning: Chunk 7 is being taken OFFLINE.
23:31:43 Who: Session(41, informix@drserver, 0, c0000001456c5288)
Thread(98, xchg_2.0, c00000014568b0a8, 1)
File: rsmirror.c Line: 1794
23:31:43 Results: Dynamic Server will block at next checkpoint
23:31:43 Action: Shutdown (onmode -k) or override (onmode -O)
23:31:43 Raw hex dump of stack located in /ifmx_dump/my_instance/af.44a0b12.rawstk
23:31:43 Stack for thread: 98 xchg_2.0
base: 0xc000000147673000
len: 69632
pc: 0x0000000000000000
tos: 0xc000000147675380
state: running
vp: 1
( 0) 0x4000000000fb0008 legacy_hp_afstack + 0x320 [/informix/IDS11.50.fc6/bin/oninit]
( 1) 0x4000000000faf4a4 afstack + 0x64 [/informix/IDS11.50.fc6/bin/oninit]
( 2) 0x4000000000fae410 afhandler + 0xa98 [/informix/IDS11.50.fc6/bin/oninit]
( 3) 0x4000000000fad904 afwarn_interface + 0x4c [/informix/IDS11.50.fc6/bin/oninit]
( 4) 0x4000000000a1eac8 bring_media_down + 0x9a0 [/informix/IDS11.50.fc6/bin/oninit]
( 5) 0x4000000000b31c78 rollfwd_error + 0x2b8 [/informix/IDS11.50.fc6/bin/oninit]
( 6) 0x4000000000b7f534 rlogm_redo + 0x82c [/informix/IDS11.50.fc6/bin/oninit]
( 7) 0x4000000000b20e48 scan_logredo + 0x998 [/informix/IDS11.50.fc6/bin/oninit]
( 8) 0x4000000000b216e4 scan_logredo + 0x1234 [/informix/IDS11.50.fc6/bin/oninit]
( 9) 0x4000000000b1f80c next_lscan + 0x87c [/informix/IDS11.50.fc6/bin/oninit]
(10) 0x4000000000fbb598 prod_loop1 + 0x2e8 [/informix/IDS11.50.fc6/bin/oninit]
(11) 0x4000000000fbbb30 producer_thread + 0x330 [/informix/IDS11.50.fc6/bin/oninit]
(12) 0x4000000000f7cf34 startup + 0xd4 [/informix/IDS11.50.fc6/bin/oninit]
(13) 0x4000000000f7cd1c resume + 0x10c [/informix/IDS11.50.fc6/bin/oninit]
base: 0xc000000147673000
len: 69632
pc: 0x0000000000000000
tos: 0xc000000147675380
state: running
vp: 1
23:31:43 See Also: /ifmx_dump/my_instance/af.44a0b12
---------------------------------
Begin System Alarm Program Output
---------------------------------
Assertion Failure Type: Warning
Host Name: drserver
Database Server Name: my_instance
Time of failure: Tue Mar 31 23:31:44 EDT 2020
AF file: /ifmx_dump/my_instance/af.44a0b12
Shared memory file: None
System Blocking: OFF
I'm not sure what the OLDRSAM:CHALLOC entry is showing. Is it saying that the table (partition) added an extent?
Our production instance is running with no reported problems. I've looked in the online.log for the relevant time period and there is nothing other than log complete/backup started/backup completed messages, and some checkpoint messages. Since the log backups came from there, I would expect any problems other than a failed disk to show up on that server as well, but as I said, it looks fine. Users are on the system, doing their normal work.
Our Unix sysadm has looked in syslog and and dmesg, but does not see anything that looks out of place. He also ran ioscan, and no issues were found. Looking at vgdisplay shows all volumes syncd and available. He has not run chkdsk yet, as the volume group is a RAID 10 striped across several disks, so it would take a while to complete.
Any suggestions on what to look for? I can just restore from the latest Level 0 archive and restart the continuous log restore, but I'd really like to be sure that there are no underlying problems first.
------------------------------
Mark Collins
------------------------------
#Informix