Things That Should Never Happen

View Only

Expand all | Collapse all

Things That Should Never Happen

1. Things That Should Never Happen

0 Like
IBM Champion

TOM GIRSCH
Posted Thu February 18, 2021 02:30 PM

Reply
Had a "fun" one today. We have a four-node cluster that features a primary, an HDR secondary (NEAR_SYNC mode), a regular RSS secondary and an RSS secondary with DELAY_APPLY=12H. The cluster is running a patched version, with three of the four nodes running 12.10.FC14XO but the primary still on 12.10.FC14XF, pending a scheduled maintenance window to complete the in-place minor upgrade.

Today, I did routine maintenance on the DELAY_APPLY RSS node. When I took that node down, big problems on the primary. It stopped accepting new connections. It froze on write transactions for already-connected sessions. And it stopped talking to the other two nodes in the cluster. The primary remained in that mode until the DELAY_APPLY RSS node re-joined the cluster, at which point everything freed back up again. But it was about a 15 minute production outage. It looks like the bulk of that time was spent with the engine stuck in a checkpoint, even though the completed checkpoint showed no block time:

AUTO_CKPTS=On RTO_SERVER_RESTART=Off Critical Sections Physical Log Logical Log Clock Total Flush Block # Ckpt Wait Long # Dirty Dskflu Total Avg Total Avg Interval Time Trigger LSN Time Time Time Waits Time Time Time Buffers /Sec Pages /Sec Pages /Sec 593376 16:18:51 CKPTINTVL 216176:0x86a018 0.2 0.1 0.0 0 0.0 0.0 0.0 2008 2008 9917 33 2587 8 593377 16:36:41 CKPTINTVL 216177:0xa6f0 770.0 0.7 0.0 1720 768.9 338.3 769.0 3291 3291 9552 8 2875 2 593378 16:36:57 HDR 216177:0x305018 0.8 0.1 0.0 18 0.6 0.5 0.7 41541 41541 7666 450 791 46

Has anyone else seen anything like this?

Because of some code issues on our end, there were residual data problems for nearly an hour after the engine itself recovered.

OS: CentOS 7, 3.10.0-1127.19.1.el7.x86_64 #1 SMP

------------------------------
TOM GIRSCH
------------------------------

#Informix
2. RE: Things That Should Never Happen

0 Like
IBM Champion

Art Kagel
Posted Thu February 18, 2021 02:50 PM

Reply
That's a new one on me Tom. Time for a call to support.

------------------------------
Art S. Kagel, President and Principal Consultant
ASK Database Management Corp.
www.askdbmgt.com
------------------------------

Original Message
3. RE: Things That Should Never Happen

0 Like
IBM Champion

TOM GIRSCH
Posted Thu February 18, 2021 04:08 PM

Reply
Already done. The connected CMs also dropped out during that window, but did NOT initiate a failover, even though failover was enabled.

------------------------------
TOM GIRSCH
------------------------------

Original Message

IBM Data Management Community

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

Informix

Things That Should Never Happen

TOM GIRSCHThu February 18, 2021 02:30 PM

Art KagelThu February 18, 2021 02:50 PM

TOM GIRSCHThu February 18, 2021 04:08 PM

1. Things That Should Never Happen

2. RE: Things That Should Never Happen

3. RE: Things That Should Never Happen