Original Message:
Sent: Mon May 08, 2023 09:02 AM
From: Mark Barinstein
Subject: takeover connection
I'm still not familiar with db2 resource model in a pacemaker environment, but it looks like bug. I believe, that a db2 hadr VIP resource must be always up on a server with db2 hadr primary.
Otherwise it's some strange "feature", which should be explained by your IBM sales rep probably...
------------------------------
Mark Barinstein
Original Message:
Sent: Mon May 08, 2023 08:10 AM
From: Guy Przytula
Subject: takeover connection
thanks for the update
I did not define the ACR - so now it has been done
But a new situation occurs
If standby leaves (network broken) no impact - all ok after integration
BUT : if primary (server2) leaves (network broken) : database is moved - VIP is not moved and not available
[root@server01 ~]# crm status
Cluster Summary:
* Stack: corosync
* Current DC: server01 (version 2.1.2-4.db2pcmk.el8-ada5c3b36e2) - partition with quorum
* Last updated: Mon May 8 14:02:53 2023
* Last change: Mon May 8 14:00:39 2023 by root via crm_attribute on server01
* 2 nodes configured
* 7 resource instances configured
Node List:
* Online: [ server01 ]
* OFFLINE: [ server2 ]
Full List of Resources:
* db2_server2_eno1 (ocf::heartbeat:db2ethmon): Stopped
* db2_server01_eno1 (ocf::heartbeat:db2ethmon): Started server01
* db2_server2_db2inst1_0 (ocf::heartbeat:db2inst): Stopped
* db2_server01_db2inst1_0 (ocf::heartbeat:db2inst): Started server01
* Clone Set: db2_db2inst1_db2inst1_PSADB-clone [db2_db2inst1_db2inst1_PSADB] (promotable):
* Slaves: [ server01 ]
* Stopped: [ server2 ]
* db2_db2inst1_db2inst1_PSADB-primary-VIP (ocf::heartbeat:IPaddr2): Stopped
[root@server01 ~]# ping db2vip
PING db2vip (192.168.0.106) 56(84) bytes of data.
^C
--- db2vip ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2048ms
if this happens in the night - manual intervention is needed- correct ?
------------------------------
Thanks for all answers
Best Regards,
Guy Przytula
Original Message:
Sent: Mon May 08, 2023 06:33 AM
From: Mark Barinstein
Subject: takeover connection
A test description with HADR + VIP.
Alternate server points to the VIP host name.
$ db2 list db directory
System Database Directory
Number of entries in the directory = 1
Database 1 entry:
Database alias = SAMPLE
Database name = SAMPLE
Local database directory = /db2path
Database release level = 10.00
Comment =
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname = db2sample-vip
Alternate server port number = 50000
db2dsdriver.cfg file references this VIP host name.
$ cat db2dsdriver.cfg
<configuration>
<dsncollection>
<dsn alias="SAM" name="SAMPLE" host="db2sample-vip" port="50000"/>
</dsncollection>
<databases>
<database name="SAMPLE" host="db2sample-vip" port="50000">
<acr>
<parameter name="enableAcr" value="true"/>
<parameter name="enableSeamlessACR" value="true"/>
<parameter name="maxAcrRetries" value="100"/>
<parameter name="acrRetryInterval" value="3"/>
<parameter name="enableAlternateServerListFirstConnect" value="true"/>
<alternateserverlist>
<server name="db2sample-vip" hostname="db2sample-vip" port="50000"/>
</alternateserverlist>
</acr>
</database>
</databases>
</configuration>
Table ddl:
create table test_hadr (ts timestamp, host_name varchar (256))
Test file for the db2cli utility:
$ cat test_tran.sql
SELECT U.APPLICATION_HANDLE, U.UOW_ID, U.UOW_START_TIME, S.HOST_NAME
FROM
TABLE (MON_GET_UNIT_OF_WORK (MON_GET_APPLICATION_HANDLE (), -1)) U
, SYSIBMADM.ENV_SYS_INFO S;
INSERT INTO TEST_HADR (TS, HOST_NAME)
SELECT CURRENT TIMESTAMP, HOST_NAME FROM SYSIBMADM.ENV_SYS_INFO;
SELECT * FROM TEST_HADR;
SELECT U.APPLICATION_HANDLE, U.UOW_ID, U.UOW_START_TIME, S.HOST_NAME
FROM
TABLE (MON_GET_UNIT_OF_WORK (MON_GET_APPLICATION_HANDLE (), -1)) U
, SYSIBMADM.ENV_SYS_INFO S;
call dbms_alert.sleep (60);
SELECT U.APPLICATION_HANDLE, U.UOW_ID, U.UOW_START_TIME, S.HOST_NAME
FROM
TABLE (MON_GET_UNIT_OF_WORK (MON_GET_APPLICATION_HANDLE (), -1)) U
, SYSIBMADM.ENV_SYS_INFO S;
SELECT * FROM TEST_HADR;
The "takeover hadr" command is issued on Standby when the CALL statement is fired.
The contents of the log file finally with comments:
$ cat test_tran.sql.log
IBM DATABASE 2 Interactive CLI Sample Program
(C) COPYRIGHT International Business Machines Corp. 1993,1996
All Rights Reserved
Licensed Materials - Property of IBM
US Government Users Restricted Rights - Use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
> SELECT U.APPLICATION_HANDLE, U.UOW_ID, U.UOW_START_TIME, S.HOST_NAME
FROM
TABLE (MON_GET_UNIT_OF_WORK (MON_GET_APPLICATION_HANDLE (), -1)) U
, SYSIBMADM.ENV_SYS_INFO S
FetchAll: Columns: 4
APPLICATION_HANDLE UOW_ID UOW_START_TIME HOST_NAME
241, 1, 2023-05-08 05:35:14.678741, host1
FetchAll: 1 rows fetched.
UOW_ID == 1, current host == host1
> INSERT INTO TEST_HADR (TS, HOST_NAME)
SELECT CURRENT TIMESTAMP, HOST_NAME FROM SYSIBMADM.ENV_SYS_INFO
The SQL command completed successfully.
> SELECT * FROM TEST_HADR
FetchAll: Columns: 2
TS HOST_NAME
2023-05-08 05:35:14.692578, host1
FetchAll: 1 rows fetched.
> SELECT U.APPLICATION_HANDLE, U.UOW_ID, U.UOW_START_TIME, S.HOST_NAME
FROM
TABLE (MON_GET_UNIT_OF_WORK (MON_GET_APPLICATION_HANDLE (), -1)) U
, SYSIBMADM.ENV_SYS_INFO S
FetchAll: Columns: 4
APPLICATION_HANDLE UOW_ID UOW_START_TIME HOST_NAME
241, 1, 2023-05-08 05:35:14.678741, host1
FetchAll: 1 rows fetched.
Row is inserted, UOW_ID == 1, still the same uncommitted transaction.
> call dbms_alert.sleep(60)
SQLError: rc = 0 (SQL_SUCCESS)
SQLGetDiagRec: SQLState : 08506
fNativeError : -30108
szErrorMsg : [IBM][CLI Driver][DB2/6000] SQL30108N A connection failed in an automatic client reroute environment. The transaction was rolled back. Host name or IP address: "db2sample-vip". Service name or port number: "50000". Reason code: "1". Connection failure code: "3". Underlying error: "-1224". SQLSTATE=08506
cbErrorMsg : 307
Takeover was issued during the CALL. The application got its transaction rollback.
> SELECT U.APPLICATION_HANDLE, U.UOW_ID, U.UOW_START_TIME, S.HOST_NAME
FROM
TABLE (MON_GET_UNIT_OF_WORK (MON_GET_APPLICATION_HANDLE (), -1)) U
, SYSIBMADM.ENV_SYS_INFO S
FetchAll: Columns: 4
APPLICATION_HANDLE UOW_ID UOW_START_TIME HOST_NAME
146, 1, 2023-05-08 05:35:36.589827, host2
FetchAll: 1 rows fetched.
> SELECT * FROM TEST_HADR
FetchAll: 0 rows fetched.
>
The driver reestablished the database connection automatically to a new Primary (host2). New APPLICATION_HANDLE and current HOST_NAME. No rows in the test table - the transaction was rolled back.
So, it works as designed.
What interface (JDBC, ODBC/CLI) do you use and with what configuration settings & connection properties?
------------------------------
Mark Barinstein
Original Message:
Sent: Sun May 07, 2023 06:39 AM
From: Guy Przytula
Subject: takeover connection
when using ACR and being connected to database
when takeover occurs - the first dml gets rc
if you repeat the dml without connect you get the answer
when vip is defined and being connected to database
when takeover occurs - the first dml gets rc
if you repeat the dml without connect you get message : not connected to db - after connect the answer is received
is it correct that you need to re-execute the connect again after takeover while being connected ?
------------------------------
Thanks for all answers
Best Regards,
Guy Przytula
------------------------------