Db2 (On Premises and Cloud)

Expand all | Collapse all

Db2 11.5.5 & Windows Server Failover Clustering (WSFC)

  • 1.  Db2 11.5.5 & Windows Server Failover Clustering (WSFC)

    Posted Thu August 05, 2021 04:44 PM

    Has anyone successfully set up Db2 11.5.5 in a Windows Server Failover Clustering (WSFC) environment? Do DRDA connections work for you?

    Several years ago, I had no problem setting up Db2 11.1 (though it was a tough slog) and documented the setup here:

    https://www.starquest.com/docs/Supportdocs/techSQDRPlus/PL038_Windows_Failover_Cluster.shtml

    However, I am not having similar success with 11.5.5fp1 - I have set up the cluster and can successfully move the role between nodes and can access the database with command line (e.g. "db2 connect to MYDB") or with an ODBC connection that uses DBALIAS=MYDB.

    However, when I try to access the database using DRDA (JCC driver, our own ODBC driver "StarSQL", or the IBM Db2 ODBC driver configured to use host/port/database parameters rather than DBALIAS), I get the following error:

    [IBM][CLI Driver] SQL1031N The database directory cannot be found on the indicated file system. SQLSTATE=58031

    It does not matter whether I use the cluster name, individual hostname, or localhost - the problem seems to be on the Db2 side, but there is nothing in db2diag.log.

    IBM Support has not been helpful and says they don't have anyone familiar enough with MCSC to understand what is going on.

    So I am reaching out to see if anyone else has been successful (i.e. I would like confirmation that it should actually work, or not), or if anyone else is encountering the above problem.



    ------------------------------
    David Brown
    ------------------------------


  • 2.  RE: Db2 11.5.5 & Windows Server Failover Clustering (WSFC)

    Posted Fri August 06, 2021 04:19 AM

    Hi,

    that's interesting because the client does not really know that you are using a cluster. Like you we used MSCS years ago without problems. Did you use the highly availabe IP adress in the JDBC driver and the Db2 node directory which was defined here https://www.ibm.com/docs/en/db2/11.5?topic=commands-db2iclus-microsoft-cluster-server under point 10?

    From Cluster Administrator, create a new resource type of type "IP Address" called "mscs5" that resides on the Public Network. This resource should also belong to Db2 Group. This will be a highly available IP address, and this address should not correspond to any machine on the network. Bring the IP Address resource type online and ensure that the address can be pinged from a remote machine.

    Regards
    Gerald



    ------------------------------
    Gerald Zottl
    ------------------------------



  • 3.  RE: Db2 11.5.5 & Windows Server Failover Clustering (WSFC)

    Posted Fri August 06, 2021 05:33 AM

    Did you use Virtual IP in defining connection string using host/port?

    This Virtual IP - when defined as floating resource in MSCS should be following current primary host in my opinion - but I have only experience with TSM resources.



    ------------------------------
    Jan Nelken
    ------------------------------



  • 4.  RE: Db2 11.5.5 & Windows Server Failover Clustering (WSFC)

    Posted Mon August 09, 2021 01:14 AM
    Gerald & Jan - Thanks for the suggestions.

    I am using what Microsoft calls the "Client Access Point". I did create this and am using it, but I'll take another look at my configuration.

    (per my own tech doc, based on Microsoft & IBM docs)... in my case, I am using a DHCP address)
    In Cluster Manager, right click on the Role and select Add Resource../Client Access Point. Supply a name e.g. MYDB2. If you are using static IP addresses for Client Access Points, you will also need to supply an IP address on the main network and (if remote access is desired) add an entry to your DNS server. This name can be used to access Db2 from remote systems, regardless of which system in the cluster it is currently running on. If the Windows firewall is enabled, add an exception for port 50000 on both systems.


    Hmm - looking at the doc and checking my notes, I never used the db2iclus command - I used the db2mscs command (with a file db2clust.wse). which I think is another way of doing what db2iclus does (in my tech doc, I show using either db2mscs or db2iclus to destroy the clustered instance). In that config file, IP_NAME and IP_ADDRESS are optional for a single partition environment, so I didn't use them.

    https://www.ibm.com/docs/en/db2/11.5?topic=commands-db2mscs-set-up-windows-failover-utility

    I'll take a closer look tomorrow. Maybe tear things down and try db2iclus instead.

    (I believe that "mscs5" is just an example of a name, correct?)

    ------------------------------
    David Brown
    ------------------------------



  • 5.  RE: Db2 11.5.5 & Windows Server Failover Clustering (WSFC)

    Posted Mon August 09, 2021 01:17 AM
    BTW there was another time where I saw a problem with the finding/interpreting the database directory which could be related (or not...)

    A bit of background:

    This was a pair of systems that was working fine with Db2 11.1.

    We want to move to 11.5.5, and the instructions provided by IBM:

    https://www.ibm.com/support/producthub/db2/docs/content/SSEPGG_11.5.0/com.ibm.db2.luw.qb.upgrade.doc/doc/t0022647.html

    didn't work and IBM Support didn't have any suggestions.

    So rather than spend a lot of time debugging the procedure (which was less than desirable anyway, because it involves installing a 2nd copy of Db2, which means the location of C:\Program Files\IBM\SQLLIB changes to something else, which means a config change in our own software), I came up with this alternative method:

    * make backups of our databases (just in case)
    * delete Db2 from the cluster altogether
    * uninstall both copies of Db2, but leave the databases in place
    (I deviated slightly from that - I did an in place update of Db2 11.1->11.5.5fp1, since at that point in time there was nothing special about the Db2 installation - I had already destroyed the cluster instance & role, and there were plain vanilla 11.1 installs on both systems.)

    * install Db2 11.5.5fp1
    * create new cluster instance
    * catalog the databases (which are still present on the shared disk).
    * upgrade the databases (since this is 11.1->11.5) & run db2updv115.

    When it came time to catalog the databases, that failed:

    * db2 list db directory on S:
    shows the uncataloged databases (and all are v14 as expected - i.e. they haven't been updated yet)
    but I am unable to actually catalog & use them

    after entering:
    db2 catalog db SAMPLE on S:
    db2 list db2 directory shows that database as v15 rather than v14
    and I was unable to connect or upgrade (I didn't make a note of the error)

    I give up on trying to regain access to the databases via recataloging (though in hindsight, this is may be the same issue that causes problems later)

    instead I renamed the NODE0000 directory (to avoid conflicts with the existing old databases) & restored my databases from backup (which upgrades them) and ran db2updv115.

    So at this point I can see & access my data from a Db2 command line. I can move the role to the other machine and can access the databases there.

    But as mentioned, DRDA access is failing with "The database directory cannot be found on the indicated file system."

    I did try creating a new database - same issues.

    Hmm - I wonder if the rename of the old NODE0000 directory (and creation of a new NODE0000 directory when I restored the databases) has any bearing?

    Also, I would consider a remove & reinstall (recall that I did an update of 11.1 -> 11.5.5fp1 once it was detached from the cluster and was just a vanilla install of Db2)

    I have backups of the 2 VM's from before I started the update if there is a reason to go back to square one.






    ------------------------------
    David Brown
    ------------------------------



  • 6.  RE: Db2 11.5.5 & Windows Server Failover Clustering (WSFC)

    Posted Mon August 09, 2021 07:37 AM

    Hi,

    you have to use static IP adresses in DHCP for all cluster members and the virtual IP adress must be defined in the Db2 node directory or in the JDBC connection string with port 50000(=default).


    Regards
    Gerald



    ------------------------------
    Gerald Zottl
    ------------------------------