Informix

 View Only
  • 1.  ER not propagating and no error messages

    Posted Fri December 03, 2021 07:48 AM
    I'm using Informix IDS 10.00.UC6 on Solaris 11, with two machines having the same database schema and all tables replicating in both directions using Enterprise Replication, so in theory both databases should have the same content.

    However , a problem has arisen where one direction of replication (Host A to Host B) continues to work correctly, but the other direction (Host B to host A) does not work. The symptoms are:
    • Changes made to a table on Host B do not propagate to Host A (as determined by changing a row on Host B and inspecting the table on Host A)
    • `cdr list serv` shows `Active` and `Connected` (both directions), but on Host B there is a queue of millions of bytes.
    • `cdr list repl` shows non-zero queues for several of the replicates.
    • `cdr stats recvq` on Host A shows nothing received from Host B recently.
    • `cdr stats rqm` shows data in the spool `trg_send_stxn` with flags `SEND_Q, SPOOLED, PROGRESS_TABLE, NEED_ACK, SENDQ_MASK, SREP_TABLE`.
    • There are no errors or relevant messages in `online.log` or `cdr_mon.log` , or any other place I can think to look.
    • Some of the tables are "out of sync" in that rows have conflicting data or are missing; this is for various reasons relating to past errors where one host was offline. However, even changes to tables with correct data on Host B are not propagated to Host A.
    • I did a `cdr cleanstart` on Host B yesterday after this problem was occurring in both directions, which did at least make the A -> B direction start working (the opposite of what I expected), and the queue on Host B were 0 at that time. After that cleanstart, some changes to tables (with correct data) would propagate to Host A, while some changes to other tables on B would not. But today, no tables are propagating from B to A.
      • Before the `cleanstart` I had found by experimenting that sometimes deleting an individual replicate would reduce the size of the stuck queue but the queue remained stuck all the same; and sometimes, deleting a replicate would make the queue move for a time before being stuck again.
    • There is also a DR host that both A and B do one-way propagation to, and that is propagating correctly with no queue backup.

    I'm at a loss now as to try and diagnose why the data in the replication queues is not moving. If there were sync errors (i.e. the replicated change could not be applied due to Host A data differing) I would expect log messages in `online.log` that the update was rejected, with information saved to $INFORMIXDIR/ats_dr and so on -- this has happened recently . It seems as if there must be something in the queue being refused but not being cleared and not logged, blocking the queue. Host A has heavy live traffic and (thankfully) is correctly replicating to Host B, but not vice versa.

    Any ideas of more things to try or ways to diagnose the problem would be most welcome.

    I have seen from other searching people advising to drop syscdr but nothing was mentioned about how to recreate it and resume replication afterwards.

    ------------------------------
    Matt McNabb
    ------------------------------

    #Informix


  • 2.  RE: ER not propagating and no error messages

    IBM Champion
    Posted Fri December 03, 2021 07:56 AM
    Dumb question - you sure are using update anywhere and not master-slave ?

    Cheers
    Paul

    Paul Watson
    Oninit LLC
    +1-913-387-7529
    www.oninit.com
    Oninit®️ is a registered trademark of Oninit LLC





  • 3.  RE: ER not propagating and no error messages

    IBM Champion
    Posted Fri December 03, 2021 09:35 AM
    It's the 'cdr define server' command which would recreat the syscdr database.

    But for this you'd first have to be able to drop it, either manually or by using 'cdr delete server <server_name> --force'
     -> I'm pretty sure this is going to fail with  "214: Cannot remove file for table (informix.trg_send_stxn)." or even more cryptic error when using 'cdr delete server'.

    My advice:
    with ER down (cdr stop) rename this table to trg_send_stxn_corrupt, create a new one with same schema ... and be happy (cdr start or cdr cleanstart).



    ------------------------------
    Andreas Legner
    ------------------------------



  • 4.  RE: ER not propagating and no error messages

    Posted Sun December 05, 2021 03:03 AM
    OK, thanks. I will try the approach of renaming the table and see what happens; if that fails then dropping will be the next attempt.  

    Would the next level if both of those fail, to be trying to drop and recreate the erdbspace and ersbspace?

    ------------------------------
    Matt McNabb
    ------------------------------



  • 5.  RE: ER not propagating and no error messages

    IBM Champion
    Posted Mon December 20, 2021 03:47 AM
    Hi Matt,

    curious how this went, were you able to get this going again?

    Andreas

    ------------------------------
    Andreas Legner
    ------------------------------



  • 6.  RE: ER not propagating and no error messages

    IBM Champion
    Posted Fri December 03, 2021 12:28 PM
    Matt:

    You are on v10.00 which has been out-of-support probably for over a decade. There are many ER bugs in that version that were fixed on 11.10, 11.50, 11.70, 12.10, and 14.10 plus several performance improvements in the server itself but also in ER as well as several new features in ER and the server as a whole. You really should upgrade to v14.10.FC7, export your databases, and build a new server under 14.10 from scratch then import your databases. You might just find your troubles gone, and even if they are not, you can call IBM for support!

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 7.  RE: ER not propagating and no error messages

    Posted Fri December 17, 2021 11:04 AM
    I found some thing like you.

    I used:
       cdr check repl ... repair 

    ER can send  the translations in queue and can the replication will OK

    ------------------------------
    David Cui
    Technical support
    gbase.cn
    ------------------------------