Informix

 View Only
Expand all | Collapse all

Write Problem w/ Solaris Volume Manager Mirror

  • 1.  Write Problem w/ Solaris Volume Manager Mirror

    IBM Champion
    Posted Sat May 30, 2020 02:33 AM
    IDS 12.10.FC12
    Solaris 10 1/13

    I have been using Solaris Volume Manager raw mirror metadevices for several years without any problems.

    But, today, I was looking over the documentation and read something I had never noticed before.  It was a caution in the SVM documentation that writes that changed the buffer contents while the data was "in-transit" could result in each side of the mirror getting written with different data.  It mentioned a possible way to prevent this, but cautioned that it didn't work with raw (meta)devices.

    Either Informix doesn't operate this way, or I have been very lucky for the past 4 or 5 years.

    Might anyone have any comment on Informix's susceptibility (or lack thereof) to this?  Follow up would be, how significant is the performance hit in the suggested addition to /etc/system?

    For convenience, I include an excerpt from the man pages of Solaris Volume Manager:

    Thank you,

    DG




    Excerpt from SVM man pages:

    Write-On-Write Problem

    When mirroring data in Solaris Volume Manager, transfers from memory to the disks do not all occur at exactly the same time for all sides of the mirror. If the contents of buffers are changed while the data is in-flight to the disk (called write-on-write), then different data can end up being stored on each side of a mirror.

    This problem can be addressed by making a private copy of the data for mirror writes, however, doing this copy is expensive. Another approach is to detect when memory has been modified across a write by looking at the dirty-bit associated with the memory page. Volume Manager uses this dirty-bit technique when it can. Unfortunately, this technique does not work for raw I/O or direct I/O. By default, Volume Manager is tuned for performance with the liability that mirrored data might be out of sync if an application does a "write-on-write" to buffers associated with raw I/O or direct I/O.

    Note that without mirroring, you were not guaranteed what data would actually end up on media, but multiple reads would return the same data. With mirroring, multiple reads may return different data. The following line can be added to /etc/system to cause a stable copy of the buffers to be used for all raw I/O and direct I/O write operations.


    set md_mirror:md_mirror_wow_flg=0x20

    Setting this flag will degrade performance.



    ------------------------------
    David Grove
    ------------------------------

    #Informix


  • 2.  RE: Write Problem w/ Solaris Volume Manager Mirror

    IBM Champion
    Posted Tue December 21, 2021 03:44 PM
    This would be a question for Andreas or someone else from IBM/HCL.

    There are buffer latches/locks which can be used to obtain exclusive access to buffers.

    Whilst a buffer under going a memory write I belive it is taken off the LRU queue and latched/locked so it cannot be accessed by another thread.

    As per https://www.ibm.com/docs/en/informix-servers/14.10?topic=memory-monitor-buffers there certainly are exclusive locks for buffers.

    The question is during an i/o write is the buffer latched/locked so it cannot be written to by another thread?

    I known other products have specific latching during /io  e.g. As per https://techcommunity.microsoft.com/t5/sql-server-support-blog/how-it-works-bob-dorr-s-sql-server-i-o-presentation/ba-p/316031 mentions latches for writes to stop other threads doing a memory write to the page whilst an IO write is occuring.

    "To write the page a latch must first be acquired. In most cases this is an EX latch to prevent further changes on the page. For example the EX latch is acquired and the checksum or torn bits are calculated and the page is then written. The page can never change during the write or it will be come corrupted. In some cases you can think of an SH latch would prevent an EX latch from changing the page so why would an EX latch be required during the write and block readers. Take the torn PAGE_AUDIT protection as the example. The torn bit protection changes a bit on every sector. If read in this state it would appears as the page was corrupted. So to handle torn bit protection the EX latch is acquired, the write completes and the in-memory copy of the page removed the torn bit protection so readers see the right data. In most instances the EX latch is used but SQL Server will use an SH latch when possible to allow readers during the write.

    David.

    ------------------------------
    David Williams
    ------------------------------