Informix

 View Only
Expand all | Collapse all

discrepancy in how Informix checks disk space

  • 1.  discrepancy in how Informix checks disk space

    Posted Fri October 22, 2021 12:26 PM
    Informix 14.10.FC6, RHEL 8.4 on x86 hardware

    I ran into an interesting situation yesterday that revealed a troubling discrepancy in how Informix checks available space.

    I have two servers that are supposed to be identical, so that one can be used as a DR server.  The DR server will be kept close to syncronized via Continuous Log Restore.  Both of these servers have 500GB allocated in two 250GB virtual drives.  Those virtual drives are managed by LVM, which divides each of them into 10 raw logical volumes.  Our system admin says that the system takes a little bit of the virtual drive for overhead, so we have nine 25GB raw lvols and one 24GB raw lvol on each of these virtual drives, for a total of 249GB each.  

    I have confirmed that all of these drives are configured the same between the two servers.  I create symbolic links on both machines, and then created an instance on the primary server.  I allocated dbspaces and chunks on 18 of the 20 raw logical volumes, and loaded data into the instance.  I then performed a level 0 archive using ontape, and transferred the resulting archive file to the DR server.

    This is where the fun begins.  

    When I attempted to restore from the level 0 archive, I received an error message stating that chunk xx could not fit on the device that was specified (don't have the exact error message available to copy here).

    I once again confirmed that all of the symbolic links pointed to the same devices on both machines.  I confirmed that the devices themselves were the same on both servers.  The raw volume in question was one of the two 24GB volumes.  I started checking more closely, and found that I had transposed a couple of digits when specifying the chunk size with the onspaces command.  Given the combination of offset and chunk size, the chunk exceeded the 24GB limit of the logical volume.  And yet the instance had allowed me to create it.

    Fortunately, there was no data in that chunk, so I was able to drop it and re-create it with the proper chunk size.

    So the discrepancy is that onspaces does not perform this basic check of available disk space when it creates a chunk, but ontape does.  This leads to several questions, such as:

    • When does Informix write the page header for a page?  I think that it occurs the first time the page is written, but it might be when an extent is created.  The point is, I would expect an error of some sort when it tries to write the page, whether that happens at extent creation or the first time that it attempts to add a new page that exceeds the capacity of the logical volume.
    • What would happen if the instance allocated an extent in this chunk, such that the extent started before the 24GB limit of the lvol, but extended above that limit?  Would it fail at the time that it created the extent?  Or would the extent get created, and the database start inserting data into that extent, suddenly failing when it hit this hard limit?  Basically restating the first question, slightly differently.
    • When the instance does find that it cannot write the new page, whether at extent creation time or on an INSERT statement, what does it do?  Crash?  Take the chunk "offline"?  Return an error to the application performing the INSERT?
    • What would happen if the instance attempted to allocate an extent beyond the 24GB hard limit?  The instance thinks the chunk has space, but the OS would slap it down, so what does Informix do?  Does the instance crash?  Does the chunk get marked "offline"?  Does it just return an error to the application that caused the instance to attempt to create the new extent?
    • Shouldn't onspaces perform a check similar to the one performed by ontape, so that it identifies a situation such as this?
    • I know that with cooked files, the instance will expand the file the size is specified in the onspaces command, so that it reserves the space and prevents the other users from filling up the disk before the instance actually gets around to using that space.  It would seem reasonable to expect the instance to at least check raw devices to insure that they are large enough to accommodate the chunk when a new chunk is created.
    • Not a question, but the obvious observation that, if I hadn't had a DR server to test this restore, I could have created a level 0 archive that could not be restored to the same server and devices from which the archive was performed.  This is a major problem.
    • Does anyone know whether previous versions of Informix perform this sort of checking?  Has something changed in 14.10.FC6 that broke this check?

    I've been fortunate enough that I've never inadvertently created this situation in the past, but now this seems like something that somceone, somewhere, likely has done in the past.  Thus, the database should try to save us from ourselves.



    ------------------------------
    Mark Collins
    ------------------------------

    #Informix


  • 2.  RE: discrepancy in how Informix checks disk space

    IBM Champion
    Posted Fri October 22, 2021 12:34 PM
    Mark:

    I believe that there was a change to how new chunks are created in 14.10.FC6 to make auto-chunk allocation faster and less intrusive when storage pools create a chunk on the fly. I believe that the OS feature to create a sparse chunk is used now where in the past Informix attempted to write to the entire chunk, or at least to write to the last page(s) of the chunk. It is likely that ontape was never changed which is probably a good thing.

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 3.  RE: discrepancy in how Informix checks disk space

    Posted Fri October 22, 2021 03:25 PM
    Edited by System Fri January 20, 2023 04:16 PM
    Art,

    Thanks for confirming that this is a very new feature.  I went back and tested it on my 11.50 instance, and onspaces definitely flags this as a problem:

    /local_home/informix 204> onspaces -a testdbs -p /test_ifmx/links/rdb15 -o 11200000 -s 1090000
    Verifying physical disk space, please wait ...
    The chunk '/test_ifmx/links/rdb15' will not fit in the space specified.

    That still leaves open the questions of how Informix will respond in a case where onspaces created a chunk larger than what can physically fit on the raw device.

    As a follow-up, what is the best way to raise this as an issue?  Just open a ticket with IBM, highlighting the issue and asking why they allow this situation?  Surely they couldn't have intended to allow users to create a chunk that can be backed up but cannot be restored.

    But then again, with my luck they'd just "fix" ontape to allow the restore, leaving the other questions hanging.


    ------------------------------
    Mark Collins
    ------------------------------



  • 4.  RE: discrepancy in how Informix checks disk space

    IBM Champion
    Posted Fri October 22, 2021 03:43 PM
    Mark:

    I count it as a bug that Informix allows one to create a chunk that will not fit on the device or filesystem. Open a case and report it as such.

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 5.  RE: discrepancy in how Informix checks disk space

    Posted Fri October 22, 2021 04:57 PM
    I have opened a ticket with IBM.

    Thinking about it some more, they probably had to leave the logic in ontape for cases where you remap the disks to which the archive is restored.

    ------------------------------
    Mark Collins
    ------------------------------



  • 6.  RE: discrepancy in how Informix checks disk space

    Posted Mon October 25, 2021 10:44 AM
    Hi Mark and Art. The chunk-inflating feature mentioned here uses fallocate(), and it's true that this feature is new in 14.10, but it shouldn't be related to the problem described here because it's used only for cooked files. Unless a stat() of the raw device is returning S_IFREG for the device's mode (as opposed to S_IFBLK or S_IFCHR as we'd expect), the new code shouldn't come into play. Obviously something odd is happening here, and there's always a chance it's an unintended consequence of digging around in the area of chunk verification for this feature, but I've done a few tests and taken a look at the code and am not yet seeing how this could be happening on Mark's end. FYI I tested on linux with a straightforward block device and got the behavior you'd expect with both onspaces and ontape. I did not test with anything semi-exotic like logical volumes and such.

    I don't mean to do an end-run around whoever in Support is working the case. If you could pass this info along to that engineer and tell them they're welcome to contact me I'd appreciate it. Obviously I'd like to get to the bottom of this and I have a feeling we'll need to work more with Mark to narrow things down and come up with a reproduction. I may be able to write a simple program that emulates the way in which each utility tests the size of a device before creating the chunk. Actually the server does that work on their behalf but the flavors of open/seek/read used for onspaces are slightly different from those used for the archive restore. If that program becomes necessary to narrow things down we'll send it to you Mark and hope you can test it on the same kind of device that gave you the original problem. It does not write to the device; it only reads--same with the chunk-verification code.

    Thanks.

    -jc

    ------------------------------
    John Lengyel
    ------------------------------



  • 7.  RE: discrepancy in how Informix checks disk space

    Posted Mon October 25, 2021 10:55 AM
    Thanks JC.
     
    Scott Pickett
    IBM Informix WW Technical Sales IBM Expert Labs
    IBM Informix WW Cloud Technical Sales IBM Expert Labs
    IBM Informix WW Cloud Technical Sales ICIAE IBM Expert Labs
    IBM Informix WW Informix Warehouse Accelerator Sales IBM Expert Labs
    Boston, Massachusetts USA
    spickett@us.ibm.com
    617-899-7549
    33 Years Informix User
     
    The current Informix Roadshow presentations are here:
     







  • 8.  RE: discrepancy in how Informix checks disk space

    Posted Mon October 25, 2021 11:03 AM
    JC,

    I will pass this along when I hear from support.  Glad to hear that this was not expected behavior.



    Mark

    ------------------------------
    Mark Collins
    ------------------------------



  • 9.  RE: discrepancy in how Informix checks disk space

    Posted Mon October 25, 2021 11:35 AM
    I'm not sure whether JC is referring to the stat(2) or the stat(3p) or some other flavor of the stat() function, and I have not dug deep enough into the respective man pages to see all of the various flags that can be returned by each.  I did run stat command from the command line, and when I use the symbolic link, I get:

    [informix@my_server links]$ stat prod_raw17
    File: prod_raw17 -> /dev/raw/raw17
    Size: 14 Blocks: 0 IO Block: 4096 symbolic link
    Device: fd07h/64775d Inode: 1022 Links: 1
    Access: (0777/lrwxrwxrwx) Uid: ( 1001/informix) Gid: ( 1001/informix)
    Context: unconfined_u:object_r:unlabeled_t:s0
    Access: 2021-10-25 11:23:17.029730169 -0400
    Modify: 2021-10-01 12:07:09.971093472 -0400
    Change: 2021-10-01 12:07:09.971093472 -0400
    Birth: -


    Using the actual raw device name that the symlink resolves to gives me:

    [informix@my_server links]$ stat /dev/raw/raw17
    File: /dev/raw/raw17
    Size: 0 Blocks: 0 IO Block: 4096 character special file
    Device: 6h/6d Inode: 1241986 Links: 1 Device type: a2,11
    Access: (0660/crw-rw----) Uid: ( 1001/informix) Gid: ( 1001/informix)
    Context: system_u:object_r:fixed_disk_device_t:s0
    Access: 2021-10-23 18:37:08.267013261 -0400
    Modify: 2021-10-21 15:50:45.714281695 -0400
    Change: 2021-10-21 15:50:45.714281695 -0400
    Birth: -


    Since the second version states "character special file", I would hope that the stat() function would return S_IFCHR rather than S_IFREG.


    ------------------------------
    Mark Collins
    ------------------------------



  • 10.  RE: discrepancy in how Informix checks disk space

    Posted Tue October 26, 2021 02:01 AM

    Hello Mark,

    you can find out what stat()/fstat()/lstat() returns with:

    $ strace stat prod_raw17

    I assume something like this:

    ..

    lstat("prod_raw17", {st_mode=S_IFLNK|0777, st_size=9, ...}) = 0
    fstat(1, {st_mode=S_IFCHR|0660, st_rdev=makedev(0x4, 0x3), ...}) = 0
    ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
    readlink("prod_raw17", "/dev/raw/raw17", 10) = 9

    ..

    HTH,

    Markus



    ------------------------------
    Markus Holzbauer
    ------------------------------



  • 11.  RE: discrepancy in how Informix checks disk space

    Posted Wed October 27, 2021 07:04 PM
    Markus,

    Thanks.  I always forget about strace.  I took the harder route and wrote a short C program to do the stat() call and report the value of st_mode.



    Mark

    ------------------------------
    Mark Collins
    ------------------------------



  • 12.  RE: discrepancy in how Informix checks disk space

    Posted Wed October 27, 2021 07:15 PM
    OK, this one was NOT an Informix issue after all.  

    When our sys adm created the logical volumes, he did it in one order on one machine, and a different order on the second machine.  Thus, lvol9 pointed to /dev/raw/raw17 on one system and /dev/raw/raw26 on the other.  I took him at his word that the logical volumes were created the same on both servers, and they were, in that lvol9 was 24GB on both servers.  But because he said they were created the same, I built my symbolic links the same on both servers, so that the same symlink pointed to the same /dev/raw/rawxx files.  

    Thus, the symlink db_raw17 device pointed to by /dev/raw/raw17 on both servers.  But on server_a, /dev/raw/raw17 pointed to a 25GB device, and on server_b it pointed to a 24GB device.  

    So when I accidentally transposed the digits for the onspaces size parameter on server_a, it exceeded 24GB, but still fit on the 25GB device.  Onspaces did not report an error because there was no error.  When attempting to restore the level 0 from server_a onto server_b, ontape reported the error because it now was pointing to a 24GB device.

    Many thanks to JC for his assistance in investigating this one.


    ------------------------------
    Mark Collins
    ------------------------------