Informix 14.10.FC6, RHEL 8.4 on x86 hardware
I ran into an interesting situation yesterday that revealed a troubling discrepancy in how Informix checks available space.
I have two servers that are supposed to be identical, so that one can be used as a DR server. The DR server will be kept close to syncronized via Continuous Log Restore. Both of these servers have 500GB allocated in two 250GB virtual drives. Those virtual drives are managed by LVM, which divides each of them into 10 raw logical volumes. Our system admin says that the system takes a little bit of the virtual drive for overhead, so we have nine 25GB raw lvols and one 24GB raw lvol on each of these virtual drives, for a total of 249GB each.
I have confirmed that all of these drives are configured the same between the two servers. I create symbolic links on both machines, and then created an instance on the primary server. I allocated dbspaces and chunks on 18 of the 20 raw logical volumes, and loaded data into the instance. I then performed a level 0 archive using ontape, and transferred the resulting archive file to the DR server.
This is where the fun begins.
When I attempted to restore from the level 0 archive, I received an error message stating that chunk xx could not fit on the device that was specified (don't have the exact error message available to copy here).
I once again confirmed that all of the symbolic links pointed to the same devices on both machines. I confirmed that the devices themselves were the same on both servers. The raw volume in question was one of the two 24GB volumes. I started checking more closely, and found that I had transposed a couple of digits when specifying the chunk size with the onspaces command. Given the combination of offset and chunk size, the chunk exceeded the 24GB limit of the logical volume. And yet the instance had allowed me to create it.
Fortunately, there was no data in that chunk, so I was able to drop it and re-create it with the proper chunk size.
So the discrepancy is that onspaces does
not perform this basic check of available disk space when it creates a chunk, but ontape
does. This leads to several questions, such as:
- When does Informix write the page header for a page? I think that it occurs the first time the page is written, but it might be when an extent is created. The point is, I would expect an error of some sort when it tries to write the page, whether that happens at extent creation or the first time that it attempts to add a new page that exceeds the capacity of the logical volume.
- What would happen if the instance allocated an extent in this chunk, such that the extent started before the 24GB limit of the lvol, but extended above that limit? Would it fail at the time that it created the extent? Or would the extent get created, and the database start inserting data into that extent, suddenly failing when it hit this hard limit? Basically restating the first question, slightly differently.
- When the instance does find that it cannot write the new page, whether at extent creation time or on an INSERT statement, what does it do? Crash? Take the chunk "offline"? Return an error to the application performing the INSERT?
- What would happen if the instance attempted to allocate an extent beyond the 24GB hard limit? The instance thinks the chunk has space, but the OS would slap it down, so what does Informix do? Does the instance crash? Does the chunk get marked "offline"? Does it just return an error to the application that caused the instance to attempt to create the new extent?
- Shouldn't onspaces perform a check similar to the one performed by ontape, so that it identifies a situation such as this?
- I know that with cooked files, the instance will expand the file the size is specified in the onspaces command, so that it reserves the space and prevents the other users from filling up the disk before the instance actually gets around to using that space. It would seem reasonable to expect the instance to at least check raw devices to insure that they are large enough to accommodate the chunk when a new chunk is created.
- Not a question, but the obvious observation that, if I hadn't had a DR server to test this restore, I could have created a level 0 archive that could not be restored to the same server and devices from which the archive was performed. This is a major problem.
- Does anyone know whether previous versions of Informix perform this sort of checking? Has something changed in 14.10.FC6 that broke this check?
I've been fortunate enough that I've never inadvertently created this situation in the past, but now this seems like something that somceone, somewhere, likely has done in the past. Thus, the database should try to save us from ourselves.
------------------------------
Mark Collins
------------------------------
#Informix