Informix

Informix

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only
Expand all | Collapse all

Changing from raw to cooked

  • 1.  Changing from raw to cooked

    Posted Mon February 23, 2026 03:27 PM
    As many of you know, RH dropped support for raw devices starting in RH9.  I know, I know - they've dropped support in the past and then added it back in.  This time, so far as I know, they have not added it back.  One source claimed that Linus finally got his way and support for raw was removed from the Linux kernel that is used in RH9 and later.  That would imply that other Linux distributions also will lose support for raw devices as they migrate to that version (or later) of the kernel.
     
    So we're going to have to migrate from raw to cooked.  Some of our instances will be migrated by doing a level 0 archive of the raw environment, shutting down, creating cooked chunks, and restoring to those cooked chunks.  Other instances will create mirror chunks on cooked files, then swap raw primary and cooked mirror chunks, then drop the raw chunks.
     
    I've viewed Art Kagel's presentation on the various Linux file system types.  Based on that, it looks like XFS is the best choice we have on RH9.
     
    On raw devices, we frequently had multiple chunks on the same device, using the '-o' option in onspaces to specify an offset.  We will not be doing that when we moved to cooked, as we'll be using a separate cooked file for each chunk.  
     
    My questions today are about the best practices for cooked chunks.  First, how many file systems should we create?  Does it matter if we use just one file system for all Informix cooked chunks, or should we create multiple?  If multiple, how many?  Does it vary based on the number of chunks, or the total size of the file system, or one file system per dbspace, or some other factor?  Does Linux spread I/O across multiple file systems in some way?
     
    Are there specific parameters that need to be used for mkfs.xfs when setting up the file systems?  Does the block size need to match the dbspace page size?  Does the sector size need to match page size?  Is there any performance benefit to specifying fewer inodes (assuming we'll only have a few dozen chunks in the file system)?  Do we need to specify any '-m' (global metadata option) -d (data section options), or '-l' (log section options)?  I need to know what to tell our Linux sys admin if we need anything other than the defaults.
     
    Next up - when it comes time to actually allocate a cooked file for a chunk, I know that if the instance is running and we run 'onspaces -c -d dbspace_name -p /path/to/cooked/file -o 0 -s 5000000', Informix will create the file and fill it up to the specified size.  For the situation where we are going to restore an archive to the cooked files, is there any advantage to us manually filling the files to the correct size via 'dd' or some similar utility?  I'm thinking back to the days of DOS computers where the files would start of small and then grow, leading to fragmentation on the disk, and I'm trying to prevent something like that from happening here.  Thinking that if the file is already the correct size when the restore runs, it will just use the existing space.
     
    Am I overthinking things here?  Everything these days is virtualized, with VMs using space that SAN administrators carve up and distribute to the various hosts.  With all of these layers of abstraction, it seems impossible to structure the layout for performance like we did in the old days.  Given that, should I just create the zero-byte files and let the OS and the SAN just decide where to put things?
     
    One last question - of the platforms that current versions of Informix run on, which ones are left that still support raw devices, now that Linux has eliminated them?
     
    Thanks in advance.


    ------------------------------
    mark collins
    ------------------------------


  • 2.  RE: Changing from raw to cooked

    Posted Mon February 23, 2026 04:07 PM
    Just addressing the pre-size the chunk files, then yes it used to be a good things, in my experience, but since EAR it is a waste of time so the engine will just redo work when 'clearing' the chunk

    Cheers
    Paul

    On 2/23/2026 2:27 PM, mark collins via IBM TechXchange Community wrote:
    0100019c8c2f14ea-c8147474-f2bb-44d0-8a98-48c43bd37e8e-000000@email.amazonses.com">
    As many of you know, RH dropped support for raw devices starting in RH9. I know, I know - they've dropped support in the past and then added it...





  • 3.  RE: Changing from raw to cooked

    Posted Tue February 24, 2026 10:04 AM

    Hi Paul,

    Thanks for confirming what I thought I recalled from the distant past.  Good to know that it is no longer the case.

    Mark



    ------------------------------
    mark collins
    ------------------------------



  • 4.  RE: Changing from raw to cooked

    Posted Mon February 23, 2026 04:33 PM
    Mark:

    I'm going to respond to your questions and comments inline as much as I can:

    On raw devices, we frequently had multiple chunks on the same device, using the '-o' option in onspaces to specify an offset.  We will not be doing that when we moved to cooked, as we'll be using a separate cooked file for each chunk.  
    I'd agree with that idea.

    My questions today are about the best practices for cooked chunks.  First, how many file systems should we create?  Does it matter if we use just one file system for all Informix cooked chunks, or should we create multiple?  If multiple, how many?  Does it vary based on the number of chunks, or the total size of the file system, or one file system per dbspace, or some other factor?  Does Linux spread I/O across multiple file systems in some way?
     
    Are there specific parameters that need to be used for mkfs.xfs when setting up the file systems?  Does the block size need to match the dbspace page size?  Does the sector size need to match page size?  Is there any performance benefit to specifying fewer inodes (assuming we'll only have a few dozen chunks in the file system)?  Do we need to specify any '-m' (global metadata option) -d (data section options), or '-l' (log section options)?  I need to know what to tell our Linux sys admin if we need anything other than the defaults.
    Since, as you note later in your post, all of the file systems will likely be carved from a single large array (hopefully not any kind of parity based RAID) it doesn't matter in general. I would not go to one file system per chunk. There is one point, however, DIRECT_IO only works if the pagesize of the chunk is an even multiple of the block size of the file system. So, if you have any odd pagesize dbspaces you may need a separate file system for those with a matched block size. 
    Informix assigns cleaner threads by chunk to get parallelism during checkpoint writes and other bulk IO, so plan accordingly, but, if your chunk layout is working, then don't go crazy. I can't comment on XFS configuration and tuning, nor how Linux handles IO with multiple file systems.

    Next up - when it comes time to actually allocate a cooked file for a chunk, I know that if the instance is running and we run 'onspaces -c -d dbspace_name -p /path/to/cooked/file -o 0 -s 5000000', Informix will create the file and fill it up to the specified size.  For the situation where we are going to restore an archive to the cooked files, is there any advantage to us manually filling the files to the correct size via 'dd' or some similar utility?  I'm thinking back to the days of DOS computers where the files would start of small and then grow, leading to fragmentation on the disk, and I'm trying to prevent something like that from happening here.  Thinking that if the file is already the correct size when the restore runs, it will just use the existing space.
    OK, so during a restore, ontape or onbar will be opening each chunk for writing, not for rewriting, so the chunk file will be wiped to zero length. That means that initializing the chunk files yourself will not do anything in that case. In the case where you are creating mirror chunks, again, the chunk files will be open for writing and so overwritten by initially truncating the file and releasing all of its existing "extents" in the file system,so, again, no gain by writing to it first. The real key, especially in the case of the restore scenario, is to make sure that the file system is clean and empty since the restore will write out the chunks one at a time making them contiguous as long as the file system is empty. That said, if you are using SSD drives, all bets are off since contiguousness does not matter (no sector latency) and rewriting existing storage sectors is accomplished by a copy-on-write mechanism (discussed in my video).

    Art


    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.









  • 5.  RE: Changing from raw to cooked

    Posted Tue February 24, 2026 10:02 AM

    Hello Art,

    Thanks.  A couple of follow-up questions.  You say "DIRECT_IO only works if the pagesize of the chunk is an even multiple of the block size of the file system", I want to confirm that the Informix pagesize has to be an even multilple, as opposed to an integer multiple.  Thus for a 2k pagesize, the block size should be either 512 or 1024 bytes, as a 2k block size would result in the pagesize being an odd multiple (1) of block size.  And obviously, a 4k block size would be completely inappropriate for a 2k pagesize.

    For dbspaces with 16k pagesize, is there any benefit to having a larger file system block size?  Perhaps a 4k blocksize?

    Next, " in the case of the restore scenario, is to make sure that the file system is clean and empty".  Does this mean that I should completely delete the cooked files prior to performing the restore?  Does the restore create the cooked files, or do I need to at least do a 'touch db_cooked_file' or 'cat /dev/null > db_cooked_file' prior to running the restore?

    Thanks.

    Mark



    ------------------------------
    mark collins
    ------------------------------



  • 6.  RE: Changing from raw to cooked

    Posted Tue February 24, 2026 10:28 AM
    Thanks.  A couple of follow-up questions.  You say "DIRECT_IO only works if the pagesize of the chunk is an even multiple of the block size of the file system", I want to confirm that the Informix pagesize has to be an even multilple, as opposed to an integer multiple.  Thus for a 2k pagesize, the block size should be either 512 or 1024 bytes, as a 2k block size would result in the pagesize being an odd multiple (1) of block size.  And obviously, a 4k block size would be completely inappropriate for a 2k pagesize.

    By even, I wasn't "speaking" mathematically, but rather colloqually. So, by even, I just meant that having a block size of 3K for a 2K pagesize dbspace would not be good. But, 512byte or 1024 byte blocks would be OK as would 2K blocks.


    For dbspaces with 16k pagesize, is there any benefit to having a larger file system block size?  Perhaps a 4k blocksize?

    Hmm, so for spindle drives there might be a small benefit to using a block size that is larger than the physical blocking of the drives (512bytes on nearly all such drives). The bigger question is for SSD drives. I would be careful to make sure that the block size of the filesystem was either a multiple of the size of each chip's blocking or divides evenlyinto the chip's blocking (with no remainder) . In the case where the FS block is smaller than the chip's block, not too much smaller. These restraints are to avoid a block write causing more than one chip to be copied and to minimize copying the same chip's data multiple times.


    Next, " in the case of the restore scenario, is to make sure that the file system is clean and empty".  Does this mean that I should completely delete the cooked files prior to performing the restore?  Does the restore create the cooked files, or do I need to at least do a 'touch db_cooked_file' or 'cat /dev/null > db_cooked_file' prior to running the restore?

    No. Empty as in all of the chunk files are zero length and there are no extraneous other data on the file system. An archive will expect that all of the chunk files exist before the restore starts and will 

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.









  • 7.  RE: Changing from raw to cooked

    Posted Mon February 23, 2026 06:09 PM
    Edited by ke chen Mon February 23, 2026 06:10 PM

    suggest to use below to fast format an empty chunk file, to reduce the wait time on ontape/onbar restore process zeroing it:

    fallocate -l <CHUNK_SIZE> <CHUNK_FILE_NAME>



    ------------------------------
    ke chen
    ------------------------------



  • 8.  RE: Changing from raw to cooked

    Posted Tue February 24, 2026 10:05 AM

    Thanks for that suggestion.  Saves time over creating the file and then using 'dd' to fill it out.

    Mark



    ------------------------------
    mark collins
    ------------------------------



  • 9.  RE: Changing from raw to cooked

    Posted Tue February 24, 2026 04:39 AM

    Hi Mark,
    just my 2 cents ... you don't need to go for real cooked files in a filesystem. We're using the LVM logical volumes as Informix chunks (basically in the same way as raw devices) since RH6:

    [informix@<hostname>:~] $ uname -a
    Linux <hostname> 5.14.0-611.24.1.el9_7.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 10 05:12:47 EST 2026 x86_64 x86_64 x86_64 GNU/Linux

    [informix@<hostname>:~] $ cat /etc/redhat-release
    Red Hat Enterprise Linux release 9.6 (Plow)

    [informix@<hostname>:~] $ onstat -d | grep rootdbs
    48da3028         1        0x1        1        1        2048     N  B     informix rootdbs_1
    48da3280         1      1      0          1048576    1040021               PO-B-- /opt/informix/devlink/1/rootdbs_1_1

    [informix@<hostname>:~] $ ll /opt/informix/devlink/1/rootdbs_1_1
    lrwxrwxrwx 1 informix informix 28 Aug  8  2025 /opt/informix/devlink/1/rootdbs_1_1 -> /dev/dginfordb01/rootdbs_1_1

    [informix@<hostname>:~] $ ll /dev/dginfordb01/rootdbs_1_1
    lrwxrwxrwx 1 root root 7 Feb  4 18:19 /dev/dginfordb01/rootdbs_1_1 -> ../dm-7

    [informix@<hostname>:~] $ ll /dev/dm-7
    brw-rw---- 1 informix informix 253, 7 Feb 24 10:25 /dev/dm-7

    From LVM perspective:
    [root@<hostname> ~]# vgs dginfordb01
      VG          #PV #LV #SN Attr   VSize     VFree
      dginfordb01   1  10   0 wz--n- <1000.00g <368.00g

    [root@<hostname> ~]# lvs dginfordb01
      LV                VG          Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
    ...

      llogdbs_1_1_1     dginfordb01 -wi-ao----   6.00g
      llogdbs_1_2_1     dginfordb01 -wi-ao----   6.00g
      plogdbs_1         dginfordb01 -wi-ao----   4.00g
      rootdbs_1_1       dginfordb01 -wi-ao----   2.00g
      sysdbs_1_1        dginfordb01 -wi-ao----   2.00g
      tempdbs_1_1_1     dginfordb01 -wi-ao----   6.00g
      tempdbs_1_2_1     dginfordb01 -wi-ao----   6.00g

    And the LVs/chunks are automatically opened using O_DIRECT flag:

    [informix@<hostname>:~] $ onstat -g glo | grep "^ 1 .*cpu"
     1     2213      cpu         1132.55   1825.31   2957.86   18450.26  16%

    [root@<hostname> ~]# ll /proc/2213/fd | grep dm-7
    lrwx------ 1 root root 64 Feb  4 18:20 257 -> /dev/dm-7

    [root@<hostname> ~]# cat /proc/2213/fdinfo/257
    pos:    0
    flags:  0150002 (i.e. O_LARGEFILE | O_DIRECT | O_DSYNC | O_RDWR as per /usr/include/asm-generic/fcntl.h )
    mnt_id: 22
    ino:    640

    HTH, -tz-



    ------------------------------
    -tz-
    ------------------------------



  • 10.  RE: Changing from raw to cooked

    Posted Tue February 24, 2026 10:17 AM

    Hello Tomas,

    If I understand, you're simply creating logical volumes and then allocating a chunk directly to the block device file, rather than creating a file system and using cooked files within the file system.  Is that correct? 

    And it looks like you create LVs of differing sizes to accommodate chunks that are different sizes as well.  Do you place a single chunk on each LV?  If so, do you make the chunk the same size as the LV?  Or do you leave some space to allow the chunk to expand if it fills? 

    If your instance grows over time, rather than adding more LVs to the file system to increase size, you plan to simply create another LV and place a new chunk there?

    Thanks.

    Mark



    ------------------------------
    mark collins
    ------------------------------



  • 11.  RE: Changing from raw to cooked

    Posted Wed February 25, 2026 03:58 AM

    Hi Mark,

    your understanding is correct - while creating the dbspace and/or adding a chunk we use the block device directly (even though via several symlinks); we don't use extendable chunks, so yep, one LV per chunk;  offset=0 and the size of the chunk equals to size of LV (but i believe using the offsets would work as well, from my experience the LV behaves exactly as a raw partition). And we add new LVs (as new chunks) if needed.

    -tz-



    ------------------------------
    -tz-
    ------------------------------



  • 12.  RE: Changing from raw to cooked

    Posted Wed February 25, 2026 09:51 AM

    Hello Tomas,

    Thanks for confirming.

    I'm working through in my mind the various pros and cons of such an approach.  I would presume that you gain a bit of efficiency in that you have the entire LV available for Informix use.  There would be no superblock and no inode table, and no space consumed by the actual directory file.  And since it is not a file system, you do not have to mount it.  And since it's not mounted, no one can see the files using 'ls'.  

    Basically, it's like using raw devices, but using the block device file rather than the character device.

    I am trying to recall the reason that we were told way back when to use the character device rather than block device.  Was it to avoid OS buffering duplicating the Informix buffere cache?  If so, the DIRECT_IO (O_DIRECT) would eliminate that concern.

    I'm trying to see if there are any negatives to this approach.

    Thanks.

    Mark



    ------------------------------
    mark collins
    ------------------------------



  • 13.  RE: Changing from raw to cooked

    Posted Thu February 26, 2026 11:54 AM

    Hi Mark,

    as i've said, we've been using the LVs this way since RH6 times, i.e. for 10+ years (and maybe even longer; i joined the company 10 year ago). And we have few hundreds of instances ... i'm not saying there are no caveats or cons, but we didn't notice any so far.

    -tz-



    ------------------------------
    -tz-
    ------------------------------



  • 14.  RE: Changing from raw to cooked

    Posted Fri February 27, 2026 10:40 AM

    Just take your time with this.   It took me 6 months to get all the scripts in place working to be able to do the same raw-->cooked transition last year when SUSE did the same thing in their OS.   

    Use the built-in Informix MIRROR command to greatly simplify this.   The process was recommended here by someone (probably Art).   My process was (roughly):

    1 - create cooked chunks

    2 - MIRROR the raw chunks to the cooked ones

    3 - Swap the primary to be the cooked inside the engine

    4 - drop the raw chunk (which at that point the engine considers the MIRROR)

    Repeat over and over.   The problem is each space/chunk is different.   Especially if you have fragmented spaces across chunks.   It was possible for me to consolidate all the fragmented chunks into one during this process.   I don't remember how I did it, but I did.   We are entirely 1-to-1 chunk-to-space.    It required a big process of scripts unique to each space.

    I also had to involve IBM directly as we had a bug  that prevented any of it to work in a couple of spaces....IBM has since fixed the issue, but if the space was created in any version older than 14.1, and has a 12k page size, there's a good chance you will run into the bug and require IBM on the system to migrate those spaces.



    ------------------------------
    Jared Heath
    ------------------------------