AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
Expand all | Collapse all

NFS specific disk slow writes

  • 1.  NFS specific disk slow writes

    Posted Mon October 07, 2024 08:59 PM

    Hi,

    I ran into an issue with NFS. The NFS server is AIX and the shared folder is on a NetApp branded Toshiba HDD. Writing to that share from NFS clients (no matter which OS) is very slow. Reading however is fine. Writing to shares from other HDDs is also fine so this specific disk seems to be the problem.
    There's one speciality to the disk in question: I had to set max_transfer to 0x100000 or higher. Otherwise read and write speeds were very bad so I assume there's some additional setting needed for that specific disk to work fine with NFS as well.
    If I flip things i.e. AIX is the NFS client getting stuff from another NFS server, writing to the disk in question is fine. So the problem only affects writing to the disk in question from an NFS client.

    Here's what iostat tells during the slow writing:

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             75.5      6.4M   193.5       28.7K       6.4M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              7.0     54.9      0.2    234.4           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            186.5     81.6      2.1    237.6           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              1.1      0.0     40.7      0.0       15.0        12.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.0     10.1M   286.0       14.3K      10.1M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              3.5    143.3      0.2    287.1           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            282.5    104.8      2.1    292.4           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              3.5      0.0     90.6      3.0       51.0        45.5
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.5     10.3M   260.0       14.3K      10.3M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              3.5    197.6      0.2    287.1           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            256.5    124.8      2.1    292.7           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              9.1      0.0     90.6      2.0       23.0        56.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.5      9.8M   275.0       59.4K       9.7M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                             14.5     53.0      0.2    291.4           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            260.5     93.0      1.7    292.7           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              5.8      0.0     93.5      4.0       38.0        42.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.5      8.6M   290.0      133.1K       8.4M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                             32.5     28.7      0.2    291.4           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            257.5     22.9      1.6    292.7           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0     93.5      0.0        4.0         0.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                            100.0      9.8M   287.5       55.3K       9.7M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                             13.5     44.8      0.2    291.4           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            274.0     81.3      1.6    292.7           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              3.3      0.0     93.5      3.0       24.0        43.5
    --------------------------------------------------------------------------------

    I also tried to increase nfs_max_write_size and nfs_max_read_size of the NFS server but that didn't help.
    Network wise I have:

    tcp_recvspace=1048576
    tcp_sendspace=1048576
    udp_recvspace=655360
    udp_sendspace=65536

    I'm not sure whether network settings are important since shares from other disks are fine.


    Anyway if somebody has any idea, it would be welcome!



    ------------------------------
    jack smith
    ------------------------------


  • 2.  RE: NFS specific disk slow writes

    Posted Tue October 08, 2024 02:47 AM

    Hi Jack,

    let's start with easy questions.

    • How is the disk connected to the system? I assume it is local disk, but because nobody knows your systems better than you, you should provide a little bit more information about your system.
    • Can you please send the output of lscfg -vl hdiskX? where hdiskX is your disk in question
    • Can you please send the output of lsattr -El hdiskX? where hdiskX is your disk in question
    • If it is a local disk, can you please send the output of lsslot -c pci?

    What I see from your output, the service queue of the disk is regularly full. It would mean for me that you should increase the size of your queue as the simplest solution.

    - Check which values it can accept:

    lsattr -Rl hdiskX -a queue_depth

    - Check it with the disk vendor. If the disk was supplied by IBM, open a case at IBM support and ask which value you should set

    - Set the proposed value

    chdev -l hdiskX -a queue_depth=X

    Depending on your hardware configuration it might help you to overcome your problems.



    ------------------------------
    Andrey Klyachkin

    https://www.power-devops.com
    ------------------------------



  • 3.  RE: NFS specific disk slow writes

    Posted Tue October 08, 2024 09:46 AM

    Based on the iostat data your hdisk service queues are full frequently. Did you adjust hdisk queue_depth from the default to a higher value?



    ------------------------------
    Ralf Schmidt-Dannert
    ------------------------------



  • 4.  RE: NFS specific disk slow writes

    Posted Tue October 08, 2024 03:02 PM

    Thanks for the replies!

    Initially AIX set the queue_depth for that disk to 3. I changed that to 16 and later to 64. That's what was set when I ran iostat.

    The disk in question is a VIOS disk but fully (physical) assigned to the AIX LPAR where I'm having the mentioned problem. Here is the output from VIOS:

    # lscfg -vl hdisk2
      hdisk2           U78CB.001.WZS00VE-P2-D11  Other SAS Disk Drive

            Manufacturer................NETAPP
            Machine Type and Model......X423_TAL13900A10
            ROS Level and ID............4E413031
            Hardware Location Code......U78CB.001.WZS00VE-P2-D11

    # lsattr -El hdisk2
    clr_q         no                               Device CLEARS its Queue on error True
    max_transfer  0x200000                         Maximum TRANSFER Size            True
    pvid          00f9433888d2791b0000000000000000 Physical volume identifier       False
    q_err         yes                              Use QERR bit                     True
    q_type        simple                           Queuing TYPE                     True
    queue_depth   64                               Queue DEPTH                      True
    reassign_to   120                              REASSIGN time out value          True
    rw_timeout    30                               READ/WRITE time out value        True+
    start_timeout 60                               START unit time out value        True
    ww_id         50000396a83a54b0                 World Wide Identifier            False

    # lsslot -c pci
    # Slot                    Description                                      Device(s)
    U78CB.001.WZS00VE-P1-C11  PCI-E capable, Rev 3 8x lane slot with 8x lanes  ent0 ent1 ent2 ent3


    And the same from the target LPAR:

    # lscfg -vl hdisk8
      hdisk8           U8284.22A.214338V-V2-C3-T1-L8200000000000000  Virtual SCSI Disk Drive

    # lsattr -El hdisk8
    PCM             PCM/friend/vscsi                 Path Control Module          False
    algorithm       fail_over                        Algorithm                    True
    encrypt_enabled no                               Encryption state of disk     False
    encrypt_md_loc  none                             Encryption metadata location False
    hcheck_cmd      test_unit_rdy                    Health Check Command         False
    hcheck_interval 0                                Health Check Interval        True+
    hcheck_mode     nonactive                        Health Check Mode            True+
    max_transfer    0x100000                         Maximum TRANSFER Size        True
    pvid            00f9433888dfb0f80000000000000000 Physical volume identifier   False
    queue_depth     64                               Queue DEPTH                  True+
    reserve_policy  no_reserve                       Reserve Policy               True+
    rw_timeout      45                               Read/Write Timeout Value     True+

    # lsslot -c pci
    # Slot                   Description                                        Device(s)
    U78CB.001.WZS00VE-P1-C6  PCI-E capable, Rev 3 16x lane slot with 16x lanes  sissas1



    ------------------------------
    jack smith
    ------------------------------



  • 5.  RE: NFS specific disk slow writes

    Posted Fri October 11, 2024 10:18 AM

    Looking at iostat the average IO size written to the disk is small with only about 40k average IO size ... why would max_transfer have to be increased to 0x100000 == 1MB?

    What is also unusual is the min. IO write service time for a 40k IO with > 2ms ... are we talking "spinning disk" at the backend? The maximum observed IO service times are very bad with > 292ms - you may want to reset the iostat counters and re-run to validate - but I see in the data you provided that in at least one interval that maximum slightly increased ... so not just historical value.

    You did not specify what type of writes you are driving from the NFS client. Large file write or "many small" writes?

    Did you try to replicate the same write characteristics directly in the AIX NFS server LPAR - so basically taking NFS / network out of the picture?

    How is the exported file system mounted in the AIX NFS server?

    Are there any errors in AIX error log on the AIX NFS server?



    ------------------------------
    Ralf Schmidt-Dannert
    ------------------------------



  • 6.  RE: NFS specific disk slow writes

    Posted Fri October 11, 2024 11:42 AM
    Edited by jack smith Fri October 11, 2024 11:42 AM

    > why would max_transfer have to be increased to 0x100000 == 1MB?
    As mentioned already: "Otherwise read and write speeds were very bad".
    For example at the default 0x40000 the transfer speeds were stuck at 20mb/s. Local usage that is, not over the network.

    > are we talking "spinning disk" at the backend?
    As mentioned already: "Toshiba HDD".

    > Large file write or "many small" writes?
    Single files between 2MB and 200MB. One file at a time, not a bunch of files with a single command/transfer.

    > taking NFS / network out of the picture?
    As mentioned already: "If AIX is the NFS client getting stuff from another NFS server, writing to the disk in question is fine. So the problem only affects writing to the disk in question from an NFS client.".
    So to rephrase, after setting max_transfer to 0x100000 or higher, everything worked fine except for writes from NFS clients as described.

    > How is the exported file system mounted in the AIX NFS server?
    As a local LV with: rw,noatime,log=NULL

    > Are there any errors in AIX error log on the AIX NFS server?
    None.


    For the record, this is an AIX specific problem. Using the disk in question with a SUSE LPAR on the same machine works fine by default. No need for any settings.



  • 7.  RE: NFS specific disk slow writes

    Posted Fri October 11, 2024 12:54 PM

    So, did you compare iostat data for the following 2 scenarios?

    1) AIX NFS server receiving writes from an NFS client over the network

    2) Local AIX process writing data to the AIX file system (preferably from a local FS like /tmp) 

    As I stated earlier, your iostat data shows average IO size of only ~40KB to disk - but that is averages - << even the default 0x40000 (256KB) max_transfer.

    Of interest here is if IO rates, IO transfer sizes, IO service times, service queue full events are dramatically different between the 2 test cases.

    If you are advantageous, do a test with "dio" mount for that file system and evaluate IO sizes,..., behavior, as well. This will take JFS2 file cache out of the picture.

    I assume you are monitoring during your tests if / how the JFS2 file cache is utilized in your testing?



    ------------------------------
    Ralf Schmidt-Dannert
    ------------------------------



  • 8.  RE: NFS specific disk slow writes

    Posted Fri October 11, 2024 04:19 PM

    Thanks for the additional pointers. Mounting with dio was interesting:

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             85.0     17.7M   135.5        0.0       17.7M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.0      0.0      0.2     96.9           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            135.5      6.4      3.5     17.7           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0         0.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             97.0     20.4M   157.5       32.8K      20.4M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              2.0     10.1      0.2     96.9           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            155.5      6.4      3.5     17.7           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0         0.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                            100.0     20.4M   158.0       32.8K      20.4M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              2.0      9.6      0.2     96.9           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            156.0      6.4      3.5     17.7           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        1.0         0.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             96.0     18.0M   172.5      448.5K      17.5M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                             33.0      9.4      0.2     96.9           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            139.5      7.4      0.9     23.4           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0         0.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             84.5     18.2M   184.5      688.1K      17.6M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                             47.5      6.6      0.2     96.9           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            137.0      6.7      0.9     23.4           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        1.0         0.0
    --------------------------------------------------------------------------------

    hdisk8          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.0     18.4M   278.5        1.6M      16.8M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                            150.0      7.3      0.1    109.0           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            128.5      7.7      0.9    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        2.0         0.0
    --------------------------------------------------------------------------------

    This shows locally writing to the disk in question from a ramdisk. If it's mounted with dio I get the same bad transfer rates as I had without changing max_transfer. So apparently the max_transfer change only affected the cache but not the disk's poor performance itself.



    ------------------------------
    jack smith
    ------------------------------



  • 9.  RE: NFS specific disk slow writes

    Posted Mon October 14, 2024 09:46 AM

    Ok, now you have established that the "raw" write service times to that disk in AIX are quite poor. 6.4ms at 135 IO/s with 134k average IO size - haven't seen that in a long time. This is for the first interval ... seems the "workload" is changing over time as tps significantly increased in later intervals but write throughput did not?

    How are you doing the write test? Plain "cp", or a dd or ? Maybe try to do a "dd" with larger block sizes to see how much throughput you can drive. Alternative you can also do a smaller block size to observe IOPS limits?

    Are you writing to new files, or overwriting existing files? If this is "spinning disk", then those service times typically indicate "lots of seeks". After you completed a copy, you may want to have a look with fileplace how one of the larger files is physically distributed over the hdisk.

    So, the interesting question now is what is causing those high service times? If you do a similar IO test directly in the VIOS against the same LUN, are you getting better latency? Does the VIOS have sufficient resources to support your workload?

    If you get the same 6.4ms for sequential write IO in VIOS, then this is a question for your storage admin to look at NetApp?



    ------------------------------
    Ralf Schmidt-Dannert
    ------------------------------



  • 10.  RE: NFS specific disk slow writes

    Posted Fri October 11, 2024 04:45 PM

    Out of couriosity I ran the same mount-with-dio-test with one of the other "good" disks and they didn't do much better. I got between 24mb/s and 26mb/s with them. So the question is why are they doing so much better if they're mounted normally i.e. without dio?
    Better as in local speed without changing max_transfer as well as NFS writes.



    ------------------------------
    jack smith
    ------------------------------



  • 11.  RE: NFS specific disk slow writes

    Posted Mon October 14, 2024 03:03 PM
    Edited by jack smith Mon October 14, 2024 03:04 PM

    > seems the "workload" is changing over time as tps significantly increased in later intervals but write throughput did not?
    The iostat interval was 2 seconds and except for my cp there was pretty much nothing else going on.

    > How are you doing the write test? Plain "cp"
    Yep, that was just cp.

    > Are you writing to new files, or overwriting existing files?
    That was a new file.

    > If you do a similar IO test directly in the VIOS against the same LUN, are you getting better latency?
    I'll try that next.

    > Does the VIOS have sufficient resources to support your workload?
    As mentioned, another HDD (see specs below) works just fine so the VIOS shouldn't suffer from limitations.

    As also mentioned though, the dio results of that other HDD were not much better, but without dio it does work much better so I tend to think this comes down to the AIX configuration. The other HDD is:

      hdisk1           U78CB.001.WZS00VE-P2-D7  SAS Disk Drive (146800 MB)

            Manufacturer................IBM
            Machine Type and Model......MK1401GRRB
            FRU Number..................00FX876
            ROS Level and ID............36323046
            EC Level....................N46478
            Part Number.................00FX870
            Device Specific.(Z0)........000006329F001002
            Device Specific.(Z1)........620F620F620F
            Device Specific.(Z2)........0001
            Device Specific.(Z3)........14021
            Device Specific.(Z4)........
            Device Specific.(Z5)........22
            Device Specific.(Z6)........N46478
            Hardware Location Code......U78CB.001.WZS00VE-P2-D7

    Also Toshiba but IBM branded.



  • 12.  RE: NFS specific disk slow writes

    Posted Mon October 14, 2024 09:03 PM
    Edited by jack smith Mon October 14, 2024 09:04 PM

    So here are the raw results. I ran these directly on the VIOS so there's no virtualisation or anything else in the way. As before just copying the same file (220mb) via cp. All mounted with log=NULL,noatime,dio

    First the problematic HDD without any max_transfer or queue_depth changes:

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.5     20.8M   159.0        2.0K      20.8M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.5      1.2      0.1      1.2           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            158.5      6.2      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0       159.0
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.5     20.6M   158.0        2.0K      20.6M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.5      1.2      0.1      1.2           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            157.5      6.3      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0       158.0
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.5     20.8M   159.5        4.1K      20.8M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              1.0      1.1      0.1      1.2           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            158.5      6.2      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0       159.5
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             98.5     20.4M   156.0        2.0K      20.4M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.5      1.1      0.1      1.2           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            155.5      6.4      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        1.0       156.0
    --------------------------------------------------------------------------------

    Now the same HDD with the following changes:
    chdev -l hdisk0 -a max_transfer=0x200000 -a queue_depth=64

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.0     20.8M   158.5        0.0       20.8M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.0      0.0      0.1      0.1           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            158.5      6.2      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0       158.5
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.5     20.8M   158.5        0.0       20.8M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.0      0.0      0.1      0.1           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            158.5      6.2      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0       158.5
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.0     20.8M   158.5        0.0       20.8M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.0      0.0      0.1      0.1           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            158.5      6.2      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0       158.5
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.0     20.6M   157.0        0.0       20.6M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.0      0.0      0.1      0.1           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            157.0      6.3      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0       157.0
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             96.5     20.5M   156.5        0.0       20.5M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.0      0.0      0.1      0.1           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            156.5      6.3      0.8    109.1           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0       156.5
    --------------------------------------------------------------------------------

    So in dio mode the max_transfer and queue_depth changes make no difference. If mounted without dio however the speed changes are significant.

    And finally for comparison the IBM branded HDD which I put into the same slot to rule out other hardware problems:

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             99.0     28.0M   214.5        4.1K      28.0M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              1.0      1.7      0.2      6.5           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            213.5      4.6      3.5      8.9           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0         0.0
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                            100.0     28.1M   215.0        4.1K      28.0M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              1.0      2.4      0.2      6.5           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            214.0      4.6      3.5      8.9           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        1.0         0.0
    --------------------------------------------------------------------------------

    hdisk0          xfer:  %tm_act      bps      tps      bread      bwrtn
                             98.0     28.0M   214.5        2.0K      28.0M
                    read:      rps  avgserv  minserv  maxserv   timeouts      fails
                              0.5      1.7      0.2      6.5           0          0
                   write:      wps  avgserv  minserv  maxserv   timeouts      fails
                            214.0      4.6      1.1     18.4           0          0
                   queue:  avgtime  mintime  maxtime  avgwqsz    avgsqsz     sqfull
                              0.0      0.0      0.0      0.0        0.0         0.0
    --------------------------------------------------------------------------------

    Compared to the problematic HDD the IBM branded HDD has very low write maxserv values and zero sqfull. So is this a queue problem or should I try to change something else? Or try different tests?



  • 13.  RE: NFS specific disk slow writes

    Posted Tue October 15, 2024 09:43 AM

    So, we are talking single physical spinning disk? I had assumed NetApp == external multi-disk environment - my bad.

    Based on your measurements it seems that the non-IBM disk can not support more than ~ 155 IO/s, which makes sense as 159 ms * 6.2 ~ 1 second. Increasing the queue depth doesn't help as you are filling up that queue all the time - sqfull = 159!

    In summary, this disk is significantly slower to do the sequential write than the IBM branded disk. In addition, it doesn't behave well in that you get very high outliers for maximum service time.

    I assume that you created the LV / FS fresh on that disk and that it is physically sequentially  laid out ?

    You may want to look into the manufacturer specs for that disk and compare those specs to what you observe - bad disk?

    To utilize larger block sizes with the "cp" command you could set environment variable AIX_STDBUFSZ to 1MB and re-test if you can get higher throughput for single-stream copy to disk.

    If you are on a recent version of AIX 7.2 or AIX 7.3 you may want to have a look at j2_nPagesPerRBNACluster which can reduce JFS2 space fragmentation if you are doing concurrent writes of new larger files into a JFS2 file system.



    ------------------------------
    Ralf Schmidt-Dannert
    ------------------------------



  • 14.  RE: NFS specific disk slow writes

    Posted Tue October 15, 2024 12:46 PM
    Edited by jack smith Tue October 15, 2024 12:48 PM

    > So, we are talking single physical spinning disk?
    Indeed, as mentioned and I also posted the specs previously.

    > this disk is significantly slower to do the sequential write than the IBM branded disk
    I know and that was never the point. If I use the disk regularly, i.e. without dio, and increase max_transfer to 0x100000 or higher it works fine ... except for NFS writes. That was the point of my initial question here.

    > I assume that you created the LV / FS fresh on that disk and that it is physically sequentially  laid out ?
    Yes, I even tried different ways like using no volume manager and formatting the whole thing directly. But it didn't matter, the performance was always the same.

    > bad disk?
    The disk is "fine" by its standards because I have 2 of them and both behave the same way.

    > you could set environment variable AIX_STDBUFSZ to 1MB and re-test
    Thanks, I'll try that!

    > you may want to have a look at j2_nPagesPerRBNACluster
    Already have that set to 512 based on Oracle recommendations.



  • 15.  RE: NFS specific disk slow writes

    Posted Thu October 17, 2024 09:16 PM
    Edited by jack smith Thu October 17, 2024 09:16 PM

    As a last resort I tried the OS/400 trick:

    - Convert the disk to a pdisk
    - Create a raid0 with only that disk

    And voila, it runs as fine as the IBM branded disk. The bad iostat values are gone and the performance (including NFS) is fine as well.

    But it's still the same disk so it's obviously a configuration problem. Shouldn't it be possible to apply certain settings to get the same result without this trick?



  • 16.  RE: NFS specific disk slow writes

    Posted Sat October 26, 2024 09:40 PM
    Edited by jack smith Sat October 26, 2024 09:40 PM

    Well actually as a VIOS share the NFS speed is still bad. Much better than before but only around 30% of what the same "fake-raid" delivered when I made the share with the VIOS directly. So it seems that raid trick doesn't solve this completely after all.