NFS specific disk slow writes

2. RE: NFS specific disk slow writes

Like

Andrey Klyachkin

IBM Champion

Posted Tue October 08, 2024 02:47 AM

Hi Jack,

let's start with easy questions.

How is the disk connected to the system? I assume it is local disk, but because nobody knows your systems better than you, you should provide a little bit more information about your system.
Can you please send the output of lscfg -vl hdiskX? where hdiskX is your disk in question
Can you please send the output of lsattr -El hdiskX? where hdiskX is your disk in question
If it is a local disk, can you please send the output of lsslot -c pci?

What I see from your output, the service queue of the disk is regularly full. It would mean for me that you should increase the size of your queue as the simplest solution.

- Check which values it can accept:

lsattr -Rl hdiskX -a queue_depth

- Check it with the disk vendor. If the disk was supplied by IBM, open a case at IBM support and ask which value you should set

- Set the proposed value

chdev -l hdiskX -a queue_depth=X

Depending on your hardware configuration it might help you to overcome your problems.

------------------------------
Andrey Klyachkin

https://www.power-devops.com
------------------------------

Original Message

Original Message:
Sent: Mon October 07, 2024 08:58 PM
From: jack smith
Subject: NFS specific disk slow writes

Hi,

I ran into an issue with NFS. The NFS server is AIX and the shared folder is on a NetApp branded Toshiba HDD. Writing to that share from NFS clients (no matter which OS) is very slow. Reading however is fine. Writing to shares from other HDDs is also fine so this specific disk seems to be the problem.
There's one speciality to the disk in question: I had to set max_transfer to 0x100000 or higher. Otherwise read and write speeds were very bad so I assume there's some additional setting needed for that specific disk to work fine with NFS as well.
If I flip things i.e. AIX is the NFS client getting stuff from another NFS server, writing to the disk in question is fine. So the problem only affects writing to the disk in question from an NFS client.

Here's what iostat tells during the slow writing:

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         75.5      6.4M   193.5       28.7K       6.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          7.0     54.9      0.2    234.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        186.5     81.6      2.1    237.6           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          1.1      0.0     40.7      0.0       15.0        12.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     10.1M   286.0       14.3K      10.1M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    143.3      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        282.5    104.8      2.1    292.4           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.5      0.0     90.6      3.0       51.0        45.5
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     10.3M   260.0       14.3K      10.3M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    197.6      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        256.5    124.8      2.1    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          9.1      0.0     90.6      2.0       23.0        56.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      9.8M   275.0       59.4K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         14.5     53.0      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        260.5     93.0      1.7    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          5.8      0.0     93.5      4.0       38.0        42.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      8.6M   290.0      133.1K       8.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         32.5     28.7      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        257.5     22.9      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0     93.5      0.0        4.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                        100.0      9.8M   287.5       55.3K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         13.5     44.8      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        274.0     81.3      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.3      0.0     93.5      3.0       24.0        43.5
--------------------------------------------------------------------------------

I also tried to increase nfs_max_write_size and nfs_max_read_size of the NFS server but that didn't help.
Network wise I have:

tcp_recvspace=1048576
tcp_sendspace=1048576
udp_recvspace=655360
udp_sendspace=65536

I'm not sure whether network settings are important since shares from other disks are fine.

Anyway if somebody has any idea, it would be welcome!

------------------------------
jack smith
------------------------------

3. RE: NFS specific disk slow writes

Like

Ralf Schmidt-Dannert

Posted Tue October 08, 2024 09:46 AM

Based on the iostat data your hdisk service queues are full frequently. Did you adjust hdisk queue_depth from the default to a higher value?

------------------------------
Ralf Schmidt-Dannert
------------------------------

Original Message

Original Message:
Sent: Mon October 07, 2024 08:58 PM
From: jack smith
Subject: NFS specific disk slow writes

Hi,

I ran into an issue with NFS. The NFS server is AIX and the shared folder is on a NetApp branded Toshiba HDD. Writing to that share from NFS clients (no matter which OS) is very slow. Reading however is fine. Writing to shares from other HDDs is also fine so this specific disk seems to be the problem.
There's one speciality to the disk in question: I had to set max_transfer to 0x100000 or higher. Otherwise read and write speeds were very bad so I assume there's some additional setting needed for that specific disk to work fine with NFS as well.
If I flip things i.e. AIX is the NFS client getting stuff from another NFS server, writing to the disk in question is fine. So the problem only affects writing to the disk in question from an NFS client.

Here's what iostat tells during the slow writing:

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         75.5      6.4M   193.5       28.7K       6.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          7.0     54.9      0.2    234.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        186.5     81.6      2.1    237.6           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          1.1      0.0     40.7      0.0       15.0        12.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     10.1M   286.0       14.3K      10.1M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    143.3      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        282.5    104.8      2.1    292.4           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.5      0.0     90.6      3.0       51.0        45.5
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     10.3M   260.0       14.3K      10.3M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    197.6      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        256.5    124.8      2.1    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          9.1      0.0     90.6      2.0       23.0        56.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      9.8M   275.0       59.4K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         14.5     53.0      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        260.5     93.0      1.7    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          5.8      0.0     93.5      4.0       38.0        42.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      8.6M   290.0      133.1K       8.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         32.5     28.7      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        257.5     22.9      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0     93.5      0.0        4.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                        100.0      9.8M   287.5       55.3K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         13.5     44.8      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        274.0     81.3      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.3      0.0     93.5      3.0       24.0        43.5
--------------------------------------------------------------------------------

I also tried to increase nfs_max_write_size and nfs_max_read_size of the NFS server but that didn't help.
Network wise I have:

tcp_recvspace=1048576
tcp_sendspace=1048576
udp_recvspace=655360
udp_sendspace=65536

I'm not sure whether network settings are important since shares from other disks are fine.

Anyway if somebody has any idea, it would be welcome!

------------------------------
jack smith
------------------------------

4. RE: NFS specific disk slow writes

Like

jack smith

Posted Tue October 08, 2024 03:02 PM

Thanks for the replies!

Initially AIX set the queue_depth for that disk to 3. I changed that to 16 and later to 64. That's what was set when I ran iostat.

The disk in question is a VIOS disk but fully (physical) assigned to the AIX LPAR where I'm having the mentioned problem. Here is the output from VIOS:

# lscfg -vl hdisk2
hdisk2           U78CB.001.WZS00VE-P2-D11 Other SAS Disk Drive

        Manufacturer................NETAPP
        Machine Type and Model......X423_TAL13900A10
        ROS Level and ID............4E413031
        Hardware Location Code......U78CB.001.WZS00VE-P2-D11

# lsattr -El hdisk2
clr_q         no                               Device CLEARS its Queue on error True
max_transfer 0x200000                         Maximum TRANSFER Size            True
pvid          00f9433888d2791b0000000000000000 Physical volume identifier       False
q_err         yes                              Use QERR bit                     True
q_type        simple                           Queuing TYPE                     True
queue_depth   64                               Queue DEPTH                      True
reassign_to   120                              REASSIGN time out value          True
rw_timeout    30                               READ/WRITE time out value        True+
start_timeout 60                               START unit time out value        True
ww_id         50000396a83a54b0                 World Wide Identifier            False

# lsslot -c pci
# Slot                    Description                                      Device(s)
U78CB.001.WZS00VE-P1-C11 PCI-E capable, Rev 3 8x lane slot with 8x lanes ent0 ent1 ent2 ent3

And the same from the target LPAR:

# lscfg -vl hdisk8
hdisk8           U8284.22A.214338V-V2-C3-T1-L8200000000000000 Virtual SCSI Disk Drive

# lsattr -El hdisk8
PCM             PCM/friend/vscsi                 Path Control Module          False
algorithm       fail_over                        Algorithm                    True
encrypt_enabled no                               Encryption state of disk     False
encrypt_md_loc none                             Encryption metadata location False
hcheck_cmd      test_unit_rdy                    Health Check Command         False
hcheck_interval 0                                Health Check Interval        True+
hcheck_mode     nonactive                        Health Check Mode            True+
max_transfer    0x100000                         Maximum TRANSFER Size        True
pvid            00f9433888dfb0f80000000000000000 Physical volume identifier   False
queue_depth     64                               Queue DEPTH                  True+
reserve_policy no_reserve                       Reserve Policy               True+
rw_timeout      45                               Read/Write Timeout Value     True+

# lsslot -c pci
# Slot                   Description                                        Device(s)
U78CB.001.WZS00VE-P1-C6 PCI-E capable, Rev 3 16x lane slot with 16x lanes sissas1

------------------------------
jack smith
------------------------------

Original Message

Original Message:
Sent: Mon October 07, 2024 08:58 PM
From: jack smith
Subject: NFS specific disk slow writes

Hi,

I ran into an issue with NFS. The NFS server is AIX and the shared folder is on a NetApp branded Toshiba HDD. Writing to that share from NFS clients (no matter which OS) is very slow. Reading however is fine. Writing to shares from other HDDs is also fine so this specific disk seems to be the problem.
There's one speciality to the disk in question: I had to set max_transfer to 0x100000 or higher. Otherwise read and write speeds were very bad so I assume there's some additional setting needed for that specific disk to work fine with NFS as well.
If I flip things i.e. AIX is the NFS client getting stuff from another NFS server, writing to the disk in question is fine. So the problem only affects writing to the disk in question from an NFS client.

Here's what iostat tells during the slow writing:

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         75.5      6.4M   193.5       28.7K       6.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          7.0     54.9      0.2    234.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        186.5     81.6      2.1    237.6           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          1.1      0.0     40.7      0.0       15.0        12.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     10.1M   286.0       14.3K      10.1M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    143.3      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        282.5    104.8      2.1    292.4           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.5      0.0     90.6      3.0       51.0        45.5
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     10.3M   260.0       14.3K      10.3M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    197.6      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        256.5    124.8      2.1    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          9.1      0.0     90.6      2.0       23.0        56.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      9.8M   275.0       59.4K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         14.5     53.0      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        260.5     93.0      1.7    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          5.8      0.0     93.5      4.0       38.0        42.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      8.6M   290.0      133.1K       8.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         32.5     28.7      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        257.5     22.9      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0     93.5      0.0        4.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                        100.0      9.8M   287.5       55.3K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         13.5     44.8      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        274.0     81.3      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.3      0.0     93.5      3.0       24.0        43.5
--------------------------------------------------------------------------------

I also tried to increase nfs_max_write_size and nfs_max_read_size of the NFS server but that didn't help.
Network wise I have:

tcp_recvspace=1048576
tcp_sendspace=1048576
udp_recvspace=655360
udp_sendspace=65536

I'm not sure whether network settings are important since shares from other disks are fine.

Anyway if somebody has any idea, it would be welcome!

------------------------------
jack smith
------------------------------

6. RE: NFS specific disk slow writes

Like

jack smith

Posted Fri October 11, 2024 11:42 AM
Edited by jack smith Fri October 11, 2024 11:42 AM

> why would max_transfer have to be increased to 0x100000 == 1MB?
As mentioned already: "Otherwise read and write speeds were very bad".
For example at the default 0x40000 the transfer speeds were stuck at 20mb/s. Local usage that is, not over the network.

> are we talking "spinning disk" at the backend?
As mentioned already: "Toshiba HDD".

> Large file write or "many small" writes?
Single files between 2MB and 200MB. One file at a time, not a bunch of files with a single command/transfer.

> taking NFS / network out of the picture?
As mentioned already: "If AIX is the NFS client getting stuff from another NFS server, writing to the disk in question is fine. So the problem only affects writing to the disk in question from an NFS client.".
So to rephrase, after setting max_transfer to 0x100000 or higher, everything worked fine except for writes from NFS clients as described.

> How is the exported file system mounted in the AIX NFS server?
As a local LV with: rw,noatime,log=NULL

> Are there any errors in AIX error log on the AIX NFS server?
None.

For the record, this is an AIX specific problem. Using the disk in question with a SUSE LPAR on the same machine works fine by default. No need for any settings.

8. RE: NFS specific disk slow writes

Like

jack smith

Posted Fri October 11, 2024 04:19 PM

Thanks for the additional pointers. Mounting with dio was interesting:

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         85.0     17.7M   135.5        0.0       17.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.0      0.0      0.2     96.9           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        135.5      6.4      3.5     17.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         97.0     20.4M   157.5       32.8K      20.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          2.0     10.1      0.2     96.9           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        155.5      6.4      3.5     17.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                        100.0     20.4M   158.0       32.8K      20.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          2.0      9.6      0.2     96.9           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        156.0      6.4      3.5     17.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        1.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         96.0     18.0M   172.5      448.5K      17.5M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         33.0      9.4      0.2     96.9           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        139.5      7.4      0.9     23.4           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         84.5     18.2M   184.5      688.1K      17.6M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         47.5      6.6      0.2     96.9           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        137.0      6.7      0.9     23.4           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        1.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     18.4M   278.5        1.6M      16.8M
                read:      rps avgserv minserv maxserv   timeouts      fails
                        150.0      7.3      0.1    109.0           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        128.5      7.7      0.9    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        2.0         0.0
--------------------------------------------------------------------------------

This shows locally writing to the disk in question from a ramdisk. If it's mounted with dio I get the same bad transfer rates as I had without changing max_transfer. So apparently the max_transfer change only affected the cache but not the disk's poor performance itself.

------------------------------
jack smith
------------------------------

Original Message

Original Message:
Sent: Mon October 07, 2024 08:58 PM
From: jack smith
Subject: NFS specific disk slow writes

Hi,

I ran into an issue with NFS. The NFS server is AIX and the shared folder is on a NetApp branded Toshiba HDD. Writing to that share from NFS clients (no matter which OS) is very slow. Reading however is fine. Writing to shares from other HDDs is also fine so this specific disk seems to be the problem.
There's one speciality to the disk in question: I had to set max_transfer to 0x100000 or higher. Otherwise read and write speeds were very bad so I assume there's some additional setting needed for that specific disk to work fine with NFS as well.
If I flip things i.e. AIX is the NFS client getting stuff from another NFS server, writing to the disk in question is fine. So the problem only affects writing to the disk in question from an NFS client.

Here's what iostat tells during the slow writing:

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         75.5      6.4M   193.5       28.7K       6.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          7.0     54.9      0.2    234.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        186.5     81.6      2.1    237.6           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          1.1      0.0     40.7      0.0       15.0        12.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     10.1M   286.0       14.3K      10.1M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    143.3      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        282.5    104.8      2.1    292.4           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.5      0.0     90.6      3.0       51.0        45.5
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     10.3M   260.0       14.3K      10.3M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    197.6      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        256.5    124.8      2.1    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          9.1      0.0     90.6      2.0       23.0        56.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      9.8M   275.0       59.4K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         14.5     53.0      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        260.5     93.0      1.7    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          5.8      0.0     93.5      4.0       38.0        42.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      8.6M   290.0      133.1K       8.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         32.5     28.7      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        257.5     22.9      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0     93.5      0.0        4.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                        100.0      9.8M   287.5       55.3K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         13.5     44.8      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        274.0     81.3      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.3      0.0     93.5      3.0       24.0        43.5
--------------------------------------------------------------------------------

I also tried to increase nfs_max_write_size and nfs_max_read_size of the NFS server but that didn't help.
Network wise I have:

tcp_recvspace=1048576
tcp_sendspace=1048576
udp_recvspace=655360
udp_sendspace=65536

I'm not sure whether network settings are important since shares from other disks are fine.

Anyway if somebody has any idea, it would be welcome!

------------------------------
jack smith
------------------------------

10. RE: NFS specific disk slow writes

Like

jack smith

Posted Fri October 11, 2024 04:45 PM

Out of couriosity I ran the same mount-with-dio-test with one of the other "good" disks and they didn't do much better. I got between 24mb/s and 26mb/s with them. So the question is why are they doing so much better if they're mounted normally i.e. without dio?
Better as in local speed without changing max_transfer as well as NFS writes.

------------------------------
jack smith
------------------------------

Original Message

Original Message:
Sent: Mon October 07, 2024 08:58 PM
From: jack smith
Subject: NFS specific disk slow writes

Hi,

I ran into an issue with NFS. The NFS server is AIX and the shared folder is on a NetApp branded Toshiba HDD. Writing to that share from NFS clients (no matter which OS) is very slow. Reading however is fine. Writing to shares from other HDDs is also fine so this specific disk seems to be the problem.
There's one speciality to the disk in question: I had to set max_transfer to 0x100000 or higher. Otherwise read and write speeds were very bad so I assume there's some additional setting needed for that specific disk to work fine with NFS as well.
If I flip things i.e. AIX is the NFS client getting stuff from another NFS server, writing to the disk in question is fine. So the problem only affects writing to the disk in question from an NFS client.

Here's what iostat tells during the slow writing:

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         75.5      6.4M   193.5       28.7K       6.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          7.0     54.9      0.2    234.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        186.5     81.6      2.1    237.6           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          1.1      0.0     40.7      0.0       15.0        12.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     10.1M   286.0       14.3K      10.1M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    143.3      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        282.5    104.8      2.1    292.4           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.5      0.0     90.6      3.0       51.0        45.5
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     10.3M   260.0       14.3K      10.3M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          3.5    197.6      0.2    287.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        256.5    124.8      2.1    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          9.1      0.0     90.6      2.0       23.0        56.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      9.8M   275.0       59.4K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         14.5     53.0      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        260.5     93.0      1.7    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          5.8      0.0     93.5      4.0       38.0        42.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5      8.6M   290.0      133.1K       8.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         32.5     28.7      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        257.5     22.9      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0     93.5      0.0        4.0         0.0
--------------------------------------------------------------------------------

hdisk8          xfer: %tm_act      bps      tps      bread      bwrtn
                        100.0      9.8M   287.5       55.3K       9.7M
                read:      rps avgserv minserv maxserv   timeouts      fails
                         13.5     44.8      0.2    291.4           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        274.0     81.3      1.6    292.7           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          3.3      0.0     93.5      3.0       24.0        43.5
--------------------------------------------------------------------------------

I also tried to increase nfs_max_write_size and nfs_max_read_size of the NFS server but that didn't help.
Network wise I have:

tcp_recvspace=1048576
tcp_sendspace=1048576
udp_recvspace=655360
udp_sendspace=65536

I'm not sure whether network settings are important since shares from other disks are fine.

Anyway if somebody has any idea, it would be welcome!

------------------------------
jack smith
------------------------------

11. RE: NFS specific disk slow writes

Like

jack smith

Posted Mon October 14, 2024 03:03 PM
Edited by jack smith Mon October 14, 2024 03:04 PM

> seems the "workload" is changing over time as tps significantly increased in later intervals but write throughput did not?
The iostat interval was 2 seconds and except for my cp there was pretty much nothing else going on.

> How are you doing the write test? Plain "cp"
Yep, that was just cp.

> Are you writing to new files, or overwriting existing files?
That was a new file.

> If you do a similar IO test directly in the VIOS against the same LUN, are you getting better latency?
I'll try that next.

> Does the VIOS have sufficient resources to support your workload?
As mentioned, another HDD (see specs below) works just fine so the VIOS shouldn't suffer from limitations.

As also mentioned though, the dio results of that other HDD were not much better, but without dio it does work much better so I tend to think this comes down to the AIX configuration. The other HDD is:

hdisk1           U78CB.001.WZS00VE-P2-D7 SAS Disk Drive (146800 MB)

        Manufacturer................IBM
        Machine Type and Model......MK1401GRRB
        FRU Number..................00FX876
        ROS Level and ID............36323046
        EC Level....................N46478
        Part Number.................00FX870
        Device Specific.(Z0)........000006329F001002
        Device Specific.(Z1)........620F620F620F
        Device Specific.(Z2)........0001
        Device Specific.(Z3)........14021
        Device Specific.(Z4)........
        Device Specific.(Z5)........22
        Device Specific.(Z6)........N46478
        Hardware Location Code......U78CB.001.WZS00VE-P2-D7

Also Toshiba but IBM branded.

12. RE: NFS specific disk slow writes

Like

jack smith

Posted Mon October 14, 2024 09:03 PM
Edited by jack smith Mon October 14, 2024 09:04 PM

So here are the raw results. I ran these directly on the VIOS so there's no virtualisation or anything else in the way. As before just copying the same file (220mb) via cp. All mounted with log=NULL,noatime,dio

First the problematic HDD without any max_transfer or queue_depth changes:

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     20.8M   159.0        2.0K      20.8M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.5      1.2      0.1      1.2           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        158.5      6.2      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0       159.0
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     20.6M   158.0        2.0K      20.6M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.5      1.2      0.1      1.2           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        157.5      6.3      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0       158.0
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     20.8M   159.5        4.1K      20.8M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          1.0      1.1      0.1      1.2           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        158.5      6.2      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0       159.5
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         98.5     20.4M   156.0        2.0K      20.4M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.5      1.1      0.1      1.2           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        155.5      6.4      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        1.0       156.0
--------------------------------------------------------------------------------

Now the same HDD with the following changes:
chdev -l hdisk0 -a max_transfer=0x200000 -a queue_depth=64

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     20.8M   158.5        0.0       20.8M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.0      0.0      0.1      0.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        158.5      6.2      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0       158.5
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.5     20.8M   158.5        0.0       20.8M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.0      0.0      0.1      0.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        158.5      6.2      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0       158.5
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     20.8M   158.5        0.0       20.8M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.0      0.0      0.1      0.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        158.5      6.2      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0       158.5
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     20.6M   157.0        0.0       20.6M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.0      0.0      0.1      0.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        157.0      6.3      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0       157.0
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         96.5     20.5M   156.5        0.0       20.5M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.0      0.0      0.1      0.1           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        156.5      6.3      0.8    109.1           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0       156.5
--------------------------------------------------------------------------------

So in dio mode the max_transfer and queue_depth changes make no difference. If mounted without dio however the speed changes are significant.

And finally for comparison the IBM branded HDD which I put into the same slot to rule out other hardware problems:

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         99.0     28.0M   214.5        4.1K      28.0M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          1.0      1.7      0.2      6.5           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        213.5      4.6      3.5      8.9           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0         0.0
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                        100.0     28.1M   215.0        4.1K      28.0M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          1.0      2.4      0.2      6.5           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        214.0      4.6      3.5      8.9           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        1.0         0.0
--------------------------------------------------------------------------------

hdisk0          xfer: %tm_act      bps      tps      bread      bwrtn
                         98.0     28.0M   214.5        2.0K      28.0M
                read:      rps avgserv minserv maxserv   timeouts      fails
                          0.5      1.7      0.2      6.5           0          0
               write:      wps avgserv minserv maxserv   timeouts      fails
                        214.0      4.6      1.1     18.4           0          0
               queue: avgtime mintime maxtime avgwqsz    avgsqsz     sqfull
                          0.0      0.0      0.0      0.0        0.0         0.0
--------------------------------------------------------------------------------

Compared to the problematic HDD the IBM branded HDD has very low write maxserv values and zero sqfull. So is this a queue problem or should I try to change something else? Or try different tests?

14. RE: NFS specific disk slow writes

Like

jack smith

Posted Tue October 15, 2024 12:46 PM
Edited by jack smith Tue October 15, 2024 12:48 PM

> So, we are talking single physical spinning disk?
Indeed, as mentioned and I also posted the specs previously.

> this disk is significantly slower to do the sequential write than the IBM branded disk
I know and that was never the point. If I use the disk regularly, i.e. without dio, and increase max_transfer to 0x100000 or higher it works fine ... except for NFS writes. That was the point of my initial question here.

> I assume that you created the LV / FS fresh on that disk and that it is physically sequentially laid out ?
Yes, I even tried different ways like using no volume manager and formatting the whole thing directly. But it didn't matter, the performance was always the same.

> bad disk?
The disk is "fine" by its standards because I have 2 of them and both behave the same way.

> you could set environment variable AIX_STDBUFSZ to 1MB and re-test
Thanks, I'll try that!

> you may want to have a look at j2_nPagesPerRBNACluster
Already have that set to 512 based on Oracle recommendations.

15. RE: NFS specific disk slow writes

Like

jack smith

Posted Thu October 17, 2024 09:16 PM
Edited by jack smith Thu October 17, 2024 09:16 PM

As a last resort I tried the OS/400 trick:

- Convert the disk to a pdisk
- Create a raid0 with only that disk

And voila, it runs as fine as the IBM branded disk. The bad iostat values are gone and the performance (including NFS) is fine as well.

But it's still the same disk so it's obviously a configuration problem. Shouldn't it be possible to apply certain settings to get the same result without this trick?

16. RE: NFS specific disk slow writes

Like

jack smith

Posted Sat October 26, 2024 09:40 PM
Edited by jack smith Sat October 26, 2024 09:40 PM

Well actually as a VIOS share the NFS speed is still bad. Much better than before but only around 30% of what the same "fake-raid" delivered when I made the share with the VIOS directly. So it seems that raid trick doesn't solve this completely after all.

AIX

AIX