Primary Storage

 View Only
Expand all | Collapse all

FlashSystem 7300 PB-HA performance issue

  • 1.  FlashSystem 7300 PB-HA performance issue

    Posted Thu August 29, 2024 10:45 AM

    Hello!

    I just configured two FS7300 (v8.7.0) with policy-based High Availability. Made this by this guide

    We have two sites, each contains one FS7300, 12 host with ESXi 8 and two brocade switch.

    Connected via two fabrics.

    Configured partnership, storage partitions, luns, host, locations, etc. All this by guide.

    After this I decided check performance by IOPS tests. I used FIO and vdbench and got poor results. Both utilities gave same results.

    Test parameters: blocksize 8k, randomRW, 70R/30W.

    And I got about 100k IOPS

    Then I tried change blocksize to 32k and got same about 100k IOPS:

    I think that my system (FS7300) by some reason limit IOPS by 100k. Because independently from block size I got same IOPS.

    Please give me a hint which settings should be checked first.

    I feel that reason of low IOPS are simple but find this reason not can yet :)

    And now I go recheck all settings at VMware to exclude it influence.

    P.S. Each IBM FS7300 contains two 4-port 32Gb FC cards, 18 drives 19.2TB FCM (DRAID6).



    ------------------------------
    A. S.
    ------------------------------


  • 2.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Thu August 29, 2024 10:49 AM

    Hi A.S.

    is this one host result? you can configure your vdbench to run a benchmark with multiple hosts. 



    ------------------------------
    Nezih Boyacioglu
    ------------------------------



  • 3.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Thu August 29, 2024 01:24 PM
    Edited by A. S. Thu August 29, 2024 01:26 PM

    Hello!

    Today I deconfigured HA and continued testing at one system to exclude HA influence.

    I created 6 luns and create one more VM. Now two VMs resides at one physical host.

    My setup:

    vdisks of 1st VM resides at luns 1, 3 and 5.

    vdisks of 2nd VM resides at luns 2, 4 and 6. 

    And I got that result:

    This much better than 100K IOPS, but I desire more :) 

    When I added two more luns:

    vdisks of 1st VM resides at luns 1, 3, 5 and 7.

    vdisks of 2nd VM resides at luns 2, 4, 6 and 8. 

    I got same 340K IOPS but latency increased to 1.2ms.

    Tomorrow I'll continue testing with more VMs and more physical hosts. And maybe more luns.

    P.S.

    Unverified source tell me that IBM FLashSystem have limit by one lun at 100K IOPS. It's true?



    ------------------------------
    A. S.
    ------------------------------



  • 4.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Thu August 29, 2024 01:42 PM

    Hi, 

    No it's not true. Your 100k iops with 32k block size looks like your hosts limit. is your host hba's 16Gbps? I asked this because if you configure your ESXi host properly 2x16Gbps port gives you 3GB/s throughput which is the host hba limit.

    For the better benchmark results I recommend you to work with 3 hosts with 32Gbps FC HBA's. if you are using VM, resources is another issue. You must use Vmware paravirtual scsi adapter instead of LSI Logic SAS for the disks of this VM. Thick Eager Zeroed also recommended. Multipath must be Round Robin for the datastore and I also recommend you to change "Round Robin iops limit" for this ESXi hosts from default value (1000) to 1. 



    ------------------------------
    Nezih Boyacioglu
    ------------------------------



  • 5.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Thu August 29, 2024 01:55 PM

    I have 2x32Gb FC per host. Ok, tomorrow I'll make test via 3 host.

    And of cause I made all by this recommendation from redbook:

    :)



    ------------------------------
    A. S.
    ------------------------------



  • 6.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Thu August 29, 2024 02:01 PM

    Oh yes I wrote that chapter :)



    ------------------------------
    Nezih Boyacioglu
    ------------------------------



  • 7.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 02:49 PM

    Strange thing.

    Changing parameter "Round Robin iops limit" for this ESXi hosts from default value (1000) to 1 didn't gave no result.

    I have same result at 1 and 1000.



    ------------------------------
    A. S.
    ------------------------------



  • 8.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Sun September 01, 2024 06:36 AM
    Edited by Nezih Boyacioglu Sun September 01, 2024 06:36 AM

    round robin iops limit parameter might not affect performance over 4k iops, it's balancing paths and have benefits below 4k iops. You can also use latency centric nmp. 



    ------------------------------
    Nezih Boyacioglu
    ------------------------------



  • 9.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 12:30 AM
    NB, fantastic work!

    AS – We are always eager to help - Can you also consider running a 
    SAN Health report on your fabric, so we can rule out zoning/hba/links/etc ?

      I trust you also did either dedicated replication links and ports between the arrays?
         Perhaps also  Brocade's  Virtual SAN,  [and not VMware's vsan???? ] 
     to further isolate Replication vs host traffic

    You can always open a Support perf ticket, and attach that SAN Health to it as well, and then my pals at SAN Central can also review for any issues, along with a Snap4, which might uncover any other snags!

    I can vouch that VMware sometimes gets wonky with multiple VM's on the same physical ESX host....

    Eager for your next steps, 
    AJ


    Andrew 'AJ' Greenfield

    WW Storage / Security


    +1 480-294-1342

    andrewjg@us.ibm.com

    Public Box Folder:  https://ibm.box.com/v/IBM-andrewgreenfield







  • 10.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 01:01 AM

    Hello!

    I follow this recomendations from redbooks:

    For zoning hosts to flassystem I used peerzones.



    ------------------------------
    A. S.
    ------------------------------



  • 11.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 11:18 AM

    Looking forward for updated results when you run the new host/volume setup with PBHA "topology"! Keep us posted!



    ------------------------------
    Thiago Lucas
    ------------------------------



  • 12.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 01:40 PM
    Edited by A. S. Fri August 30, 2024 01:44 PM

    Hello!

    Today I started from testing single system. Migrate VM to another host didn't gave me increasing IOPS.

    After this I tried change parameter cache hit.

    30% cache hit:

    70% cache hit:

    100% cache hit:

    After this I several time change count of threads and some other setting and at one try I got that result (100% cache hit):

    Increasing count of VMs or hosts didn't gave significant increase of IOPS, but latency grow more than 2 ms.

    Optimal count of luns is 6.

    Below I'll add info about testing with PB-HA.


    ------------------------------
    A. S.
    ------------------------------



  • 13.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 01:58 PM

    After this I  configure PB-HA and continue testing.

    I used 3 VMs, which resides at different hosts.

    vdisks of all VMs resides at all luns.

    Long time I played with different parameters, and now I see than save only two result :)

    Test parameters: blocksize 8k, randomRW, 70R/30W, 70% or 100% cache hits, I don't remember exactly :)

    blocksize 8k, randomRW, 70R/30W, 30% cache hits:



    ------------------------------
    A. S.
    ------------------------------



  • 14.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 02:06 PM

    And I did two extra tests at one controller (second I rebooting via service console).

    Below the best results (blocksize 8k, randomRW, 70R/30W, 100% cache hits).

    At single system:

    At PB:HA systems:

    P.S.

    At this moment we have only 2x16Gb FC links between sites.

    At future will be 8x16Gb FC links.



    ------------------------------
    A. S.
    ------------------------------



  • 15.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 02:09 PM
    Edited by Thiago Lucas Fri August 30, 2024 02:11 PM

    EDIT: 

    You answered while I was posting, 2x 16Gbps interconnect - I suppose 1 for each fabric?

    --------------------------------------

    If you dont mind sharing:

    1) FIBER distance and port speed of these fabric interconnect ?

    2) Ports for host / storage host access / storage replication, all running at 32Gbps ?



    ------------------------------
    Thiago Lucas
    ------------------------------



  • 16.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Fri August 30, 2024 02:45 PM
    1. Distance between sites is about 30 km. At this moment one 16Gbps for each fabric.
    2. Yes.


    ------------------------------
    A. S.
    ------------------------------



  • 17.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Tue September 03, 2024 03:33 AM

    Hello A.S. - thanks for posting - and Nezih, as always, thanks for engaging with our Storage community. In order to make sure that we get you the support you need as quickly as possible, can I ask that you raise a support ticket please ? If you already have, can you please message me directly with the details ? Thank you. 



    ------------------------------
    Maria McDade
    ------------------------------



  • 18.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Wed September 04, 2024 03:09 PM

    Hello!

    Today I repeated testing without HA.

    I created 8 luns and 2 VMs, which resides at different hosts, vdisks of all VMs resides at all 8 luns.

    Below I describe test parameters with results.

    blocksize 8k, randomRW, 70R/30W, 30% cache hits:

    blocksize 8k, randomRW, 70R/30W, 70% cache hits:

    blocksize 8k, randomRW, 70R/30W, 100% cache hits:

    After this I performed tests to check throughput.

    I used 4 VMs, which resides at different hosts, vdisks of all VMs resides at all 8 luns.

    blocksize 128k, sequentialRW, 80R/20W, 50% cache hits:

    blocksize 128k, sequentialRW, 80R/20W, 70% cache hits:

    blocksize 128k, sequentialRW, 80R/20W, 100% cache hits:

    And one extra test. One controller (other in maintenance mode). blocksize 8k, randomRW, 70R/30W, 100% cache hits:

    I completely satisfied by this results. 

    No more help needed. Thanks everyone :)



    ------------------------------
    A. S.
    ------------------------------



  • 19.  RE: FlashSystem 7300 PB-HA performance issue

    Posted Tue September 03, 2024 03:41 PM

    It seems like your IBM FS7300 setup may be limited by a configuration setting, especially since your IOPS remain consistent regardless of the block size. First, ensure that there are no IOPS limits set at the storage partition or LUN level. Check for any Quality of Service (QoS) settings that might be capping performance. Additionally, review the multipathing configuration in VMware to ensure it's optimized for performance, as suboptimal settings can lead to bottlenecks. Lastly, verify the firmware versions of your FS7300 and FC cards, as outdated firmware can impact performance. Rechecking settings on the ESXi hosts for potential configuration issues that could limit throughput is also advisable.



    ------------------------------
    shoib 000
    ------------------------------