File and Object Storage

A case study of capacity augmentation while retaining existing hardware and support application migration.

By Archive User posted Thu January 24, 2019 08:57 AM

Akbar Harees -
Manish Bansode -
Sandeep Bangur -

1. Requirement:-
Our client has 3 site data centers. They have IBM Storwize V7000-Gen1 Storage infrastructure at each site and they want to add additional capacity (storage) for the applications using the storage. Since V7000-Gen1 hardware sales and upgrades are discontinued as per the product life cycle policy, new capacity addition to the V7000-Gen1 Storage is not possible. New V7000-Gen2 hardware was proposed to the customer for new capacity and performance requirements.

2. Constraints:-
• Because of the budget constraints customer was not willing to invest for entire storage
hardware refresh and capacity upgrade.
• New V7000-Gen2 capacity was planned only for new capacity requirements for the applications
and the existing V7000-Gen1 capacity will have to be re-used.
• Limited staging storage.

3. Customer current storage infrastructure:-
Customer has three sites DC(Delhi), DR (Chennai) and NDR (Delhi). They have configured the GMCV (Global Mirror with Change Volumes) Remote Copy feature between the sites.

Here is a diagrammatic representation of the current storage infrastructure across the three sites:-

How the storwize communicates with other boxes in relation to layers please see below link:-

4. Various approach planned:-
For adding the additional storage, plan was to add V7000-Gen2 2 canisters (single I/O group) at DC and DR site. No additional storage was planned at NDR site. Storage infra at NDR site will remain V7000-Gen1.
Upgrading the hardware across different sites wasn't straight forward because of various reasons. There were different approaches planned each having its pros and cons.

4.1 Approach 1:-
To meet the new capacity requirement, one of the approach was to add the new V7000-Gen2 to the existing V7000-Gen1 cluster.
Here is a diagrammatic representation how storage infra would look like with this approach:-

Implementing this approach would have been straight forward, but the V7000-Gen1 code upgrade is limited till SVC version 7.8.6. If we add the V7000-Gen2 to the existing V7000-Gen1 cluster, it will also be limited to SVC version 7.8.6.
Customer is keen to have new SVC version code to be implemented in their infrastructure, so limiting V7000-Gen2 to 7.8.6 version is possible.
Check following link for more info on 'Hardware Interoperability Matrix for Storwize V7000 Software Levels':

4.2 Approach 2 (Also the preferred one):-
In this approach the total storage capacity will be served by V7000-Gen2. Existing V7000-Gen1 entire capacity will be virtualized under new V7000-Gen2 storage to keep the current data.
Here is a diagrammatic representation how storage infra would look like with this approach:-

• New V7000Gen2 capacity is planned only for new capacity requirements of 130TB at DC and DR sites.
• New V7000 Gen2 single IO GRP with scale in and scale out upgrade options at DC and DR.
• No Storage upgrade required at NDR site.
• Existing V7000-Gen1 cluster is virtualized under new V7000-Gen2, to reuse the V7000-Gen1 hardware till it's EOL and to reduce the new storage investments.
• This design provides the freedom to keep the V7000-Gen2 storage in latest code level in 8.x version. Backend
V7000-Gen1 storage in 7.8.6.

4.2.1 Steps in implementing the approach:- Migration of V7000-Gen1 data:-
a) 325TB and 300TB Capacity from V7000-Gen1 is to be virtualized under V7000-Gen2 at DC and DR respectively.

b) New V7000-Gen2 doesn’t have internal capacity to hold 300TB plus data. Temporary data staging capacity of 52 TB to be by virtualizing IBM FS900 under new V7000-Gen2 at site DC.

c) Since the production is live from DC, average of 30K IOPs are served by the storage controller, FS900 flash storage provides superior performance during data movement actions.

d) The storage at DR location is idle for production, there are less than 5K IOPs average served by the storage.

e) V5000 storage with 180TB of usable capacity (NLSAS drives) to be virtualized under V7000-Gen2 as temporary data staging capacity.

f) To virtualize the V7000-Gen1 capacity under V7000-Gen2, V7000-Gen1 pools to be freed up and new volumes created as per best practice guidelines for backend volume and mdisk creation.

g) Data migration planned in phased wise manner based on host and application down time availability. Every host has volume allocation from different pools. Freeing up the V7000-Gen1 pools is not easy. Initial few phases of host migrations, data migrated to internal drives of V7000-Gen2 and temporary FS900 Pool.

h) Hosts were selected for migration based on capacity provisioned from each pools. Higher capacity pools to be freed up first, to virtualize them under V7000-Gen2 to mitigate the capacity requirements for the migrations. Remote Copy:-
Currently GMCV RC(remote copy) is configured between DC, DR and DC, NDR. All this volumes remote copy must also happen between V7000-Gen2 clusters.
1) DC to DR Remote Copy.
• There was about 200TB data is replicating between DC and DR.
• There shouldn’t be any initial volume remote copy synchronization.
• Storage layer for both DC and DR storage layer configured as Replication.

Initial synchronization between the volumes was not allowed. It was handled by following steps:-
a) Host outage provided for both DC and DR together.
b) Once the Host IO stopped to storage, ensure that Volume are in replication relationship with consistent copied
c) Stopped the Volume replication, and made secondary volume read/write.
d) Volumes were virtualized under V7000 gen2 on image mode at DC and DR. Replication relationship has configured between image mode volumes using “mkrcrelationship” and “-sync” option to avoid the initial synch.
e) Mapped the volumes to DC host and checked the volumes. On successful completion of checking. Stopped the replication and tested the volumes at DR hosts.

2) DC to NDR Remote Copy.
• DC V7000 Gen2 to NDR V7000 Gen1 new replication relationship was to be created.
• Since limited number of volumes were replicating, initial synchronization allowed by the client.

Following steps were taken for setting RC between DC Gen2 and NDR Gen-1.
a) Since the NDR storage was in storage layer, removed all existing volume replication relationship and controller partnership.
b) Then the changed the NDR storage layer to “replication” to have RC partnership with DC V7000 gen2 storage.
c) Required volumes are migrated from v7000 gen1 to V7000 gen2 at DC, volume replication relationships are created between DC and NDR. Tips and tricks :-
This is exercise where we are adding additional capacity to storage infra and also migration of data from old storage infra to new one. These are some tips/tricks used in the implementation:

✔ All paths between existing storage infra and hosts should be removed. This can be done through removing volume host mapping, remove zoning between host and storage infra.
✔ For host migration, existing volumes can be unmapped from existing storage infra and re-mapped to new infra as image mode volumes to the host.
✔ Unmapping of host can also be done by deleting all defined host ports from host object. This helps to keeps the existing volume mapping for any reference and easy reversal to old storage if required.
✔ For thin volumes, identify the real capacity of each volume for the capacity calculation.
✔ While migration to match target volumes storage tiers characteristics with the source volumes, prefixed the host
name/id along with pool name/id as host is also needed to be remembered.

4.2.3 Problems faced:-
Image mode volumes (from old V7000-Gen1 ) are migrated to the new V7000-Gen2 , the destination pools may be either internal storage pools or external storage pool that originated from V7000-Gen1 itself. When V7000-Gen1 external pools are used as destination pools, both high read (from image mode volumes) and writes (to external storage pools) IOPs may create performance bottle neck at V7000 Gen1 storage controllers.

To handle this, migration is planned in two stages:-
1) First from image mode to V7000-Gen2 internal storage pool and then
2) In second stage the data moved from V7000-Gen2 internal storage pool to V7000 External storage pool (V7000-Gen1 volumes ).

This way we ensured high read or write IOPs are not performing at V7000-Gen1 controllers simultaneously.

4.2.4 Other issues:-
Hosts were from different platforms like AIX-LPARs, Windows, Hyper-V, RedHat Enterprise Linux, RHEV-KVM, Vmware. On each platform Migration phase, V7000-Gen1 volumes are virtualized under V7000 Gen2 as image mode volumes to present to the host. For every platform administrator were required to re-scan the volumes at host side from V7000-Gen2 system to bring up the file system as they were before. Each platform had different administrators and every administrator had different set of concern/doubts about these migration steps. All of these concerns were addressed individually to take them into the confidence for migration process.