IBM FlashSystem

Find answers and share expertise on IBM FlashSystem

View Only

Back to discussions

Expand all | Collapse all

Best practices (Planning & Implementation) for Policy-Based HA replication

1. Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Mirosław Pura
Posted Wed June 05, 2024 10:15 AM

Reply
1. Let's assume that we have 2 site called A and B. In each site several independent SQL applications are running. Can we plan PB-HA solution (one IOgroup per site) that:
SQLcluster A1 & A2 has volumes at site A (and synchronous replica in site B)
SQLcluster B1 & B2 has volumes at site B (and synchronous replica in site A)
2. In the above example we must define two different policies (A->B and B->A), right?
3. Do we need to define two (A->B and B->A) different HA storage partition for above example?
4. Can we use one defined policy for more than one storage partition?
5. Is the best practices to group the SQL A1, A2, …, An independent application in the same storage partition and apply policy (A->B) and then the SQL B1, B2, …, Bn independent application in the other partition with policy (B->A)?
6. Can we switch A1 (volumes and host appl) to site B when A2 continues work at site A?

Customer has separate divisions: SQL Admins and Storage administrators. Currently planned host outage (competitive Hitachi GAD solution) does not require involvement of Storage administrators. Planned storage outage does not require involvement of SQL Admins.

7. Is there any way that after the SQL Admin has planned to switch host from site A to site B, it will cause the Flashsystem to switch volumes automatically from A to volumes B?
8. If Flashsystem does not switch automatically - What sequence of FlashSystem commands (or GUI) the Admin must use to switch SQL A2 volumes from site A to site B?
9. It is obvious that independent VG should be defined for each SQL. What is PB-HA best practices which allow the SQL Admin to switch only one independent SQL to another site (example: he must apply patch for only SQL B2 application) regarding planning policy and partitions?
10. Is there any way that after the Storage administrator has planned to switch storage from site A to site B, it will cause the application to switch from host A to host B without SQL Admins engagement?

------------------------------
Mirosław Pura
------------------------------
2. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Carlos Fuente
Posted Thu June 06, 2024 08:12 AM

Reply
To questions in turn:

No actual question here but not sure what A1/A2 are with SQL cluster - are they server nodes in the cluster located at A
But sounds like A and B are separate SQL applications.... as well as being different sites A/B. Continue on that basis

It is valid to use separate policies A>B and B>A for the SQL applications A/B but it might be valid to use the same. Top question is which site/system do you prefer would keep operating if the comms between the two sites breaks? That is the top feature that the policy defines. As a side-effect of this the policy will also control which system you will manage the partition from in GUI and CLI

If you want applications A / B to run separately on sites A / B when the link breaks, then yes they must be separate partition. If they actually both (say) want to be on A then they could share a partition
You can reuse policies. There can be up to 4 partitions per system
It is a good policy to put each large application into its own partition. But you can put multiple applications into the same partition with understanding that they will all failover/failback together - which might or might not be desirable
See the new 870 functionality (announced this week). Volumes for A1 are accessible via the storage on either site. The host A1 can now (in 870) be configured with a 'host location' setting that describes which system is co-located with host A1 and so will be used by preference. Hosts A1 and A2 can be configured with different preferred site locations... even though each continues to have access to all volumes through both systems. Host access can swap dynamically from one site to another with no need to manage at the storage system.... it will just work on the new site.

As in 6.... SQL application can be migrated from one site to another using new servers and will be able to access with no storage management /reconfiguration required
n/a - no switch manual or automatic requierd
Host maintenance doesn't matter - doesn't need to be reflected in the volumegroup or partition design - only need to worry about application consistency for the clustered-volume-scope data
I don't think the question arises because there is no workflow storage admin needs to invoke to do this even for planned maintenance.

Storage admin could alter the preferred location setting for a host which might swap the ALUA states but this would not impact host IO capability. Assuming hosts are zoned to both systems then that would cause host IO to start using a different system but this doesn't require SQL admin engagement.

If hosts are only zoned to a single system (needs SCORE today) then shutting down a storage system will affect that host access but this would be expected.

Carlos Fuente

Distinguished Engineer, IBM Storage Virtualize, FlashSystem & SVC Family

email: carlos_fuente@uk.ibm.com

phone: +44-7795-917197

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Original Message
3. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Mirosław Pura
Posted Thu June 06, 2024 10:55 AM

Reply
Thank You very much!

My understanding is (regarding new v8.7.0 and the picture, and question n.6.) that before switching the application:

host SQL_A1 has preferred location A, volume group Prod_A1 is R/W and volume group Recov_A1 is immutable (btw: does it mean it is Read Only or offline?),

host SQL_A2 has preferred location A, volume group Prod_A2 is R/W and volume group Recov_A2 is immutable,

Both VG (Prod_A1, Prod_A2) are in same partition at site A and has the same policy.

Both VG at site B (Recov_A1, Recov_A2) are in same partition and has the same policy.

SQL Admin needs to move the clustered SQL_A1 appliction to site B.

He want to avoid ISL work from the host on site B to volumes on site A so he need use Recov_A1 volumes after switching.

Application SQL_A2 must continue to run on site A.

Please clarify:

After switching the application SQL_A1 to site B the application runs on new host (lets name it B_SQL_A1).

Will it work as below? Please confirm or explain if this is wrong...

host B_SQL_A1 has preferred location B, volume group Prod_A1 is immutable and volume group Recov_A1 is R/W,

host SQL_A2 has preferred location A, volume group Prod_A2 is R/W and volume group Recov_A2 is immutable,

Both VG at site A (Prod_A1, Prod_A2) are in same partition and has the same policy.

Both VG at site B (Recov_A1, Recov_A2) are in same partition and has the same policy.

Let me explain: the client has hundreds independent application not only 4. Some are working at site A, all others at site B. I used numbers A1, A2 to keep it simple on picture. So the challenge is have to good plan for storage since every months the SQL Admins have planned outage for host maintenance (apply patches, fixes, etc). When making change for one host/application they don't want to switch all other servers from site A to site B. And more: the storage guys tell me that they do not want be involved into server maintenance process.

------------------------------
Mirosław Pura
------------------------------

Original Message
4. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Carlos Fuente
Posted Thu June 06, 2024 03:02 PM

Reply
Hi,

I'm going to re-write the story with a few changes in terminology and to describe the actual volume availability. On terminology:

We don't use the descriptions 'production' and 'recovery' with HA - both sites are production. We only use those descriptions when discussing the PBR DR solution where the volumes have different roles
We don't use the term immutable for the PBHA either - we often use that term for snapshots . It doesn't apply to the 870 solution.... for the 861 solution from last year we might have used the word 'standby' but that is old news now
We tend to use the same names for objects like volumes and volumegroups... in each site... so I'll use phrase 'volume A-at-site-A' or 'A-at-A' here...

The new description:

before switching the application:

host SQL_A1 has preferred location A, volume group A1-at-A is R/W and volume group A1-at-B is R/W as well but not receiving IO

host SQL_A2 has preferred location A, volume group A2-at-A is R/W and volume group A2-at-B is R/W too

Both VG (A1-at-A, A2-at-A) are in same partition at site A and has the same policy.
Both VG at site B (A1-at-B, A2-at-B) are in same partition and has the same policy.

SQL Admin needs to move the clustered SQL_A1 application to site B.

He want to avoid ISL work from the host on site B to volumes on site A so he need use A1-at-B volumes after switching.

Application SQL_A2 must continue to run on site A.

Please clarify:

After switching the application SQL_A1 to site B the application runs on new host (lets name it B_SQL_A1).

Will it work as below? Please confirm or explain if this is wrong...

host B_SQL_A1 has preferred location B, volume group A1-at-A remains R/W but no longer receives IO and volume group A1-at-B is R/W and now receives IO,

host SQL_A2 has preferred location A, volume group A2-at-A is R/W and volume group A2-at-B is still R/W but unused,
Both VG at site A (A1-at-A, A2-at-A) are in same partition and has the same policy.
Both VG at site B (A1-at-B, A2-at-B) are in same partition and has the same policy.

Let me explain: the client has hundreds independent application not only 4. Some are working at site A, all others at site B. I used numbers A1, A2 to keep it simple on picture. So the challenge is have to good plan for storage since every months the SQL Admins have planned outage for host maintenance (apply patches, fixes, etc). When making change for one host/application they don't want to switch all other servers from site A to site B. And more: the storage guys tell me that they do not want be involved into server maintenance process.

>>> Customer should not see any hindrance on migrating application activity from one physical server to another and have each server use host location to access local storage image by preference, and not need to rely on ISL traffic at all. This would even extend to scenarios where a volume is a VMFS datastore and is accessed by servers in both sites using local storage system to each server... still without ever needing to ask server to use ISL to reach remote system.

Thanks,

Carlos Fuente

Distinguished Engineer, IBM Storage Virtualize, FlashSystem & SVC Family

email: carlos_fuente@uk.ibm.com

phone: +44-7795-917197

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Original Message
5. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Mirosław Pura
Posted Fri June 07, 2024 11:15 AM

Reply
Hi,

I have one more question regarding the sentence "This would even extend to scenarios where a volume is a VMFS datastore and is accessed by servers in both sites using local storage system to each server... still without ever needing to ask server to use ISL to reach remote system."

Can we plan use multiple VMDKs on the VMFS volume such that few of them are run at site A and others at site B and all read IOs are handled locally?

------------------------------
Mirosław Pura
------------------------------

Original Message
6. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Carlos Fuente
Posted Fri June 07, 2024 11:29 AM

Reply
Yes exactly - when the partition is HA it is possible to have shared volumes such as VMFS datastore with many vmdks with servers in both sites A/B.... each server delivers read and write IO to the local FlashSystem without tranversing the ISL.... read IO is serviced using local storage, and write IO is handled by local storage system with the data also mirrored across ISL just once to the remote storage to be applied there.

Thanks,

Carlos Fuente

Distinguished Engineer, IBM Storage Virtualize, FlashSystem & SVC Family

email: carlos_fuente@uk.ibm.com

phone: +44-7795-917197

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Original Message
7. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Thiago Lucas
Posted Fri June 21, 2024 09:59 AM

Reply
Hi Carlos,

With hosts in a 2-site stretched vSphere cluster scenario, so all hosts must see the same volume groups (otherwise vSphere HA wouldn't do its job), if preferred system is defined by policy at storage partition level to site A, hence (in my understanding) all "hosts-to-volume groups" within that partition inherits active path definitions to system on site A, how site B hosts within the said HA partition would read or write from the system on B, and not cross-ISL to site A ? Is there a "affinity"-like rule to be configured between hosts and volume groups?

Thanks.

------------------------------
Thiago Lucas
------------------------------

Original Message
8. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Carlos Fuente
Posted Fri June 21, 2024 11:16 AM

Reply
There are two separate settings at play here:

A partition -level 'preferred management system' that defines the behaviour in a quorum race
A host 'location' setting that defines the proximity of that host to one or other of the two systems

Different hosts within a partition can have different location values, even if they are mapped to the same volumes and volumegroups

Hopefully that is clear.

Carlos Fuente

Distinguished Engineer, IBM Storage Virtualize, FlashSystem & SVC Family

email: carlos_fuente@uk.ibm.com

phone: +44-7795-917197

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Original Message
9. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Like
Thiago Lucas
Posted Fri June 21, 2024 11:58 AM

Reply
Sure it does, actually its major. My thoughts were that host location settings wouldn't apply to active volume (multi)path because of storage partition level policy.

Is there a deep dive technical paper on PBHA out already with this kind of specifics?

------------------------------
Thiago Lucas
------------------------------

Original Message

IBM FlashSystem

IBM FlashSystem

Best practices (Planning & Implementation) for Policy-Based HA replication

Mirosław PuraWed June 05, 2024 10:15 AM

Carlos FuenteThu June 06, 2024 08:12 AM

Mirosław PuraThu June 06, 2024 10:55 AM

Carlos FuenteThu June 06, 2024 03:02 PM

Mirosław PuraFri June 07, 2024 11:15 AM

Carlos FuenteFri June 07, 2024 11:29 AM

Thiago LucasFri June 21, 2024 09:59 AM

Carlos FuenteFri June 21, 2024 11:16 AM

Thiago LucasFri June 21, 2024 11:58 AM

1. Best practices (Planning & Implementation) for Policy-Based HA replication

2. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

3. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

4. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

5. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

6. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

7. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

8. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

9. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Additional
Resources

Office

Quick Links

IBM FlashSystem

IBM FlashSystem

Best practices (Planning & Implementation) for Policy-Based HA replication

Mirosław PuraWed June 05, 2024 10:15 AM

Carlos FuenteThu June 06, 2024 08:12 AM

Mirosław PuraThu June 06, 2024 10:55 AM

Carlos FuenteThu June 06, 2024 03:02 PM

Mirosław PuraFri June 07, 2024 11:15 AM

Carlos FuenteFri June 07, 2024 11:29 AM

Thiago LucasFri June 21, 2024 09:59 AM

Carlos FuenteFri June 21, 2024 11:16 AM

Thiago LucasFri June 21, 2024 11:58 AM

1. Best practices (Planning & Implementation) for Policy-Based HA replication

2. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

3. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

4. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

5. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

6. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

7. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

8. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

9. RE: Best practices (Planning & Implementation) for Policy-Based HA replication

Additional Resources

Office

Quick Links

Additional
Resources