Informix

nested-group-icon.png

DB2

Expand all | Collapse all

moving to virtual servers, have questions

  • 1.  moving to virtual servers, have questions

    Posted Mon January 25, 2021 09:59 AM
    All,

    I've been working with Informix for a number of years, but always on physical servers. We now are migrating to virtual servers running under VMWare/SimpliVity. As part of this, we will be upgrading our Informix from 11.70 to 14.10, and from RH 7.x to 8.2. I understand the basics of server virtualization, but there are many variations and different capabilities, so I've got questions.

    Leaving aside the performance benefits of having local replicated copies of your databases, is there any benefit to using Informix replication to keep a disaster recovery (DR) instance up to date, vs using the hive synchronization capabilities of something like SimpliVity?

    Currently, we have physically separate boxes at our primary site, which are kept in sync with HDR, and another server several states away that is kept up to date via RSS. Our users require the local failover machine be available within an hour (easy to accomplish with HDR), and our remote DR site to be available within 6 hours, if our primary data center is lost due to fire or other disaster. Further, in the case of switching to the DR site, the users are willing to accept a loss of up to one hour's worth of work. Those requirements were put in place before we implemented the replication, and never updated to reflect the fact that we actually can come up much more quickly.

    Looking first at the case of a local failover, the team in charge of configuring the virtual servers assures us that with the synchronization between SimpliVity hives, we could avoid the overhead and maintenance chores associated with keeping a second copy of Informix running on a second VM, and instead just spin up a new VM if/when the primary node fails. That new VM would then access a set of disks kept in sync via SimpliVity hive synchronization, and the VM itself would be a backup of our primary server, so all of the Informix biaries, config files, OS configs, cron jobs, etc., would all be present, and the new VM would even have the same hostname and IP address as the failed server.

    Intellectually, I know that all of this should be feasible from a Linux point of view, but I do not know whether the SimpliVity sync can be trusted for Informix. Obviously, if we did this, when we started Informix on the new VM, it would go through its normal fast recovery process, using whatever had been written to the disks supporting the physical and logical logs. But can that VMWare/SimpliVity sync process be trusted to get that right, so that Informix would be a happy camper?

    Taking it to the next step, if we lose the primary data center and have to transition to our DR site, similar questions arise. As I understand it, SimpliVity sync only works within the local clusters of servers, so our DR site would not be able to use that solution. So would we still want to use RSS replication for that site? Given the relatively long period that we're allowed to be down, we may be able to just have a nightly level 0 archive that is transferred to the DR site, along with copies of the logical log backups taken throughout the day, and use those to restore to the point of the last logical log backup. That's what we did prior to replication, but our database has grown since then, so we'll have to test the performance of the new servers. If that is not sufficient, we could implement Continuous Log Restore, which I've used at a previous site.

    Another question involves performance, particularly I/O performance. I've heard that some hypervisors have worse performance than others in this respect, but no one has said which are good and which are bad. Obviously, since the rest of the company already uses VMWare and SimpliVity, I can't pick and choose to get the best, but I would like to know whether this combination performs well, or if there are certain configuration settings that need to be put in place to improve the I/O performance.

    I probably will come up with other questions as we move forward with this, and I'm sure that any responses here will also make me think of additional questions.

    Thank you in advance.



    Brad

    ------------------------------
    Brad Day
    ------------------------------


  • 2.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 10:32 AM

    Brad:

    I'm personally not a fan of disk based replication for databases in general and for Informix in particular. I have seen it work and I have seen it fail. I trust Informix replication completely and would always go that way when I have a choice and recommend that to my clients as the best option.



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 3.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 10:44 AM

    I agree with Art

     

    However I have one customer using Zerto to replicate from the US to  Europe, been solid and, to date, has passed all the DR tests.

     

    Cheers

    Paul

     






  • 4.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 11:03 AM
    Paul,

    You mention "passed all the DR tests."  That is something I've been trying to work out in my head - how to test this for DR.  In most cases, you can't "crash" a VM.  You can issue a command to stop it, but as part of stopping it, it will go through a controlled series of steps, at least at the hypervisor level.  There is nothing controlled about a server crash.  So, how do you test with any degree of certainty that your test accurately reflects what would/could happen in the real world?



    Brad

    ------------------------------
    Brad Day
    ------------------------------



  • 5.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 11:13 AM

    Assuming you have a replicating test system – then just unplug the raw tin  - I find that crashes everything very nicely J

     

    Bring the test system back online with replication down/disabled and compare either end

     

    But  the regular production DR tests have to be more controlled but we have the 'crash' confidence from the test systems

     

    Cheers

    Paul

     






  • 6.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 11:24 AM
    Unfortunately, there are other users on the same physical server that hosts our test VM, so we can't just unplug things.  Not only are there other VMs for other groups, some of those VMs are for production workloads.  Although I guess if they've got replication in place and trust it so well, they shouldn't mind if we unplug it, right?  That last bit was meant as sarcasm, in case anyone missed it.

    But that is what is driving my concern - it seems to be very difficult to do a realistic test in a virtual world.

    ------------------------------
    Brad Day
    ------------------------------



  • 7.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 11:32 AM

    We were lucky with our 'unplug' tests as the entire prod infrastructure was moving datacentre providers.

     

    Cheers

    Paul

     






  • 8.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 12:31 PM
    Brad:

    You say that there are other VMs on the same frame as the Informix VM? Not ideal. Make certain that the Informix VM has dedicated physical cores (not shared cores depending on hyperthreads to be just as good as a physical core) and dedicated memory. If the frame has many physical processors, then make certain that the cores assigned to the Informix VM are all on the same physical processor or as few as possible to avoid NUMA effects slowing performance.

    The Hypervisor that you use with the VMs must be the latest release and enable direct physical IO bypassing the hypervisor or IO will seriously suffer versus bare iron.

    Artr

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 9.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 01:09 PM
    Art,

    Not sure about "same frame" vs. "same host" vs. "same node".  There are multiple terms that seem to be saying the same, or nearly the same thing.  But yes, my understanding is that there are multiple virtual machines running on one physical machine.  That is seen as a major source of efficiency by the server team, compare to running on a "real" server.

    I am trying to make sure that we have dedicated cores, but I do not know whether I can get them to spend the time to see which cores are on which sockets.  They are in the "it's all virtual, so you really don't know where it's executing, and that doesn't matter" camp.  Can you point me to an article somewhere that describes the NUMA effects you mention?  Anything that might provide data to argue the case would be helpful.

    In general, they really want to go to thin provisioning of all resources, disk, memory, and CPU.  I believe I've finally got them to bend on the disk, and as far as memory goes, the instance will take however much virtual memory it needs for the resident segment, at least, so they won't be able to save as much as they probably think they're saving if they use thin provisioning.  I raised the point that we need to insure that they don't overallocate CPU cores, but I don't know whether they'll heed that warning.

    As far as having the direct physical I/O, if it bypasses the hypervisor, doesn't that negate the possibility of it (VMWare/SimpliVity) replicating the I/O to a second set of disks on a standby node?  

    Along with the promise of not requiring Informix replication for failover, they're pushing to let VMWare handle the encryption so that we satisfy the encryption-at-rest requirements.  Any insights on whether that is better or worse than using Informix disk encryption?  

    Thanks again.



    Brad

    ------------------------------
    Brad Day
    ------------------------------



  • 10.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 02:31 PM
    Brad:

    Yes, "frame"="node"="host".

    All Intel and AMD processors are NUMA (Non-Uniform Memory Access) architectures when there are two or more physical CPU chips. Here's a presentation from some university course:
    https://www.cc.gatech.edu/~echow/ipcc/hpc-course/HPC-numa.pdf

    Here's an article from the Journal of Physics that also discusses the effects:
    https://iopscience.iop.org/article/10.1088/1742-6596/664/9/092010/pdf

    The main issue is that as a process migrates from one processor chip's cores to another chip's cores it may leave its memory image behind on memory "owned" by the previous chip. Access to that memory will be significantly slower than access to memory "owned" by the processor on which the code is currently executing. Informix uses shared memory for data management. That memory must be accessed by all oninit engine processes. If oninit processes can be running on cores belonging to multiple processors, then at least some of them will be accessing shared memory non-locally. Those accesses will be slower.

    Whether the VM can be replicated if you are using direct physical IO, I do not know. I would talk to a real VM expert, perhaps someone from VMWare, rather than me or one of your company's system admins (who tend to know far less than they think they do).

    On Encryption-at-Rest, Informix implements EAR very well and the performance impact is rather small. Again, I trust Informix tech. I do have one client using both Informix EAR and SAN base EAR at the same time. I recommended against that because of the doubled encryption/decryption overhead, but this one client has always ignore my advice, so ... That said, using Informix EAR has the advantage that the data is encrypted in flight to and from the storage media as well as at rest. Most lower level EAR schemes have to decrypt the data before sending it to the database and encrypt it after the database sends unencrypted data its way. A major reason to prefer Informix EAR! 

    Oh, one more thing: Insist as strongly as you can that the Informix on Linux VMs not share a frame/host/node with VMs running Windows and especially not share storage volumes with any Windows VMs particularly not email/exchange servers! The CPU and IO patterns are so different as to interfere with each other!



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 11.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 03:48 PM
    Art,

    Thanks for the links on NUMA.  I'll have to read up on that.

    Your point about not sharing with Windows VMs causes me concern, as there are many more Windows VMs than Linux VMs in this shop.  And given that they are touting the ability of SimpliVity to migrate VMs as it sees fit (for load balancing?  I'm not sure), I doubt they'll be able to commit to keeping Informix separate from Windows VMs.



    Brad

    ------------------------------
    Brad Day
    ------------------------------



  • 12.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 04:09 PM
    Brad:

    Yup, and that's the reason why I recommend to my clients that they use bare iron or at least a VM on a dedicated physical machine.

    But, bottom line is that you have to work with what you get. You can recommend, but you are not making the decisions. Just make sure that someone does the performance testing of the new system versus the existing system under typical loads and allows for growth! You can use iReplay from Exact-Solutions to do that kind of testing if you don't have your own testing procedures in place already.

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 13.  RE: moving to virtual servers, have questions

    Posted Thu January 28, 2021 01:38 PM
    Pardon the intrusion as I step in to someone else's thread here, but I have a question based on Art's response regarding EAR. 

    This may relate to a problem we're having getting raw devices to work in our RHEL 8.2 environment, and if Brad has any experience on this, I'd love to hear from him as well.

    We have attempted to create character raw devices, and everything looks OK until I try to use them in an onspaces command:

    [informix@sandbox]$ ls -l /dev/raw
    total 0
    crw-rw----. 1 informix informix 253, 11 Jan 28 12:18 dm-11
    crw-rw----. 1 root     disk     162,  0 Jan 14 16:49 rawctl
    [informix@sandbox]$ pwd;ls -l
    /informix/links
    total 0
    lrwxrwxrwx. 1 informix informix 14 Jan 28 12:21 ifmx_raw_1 -> /dev/raw/dm-11
    [informix@sandbox]$ onspaces -a idxdbs -p /informix/links/ifmx_raw_1 -o 0 -s 100000
    Verifying physical disk space, please wait ...
    Error opening file /informix/links/ifmx_raw_1.​​

    As I mentioned in my thread, there are no error messages in online.log or syslog, and the return code is simply '1'.  

    I asked my sysadm for more information on the major/minor numbers for the raw device.  The minor number, 11, matches the device number in 'dm-11'.  The major number is system-assigned, using values from /proc/devices.  That list is:

    [informix@gsvgsandbox02 links]$ cat /proc/devices
    Character devices:
      1 mem
      4 /dev/vc/0
      4 tty
      4 ttyS
      5 /dev/tty
      5 /dev/console
      5 /dev/ptmx
      7 vcs
     10 misc
     13 input
     21 sg
     29 fb
     99 ppdev
    128 ptm
    136 pts
    162 raw
    180 usb
    188 ttyUSB
    189 usb_device
    202 cpu/msr
    203 cpu/cpuid
    226 drm
    243 aux
    244 hidraw
    245 usbmon
    246 bsg
    247 watchdog
    248 ptp
    249 pps
    250 cec
    251 rtc
    252 dax
    253 tpm
    254 gpiochip
    
    Block devices:
      8 sd
      9 md
     11 sr
     65 sd
     66 sd
     67 sd
     68 sd
     69 sd
     70 sd
     71 sd
    128 sd
    129 sd
    130 sd
    131 sd
    132 sd
    133 sd
    134 sd
    135 sd
    253 device-mapper
    254 mdp
    259 blkext
    ​

    Since we're dealing with a character device, 253 would be 'tpm', which he tells me is the Trusted Platform Module.  I'm guessing here, as I haven't heard back from our server team yet, but could that be because they have configured the VM to use encryption on all disks?  So the OS sees that the VM is providing it an encrypted disk, and onspaces is attempting to access that using some sort of low-level functions that are not supported by encrypted disks?  I'm probably grasping straws, as it seems like the VM should take any attempt to access the disk and do whatever sort of magic is necessary to make it work with the virtualized disk, but ...

    From the list in /proc/devices, it looks like major 162 is raw, and 244 is hidraw, which may be some variant of raw.  

    If anyone has successfully implemented raw devices on RHEL 8.2, please let me know the major/minor numbers that your raw devices show.

    Thank you.



    ------------------------------
    Mark Collins
    ------------------------------



  • 14.  RE: moving to virtual servers, have questions

    Posted Thu January 28, 2021 04:02 PM
    All,

    Thanks to Art and Vladimir, I believe I have the raw device working now.  See the other thread I referenced above if you need details of the resolution.

    ------------------------------
    Mark Collins
    ------------------------------



  • 15.  RE: moving to virtual servers, have questions

    Posted Tue January 26, 2021 02:45 AM
    I agree completely with Art here. Pin the cores to the VM and make sure the cores are on as few sockets as possible. Shared memory access is very important and performance will suffer greatly when accessing memory belonging to other sockets. With NUMA and VM's there are incredibly many variables in play.

    ------------------------------
    Øyvind Gjerstad
    Developer/Architect
    PostNord AS
    ------------------------------



  • 16.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 11:11 AM
    Art,

    Like you, I trust the Informix replication, as it's what I've used for several years.  Any specific situations that caused VM disk-based replication to fail would be most welcome, as that likely is the only way to make the case to those who have fully bought in to the VM promises.  If there are scenarios that can be tested to highlight the potential failure points, or to demonstrate the ways in which a VM can break things such that Informix fast recovery cannot complete (for example), I would very much appreciate seeing those.  

    Thanks.



    Brad

    ------------------------------
    Brad Day
    ------------------------------



  • 17.  RE: moving to virtual servers, have questions

    Posted Mon January 25, 2021 12:05 PM

    Brad,

     

    Not original answer but I go Art's way about the trust on IFMX replication. Deadly simple to setup and admin, never fails, and more flexible in terms of functionality.
    With HDR + RSS you can ensure both High Availability and DRP scenarios in the lapse of a few seconds.

     

    I have a customer who has HDR + connection manager for user sessions, and one day, he realized that he had 2 failovers ( including one failback) without anyone to notice a couple of weeks before....

     

    Also, with a non Informix replication, the question are 'does this type of replication do replication as accurately as Informix does ?' Also does the sum of  (hardware replication + non IFMX software replication + bandwidth) costs  do better than Informix HDR for less $$ or at least same amount of $$ ?

    And also: can this non IFMX replication be changed to RSS type in instants and vice versa. Can they be used concurrently if needed, if yes how complex is this to setup ? (watch out, don't forget to pay the licence for Active active)

    I would hate paying more to have less and less flexibility �� But that's one point of view among others, not a Universal Truth

     

    Regarding the global performance question, you will never have better on a VM than on bare metal, and definitely less if your hypervisor's resource are not minimally guaranteed for your IFMX servers.
    Generally "other vendors" (starting with 'O') DBMS sitting on the same hypervisor just suck most of the machine's resource and leave IFMX with the poor-guy's resource. I saw that many times. YOU gotta watch out to have a minimum of CPU, Memory and disk dedicated as much as possible, else this may lead to unstable performance levels.

     

    Hope this helps

     

    Eric

     

     

     

     

    Eric Vercelletto
    Data Management Architect and Owner / Begooden IT Consulting
    KandooERP Founder and Community Manager
    IBM Champion 2013,2014,2015,2016,2017,2018,2019,2020
    ibm-champion-rgb-130px

    Tel:     +33(0) 298 51 3210
    Mob : +33(0)626 52 50 68
    skype: begooden-it
    Google Hangout: eric.vercelletto@begooden-it.com
    Email:
    eric.vercelletto@begooden-it.com
    www :
    http://www.vercelletto.com
    www  https://kandooerp.org