IBM Destination Z - Group home

Seamless Computing

By Destination Z posted Mon December 23, 2019 03:33 PM


If awards were given for vague and over-hyped technology terms, cloud computing could be a strong contender. Like the famous fragmented elephant description, the meaning and attributes of cloud computing depend on who's speaking, his or her background and experience and (too often) what’s being sold. It's any/all of scalable technology, shared storage, easily provisioned compute images, seamless networking, failover reliability, outage-free maintenance and maybe more.

VM Architectural Enhancement

For a more down-to-earth implementation that plausibly satisfies those reasonable (and not mutually exclusive) requirements, consider z/VM's Single System Image (SSI) feature.

An earlier SSI implementation, developed at the University of Waterloo, was introduced more than 30 years ago by ISV The Adesse Corporation as a VM/SP enhancement. At the time, Computerworld described it as "a comprehensive facility allowing several VM [systems] to appear to the user community as a single CPU."

I installed it in the data center where I worked to connect an IBM 4381 system to the in-place 4341. SSI met multiple needs by avoiding having the systems appear as separate computing islands, which wouldn't have supported our heterogeneous user base accustomed to a shared/common/flexible platform. In addition to having no way to separate users by department or applications used, we avoided duplicating costly software licenses for less-used tools. And we achieved integrated administration for staff efficiency.

By closely coupling the systems with shared DASD and channel-to-channel connectors, we gave our users the services of unified systems with all resources always available. The magic involved was the ability to quickly move running VMs between systems matching load requirements or, more often, switching them to the systems with needed resources such as a database or language compiler.

Reborn for 21st Century z/VM

In October 2011, IBM announced the z/VM VMSSI priced feature, a new implementation of the single system image concept. Once again, it allows seamlessly, invisibly and nearly instantly moving operating Linux guest systems, without disrupting their work, among up to four VM systems (on one or more hardware platforms) composing a cluster. Called Live Guest Relocation (LGR), it provides tremendous operational flexibility, reliability and service quality.

For decades, VM's hypervisor has exploited IBM mainframe capabilities to increase resources dynamically allowing processor, memory, network and I/O capacity to expand on-demand to meet increasing workload requirements. With LGR, z/VM not only moves resources to work, but also transfers work to available resources while providing VM's full architectural and data integrity.

The VMSSI feature enhances the z/VM systems management, communications, disk management, device mapping, VM definition management, installation and service functions for multiple z/VM systems to share and coordinate resources within an SSI structure.

Resource Management

As with the earlier implementation, all systems in a cluster must share DASD and be connected (in earlier version via proprietary protocol and currently using standard Inter-System Facility for Communication). Minidisks can either be shared across members or restricted to a single member; CP checks for conflicts throughout the cluster when a link is requested. There's still no free lunch: cluster capabilities must be adequate for anticipated workload with additional capacity to support relocation. For example, size and use of VM memory affects relocation performance, as relocation processing is proportional to VM size and relocation performance depends on memory change patterns.

Some restrictions limit which VM configurations can be relocated, and target systems must include all facilities a VM uses. After initialization, synchronizing cluster members imposes relatively low overhead, though this still must be anticipated when connecting systems running separately. Each system's resources such as paging space and real memory must be robust enough to handle normal workloads plus "visitors" relocated from other cluster members.

More Elegent

Romney White, developer of Adesse's SSI and today an IBM Senior Technical Staff Member, notes that many people using current SSI ran Adesse's version. He adds, "There are a lot of one-member clusters, which seems odd until you realize that the decision whether to use SSI is best made up front, and is painless if you never go beyond that initial member; of course, if you decide to expand you are well positioned to do so easily."

White adds that current SSI is more elegant with single-system maintenance function, supporting VM multiple releases within a cluster, and allowing dynamically adding/removing systems to/from a cluster. He observes that VMSSI was one of the largest enhancements made to VM.

Systems management is simplified for z/VM instances within a cluster being serviced and administered as one system. Sharing of resources used by each hypervisor and its VMs is coordinated among all members. This gives guests the same devices and networks regardless of which member they run on or are relocated to. Shared resources include:

  1. User directory: VMs are defined and managed in a common repository for the cluster. In addition to supporting traditional VMs which log on to one member at a time, multiconfiguration users (service virtual and system support machines) can log on to multiple members, with member-specific configurations.
  2. Minidisks: Access to VM file systems is protected so that they can be shared across cluster members
  3. Spool files: A VM’s console and unit record files are always available to it
  4. Network device media access control (MAC) addresses: MAC addresses are unique across all systems within an SSI cluster
  5. Commands: Privileged commands and user commands such as TELL and SENDFILE can operate across members
Live Guest Relocation: Here Now and There Moments Later

The LGR command (VMRELOCATE) TEST option determines whether a proposed migration would succeed, useful information long before relocation is needed. "Relocation Zones" are sets of systems to which a guest is permitted to move. They are created automatically based on architectural compatibility but can also be user-defined to, for example, separate test resources from production assets without also managing separate systems or clusters.

Installations have many reasons for valuing LGR. For example, it greatly simplifies planned outages by moving critical guest systems to peer VM systems without service interruptions. They're briefly quiesced, and resume operation on the destination cluster member. Having multi-member SSI complexes minimizes or eliminates outages for guest OSes. Rather than shutting them down on a production LPAR, a staged shutdown and migration can occur until the hypervisor is free, and then maintenance can begin without affecting guests. When production environments have different outage windows, each can be accommodated. Moves from the production hypervisor and back can be handled by lines of business as requirements dictate.

A large financial institution uses LGR for supporting and enhancing development/test, production and disaster planning and recovery. The cluster has four members, geographically separated by about 10 miles, connected via channel extension technology. The VM systems are no longer isolated computing islands, unable to move servers and resources among them. The directory is shared across all systems via Dirmaint and satellite servers. SSI protects minidisk write links without Cross System Extension reserved cylinders.

A federal government agency finds that it makes maintaining a cluster of z/VM systems very straightforward and auditable. It facilitates showing security and performance groups that all systems are indeed running the same software at the same maintenance level.

In addition, SSI provides safety and security for individual users by preventing multiple minidisk write (destructive) accesses across systems, and provides operational continuity by using rolling maintenance on quiesced systems to avoid complete shutdowns. And it allows adding a training/testing environment without affecting production data, yet that data can easily be shared for more realistic developer/user scenarios.

z/VM's SSI feature provides operational efficiency, improved service, business integration and enhanced risk management. SSI is a powerful enhancement to VM's decades-long tradition of leading-edge virtualization, providing horizontal expansion options for VM architecture.

Gabe Goldberg has developed, worked with and written about technology for decades. Email him at