Mainframe Storage

Mainframe Storage

Enhancing performance, reliability, and security ensuring the availability of critical business workloads

 View Only

Hu Yoshida should know better

By Tony Pearson posted Thu February 15, 2007 02:09 PM

  

Originally posted by: TonyPearson


I am still wiping the coffee off my computer screen, inadvertently sprayed when I took a sip while reading HDS' uber-blogger Hu Yoshida's post on storage virtualization andvendor lock-in. This blog appears to be the text version of theirfunny video.

While most of the post is accurate and well-stated, two opinions particular caught my eye. I'll be nice and call them opinions, since these are blogs, and always subject to interpretation. I'll put quotes around them so that people will correctly relate these to Hu, and not me.

"Storage virtualization can only be done in a storage controller. Currently Hitachi is the only vendor to provide this."
-- Hu Yoshida

Hu, I enjoy all of your blog entries, but you should know better. HDS is fairly new-comer to the storage virtualization arena, so since IBM has been doing this for decades, I will bring you and the rest of the readers up to speed. I am not starting a blog-fight, just want to provide some additional information for clients to consider when making choices in the marketplace.

First, let's clarify the terminology. I will use 'storage' in the broad sense, including anything that can hold 1's and 0's, including memory, spinning disk media, and plastic tape media. These all have different mechanisms and access methods, based on their physical geometry and characteristics. The concept of 'virtualization' is any technology that makes one set of resources look like another set of resources with more preferable characteristics, and this applies to storage as well as servers and networks. Finally, 'storage controller' is any device with the intelligence to talk to a server and handle its read and write requests.

Second, let's take a look at all the different flavors of storage virtualization that IBM has developed over the past 30 years.

1972

IBM introduces the S/370 with the OS/VS1 operating system. "VS" here refers to virtual storage, and in this case internal server memory was swapped out to physical disk. Using a table mapping, disk was made to look like an extension of main memory.

1974

IBM introduces the IBM 3850 Mass Storage System (MSS). Until this time, programs that ran on mainframes had to be acutely aware of the device types being written, as each device type had different block, track and cylinder sizes, so a program written for one device type would have to be modified to work with a different device type. The MSS was able to take four 3350 disks, and a lot of tapes, and make them look like older 3330 disks, since most programs were still written for the 3330 format. The MSS was a way to deliver new 3350 disk to a 3330-oriented ecosystem, and greatly reduce the cost by handling tape on the back end. The table mapping was one virtual 3330 disk (100 MB) to two physical tapes (50 MB each). Back then, all of the mainframe disk systems had separate controllers. The 3850 used a 3831 controller that talked to the servers.

1978

IBM invents Redundant Array of Independent Disk (RAID) technology. The table mapping is one or more virtual "Logical Units" (or "LUNs") to two or more physical disks. Data is striped, mirrored and paritied across the physical drives, making the LUNs look and feel like disks, but with faster performance and higher reliability than the physical drives they were mapped to. RAID could be implemented in the server as software, on top or embedded into the operating system, in the host bus adapter, or on the controller itself. The vendor that provided the RAID software or HBA did not have to be the same as the vendor that provided the disk, so in a sense, this avoided "vendor lock-in".Today, RAID is almost always done in the external storage controller.

1981

IBM introduces the Personal Computer. One of the features of DOS is the ability to make a "RAM drive". This is technology that runs in the operating system to make internal memory look and feel like an external drive letter. Applications that already knew how to read and write to drive letters could work unmodified with these new RAM drives. This had the advantage that the files would be erased when the system was turned off, so it was perfect for temporary files. Of course, other operating systems today have this feature, UNIX has a /tmp directory in memory, and z/OS uses VIO storage pools.

This is important, as memory would be made to look like disk externally, as "cache", in the 1990s.

1990

IBM AIX v3 introduces Logical Volume Manager (LVM). LVM maps the LUNs from external RAID controllers into virtual disks inside the UNIX server. The mapping can combine the capacity of multiple physical LUNs into a large internal volume. This was all done by software within the server, completely independent of the storage vendor, so again no lock-in.

1997

IBM introduces the Virtual Tape Server (VTS). This was a disk array that emulated a tape library. A mapping of virtual tapes to physical tapes was done to allow full utilization of larger and larger tape cartridges. While many people today mistakenly equate "storage virtualization" with "disk virtualization", in reality it can be implemented on other forms of storage. The disk array was referred to as the "Tape Volume Cache". By using disk, the VTS could mount an empty "scratch" tape instantaneously, since no physical tape had to be mounted for this purpose.

Contradicting its "tape is dead" mantra, EMC later developed its CLARiiON disk library that emulates a virtual tape library (VTL).

2003

IBM introduces the SAN Volume Controller. It involves mapping virtual disks to manage disks that could be from different frames from different vendors. Like other controllers, the SVC has multiple processors and cache memory, with the intelligence to talk to servers, and is similar in functionality to the controller components you might find inside monolithic "controller+disk" configurations like the IBM DS8300, EMC Symmetrix, or HDS TagmaStore USP. SVC can map the virtual disk to physical disk one-for-one in "image mode", as HDS does, or can also map virtual disks across physical managed disks, using a similar mapping table, to provide advantages like performance improvement through striping. You can take any virtual disk out of the SVC system simply by migrating it back to "image mode" and disconnecting the LUN from management. Again, no vendor lock-in.

The HDS USP and NSC can run as regular disk systems without virtualization, or the virtualization can be enabled to allow external disks from other vendors. HDS usually counts all USP and NSC sold, but never mention what percentage these have external disks attached in virtualization mode. Either they don't track this, or too embarrassed to publish the number. (My guess: single digit percentage).

Few people remember that IBM also introduced virtualization in both controller+disk and SAN switch form factors. The controller+disk version was called "SAN Integration Server", but people didn't like the "vendor lock-in" having to buy the internal disk from IBM. They preferred having it all external disk, with plenty of vendor choices. This is perhaps why Hitachi now offers a disk-less version of the NSC 55, in an attempt to be more like IBM's SVC.

IBM also had introduced the IBM SVC for Cisco 9000 blade. Our clients didn't want to upgrade their SAN switch networking gear just to get the benefits of disk virtualization. Perhaps this is the same reason EMC has done so poorly with its "Invista" offering.

So, bottom line, storage virtualization can, and has, been delivered in the operating system software, in the server's host bus adapter, inside SAN switches, and in storage controllers. It can be delivered anywhere in the path between application and physical media. Today, the two major vendors that provide disk virtualization "in the storage controller" are IBM and HDS, and the three major vendors that provide tape virtualization "in the storage controller" are IBM, Sun/STK, and EMC. All of these involve a mapping of logical to physical resources. Hitachi uses a one-for-one mapping, whereas IBM additionally offers more sophisticated mappings as well.

technorati tags: , , , , , , , , , , , , , , , , , , , , , , ,

7 comments
10 views

Permalink

Comments

Tue April 24, 2007 09:45 PM

Why does IBM have to continue to apply 1960s mainframe terminology to the 21st century open systems world?
If Hu had said "DASD can only be virtualized at the controller", Tony would still be working the response.
It's the 21st century now. Computer virtual memory management and SAN disk storage virtualization are too different to lump together.
That said, Hu's comment is ridiculous. First, nobody it seems can really define storage virtualization. There is RAID and RAID LUNs, but the LUN is a by-product of duplication of data, and was not invented to virtualize disks. There are host-based volume managers, but those were not invented with virtualization in mind. The original goal was to aggregate disks when disks were small so a single file system could span multiple disks.
Richard's comment is strange. Parity RAID should be done on the controller, where dedicated XOR engines can work their magic. Introducing SAN based volume manager (which is what SVC is), may be a better option than using the RAID controller as the LUN slicer, but functionally the RAID controller/SVC combo is little different from Hu's larger, more functional controller. Certainly the POWER5 server based controllers on IBM's 8000 series can do the same thing. The problem with this is you now have two levels of abstraction to manage. Actually, you have three, because the file system (and perhaps another volume manager) exist on the server.
What I think could bring value to the two-layers of abstraction approach is to use the SVC-like layer as a filesystem. A combination NAS gateway (or SAN FS gateway) and SVC, baked into the array, could be useful as it moves all elements of storage management out of the server. Parallel NFS could be the industry standard SAN file system we have been waiting for. pNFS over RDMA 10Gb Ethernet could make it perform like a SAN over a single wire. That holds some real promise, and would require some serious rethinking of array designs.
The more I study storage, the more I think the NetApp machines are almost ideal. They in essence combine the functions of the SVC and the RAID controller. They can be feature-rich, because the design point for the controller is a powerful server which can serve NAS. Compare this to the traditional midrange modular RAID box which is designed to run an embedded OS on an embedded chip.
It's all an interesting discussion, however the storage industry has failed miserably, and continues to fail miserably, at effectively utilizing capacity. SCSI RAID on open systems is two decades old now. Shared FC SAN is a decade old now. Yet we do not have intelligent storage capacity management. We buy hundreds of large, expensive FC disks to provide performance to our ERP systems, leaving 80% of the capacity unused, and then go buy FC SATA arrays as second tier storage. We have 80% unused capacity, but loose one disk of a RAID5 set and you are keep your fingers crossed for hours as the rebuild to the hot-spare happens. Then when you swap the failed disk, you performance dives again for hours.
Maybe Tony is right. Virtual memory can have page storms (similar to the RAID rebuild), but they are rare events, as we have learned and innovated pretty well on the memory management front. But disk storage is still stuck in the past.

Fri March 16, 2007 11:39 AM

When we looked at IBM SVC, for block level storage virtualization we couldn't understand the logic of putting multiple Tier1 or Tier 2 frames behind a commodity PC with couple of HBA's running Linux. We went with HDS USP and currently have over 300 TB of external storage.

Tue February 27, 2007 06:55 PM

Hu, thanks for the clarification. The SVC now has 8GB per node (it was 4GB, but now is 8GB as of September 2006).

Sun February 25, 2007 10:28 PM

Hello Tony, hope you were able to clean up your computer screen. Good comments from Chris and Richard. I felt the need to clarify my statement due to your post. Please check it out on my blog. By the way I like your blog and appreciate that you keep it open for comments.
RegardsHu

Sun February 18, 2007 03:00 AM

Chris, Richard, you both bring up good points. Rather than a response here, will address them in future blog posts.

Sat February 17, 2007 12:42 AM

Tony,
One of the major tasks of a RAID controller is to present (on its host ports) an error free and ‘perfectly’ emulated disk drive… usually as a number of LUNs.
In the proposed scheme, the SVC & USP provide another level of "virtualization" in front of the existing & already “virtualized” third-party backends, containing RAID controllers. This is sometimes called a “raid of raids”.
Hence, the principle of "virtualization" in SVC is similar to that implemented in a typical RAID controller, driving a number of generic disks. However, this SVC ‘virtualization” task is much simplified by the fact that there is nosupport of RAID 5/6 algorithms. Why not..?
RAID5/6 are very computationally & backend IO intensive algorithms, difficult to implement at the required performance level. This particularly true on ‘commodity’ PC hardware, on which the SVC is based. Performance is one of the major reasons justifying purpose-built RAID controllers.
If you remove the need for RAID 5/6, then all you have is RAID 0/1 … i.e. no extra level of protection…. and more computing power available to LVM, replication and data migration tasks.
SVC seems to be similar to what is currently available under Linux, which is very efficient in striping over multiple host adapters, supports good LVM tools…. and also has a problem with performance under RAID 5/6.
There are some other issues to explore ….
How does one manage & support multi-vendor RAID protected backend enclosures… with different controllers, disks & management interfaces? How doe one justify the cost & complexity of “split” support responsibility? How does SVC scale and how is it a better solution to a large centralized system..?
All said and done… I suspect that it may be cheaper to migrate all of the backend data to a more ‘uniform’ hardware …. to multiple RAID-protected backends or perhaps on one large array. Both IBM & HDS can provide this.
The extra level of virtualization (be it SVC or USP) is an excellent tool for such “as you go”, uninterrupted data migration.
In practical terms, this is a very thinly disguised marketing campaign to facilitate another case of vendor lock-in…...but is there any other alternative ?

Fri February 16, 2007 03:49 AM

Tony, great post. I'd also include (although they didn't invent) the virtualisation in RVA, which enabled virtual LUNs to be created to do thin provisioning. This was an STK product if anyone is interested. One other point, you indicate that Hitachi uses 1:1 mapping versus more complicated mapping in SVC. In fact, I see Hu push the merits of 1:1 mapping in his posts but it isn't the only way to use virtualisation in the USP. For the implementations I have done, I didn't have enough addressable LUNs in the lower tier AMS to be able to present the LUNs as 1:1 through the USP, so I chose to present larger 400GB LUNs and use the USP to slice them. This meant I could address more storage however I lose the ability to just unhook the USP and go direct to the storage. The question is do I care? Probably not. My implementation gets better performance anyway.