IBM FlashSystem

IBM FlashSystem

Find answers and share expertise on IBM FlashSystem

 View Only

The Truth about XIV

By Tony Pearson posted Mon September 15, 2008 09:15 PM

  

Originally posted by: TonyPearson


Last week, I presented IBM's strategic initiative, the IBM Information Infrastructure, which is part of IBM's New Enterprise Data Center vision. This week, I will try to get around to talking about some of theproducts that support those solutions.

There has been a lot of attention on XIV in the past few weeks, so I will start with that. Steve Duplessie, anIT industry analyst from Enterprise Strategy Group (ESG) had a post [Adaptec buys Aristos, Tom Cruise, XIV, and Logical Assumptions] with some interesting observations and some sage advice.Val Bercovici on his NetApp Exposed blog, has a post [Has Storage Swift-Blogging Finally Jumped the Shark?] which blasts EMC for their negativity.

(For those not in the USA, swift-blogging is a reference tofalse accusations and negative remarks made during the U.S. 2004 presidential election by the[Swift Boat Veterans], and ["jumping the shark"] is a reference to [a TV show that ran out of interesting and relevant topics].For movie sequels, the comparable phrase is ["nuke the fridge"] in reference to the most recent Indiana Jones' movie.)

I was going to set the record straight on a variety of misunderstandings, rumors or speculations, but I think most have been taken care of already. IBM blogger BarryW covered the fact that SVC now supports XIV storage systems, in his post[SVC and XIV],and addressed some of the FUD already. Here was my list:

Now that IBM has an IBM-branded model of XIV, IBM will discontinue (insert another product here)

I had seen speculation that XIV meant the demise of the N series, the DS8000 or IBM's partnership with LSI.However, the launch reminded people that IBM announced a new release of DS8000 features, new models of N series N6000,and the new DS5000 disk, so that squashes those rumors.

IBM XIV is a (insert tier level here) product

While there seems to be no industry-standard or agreement for what a tier-1, tier-2 or tier-3 disk system is, there seemed to be a lot of argument over what pigeon-hole category to put IBM XIV in. No question many people want tier-1 performance and functionality at tier-2 prices, and perhaps IBM XIV is a good step at giving them this. In some circles, tier-1 means support for System z mainframes. The XIV does not have traditional z/OS CKD volume support, but Linux on System z partitions or guests can attach to XIV via SAN Volume Controller (SVC), or through NFS protocol as part of the Scale-Out File Services (SoFS) implementation.

Whenever any radicalgame-changing technology comes along, competitors with last century's products and architectures want to frame the discussion that it is just yet another storage system. IBM plans to update its Disk Magic and otherplanning/modeling tools to help people determine which workloads would be a good fit with XIV.

IBM XIV lacks (insert missing feature here) in the current release

I am glad to see that the accusations that XIV had unprotected, unmirrored cache were retracted. XIV mirrors all writes in the cache of two separate modules, with ECC protection. XIV allows concurrent code loadfor bug fixes to the software. XIV offers many of the features that people enjoy in other disksystems, such as thin provisioning, writeable snapshots, remote disk mirroring, and so on.IBM XIV can be part of a bigger solution, either through SVC, SoFS or GMAS that provide thebusiness value customers are looking for.

IBM XIV uses (insert block mirroring here) and is not as efficient for capacity utilization

It is interesting that this came from a competitor that still recommends RAID-1 or RAID-10 for itsCLARiiON and DMX products.On the IBM XIV, each 1MB chunk is written on two different disks in different modules. When disks wereexpensive, how much usable space for a given set of HDD was worthy of argument. Today, we sell you abig black box, with 79TB usable, for (insert dollar figure here). For those who feel 79TB istoo big to swallow all at once, IBM offers "capacity on demand" pricing, where you can pay initially for as littleas 22TB, but get all the performance, usability, functionality and advanced availability of the full box.

IBM XIV consumes (insert number of Watts here) of energy

For every disk system, a portion of the energy is consumed by the number of hard disk drives (HDD) andthe remainder to UPS, power conversion, processors and cache memory consumption. Again, the XIV is a bigblack box, and you can compare the 8.4 KW of this high-performance, low-cost storage one-frame system with thewattage consumed by competitive two-frame (sometimes called two-bay) systems, if you are willing to take some trade-offs. To getcomparable performance and hot-spot avoidance, competitors may need to over-provision or use faster, energy-consuming FC drives, and offer additional software to monitor and re-balance workloads across RAID ranks.To get comparable availability, competitors may need to drop from RAID-5 down to either RAID-1 or RAID-6.To get comparable usability, competitors may need more storage infrastructure management software to hide theinherent complexity of their multi-RAID design.

Of course, if energy consumption is a major concern for you, XIV can be part of IBM's many blended disk-and-tapesolutions. When it comes to being green, you can't get any greener storage than tape! Blended disk-and-tapesolutions help get the best of both worlds.

Well, I am glad I could help set the record straight. Let me know what other products people you would like me to focus on next.

technorati tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , ,



8 comments
6 views

Permalink

Comments

Tue September 23, 2008 10:49 AM

BarryB,Since you do not have definitive, objective, third-party proof that XIV uses more or less power that comparable EMC models, and the "customer tests" you've seen are not public, it seems we are at a standstill. Arguing over the maximum kW specified in spec sheets does not accurately reflect the amount of energy used in typical workload situations, and diminishes the improved features of availability, performance and usability of the XIV.
As for the trade-off of recovery between recovering all the data on the box in rare cases when enough components fail on a DMX4, CX4 or XIV, or the more common recover of just a subset of LUNs contained to a single RAID rank on a DMX4 or CX4 that happens orders of magnitude more often, the situation is the same in that customers either (a) have a remote disk mirror and the recovery is quick and painless, or (b) perform recover on a list of LUNs from external tape or disk backup, and the recovery takes more time and effort.
I cannot speak about individual customers on this blog without permission. I will try to find other customer references to support the positive features of XIV.
--Tony

Tue September 23, 2008 10:42 AM

Ahh...your silence underscores the reality:
Some Factual Observations DON'T Mislead and Misrepresent.

Sat September 20, 2008 08:24 AM

Tony, Tony, Tony.
Although you say you are no longer in marketing, you still have the requisite mastery of the english language - you twist and turn every challenge back into a situation where you do not have to admit the limitations.
But the limitations are still there.
The power config I used was indeed 180 1TB SATA drives in a 950, including the separate bay. Apples to apples on drive count, or usable capacity, for both CX and DMX4 - whichever way you want to look at it.
Fact is that you have no proof for your assertion that the EMC arrays would have to use FC drives to match the performance of a XIV - that is nothing more than PowerPoint claims and marketecture. There fore, I will stand by my assertion that a CX4 or DMX4 with SATA drives will easily exceed the performance of an all-ATA XIV box.
And I have seen the customer tests that prove it.
Even your XIV poster child Leumi Bank admits that they cannot use the XIV for low-latency applications or with random-access applications or where bandwidth is the limitation.
That covers a lot of unsable territory, IMHO.
According to them, XIV is relegated instead to what Itzik Reuven calls the "fat middle" - sounds like the definition of SATA in a CX or DMX4...except that the EMC arrays can support BOTH tier 1 and "fat middle" in the same box - not to mention flash-based tier 0 as well.
Leumi even says that when performance starts to slow down on one of their XIV arrays, the management action is to stop adding new application data to that XIV array.
Translation: they aren't even able to use all of the (paltry) usable capacity of the array because of performance limitations! (I suspect they didn't pay full price for their Nextras if they can afford not to use all the capacity - especially when they claim that a backup clone would have been "cost prohibitive".)
In fact, that pretty much undermines Leumi's credibility right there - they wouldn't pay for a RAID 5 full-clone to protect their data, but they WILL buy MUCH MORE THAN TWICE the amount of physical storage required to meet their needs. Penny-wise, pound foolish, unless Moshe gave them the hardware for free...
In the interest of truthfullness and honest disclosure - let's just keep power comparisons apples-to-apples, SATA-SATA. OK?
As to disk loss...like a good marketing professional, you keep reforming the problem so it fits your architecture.
Let me keep it simple:
What happens when 2 drives in TWO DIFFERENT storage sets in the same XIV array fail within 5 minutes of each other?
Answer: Data LossScope: Unknown - 1 or more LUNs are missing 1 or more 1MB chunks of real data that cannot be recoveredSolution: Rebuild/recover every single file system and database on every single LUN in the array
And no - the same is not true for the usual RAID implementations...if you lose 2 drives in a RAID 5 7+1 set, you lose all the luns on that RAID set - but no more. And with advanced systems like CLARiiON and Symmetrix, you'll only lose the LUNs on the segments of the RAID set that hadn't yet completed rebuilding.
Most importantly, EMC customers can easily determine which LUNs were not totally recovered, and thus limit their restores to only the impacted LUNs.
This is the achilles heel of XIV - there is no way for the customer (or IBM customer service, I've been told) to figure out which LUNs are missing 1 or more 1MB chunks.
You can keep wiggling, but my sources assure me my observation is accurate. And they should know...and I think you do too.
Why else would you wiggle so?

Wed September 17, 2008 09:28 PM

According to your own specs, the system bay of the DMX4-950 can hold only 120 drives, so a storage bay is required to have the full 180 drives. To get comparable performance, you would need 180 FC drives in the DMX4-950 versus the wide-striped performance of XIV storage using SATA. It would be an interesting comparison to actually see how much each would actually draw in terms of energy, as I agree it is uninteresting to argue over spec ratings (which you started by the way on your blog)
For each disk in the XIV, the 1MB chucks are only copied to 168 of the 180 drives. In other words, there are 11 other drives that do not contain mirrored data with any drive you pick. If your two drive failure occured within 5 minutes on those two drives within this set, the system can handle this just fine, no LUNS are impacted, and no need to recover any data. In fact, XIV can lose an entire drawer of drives and still continue running without any loss of data and be able to recover from this situation just fine. I have confirmed this with our XIV team. I am sorry your sources have misinformed you again.
The same could be said about any other disk system from any major vendor. As most EMC systems are running either RAID1, RAID10 or RAID5, in some cases a two-drive failure would require major recovery, and in other cases, the two drives might not be part of the same RAID pair or rank to affect each other. Whether you are recovering 8TB or 80TB, the act of recovering from tape would take hours to days, depending on how much data you had actually written, and the act of recovering from a disk mirror would be substantially less time. All major vendors advise that disk mirroring is a smart thing to do for any data where time is of the essence for recovery. No vendor guarantees 100 percent uptime on any disk system, because there are always combinations of multiple component failure that cannot be recovered from internally.
In the case of XIV, when in disk mirroring mode, each 1MB chuck appears quadruple set, so you would need to have a four-drive failure to require recovery action. The system can perform regular rebuild process from any two-drive failure by using the third and fourth copies of each 1MB chunk, without having to switch over the entire system to the secondary site.
Having 3, 4 or more copies of data to ensure availability is what Google, Yahoo, and nearly every mainframe customer do for their most critical applications. The popularity of IBM Metro Mirror, EMC SRDF and HDS TrueCopy mirroring are used in these cases. XIV storage system is no different.
-- Tony

Wed September 17, 2008 07:37 PM

Tony, I will have to correct you on your DMX4 comparison - you are using ratings for the full compliment of 240 drives in the drive bay, while my comparisons were for the actual number of drives used. Drives make up the vast majority of power, so there's a big difference between 240 and 180 drives.
As I'm sure you know, it is difficult to communicate the power requirements for all the various possible configurations, so, like IBM, EMC lists ratings for the maximum configurations. For the XIV, since there is no ability to remove drives, the maximum config is the same as the minimum; for the DMX4, the system can be configured with as few as 32 drives, and the power will be significantly lower than for the maximum 240 drive config.
Similar for the CLARiiON configs, by the way, which is a far more appropriate comparison to the all-SATA XIV array. At least some of those customers doing their own evals are finding that CLARiiON is not only faster, it's cheaper. Go Figure!
Oh, and UL rules require that you report the worst-case power draw, which in the case of the DMX includes the power required to recharge the integrated UPS's while operating under full load.
You really can't use spec sheets to compare power requirements, unless you take the time to understand what they really mean.
Of course, I probably should have used nominal operating power instead of maximum, although that would have shown even a bigger gap between the EMC arrays and the XIV.
But I would like to see you present documentation that shows that the damage is limited to a small subset of the LUNs in the event of a dual drive failure within, say, 5 minutes of each other. I will not betray my sources, but I am confident that I wasn't misinformed. As I have said, I am pretty sure that your own customer service people are advising customers that should two drives fail within a few minutes of each other, they should be prepared to restore every last byte of data stored in the array - there is no magical in-box recovery if the second drive that fails includes the only remaining copy of any of the blocks lost on the first failure. And worse, the loss cannot be tracked to the specific LUNs that were affected - there are no backwards pointers for the lost 1MB chunks. So you have to restore everything, or risk the effect of silently corrupted file systems and databases (a megabyte here and there can really be catastrophic!).
Go ahead - ask your XIV-trained CS engineers. I won't be surprised that you never admit it, but I'm pretty sure your CS engineers won't leave your customers in the dark, either.
As for Stewey - geez, lighten up a little...I am starting to feel like the boy who kept telling everyone the emporer had no clothes.
And you are sounding an awful lot like the emporer's tailor

Wed September 17, 2008 01:17 PM

Stewey,Thanks for the support. We'll see if BarryB or anyone from EMC takes you up on that challenge. We have several customers doing proof-of-concept projects at their own locations, with their own applications and data, making their own fair and balanced comparisons.
--Tony

Tue September 16, 2008 02:46 PM

I find it hilarious at how fascinated BarryB is with the XIV array. He's very excited, as a dog is to a pant leg, to point out the capabilities of the array. I think he has tremendous animosity towards Moshe and is really jealous of his success. There seems to be a serious personal vendetta going on here.
The problem with most of his arguments is that he's not doing an apples/apples comparison. You can't build a DMX or CLARiiON with similar performance or price to the XIV and have either the CLARiiON or DMX come out on top. It just aint going to happen. I would bet that you could buy TWO XIV's and still be cheaper then a single CLARiiON or DMX and have the capability to replicate the data between two sites.
In fact, I challenge BarryB to provide a similar config of a DMX/CLARiiON, include usable capacity, Power (KwH), Cooling (KBTU/H), and footprint (SqFt). Let's pick a city in the US to determine the avg $/KW (http://www.aps.org/policy/reports/popa-reports/energy/units.cfm). And then do a combined footprint/power/cooling cost comparison. Then, let's pick a simple disk I/O benchmark tool to see how the configs compare against each other. From there, let's see who comes out on top.

Tue September 16, 2008 11:26 AM

Thanks for setting the record straight, Tony.
By ommission, I think you have confirmed my assertion that an XIV dual-drive failure (before the 1st drive rebuild completes) WILL indeed cause irreparable damage to ALL of the data stored in the array, requiring every single LUN to be recovered from backup. The issue isn't one of probabilities, it is a matter of the scope of impact WHEN it occurs. And like I said, you do not have to believe me - ask your own XIV-trained service engineers.
You also neglected to mention that while the terms of PAYGO allow the customer to pay for only the first 25% of capacity in the array at time of acquisition, PAYGO also requires the customer to purchase the remaining capacity within 12 months of installation. More of a 12-month installment plan than pay-as-you-grow.
And of course, you are powering and cooling all 180 drives for the whole duration, whether you're using the capacity or not.
Which reminds me...you have done nothing to directly contradict the observation that the XIV array uses more power than a comparable CLARiiON or DMX, be they configured with the same usable capacity or with the identical number of 1TB SATA drives. When you are outright unable to get any more power into the data center, every excess watt is important.
Finally, and for the record, I am truly sorry that you mistook my factual observations as "negativity." While I'm obviously not going to sing the praises for you, I honestly tried to present a factual assessment, pointing out some things about XIV that others may have overlooked. One man's facts are another man's FUD, I guess.
But I will admit it came across pretty heavy-handed, especially when you and BarryW weren't able to respond until weeks later. Sorry about that...