IBM FlashSystem

IBM FlashSystem

Find answers and share expertise on IBM FlashSystem

 View Only

Spare Capacity for Life Extension

By Tony Pearson posted Mon May 11, 2009 05:07 PM

  

Originally posted by: TonyPearson


My post last week [Solid State Disk on DS8000 Disk Systems] kicked up some dust in the comment section.Fellow blogger BarryB (a member of the elite [Anti-Social Media gang from EMC]) tried to imply that 200GB solid state disk (SSD) drives were different or better than the 146GB drives used in IBM System Storage DS8000 disk systems. I pointed out that they are the actual same physical drive, just formatted differently.

To explain the difference, I will first have to go back to regular spinning Hard Disk Drives (HDD). There are variances in manufacturing, so how do you make sure that a spinning disk has AT LEAST the amount of space you are selling it as? The solution is to include extra. This is the same way that rice, flour, and a variety of other commodities are sold. Legally, if it says you are buying a pound or kilo of flour, then it must be AT LEAST that much to be legal labeling. Including some extra is a safe way to comply with the law. In the case of disk capacity, having some spare capacity and the means to use it follows the same general concept.

(Disk capacity is measured in multiples of 1000, in this case a Gigabyte (GB) = 1,000,000,000 bytes, not to be confused with [Gibibyte (GiB)] = 1,073,741,824 bytes, based on multiples of 1024.)

Let's say a manufacturer plans to sell 146GB HDD. We know that in some cases there might be bad sectors on the disk that won't accept written data on day 1, and there are other marginally-bad sectors that might fail to accept written data a few years later, after wear and tear. A manufacturer might design a 156GB drive with 10GB of spare capacity and format this with a defective-sector table that redirects reads/writes of known bad sectors to good ones. When a bad sector is discovered, it is added to the table, and a new sector is assigned out of the spare capacity.Over time, the amount of space that a drive can store diminishes year after year, and once it drops below its rated capacity, it fails to meet its legal requirements. Based on averages of manufacturing runs and material variances, these could then be sold as 146GB drives, with a life expectancy of 3-5 years.

With Solid State Disk, the technology requires a lot of tricks and techniques to stay above the rated capacity. For example, you can format a 256GB drive as a conservative 146GB usable, with an additional 110GB (75 percent) spare capacity to handle all of the wear-leveling. You could lose up to 22GB of cells per year, and still have the rated capacity for the full five-year life expectancy.

Alternatively, you could take a more aggressive format, say 200GB usable, with only 56GB (28 percent) of spare capacity. If you lost 22GB of cells per year, then sometime during the third year, hopefully under warranty, your vendor could replace the drive with a fresh new one, and it should last the rest of the five year time frame. The failed drive, having 190GB or so usable capacity, could then be re-issued legally as a refurbished 146GB drive to someone else.

The wear and tear on SSD happens mostly during erase-write cycles, so for read-intensive workloads, such as boot disks for operating system images, the aggressive 200GB format might be fine, and might last the full five years.For traditional business applications (70 percent read, 30 percent write) or more write-intensive workloads, IBM feels the more conservative 146GB format is a safer bet.

This should be of no surprise to anyone. When it comes to the safety, security and integrity of our client's data, IBM has always emphasized the conservative approach.

4 comments
11 views

Permalink

Comments

Tue May 12, 2009 05:21 PM

Well, then, let's see the math of your Distinguished Engineers!
Meanwhile, I'll save the public the agony of anticipation - I've posted the quiz answers over on my blog.

Tue May 12, 2009 03:32 PM

BarryB,I have distinguished engineers here at IBM to do the math for me. I am sorry that IBM has put you and your colleagues in the uncomfortable position of having to explain or defend EMC's decision on the aggressive 200GB format versus IBM's more conservative 146GB format.
-- Tony

Tue May 12, 2009 07:19 AM

By the way, you should really take the time to learn about how wear levelling works on the ZeusIOPS drive before you blog about it.
You're demonstrating my point - IBM doesn't get Flash yet.
Or at least, YOU don't.
There is no situation EVER with the ZeusIOPS drive that you would lose "22GB of cells per year", since the wear-levelling algorithms work to ensure that EVERY CELL WEARS EVENLY. It is virtually IMPOSSIBLE to wear out just one cell (or block) - even if you write to the same LBA millions of times, the drive will actually distribute those writes evenly across ALL of the blocks in the drive.
In a mythically perfect world where every NAND cell could accept EXACLTY the same number of program/erase cycles, the ZeusIOPS algorithms have the effect that ALL the cells would fail within a short window of time relateive to each other.
And there would be no "re-issue legally as a refurbished 146GB" because the remaining life of the drive would be well below 5 years. Thus it wouldn't be appropriate to resell as a smaller drive (I'm not a lawyer, so I don't know about "legally").
If you understood Flash, you'd have known that.
But of course, NAND cells don't all have the same exact P/E cycle tolerance. And they have been observed to tolerate well in excess of the rated 100,000 cycles - up to 300,000 to even 2Million in some cases.
So the bottom line is that to use these drives effectively, the array actually has to monitor the wear out of cells and adapt/adjust/react appropriately. You haven't mentioned it, so I doubt the DS8000 does this (both DMX and V-Max do).
Which means that DS8K customers will indeed be putting there data at higher risk, once IBM actually starts shipping the drives.
And of course, no application will write 100% of the time. In fact, with RAID 5, you are GUARANTEED to have to do reads to the drive. But using a mythical 100% 8K write model is appropriate for calculating the MINIMUM lifetime of the drives.
Oh, and before you argue that the DS8K will write smaller than 8K (I know it does), doing so in fact wears out the drive faster due to the write-amplification factor (the smallest unit of program/erase on the ZeusIOPS is 4K, and it increases to 8K or 16K depending upon the drive capacity and the formatting). Notably, Symmetrix will always buffer writes and do 8K I/Os, minimizing the effect of write amplification.
But of course, since Symmetrix can use up to 80% of global memory as write cache (256GB usable on DMX4, 512GB on V-Max), it will actually do fewer writes to EFDs (and HDDs, for that matter) than the DS8K, given the aging and outdated DS8K's paltry 4GB or 8GB of Non-volitile write cache. Deferring writes and doing writes aligned to the internal structure of the drive minimizes the P/E cycles that the cells experience, further lengthening the life of the drive.
Lacking these abilities, DS8K customers are exposed to premature wear-out (and DU/DL) risks that Symmetrix customers need not be concerned with.
Those are just some of the advantages customers get by choosing a vendor that actually "gets it", instead of one that merely blogs that they do.
So we all are waiting for you to DO THE MATH.
And please, be sure to check it twice - maybe even verify it with STEC before you post it. Otherwise you may embarrass yourself and IBM (again).

Mon May 11, 2009 09:33 PM

Hey, Tony...instead of all the hand-waving and FUD cast about as if it were fact, just do the math.
Writing 8KB blocks 24x7xforever at the maximum write rate the drive can accept and 100% writes (no reads), exactly how long will it take to wear out all the spare cells in a 256GB ZeusIOPS drive formatted to present 200GB of usable capacity (the one EMC is selling), vs. a 256GB flash drive formatted to present only 146GB of usable capacity (as IBM and HDS are trying to sell).
Hint - the ZeusIOPS can accept the same number of 8KB writes per second no matter how much usable capacity is presented, and the actual MB/s will stabilize to an identically constant rate on both drives within the first 48 hours.
Oh, and remember. The ZeusIOPS drive load balances evenly across ALL 256GB of NAND cells, not just the spares.
I say they both last at least 5 years.
And that's even with DMX4, V-Max and CLARiiON running the drives at 4Gb/s, while the DS8000 can only support them at 2Gb/s (since the back-end DAs are only 2Gb/s Fibre Channel).
Which means IBM is overcharging customers for Flash capacity they can't use.
(BTW - I think EMC's street price for the 200GB EFD is lower than IBM's for the 146GB...imagine that!)
But go ahead Tony: stop misleading your readers with bogus FUD...
DO THE MATH!