AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
#Power
 View Only
Expand all | Collapse all

Average lifetime of the hard disk - p-series

  • 1.  Average lifetime of the hard disk - p-series

    Posted Wed March 07, 2007 11:32 AM

    Originally posted by: SystemAdmin


    My machine is P550 (OS 5.3L). The company had it since one and half year ago. How can I find out the life time in average of the HD on this machine? Does IBM have any such statistic collections of their hardware lifetimes? So I can prepare a spare one, just in case the media failure. Here (a foregin country), it is difficut to convince the boss to purchase one in advance because of budget. So I need such info.
    lscfg -v -l hdisk0
    hdisk0 U787B.001.DNW65BA-P1-T14-L3-L0 16 Bit LVD SCSI Disk Drive (73400 MB)
    Manufacturer................IBM
    Machine Type and Model......HUS103073FL3800
    FRU Number..................00P3833
    Part Number.................26K5191
    I think it was made by Hitachi
    #AIX-Forum


  • 2.  Your thinking is all wrong

    Posted Mon March 19, 2007 09:52 AM

    Originally posted by: nagger


    Disks are not like cars - they don't wear out after 100,000 miles and then the engine goes bang and you buy a new one.

    There is no average time at which a disk fails.
    It is also extremely bad thinking that you need to buy a disk as you approach the average.

    First, you need to buy Hardware Support - this will mean that it is IBM's problem when your machine fails and IBM will have to find you a replacement. When i say IBM I mean IBM or IBM Business Partner etc.

    This assumes you have good backups and can recover your data and assumes you have disk protection - mirrors or RAID5 etc.

    IBM tracks all componenets of all their computers and tracks the mean time between failures called MTBF numbers. On new purchases IBM can calculate from the components in a particular configuration the MTBF for the whole machine. There are processes within IBM to make sure that each generation of machine improves on the MTBF of the previous generation.

    Just because the MTBF is say 10 years (I have just made this number up, right), it does not mean that disks run for 10 years and then promptly break. In life, some disks will fail early on and it the reason large machines run the disks in a "burn in mode" for two weeks before putting data on them. Then there will be disk failures depending on your luck up to the mean and many disks will run for many years after the MTBF date - in fact most disks get retired with the machine and never fail.

    So you could have a failure AT ANY TIME.

    Your question refers to the "the disk" - if you have just the one disk, I hope you have it backed up and you have tested the backup is complete or your first disk failure could be your last!

    Hope this helps, N
    #AIX-Forum


  • 3.  Re: Your thinking is all wrong

    Posted Mon March 19, 2007 11:21 AM

    Originally posted by: SystemAdmin


    I fail to see what's so wrong about the thinking here. Disks are indeed like cars and like anything else with moving parts. After some number of rotations or miles or however else usage is measured they certainly do "go bang and you buy a new one". Back in the good old days of mainframes there used to be a software product/service offering called Reliability Plus that allowed you to compare your MTTF experience to everybody else's so you could evaluate how good your disk vendor was.

    I also fail to see why buying a disk as you approach the average is such a bad then especially when compared to mirroring which, when you think about it, is buying a disk on day one. Having vendor maintenance on new gear is essentially the equivalent of this and may or may not cost less money. If you have old or second-hand disks buying spares in a timely fashion is just a prudent step to take and it would be a good idea to know when to do that.

    FWIW

    Jim Lane
    #AIX-Forum


  • 4.  Nope - My thinking was right :-)

    Posted Mon March 19, 2007 02:08 PM

    Originally posted by: nagger


    I could not agree less.

    In the car analogy, disk can go "bang" in the first week or out live the owner by a decade but most cars are pretty much worn out all-over by 100,000 miles and something major will go wrong.

    The question was "when to buy a spare disk as you expect a disk to fail".
    The problem is it's totally unpredictable for a particular disk, so there is no a date at which "it would be a good idea" or you could "buying spares in a timely fashion". So you can't say buy a spare after X years or Y disk I/O's.
    MTBF is not a life expectancy. Other wise, we would have some humans living 200 years or more!!

    The question was phrased in a way that made me suspect we are talking about a single disk in the machine rather than a machine with hundreds of disks - in which case on site spares could save the day or make a hardware contract vital.

    Glad to have raised a bit of a debate on this question :-)
    ta N
    Flame suite activated
    #AIX-Forum


  • 5.  Re: Nope - My thinking was right :-)

    Posted Tue March 20, 2007 07:28 AM

    Originally posted by: SystemAdmin


    The car analogy is a bit inexact because when cars fail they get fixed, when disks fail they get replaced. Ergo, cars last longer than disks but they aren't necessarily more reliable.

    MTBF isn't a life expectancy but rather an estimate of life expectancy in the same way the state of health is for a person. Tracking reliability of disks is kind of like going to the doctor, it won't tell you exactly how long you're going to live but it's the best estimate you're likely to get.

    To make this marginally on-topic the justification for having either spares or hardware maintenance has to do with how important the disks are to you rather than how many of them you have. If the risk of loss is great enough it could be perfectly worthwhile to take precautions for a single disk (or not for a room full of them).
    #AIX-Forum


  • 6.  Re: Nope - My thinking was right :-)

    Posted Mon March 26, 2007 03:05 PM

    Originally posted by: SystemAdmin


    Actually the car is the computer not the hard drive, and cars only get fixed when something is replaced, just like computers. The computer gets fixed as the hard drive gets replaced.

    Nagger is right, a hard drive can last 10 hours to 10 years and anywhere in between. There is an average life expectancy on all parts (memory, monitors, hard drives, etc). The point of the MTBF is that you should be able to reach an average threshold before expectation of failure. CRT monitors can rate from 8,000 to 16,000 hours of life, but there are just over 8,000 hours in a year. This would mean that the monitor could last one to two years if left on 24/7 before it would be expected to fail. This does not mean that the monitor will be good for at least 8,000 to 16,000 hours though. It could die at any time, or last longer. The end result is, not all parts are created equal.

    Looking to have a spare drive around is not a bad thing, but it's not normally based on MTBF as to when you make that purchase. You will normally have them to cover any unexpected failures, or you will have a maintenance contract.
    #AIX-Forum


  • 7.  Re: Average lifetime of the hard disk - p-series

    Posted Mon March 19, 2007 01:07 PM

    Originally posted by: cd3lgado


    Hi

    Ask IBM to inform you the MTBF value. This is an average number that should be generated on several test of use of disk of any given branch and you can use as a measure on how long you might expect before some disks are wrong. Remember this is just an average so it's not a fixed rule.

    It could be good idea to have some spare parts in your organization but sometimes is a better idea to sign a HW maintenance contract with IBM, from the pint of view of Managers this has added value instead having spare parts.

    Hope this helps
    #AIX-Forum


  • 8.  Re: Average lifetime of the hard disk - p-series

    Posted Mon March 19, 2007 01:29 PM

    Originally posted by: SystemAdmin


    > ...The company installed this disk one and half year ago.
    > How can I find out the life time in average of the HD on this machine?

    Asking this forum, as you have, may elicit someone's experience.

    I haven't used AIX for several years until recently.

    In the 1990s, I expected a good-quality hard disk (including the AIX
    boxen I worked with) to last 2 years under hard (23/6 file server) conditions, and 5 years under normal (home user or engineering workstation) conditions.
    I remember less than 10% failures at lesser lifetimes.

    About one to six months before failure, I would notice the correctible
    error rate reported by fsck (run each boot and once a week) would begin to
    increase from a normal one or two a week.
    The error rate would start doubling approximately every week to month.

    About half the time, the last day or month of service was presaged by a
    "whistling" from the hard drive.

    > ... So I can prepare a spare one, just in case the media fails.

    Expect the media to fail. Backup!
    Expect someone to accidently delete your data. Backup!
    Expect any system not tested to fail. Test read your backups!

    <AFAIK>
    Hard disk/memory failure generally follows a Poisson distribution with time,
    truncated at the short-time end by built-in hardware to catch and correct
    a limited bad sector count. Last I checked, the industry standard was
    a half-life of 100 r/w cycles for each sector and 3% excess sectors for
    forward error correction.
    </AFAIK>

    > Here (a foreign country), it is difficut to convince the boss to purchase
    > one in advance because of budget. So I need such info.

    Let us know what info is convincing.
    #AIX-Forum


  • 9.  Re: Average lifetime of the hard disk - p-series

    Posted Wed March 21, 2007 11:34 AM

    Originally posted by: SystemAdmin


    First, thank to all your opinions. Because this is the first IBM machine that I work on
    , so I have to learn all. Considering the maintenance contract, it is a hard sell concept in the Mexico, where I works. First, there is not much budget because majority of the companies or organizations here tend to use lot of money on the first purchase and no budget / planned-for-future. Secondly, I am in a place called Leon, Mexico. Although it is medium size, industrial town, I have no idea how many of IBM sale here. Therefore I do not know that how long takes an IBM contractor to get here for contact-maintenance work (I knows that it takes 6 hours drive from Mexico City that where the original IBM contract came from). As I remembered that the new p550 has little problem with its DVD, but I have not seen any "maintenance" since December. All my colleagues from US have the same conclusion that here the priority of many establishment is different to the States. I do have my backup and the p550 has 4 disks standard. But with my circumstance and the facts above, if my car broken down in a Mexico desert, I would rather to have some spare parts ready as long as I knew ahead the time that the problem is expectedly minor and I can fix it my self and let it running until I can get to a real dealer shop.

    I remember the IBM contractor told us that p550 has a function which can warn the failure of the system. Where can i find it?
    Can I use a non-IBM, cheap brand hard disk to the machine, just as a temp?
    #AIX-Forum


  • 10.  Re: Average lifetime of the hard disk - p-series

    Posted Mon March 26, 2007 01:47 PM

    Originally posted by: SystemAdmin


    Are your disks mirrored?

    If they are, you could have a disk failure and still keep running on the disk that survives in the mirror while you wait for the replacement to get there.

    Even if you do get a spare disk to have in case of failure, if it is years before you need it, I'm not sure I would have much confidence in a disk that has been sitting on your shelf for a few years. At least (presumably) if a disk sits on the shelf at IBM it will be in absolutely the best climate (temperature, humidity, etc.)

    Check errpt regularly (errpt for a short listing, errpt -a | more for a long listing) and watch for disk errors. In addition, a previous poster talked about regularly running fsck (other than at boot) and noting an increase in the number of errors.

    I'm not sure I'd risk production on a disk that is not IBM. You may be able to get a reconditioned used disk from IBM for a good price.
    #AIX-Forum


  • 11.  Re: Average lifetime of the hard disk - p-series

    Posted Mon March 26, 2007 02:33 PM

    Originally posted by: SystemAdmin


    > First, thank to all your opinions.

    You're welcome.

    > Because this is the first IBM machine that I work on, so I have to learn
    > all. Considering the maintenance contract, it is a hard sell concept in
    > the Mexico, where I works.

    Your experience is not unique.

    > ... how long takes ... 6 hours drive from Mexico City ...
    > ... new p550 has little problem ... but I have not seen any "maintenance"
    > since December. ... if my car broken down in a Mexico desert, ...
    > I would rather to have some spare parts ready ...

    <SUGGESTION>
    I would expect a 6 hour drive to cost much more than a disk drive.
    May I suggest that rather than merely inventorying spare parts for
    your computer, maybe preventing 1 problem, you might do better
    (e.g. prevent more problems, spend less cost) to invest in "spare parts"
    for your transportation and other infrastructure.

    Operating a used light airplane and "dirt strip" is probably less expensive
    than several 6 hour drives or a single stranding in the desert.

    Mirroring is one of the many redundancy strategies you could learn.
    An accredited training program and telephone support would probably
    cost less than the loss of business and service outages you could
    thereby avoid.

    Consider forming a joint venture with other companies in Leon to invest
    in such infrastructure.
    </SUGGESTION>

    Good luck,
    #AIX-Forum


  • 12.  Re: Average lifetime of the hard disk - p-series

    Posted Mon March 26, 2007 06:01 PM

    Originally posted by: SystemAdmin


    I knew that the hdisk0, where the OS is sitted on is mirrored, but not other dirve with applications on. However, I do use the SSHOpen to rsynch the applications to other Unix box.
    #AIX-Forum