DevOps Automation

DevOps Automation

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
  • 1.  Dial Tone Reliability?

    Posted Wed November 29, 2017 11:26 AM

    I’ve been reading a lot into DevOps in general, and keep running across the phrase Dial Tone Reliability. Could someone please elaborate more on what this means?



  • 2.  RE: Dial Tone Reliability?

    Posted Wed November 29, 2017 01:08 PM

    This is a reference to landline telephones.

    Back in "Ma Bell" days, you could pick up the phone and get a dial tone 99.999% percent of the time (or even better).  This has been the gold standard for reliablity, where an outage would be at most 5 minutes out of a year.



  • 3.  RE: Dial Tone Reliability?

    Posted Mon December 11, 2017 12:48 PM

    I know this is pretty random Robert, but is this a phrase that's typical to DevOps only? I just haven't heard it used with this much frequency in day-to-day life referring to reliability.I also didn't realize that land lines had that level of consistency either!!



  • 4.  RE: Dial Tone Reliability?

    Posted Mon December 11, 2017 01:47 PM

    Andrew,

    Concerns about reliability have always been a concern with systems dating back decades (think Apollo 13 and "Failure is not an option").  With systems design, the major numbers to look at were MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair/Replace).  

    These days, reliability may start to be a concern in DevOps approaches once you have a means of deploying to production efficiently (e.g. your CI/CD pipeline is stabilized).  You can see how resilient your production system is and design for improving your MTBF/MTTR.  Netflix is a famous example of doing this with their "Chaos Monkey" suite of tools.

    An EXCELLENT resource for looking at this is a series of blog articles from Sanjeev Sharma regarding DevOps and reliability.  Links to the articles are below.

    https://sdarchitect.blog/2017/06/26/cloud-service-reliability-part-i-apollo-13-to-google-sre/

    https://sdarchitect.blog/2017/07/19/cloud-service-reliability-part-2-houston-we-have-an-outage/

    https://sdarchitect.blog/2017/08/18/cloud-service-reliability-part-3-antifragile-when-devops-met-sre/



  • 5.  RE: Dial Tone Reliability?

    Posted Mon January 08, 2018 09:40 PM

    haha,  also known as "The Daily 5/9".  

    In Reply to Robert Wen:

    This is a reference to landline telephones.

    Back in "Ma Bell" days, you could pick up the phone and get a dial tone 99.999% percent of the time (or even better).  This has been the gold standard for reliablity, where an outage would be at most 5 minutes out of a year.