Tape Storage

 View Only

Data Durability, and Back-up at scale: A tale of "the Tape"

By Shawn Brume posted Tue July 14, 2020 12:28 PM

It is no secret that the amount of data being produced and stored in the world continues to explode. It is so much not a secret, that we see it in nearly every storage brief:

But what does this number really mean? It is only significant to the degree at which it impacts the IT budget related to data storage. Most of that [124ZB] of data is produced and nearly immediately extinguished. Once we remove the data produced and extinguished (90% of IOT data for example), remove the 98% of cell phone pictures that will never be stored anywhere but the cell phone, the amount of data that is impacting corporate entities at all levels is greatly reduced, but the overall importance of that data is extremely high, not that anyone needs to be told that!

One area that is a continuous focus for expense reduction is data retention. Regardless of how it is referenced; Archives, duplicate copies, Back-ups, legal holds, regulatory retention…it all comes down to the growing amount of data that organizations are keeping for longer periods of time. Depending on the environment, this data can be from 30% to 75% of all data in digital storage. In the U.S., mortgage loan documents must be retained for 7 years past the last date of financial liability, this could be up to 37 years! Contrary to the popularized perception, Cloud vendors (Hyperscale) are not immune to the need for long term retention, Amazon has made a $1.63 Billion per year business from archive storage.

If we are to assume that the Hyperscale companies have the benefit of economics at scale for data storage, we can also reasonably assume that actions they take today will be monetized at a very efficient rate. As with many hyperscale designs and implementations, global enterprises will eventually adopt the methods of data storage creating value for the “Hyperscaler’s”. Regardless of the use case, large scale storage requires a multi-level approach to derive the most value from retained data.
Cern TS4500Click the image for a video of the CERN Big Data installation

Cloud Service Providers, research facilities, medical and financial organizations are all charged with retaining data. The most common retention method is tape. Big Data/AI and IoT are starting to monetize data through longer retention, and they are using tape. Tape is continuously innovating and integrating into new data models and has been for nearly 70 years. All too often tape is dismissed as not viable because modern storage administrators do not understand how tape storage is offered. I totally understand this process, we often cling to what we know, forgetting that it is the failure to understand the past that impacts our future.

If a solution does not create revenue, reduce expense or keep an organization out of regulatory trouble, why does it exist.

Tape value to IT managers in 3 Bullets
  1. Economics – Tape has always been primary in reducing expense of long term storage.  In the financial and medical records retention business, tape can easily reduce the cost of storage by 85%, when compared to Disk solutions at scale, and  even more when compared to cloud storage over a 12 year period.  Check for yourself using the IBM TCO tool.  2 Petabytes of data, with a 15% Compounded annual growth rate, and only 10% of the data retrieved in any given month results in a storage cost of $3.4 million for Disk on-premise, and a lowest cost cloud storage of $4.7 million, Tape at $883 thousand is 1/3 the cost of disk and 1/5 the cost of Cloud.
  2. Regulatory and Cyber Protection – Tape is the most effective air-gap solution for long term. Many organizations fully understand the need to protect data in a solution that is not accessible as an immediate on-line storage, while still providing self-service as part of a 3-2-1 data durability strategy. Although other solutions like optical and the obscure film-based storage exist, the economic benefit and performance of these systems is significantly lower than with tape. Tape is also capable of storing data for more than 30 years without migration and at the same time most organizations realize it is more cost beneficial to migrate data to more dense storage than to just retain the systems for periods longer than 10 years.  The 100 year archive is not truly viable in modern operations.
  3. Monetization – Tape stepped back into the limelight of revenue generation with the innovation of LTFS. File System enablement has put tape usability in to the mainstream of smaller operations like Media and Entertainment and also enabled Object Storage workloads to off-load data to the lowest cost storage. That becomes a monetized value as the data mining matures. Analytics are only as good as the data, and the real value of any data comes from having more of it to mine.  The problem has traditionally been the cost of storing more data until the analytic engines can mine them is more expensive the derived value. Tape changes that inflexion point by a factor of 5x at the minimum. More data = More revenue or in the case of fraud prevention: More data = Lower fraud = More profit.
IT professionals always hear from the tape sellers how tape reduces TCO, and we know it can. We often prevent ourselves from installing tape because we do not understand it, or our understanding is 20 years old. Tape is for modern workloads integrating with File Systems, Object storage and Back-up and recovery applications in every industry in the world. Hyperscale data centers are embracing it, even the Facebook lead Open Compute Project has a working group evaluating tape; are you? 

“Great things never come from your comfort zone”