Db2

 View Only

DB2 V10.1 Multi-temperature Data Management Recommendations

By Ashley Bassman posted Thu September 29, 2022 10:41 PM

  

This paper presents a strategy for managing a multi-temperature data warehouse by storing data on different types of storage devices based on the temperature of the data 


Developed by:

Jim Seeger, Senior Software Engineer, DB2 for Linux, UNIX, and Windows development

Naresh Chainani, DB2 for Linux, UNIX, and Windows Software Developer

Aruna De Silva, Quality Assurance & Solutions Specialist, DB2 Distributed and Data Warehousing

Karen Mcculloch, Manager, DB2 WLM Development

Kiran Chinta, DB2 for Linux, UNIX, and Windows Software Developer

Vincent Kulandai Samy, DB2 for Linux, UNIX, and Windows Software Developer

Tom Hart, DB2 for Linux, UNIX, and Windows Software Developer


Executive summary

Data in a data warehouse can be classified according to its temperature. The temperature of data is based on how old it is, how often it is accessed, how volatile it is, and how important the performance of the queries that access the data is. Hot data is frequently accessed and updated, and users expect optimal performance when they access this data. Cold data is rarely accessed and updated, and the performance of the queries that access this data is not essential. Using faster, more expensive storage devices for hot data and slower, less expensive storage devices for cold data optimizes the performance of the queries that matter most while helping to reduce overall cost.

This paper presents a strategy for managing a multi-temperature data warehouse by storing data on different types of storage devices based on the temperature of the data. It provides guidelines and recommendations for each of the following tasks:

  • Identifying and characterizing data into temperature tiers

  • Designing the database to accommodate multiple data temperatures

  • Moving data from one temperature tier to another

  • Using DB2® workload manager to allocate more resources to requests for hot data

    than to requests for cold data

  • Planning a backup and recovery strategy when a data warehouse includes multiple

    data temperature tiers

    The content of this paper applies to data warehouses based on version 10.1 or later of DB2 Database for Linux, UNIX, and Windows.

Introduction

The quantity of data stored in data warehouse environments is growing at an unprecedented rate. There are several reasons for this growth. For example:

  • Database users are retaining enormous amounts of detailed data such as transaction history, web search queries, and detailed phone records.

  • As data mining algorithms continue to improve, and as increasing processing power becomes available, organizations are analyzing much older historical data to predict future trends more accurately.

  • Stricter regulations and audit standards now require businesses to keep data for longer periods of time than previously.

  • Many businesses are eliminating the cost of keeping paper-based records by switching to web-based records.

    However, not all of the data in a data warehouse is equally valuable to an organization. In general, the most recent data in a warehouse is much more likely than older data to be accessed by queries and maintenance processes or to be updated. Such data is therefore called hot. As time goes by, data tends to cool off, becoming warm and later cold , meaning that the probability that users access or update this data significantly decreases. The data must still be available, however, for regulatory requests, audits, and long-term research. Another important characteristic of requests for colder data is that users do not typically insist on optimal performance for these requests. Because strong performance for these queries is not essential, you can place colder data on slower, less expensive types of storage devices.

    A warehouse can contain several different temperature tiers (hot, warm, cold, dormant). In general, the number of temperature tiers is tied to the number of different types of storage devices that are attached to the warehouse. For example, you might store hot data on new, solid-state drives (SSD); warm data on new, fast magnetic storage devices; and cold and dormant data on older, less efficient magnetic storage devices.

    The definition of each data temperature depends on the specific environment, but data temperatures usually fall into fairly common categories. The following chart provides some guidelines for classifying data by temperature:

    Data temperature

    Data temperature characteristics

    Typical data age

    Hot

    Tactical and OLTP type data – current data that is accessed frequently by queries that must have short response times. For example, high volume, small result set point queries in operational data stores (ODS).

    0 - 3 months and aggregates or summaries of this data

    Warm

    Traditional decision support type data – data that is accessed less frequently and by queries that most likely do not require short response times.

    3 - 13 months and aggregates or summaries of this data

    Cold

    Deep historical and legacy data – data that is typically accessed infrequently.

    13 months - 5 years

    Dormant

    Regulatory type or archival data – data that is accessed infrequently and that is never updated.

    Over 5 years


    Download the report to learn more!
    0 comments
    27 views

    Permalink