watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

View Only

Back to Blog List

How to optimize your data storage workload by offloading data within watsonx.data

By Ahmad Muzaffar Bin Baharudin posted Mon November 20, 2023 01:01 AM

Overview

When it comes to data storage, the expenses can become prohibitive, particularly when housing all data in a cloud or on-premises data warehouse. As data volumes inevitably increase over time, so does the associated cost. Many enterprises often fail to discern the nature of their data, opting to store all data within a data warehouse. It is crucial for enterprises, especially data engineers and architects, to delve into their data assets, conducting assessments to optimize data storage costs.

The initial step involves identifying and categorizing data assets into hot, warm, and cold data. By making these distinctions, enterprises can significantly reduce unnecessary expenses related to data warehousing. Through strategic optimization of data storage, substantial cost savings can be achieved.

What is Hot Data, Warm Data and Cold Data?

Hot data, warm data, and cold data are terms often used in the context of data storage and management to describe the different levels of access frequency or importance of data. These terms are used to categorize and manage data based on its usage patterns, which can influence how it is stored, processed, and retained.

Hot Data

Definition: Hot data refers to the most frequently accessed and used data in a system.
Characteristics: This type of data is actively in use and requires high-performance storage and quick access times.
Example: Current transactional data, frequently accessed files, or real-time analytics data.

Warm Data

Definition: Warm data represents data that is accessed less frequently than hot data but is still important and may be needed periodically.
Characteristics: This data is not accessed as frequently as hot data, so it may be stored on slower and less expensive storage media compared to hot data.
Example: Historical records, older transactions, or data used for periodic reporting.

Cold Data

Definition: Cold data is the least frequently accessed data, and it may be archival or rarely used for reference.
Characteristics: This type of data is typically stored on cost-effective and slower storage media, as quick access is not a priority.
Example: Older archives, backups, regulatory compliance data that needs to be retained but is rarely accessed.

These categorizations help you to optimize your data storage infrastructure and costs by aligning the storage characteristics with the access patterns and importance of the data. For example, hot data might be stored on high-performance data warehouse, warm data on slower but still reliable data warehouse or cloud object storage and cold data in long-term archival storage solutions.

In watsonx.data, you can easily offload warm data or cold data

Consider the scenario where we have data stored in our Db2 database, which is a relational database designed for data warehousing and structured data storage. To optimize storage costs for structured data, our approach involves offloading this data to object storage. Object storage, such as S3 bucket, proves to be a more cost-effective solution compared to traditional data warehousing, even though it may exhibit slower performance during data retrieval.

The initial step involves identifying warm or cold data within the data warehouse—data that is accessed infrequently or rarely. This allows us to selectively move such data to object storage for more economical storage. In this tutorial, we will demonstrate the process of offloading data from the Db2 database to a MinIO bucket. Offloading, in this context, entails copying the data from Db2 to our MinIO bucket, after which the data in Db2 can be safely deleted to reduce storage costs.

In Infrastructure Manager, we can see all of our data sources such as buckets or databases. In my data lakehouse, I have 3 data sources, which are two MinIO buckets and one Db2. I am going to offload data from my Db2 to MinIO bucket (Iceberg bucket).

In Data Manager, we can see our schema, table and sample data in Db2.

We are going to be offload this data:

Catalog: db2catalog

Schema: gosalesdw

Table: dist_inventory_fact

We want to offload our data to our Iceberg bucket which will reside in MinIO bucket below.

Catalog: iceberg_data

Schema: gosalesdw_offload

Table: dist_inventory_fact

So, we need to create our new schema named 'gosalesdw_offload'.

After setting up the schema, data can be offloaded in 'SQL Workspace' using a standard SQL query. You simply need to execute these two lines of the query.

CREATE TABLE "iceberg_data"."gosalesdw_offload"."dist_inventory_fact" AS
SELECT * FROM "db2catalog"."gosalesdw"."dist_inventory_fact";

Upon the execution of the query, the results can be observed in our 'icerberg_data' catalog. Of course, you create a more complex query based on your requirements.

There you go!

Within watsonx.data, you can easily offload your warm or cold data from high-cost storage to more economical object storage. There's no need for you to do complex processing steps across various data warehouse or object storage environments.

Muz

Ecosystem Technical Enablement Specialist | Data & AI

IBM APAC Ecosystem Technical Enablement Team

#watsonx.data

0 comments

52 views

Permalink

https://community.ibm.com/community/user/blogs/ahmad-muzaffar-bin-baharudin/2023/11/18/how-to-offload-data-from

watsonx.data

watsonx.data

How to optimize your data storage workload by offloading data within watsonx.data

By Ahmad Muzaffar Bin Baharudin posted Mon November 20, 2023 01:01 AM

Permalink

Additional
Resources

Office

Quick Links

watsonx.data

watsonx.data

How to optimize your data storage workload by offloading data within watsonx.data

By Ahmad Muzaffar Bin Baharudin posted Mon November 20, 2023 01:01 AM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources