watsonx.data

watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

 View Only

How to optimize your data storage workload by offloading data within watsonx.data

By Ahmad Muzaffar Bin Baharudin posted Mon November 20, 2023 01:01 AM

  
Overview
 
When it comes to data storage, the expenses can become prohibitive, particularly when housing all data in a cloud or on-premises data warehouse. As data volumes inevitably increase over time, so does the associated cost. Many enterprises often fail to discern the nature of their data, opting to store all data within a data warehouse. It is crucial for enterprises, especially data engineers and architects, to delve into their data assets, conducting assessments to optimize data storage costs.
 
The initial step involves identifying and categorizing data assets into hot, warm, and cold data. By making these distinctions, enterprises can significantly reduce unnecessary expenses related to data warehousing. Through strategic optimization of data storage, substantial cost savings can be achieved.
 
What is Hot Data, Warm Data and Cold Data?
 
Hot data, warm data, and cold data are terms often used in the context of data storage and management to describe the different levels of access frequency or importance of data. These terms are used to categorize and manage data based on its usage patterns, which can influence how it is stored, processed, and retained.
 
Hot Data
  • Definition: Hot data refers to the most frequently accessed and used data in a system.
  • Characteristics: This type of data is actively in use and requires high-performance storage and quick access times.
  • Example: Current transactional data, frequently accessed files, or real-time analytics data.
 
Warm Data
  • Definition: Warm data represents data that is accessed less frequently than hot data but is still important and may be needed periodically.
  • Characteristics: This data is not accessed as frequently as hot data, so it may be stored on slower and less expensive storage media compared to hot data.
  • Example: Historical records, older transactions, or data used for periodic reporting.
 
Cold Data
  • Definition: Cold data is the least frequently accessed data, and it may be archival or rarely used for reference.
  • Characteristics: This type of data is typically stored on cost-effective and slower storage media, as quick access is not a priority.
  • Example: Older archives, backups, regulatory compliance data that needs to be retained but is rarely accessed.
 
These categorizations help you to optimize your data storage infrastructure and costs by aligning the storage characteristics with the access patterns and importance of the data. For example, hot data might be stored on high-performance data warehouse, warm data on slower but still reliable data warehouse or cloud object storage and cold data in long-term archival storage solutions.
 
In watsonx.data, you can easily offload warm data or cold data
 
Consider the scenario where we have data stored in our Db2 database, which is a relational database designed for data warehousing and structured data storage. To optimize storage costs for structured data, our approach involves offloading this data to object storage. Object storage, such as S3 bucket, proves to be a more cost-effective solution compared to traditional data warehousing, even though it may exhibit slower performance during data retrieval.
 
The initial step involves identifying warm or cold data within the data warehouse—data that is accessed infrequently or rarely. This allows us to selectively move such data to object storage for more economical storage. In this tutorial, we will demonstrate the process of offloading data from the Db2 database to a MinIO bucket. Offloading, in this context, entails copying the data from Db2 to our MinIO bucket, after which the data in Db2 can be safely deleted to reduce storage costs.
 
In Infrastructure Manager, we can see all of our data sources such as buckets or databases. In my data lakehouse, I have 3 data sources, which are two MinIO buckets and one Db2. I am going to offload data from my Db2 to MinIO bucket (Iceberg bucket).

 
In Data Manager, we can see our schema, table and sample data in Db2.

 
We are going to be offload this data:
 
Catalog: db2catalog
Schema: gosalesdw
Table: dist_inventory_fact
 
We want to offload our data to our Iceberg bucket which will reside in MinIO bucket below.
 
Catalog: iceberg_data
Schema: gosalesdw_offload
Table: dist_inventory_fact
 
So, we need to create our new schema named 'gosalesdw_offload'.

 
After setting up the schema, data can be offloaded in 'SQL Workspace' using a standard SQL query. You simply need to execute these two lines of the query. 

CREATE TABLE "iceberg_data"."gosalesdw_offload"."dist_inventory_fact" AS
SELECT * FROM "db2catalog"."gosalesdw"."dist_inventory_fact";

 
Upon the execution of the query, the results can be observed in our 'icerberg_data' catalog. Of course, you create a more complex query based on your requirements.

 
There you go!
Within watsonx.data, you can easily offload your warm or cold data from high-cost storage to more economical object storage. There's no need for you to do complex processing steps across various data warehouse or object storage environments.

Muz
Ecosystem Technical Enablement Specialist | Data & AI
IBM APAC Ecosystem Technical Enablement Team


#watsonx.data

0 comments
52 views

Permalink