Db2

 View Only

Ingesting data into an IBM Smart Analytics System

By Ashley Bassman posted Sat October 01, 2022 02:37 PM

  
Developed by:
Austin Clifford 
IBM DB2 DataWarehouse QA Specialist
Garrett Fitzsimons 
IBM DataWarehouse BestPractices Specialist
Richard Lubell 
IBM Smart Analytics Systems Information Developer


Executive summary 

This paper is targeted at people involved in the design and development of data ingest applications based on DB2® Database for Linux, UNIX, and Windows database software. In particular, this paper focuses on ingesting data into the IBM® Smart Analytics System environment with configurations based on System x® and Power Systems™ servers. 

The key design goals of ingest application design are to balance the required rate of ingest with the availability of processing resources, to ingest data without affecting data availability and to maintain flexibility in the volumes of data ingested. Identifying service level objectives for the speed and volume of data ingested into the data warehouse will determine the architecture, design, and development of the data ingest application. 

The design of a data warehouse database is focused on query performance. Employing staging tables to load and transform data ahead of ingest minimizes interference with query workloads. Further benefits are gained through managing database locking and logging and system resource usage with the software tools available with DB2 and IBM Optim™ software available with the IBM Smart Analytics System. 

You can minimize the resources needed to perform the movement of data from the staging tables to the production tables by using table partitioning to subdivide data during transformation, creating parallel ingests and reducing overall workload. 

The data ingest application design should accommodate an active warehousing environment where all workloads are intended to run online and concurrently. This allows data to be processed as soon as it is made available and anticipates the growth of the data warehouse. 

A data ingest application should contain individual components that can process data as promptly and completely as possible after it arrives. This helps reduce the time needed for the ingest application development process, isolate errors through targeted testing, and reduce the volumes of data that require reprocessing when errors occur. 

The IBM Smart Analytics System product incorporates best practices guidelines in building a stable and scalable data warehouse environment. Using the recommendations in this paper will help you develop a data ingest application that will be scalable, balanced, and flexible enough to meet future as well existing needs. 

Introduction 

This paper describes best practices for designing and implementing a data ingest application for an IBM Smart Analytics System data warehouse. The paper covers how to ingest large volumes of extracted data into the data warehouse in preparation for analytical queries. This paper assumes that you have a working knowledge of DB2 software including partitioned databases. DB2 Database Version 9.7 fix pack 3 was used in the test system for this paper. 

Database administration should not be part of the data ingest workload, though data ingest application designers and developers should work with the database administrator to help ensure that all workloads are balanced. The data presented in this paper is based on testing and experience with customers needs. 

A well‐designed data ingest application helps to achieve business requirements for the availability of data. Defining your service level objectives dictates the characteristics of your data ingest applications. Design the solution to work concurrently with all other workloads while also taking advantage of features of the partitioned database. Parallelism, co‐location and control of the ingest rate can maximize the efficiency of loading data into staging tables and subsequently inserting data into production tables. Statistics and monitoring are used to define a baseline and enable tuning for optimum performance. 

This paper builds on and complements a series of best practices papers that examine all aspects of data warehouse design and implementation for a DB2 data warehouse. Refer to these papers, referenced in the “Further reading” section, for further information 

Planning for data ingest 

When planning to build your data ingest application, consider the overall environment where the application operates, the data warehouse design and competing workloads. Business requirements for the availability of data, together with constraints imposed by resources, determine your data ingest application design. These requirements are expressed as service level objectives (SLOs). Use these objectives to develop a data ingest application that meets achievable targets and operates in concert with all other workloads. 

The following factors can affect your data ingest design. Integrate these recommendations into your planning phase: 

For most operations, use the tools available within the core IBM Smart Analytics System modules. For complex or high volume Extract, Transform, and Load (ETL) operations, use IBM InfoSphere® DataStage® software as the ETL tool for building a data ingest application. 

Avoid locking data that is required by the query workload or saturating the DB2 network with data ingest traffic during peak query times. For allocation of system resources, the query workload has the highest priority among data workloads. 

Identify and plan for peak data volumes rather than the average data volume expected and allow for system outages. 

Incorporate the population of all objects that are affected by the new data into the ingest process. 

Know when data is scheduled for ingest, the volume of data to be presented and the expectation for data availability. 


#Db2
0 comments
10 views

Permalink