A closer look at IBM Db2 Warehouse and the benefits and capabilities that it offers.
IBM Db2 Warehouse is a containerised scale-out data warehousing solution that is available in multiple fit-for-purpose form factors, whether you are strategically delivering on public cloud, hybrid cloud, or need a high-performance appliance you can deploy with the same function and capability and then have the choice to easily move from one form factor to another without changing your code.
Db2 Warehouse supports Massively Parallel Processing (MPP) with a pure shared nothing database architecture as well as standard SMP deployments for smaller data marts or warehouses. This means it can scale from gigabytes- to petabytes-sized systems, on-premise or in the cloud.
It supports a range of different clients and plugins, including JDBC, Node.JS, Spring, Python, R, Go, Spark, ADO, Microsoft Visual Studio Plugin, and more.
To extend its analytical capabilities, Db2 Warehouse also comes with an integrated Apache Spark cluster, allowing you to submit Apache Spark jobs through stored procedures to run against Db2, extending your analytical reach in the data warehouse.
SMP: Single partition deployments
For SMP deployments, the container can be moved to other virtual machines with more memory and cores and it will automatically adapt. When using Db2 Warehouse on Cloud Flex One Service, this can be managed through the service itself, allowing you to add more cores, memory, or storage as you need.
Leveraging Db2’s optimiser, queries will adapt to the quantity of cores and memory available and then optimise query access plans accordingly.
When Db2 Adaptive Workload Management is deployed, service level targets can be implemented and are integrated with the optimiser to ensure concurrency and throughput for high service level queries to meet targets.
MPP: Scale out partitioned warehouse deployments
For MPP Shared Nothing deployments, scale out operates like magic—moving partitions across a larger number of virtual machines, no data redistribution or reloading required. Using either a Hash Partitioning approach or a Random Distribution, tables are partitioned across database partitions or can be constrained within partition groups to increase query parrallelism and enable elastic scale out of the data warehouse, with near linear performance and scale gains.
Db2 Data partitions are given their own Docker container to make them portable and easy to adapt to scale up or scale out requirements. When capacity requirements grow, the containers managing the data partitions are able to be moved to new virtual machines/servers, elastically scaling out the data warehouse.
For example: A Single Db2 Warehouse deployment with 4 Db2 data partitions on 2 X 8 core Virtual Machines or Physical Servers is deployed, allocating 4 cores per partition.
When there is a growth in concurrent query demands, query complexity, or data volume, the system needs to scale up with more resources. As such, Db2 Warehouse is able to be easily reconfigured, moving partitions 2 and 3 to their own individual virtual machines and servers and, thus, doubling the compute capacity. Partitions 1 and 3 now occupy the resources of the original virtual machines.
If deployed using the IBM Cloud and AWS Flex offerings, you can manage scale out through the managed service itself (you can also increase the memory, storage, and cores on the database partitions). This allows your warehouse to grow as it needs to (and shrink, as the case may be) without the cost of maintaining overhead resources for future growth requirements and without the complexity you incur on other systems achieving the same outcome, which often comes with significant DBA overhead to reoptimise the solution.
Db2 Warehouse deployments are available in the form of an Appliance—the IBM Integrated Analytics System—that is built for high-scale, performance, and highly available data warehousing, with integrated data science capability powered by Watson Data Studio. Installed and up and running in just a day, it is the option for customers who want appliance performance and simplicity in their own data center.
IBM Db2 Warehouse is also a component of the IBM Cloud Pak for Data hybrid cloud solutions. The pack provides an integrated, Kubernetes-based, drop-in AI and machine learning development platform that includes transformation capabilities with IBM Watson Studio and AI/ML. It can be deployed on its own or within another Kubernetes deployment, such as Red Hat OpenShift.
Managing Db2 Warehouse
As an ex-developer and database performance engineer, one of the cool things I like about Db2 Warehouse is the no-fuss deployment and scale-out and autonomic self-tuning capabilities. It allows you to cover scale-out or -up activities via a simple-to-use web console rather than worrying about the hassles of reconfiguration of the database.
This also avoids the need to reload or repartition data in the event you are scaling out, monitoring, managing system security, loading data, and other common database administration tasks that are all done through the simple to use web console.
This means that rather than focusing on configuration and setup, you can get on with design and implementation of warehouse schemas and just wind the dial up as needed.
To get a feel for it yourself and learn how to load and analyse data, the following tutorial can give you an overview of what it takes to get going with Db2 Warehouse: “Overview of Managing your Db2 Warehouse"
Securing your data warehouse
Db2 Warehouse leverages Db2’s Native Data Encryption—commonly known in the industry as Transparent Data Encryption (TDE). This ensures your database, logs, and backups are all encrypted. The database keys are encrypted themselves with the ability to rotate the master key as your security policy requires. Database connectivity uses SSL encryption to secure your data on the wire.
Db2 Auditing provides high-level or fine-grained auditing of activities on the data warehouse, allowing you to monitor activity at various levels of details, from connections opened through to queries executed, inserts, updates, and deletes.
At the data governance level, you also have the ability to mask data, apply fine-grained access control to different users or groups (as well as standard Db2 access controls), which makes it a highly secure platform, whether as a managed service in the cloud or on-premise.
Managing your workloads’ SLAs
Modern data warehousing systems are increasingly expected to be able to handle a wide range of different workload types. These workloads range from real-time ETL and data streaming, batch reporting, AI/ML real-time analytical operational services, ad-hoc queries, and other use cases.
Effective workload management is critical in order to be able to manage high concurrency while keeping the system stable and performant. It reduces the number of “versions of the truth” required to address a company’s analytical and reporting business requirements.
Traditionally, Workload Managers rely on configuring fixed limits and require constant manual tuning as new users or data are introduced. This means that their definition is less about defining a desired outcome and letting the database manage it, and more about fine-tuning to meet an outcome, requiring constantly engaged DBA activity.
Unlike traditional Workload Managers, Db2's Adaptive Workload Manager manages the admission of jobs based on their actual resource requirements and intelligently schedules them based on the available system capacity and their anticipated response time needs.
Out of the box, it tries to balance resource consumption across queries of the same or different workloads.
For users seeking more direct control over their workload execution, the Adaptive Workload Manager has a simple but powerful set of configuration options that allow you to easily divide database resources between workloads in your system as well as to optimize execution for different workload types.
Using the resource-sharing capability allows service classes to be assigned based on share of resources:
create service class HIPRI soft resource shares 25
workload type INTERACTIVE
create service class ETL soft resource shares 25
workload type BATCH
create service class GENERAL soft resource shares 50
workload type MIXED
The user is also able to apply a range of rules to govern the allowed behaviour of jobs in a workload in order to protect the system from "rogue" jobs. These capabilities are supported by a comprehensive set of monitoring capabilities that allow you to track individual workload behaviour through a range of performance and resource consumption metrics.
Create a Workload Definition that has a higher priority than another Workload Definition
CREATE WORKLOAD NEWCAMPAIGN
SESSION_USER GROUP (
) SERVICE CLASS MARKETINGSC
POSITION BEFORE CAMPAIGN
Create a Rule to stop queries running for excess time
CREATE THRESHOLD MAX_QUERY_RUNTIME
FOR DATABASE ACTIVITIES ENFORCEMENT DATABASE
WHEN ACTIVITYTOTALRUNTIME > 1 HOURS STOP EXECUTION
What can I do with Db2 Warehouse?
Db2 Warehouse comes with Polyglot database persistence, allowing you to store and manage NoSQL and Relational data types in an optimal format for your workload.
Out of the box, Db2 Warehouse comes ready to go with Db2 BLU Columnar in-memory technology as the default table type, but this can be extended with NoSQL JSON/BSON support. It also allows you to define Db2 row-based tables, which include JSON- and XML-native data type and query support.
This means that whether you have a Star or Snowflake Schema Data Mart, Normalised Enterprise Data Warehouse, Operational Data Store (ODS), or a mix of all of the above, you can optimize your data warehouse to meet your business objectives and monitor and manage its service levels at the same time.
When deploying a clustered scale-out solution, queries are divided and conquered using a shared nothing architecture. Tables are automatically partitioned across the database partitions—using Hash Partitioning by default—but with the option of using random partitioning keys as an alternative.
This makes it a perfect platform for simple data warehouse workloads, with out-of-the-box in-memory database performance, scaling through to complex operational data warehousing, and myriad different data types, schemas, and workloads.
Polyglot persistence requirements are becoming more frequent in a modern data warehouse. For example, in modern retail businesses, many transactions are recorded in JSON documents and pushed back to the central business systems from multiple channels. The baseline transaction details are stored in traditional row data format and the fine-grained line item details of each transaction are stored as an associated JSON document.
Typically, to unpack and analyse the detail data (to perform trend analysis or market basket analysis, for example), ETL jobs would have to be crafted and multiple schemas developed for each analysis scenario.
However, with Db2, only the single source table is required. By using Db2’s ISO standard JSON SQL capabilities and leveraging Key Value Indexing, the analysis of combined relational and JSON data can be performed in a single query, minimising time to analyze and optimising business analysis flexibility.
Other key Db2 Warehouse capabilities
- Db2 data compression: Db2 deploys optimised compression for both columnar and row data, compressing in memory and in storage and materializing the data as late in the query plan as possible to optimise memory and CPU usage. The Db2 BLU Columnar tables uses Huffman encoding, which even allows values to be evaluated in predicates without the need for decompression, further optimizing memory usage.
- External tables: Db2 has the capability to map an external file as a table (for the purposes of reading and faster loading of data) directly from an SQL statement, avoiding the overhead of a traditional database import.
- Analytic functions: Db2 Warehouse supports a range of analytical stored procedures and functions, including ANOVA, Association, Decision Trees, k-Means, Linear Regression, and many more.
- Geospatial functions: Db2 supports spatial query functionality with support for ESRI Shapes and Geographic Markup Language (GML) support. This supports applications like ESRI ArcGIS.
- Oracle compatibility mode: Db2 provides the ability to run Oracle-compatible queries and scripts over Db2, making it an easier environment for Oracle DBAs and developers to adapt.
- Federation to Db2 servers: Db2 has the ability to perform native federation capabilities to other Db2 servers.
- Integrated Apache Spark: Db2 Warehouse includes an integrated Apache Spark cluster for running data science projects and analytics, making it easy to run an Apache Spark job with Db2 Warehouse. Jobs can simply run via a database-stored procedure or directly against the Spark Cluster itself.
- Range of open source client libraries and integration: While Db2 comes with a full installable client and JDBC client capability, there is also a range of different client libraries available from Python to Ruby on Rails and Node.js. Commonly used open source drivers are available on GitHub IBMdb.
What next? Try it out!
There are a series of technical education hands-on labs and product tours that enable you to get a feel for IBM Db2 Warehouse, whether you are deploying on the ground or in the cloud:
You can also visit the respective www.ibm.com/demos pages for the form factor you are interested in:
An IBM Db2 Warehouse Developer Edition of IBM Db2 Warehouse is available from Docker hub, and you can also try a full scale-out IBM Db2 Warehouse Enterprise edition trial.