Open Source Offerings

Latest version of Big SQL now available on Cloud Pak for Data v3.5 and Cloudera Data Platform v7.1

By Priya Tiruthani posted Tue December 15, 2020 08:26 PM

  



IBM Db2 Big SQL, an advanced SQL engine for Big Data (on Hadoop or Object Stores), has been making strides with the fast-evolving open source ecosystem by supercharging analytical workloads on data lakes. The core capabilities of Big SQL focusses on SQL compatibility, scalability, performance, and of course enterprise security/governance, making it a desirable query engine to seek insights from Big Data. With Db2 at its core, for query runtime, compiler and cost optimization, Db2 Big SQL is optimized for open source file formats like ORC, Parquet, Avro, etc. stored in HDFS or object stores. 

The big data landscape has been evolving to new platforms and environments and along with that Big SQL has also been reinventing itself to bring more value to the changing landscapes. Therefore, we have 2 formats of Big SQL available. 

Db2 Big SQL on Cloudera Data Platform: With Hortonworks Data Platform and Cloudera Distributed Hadoop becoming legacy platform, Cloudera Data Platform is the latest platform that brings the strength of HDP and CDH together. Big SQL is now integrated with the latest platform to brings enhanced SQL query capabilities to the data in HDFS. The strengths of Big SQL that complement Hive or Impala on the platform is its inherent enterprise capabilities of executing complex queries efficiently with no OOM errors and also the ability to handle 100s of concurrent users as it knows how to draft out an efficient query plan. 

For environments requiring cloud native capabilities that enables the flexibility to scale up/down or in/out based on workloads needs is available as part of IBMs flagship product, IBM Cloud Pak for Data. The platform brings together all the tools required to streamline a data pipeline in an enterprise to address most, if not all, analytics and AI use cases with centralized governance, security and user management. 

In 4Q 2020, we had a new releases for each of the form factor.. check out the announcement below:

Simplified provisioning and monitoring of Db2 Big SQL in Cloud Pak for Data v3.5 with brand new UI

Db2 Big SQL is now generally available on Cloud Pak for Data platform to extend the use cases towards data lakes. Db2 Big SQL on Cloud Pak for Data is an elastically scalable, cloud native SQL engine for data engineers & data scientists to analyze the structured and unstructured data already stored on HDFS or object stores. Db2 Big SQL is nimble and elastic service, on the platform, to handle ephemeral workloads by scaling up/down compute based on workload needs.

As a microservice on the platform, Db2 Big SQL, is optimized for analytics on open source file formats stored in Big Data stores. Some highlights are:  

  • Drives additional workloads, driven by Hadoop use cases, from CP4D. Explore additional analytics on existing Big Data stores that are outside of Cloud Pak for Data for structured, unstructured and semi-structured data analytics
  • With Db2 Big SQL, Cloud Pak for Data users now have access to different formats of Common SQL Engine on a single platform and can enable different use cases from one platform
  • With its advanced SQL capabilities, Db2 Big SQL enables BI tools to directly connect for interactive SQL analytics with high performance and high concurrency without data duplication or vendor lock-in
  • Port BI applications for business intelligence tools including Cognos, Tableau and others using ODBC or JDBC connections. Migrate existing applications to the platform without major rewrites to access remote Big Data stores
  • Extend Data Virtualization to access object stores using Big SQL and combine that data with other sources for deep analytics
  • Easily scale up/down Big SQL workers based on workload needs and free up compute resources when not running workloads
  • Create multiple instances easily to allow separation of duties for the business groups while accessing data and also doesnt restrict compute resources for others when an important job is running in another instance
  • Simplified deployment and provisioning experience with the ability to monitor and scale the multiple Big SQL instances
  • Compute only workers provides elastic scalability with its ability to successfully run all 99 TPCDS queries up to 100TB with numerous concurrent users. 
  • Robust SQL based role-based access control (RBAC) like row-based dynamic filtering and column-based dynamic masking are natively available in Db2 Big SQL
  • Provides a stable environment for applications and avoid unnecessary query rewrites with Hadoop platform changes or migrations
  • Supports popular open source file formats like Parquet, ORC, Avro, Text, Sequence, etc. enables reusing the schema definitions already setup
  • Data scientists can access data directly using their tool of choice and build, test and deploy models seamlessly using Db2 Big SQL

Useful links:
1. Blog from Cloud Pak for Data offering management on v3.5 release

2. For feedback or feature requests, please reach out to Priya Tiruthani (Offering Manager, Db2 Big SQL)  at ntiruth@us.ibm.com or submit your ideas in Aha! Ideas Portal (Choose Component: Db2 Big SQL)



Db2 Big SQL v7.1 on CDP Private Cloud Base (also called CDP Data Center) v7.1.3 accelerates your digital transformation journey on the new unified platform
Db2 Big SQL is now available on Cloudera Data Platform Private Cloud Base (previously called CDP Data Center) v7.1.3. Db2 Big SQL allows data engineers & data scientists to analyze the structured and unstructured data already stored on HDFS. This format of Db2 is optimized for open source file formats stored in Hadoop. Some highlights are:  
  • Integration with CDP Private Cloud Base v7.1.3
    • Tight integration with the various components, like Hadoop, Hive, Cloudera Manager, Ranger, etc.,  in the platform as all the component levels have changed
    • Provide ACID capabilities for changing data
    • Zookeeper based solution for HA automatic failover
    • Upgrade Big SQL from legacy platforms to the latest unified platform. 
  • Create and centrally manage data access policies using Ranger for Db2 Big SQL tables
    • Add support for column-level Ranger tag-based policies for fine grained access controls on data
    • Enhance Ranger policies on Big SQL tables to have both column masking and row filtering via Ranger UI
    • Extend the Ranger column masking/Row-filtering policies to not only Hadoop tables but also to federated tables (Nicknames only)
  • SQL Core
    • Enhance SQL core by improving Reader/writer operations and enhance error handling to improve trouble-shooting customer issues
    • Extending federation capabilities to query data in Blockchain from a single SQL engine
    • Add support for ACID on Hive tables that are compacted to merge the changes that been done to a table
  • Performance
    • Enhance Scheduler to better coordinate operations among the various Big SQL workers in data nodes
With its advanced SQL capabilities, Db2 Big SQL enables BI tools to directly connect for interactive SQL analytics with high performance and high concurrency without data duplication or vendor lock-in. With changing platforms, Big SQL provides a stable environment for applications and avoid unnecessary query rewrites with platform changes or migrations. Port BI applications for business intelligence tools including Cognos, Tableau and others using ODBC or JDBC connections. With its support for popular open source file formats like Parquet, ORC, Avro, Text, Sequence, etc., data engineers and data scientists can access data directly using their tool of choice and build, test and deploy models seamlessly using Db2 Big SQL.

Useful links:
1. Announcement of Big SQL v7.1 on CDP v7.1.3 release

2. For feedback or feature requests, please reach out to Priya Tiruthani (Offering Manager, Db2 Big SQL)  at ntiruth@us.ibm.com or submit your ideas in Aha! Ideas Portal (Choose Component: Db2 Big SQL)
3. Product webpage

0 comments
7 views

Permalink