Big Data platforms have evolved quite a bit in the last 4 years. The good news is, all the various leading on-prem offerings have consolidated to one offering. That is... Cloudera Data Platform Private Cloud (CDP PvC).
Being Cloudera's strategic partner, IBM is happy to bring CDP PvC to its customers. What is CDP?
As shown in the image below, we made available many different offerings from Cloudera that were offered by Hortonworks and Cloudera in the past. With the merger of the 2 companies, now we have an advanced Big Data platform to bring the best of the legacy platforms and also introduces new capabilities for today's need i.e. provide cloud native solutions that brings easy scalability and consumability.
Why is it important to migrate to CDP?
Cloudera Data Platform Private Cloud
provides a platform that has compute and storage separate to give flexibility in scaling them independently. The platform has been architected to provide 3 important aspects:
- Shared data lake - storage layer: All data is stored in HDFS or Ozone (object store) that is very tightly integrated with security and governance via Shared Data Experience (SDX)
- Container platform - compute layer: This platform is deployed on RedHat OpenShift platform that enables the infrastructure teams to have simplicity and fast provisioning experiences when requests come from business teams
- Data analytics applications - application layer: This use-case based application layer provides great flexibility to users to have noise-free environments to run their workloads/jobs. Since it is use-case based instances, unnecessary services do not run to choke/compete on the compute resources
Container platform and Data Analytics applications layer run on Kubernetes platform while the data lake (storage) layer runs on traditional bare-metal servers.
Based on how the product is deployed, CDP Private Cloud is divided into 2 parts: CDP Private Cloud Base and CDP Private Cloud Plus.CDP Private Cloud Base:
Base is the next generation of to the traditional platform. People familiar with Hortonworks Data Platform(HDP) and Cloudera Distributed Hadoop (CDH) can easily relate to Base. In the above pic, the section below the RedHat OpenShift platform consists of CDP PvC Base. Base is deployed on bare-metal servers to run HDFS or Ozone (new object store from Cloudera) or both with security and governance enforced. The traditional workloads that used to run on HDP and CDH can still run except against few components like Hive LLAP. CDP Private Cloud Plus:
Plus is the containerized platform that provides flexibility of launching different use-case driven experiences; enables multi-tenancy; provides independent upgrades of services instead of the whole platform; and dedicated compute per tenant. Plus is certified on RedHat OpenShift platform and provides a management console that helps deploy and manage the different experiences seamlessly. The experiences currently available are:
- Cloudera Data Warehouse (CDW): Spins up the compute for Hive or Impala workloads. Let's you choose the SQL engine when launching the service.
- Cloudera Machine Learning (CML): Provides a workspace for executing Machine Learning workloads using Python, R and Spark. And also lets you execute high performance deep learning with distributed GPU scheduling and training while enabling autoscaling.
Soon 3 additional services, Data Engineering, Operational Database and DataFlow, will also be made available in Plus. This new architecture gives you the same perks as in a public cloud environment, namely easy provisioning, maintenance, efficient resource consumption, scalability and application management.
So, why get stuck in traditional architecture which inhibits scaling compute and storage separately and has multiple services struggle for resources. Migrating to the latest platform, CDP Private Cloud, gives you the much needed Cloud characteristics in your data center.
If you would like to learn more about "Journey to CDP
", do reach out to me.