by Barbara Sannerud and Jonathan Sloan
Spark is one of the most widely-used engines for scalable computing, with 80% of the Fortune 500 using Apache Spark™1. With thousands of contributors to the open source project, and with its ability to support many data sources and languages, Spark is an appealing technology for banks and financial institutions.
With IBM Z’s advantages for critical industries with its hyper scale performance, availability, and security, it is a natural step for Spark to be supported on IBM Z and LinuxONE. The Analytics Engine powered by Apache Spark was introduced on November 30th 2021 as a recent enhancement to IBM Cloud Pak for Data on IBM Z 4.0.3. With IBM Z system's advantages for performing analytics on platform, such as collocation of transactions and data, the following are benefits of using Analytics Engine for Apache Spark on Z:
- High performant communications between IBM Z partitions for running Spark on Z and accessing data sources running on IBM Z
- Access to current transactional data and to historical data on Z as well as to other non-IBM Z data sources
- Access to many different data sources such as VSAM, sequential or partitioned data sets, as well as log stream data (SMF) and databases like IBM Db2 for z/OS, VSAM, etc. when running Spark on IBM Z along with IBM Data Virtualization Manager for z/OS
- Protection of sensitive data governed by security and privacy mechanisms on Z, such as Hyper Protect Data Controller and pervasive encryption
- Ability to analyze data in place on IBM Z, without the need to move data off platform, and thus avoid the latency delays from moving data, as well as the risk of using stale data
- Benefit from libraries on Z for superior performance on select operations, such as SIMD
- Ability to leverage common skill sets of data scientists and developers familiar with Spark
- Exploit standard Linux security mechanisms on IBM Z
Value of IBM Cloud Pak for Data
IBM Cloud Pak for Data is a cloud-native enterprise insights platform designed to help you efficiently generate meaningful insights from your data. IBM Cloud Pak for Data helps you connect to data, wherever it might be, govern it, and derive additional value from it via analytics. It enables users to collaborate using a single, unified interface that supports many services designed to work together. With modern tooling that removes barriers to collaboration, users can spend more time analyzing and using data effectively and less time on integrating components.Analytics Engine powered by Apache Spark (Spark)
Analytics Engine helps extend the value of IBM Cloud Pak for Data on IBM Z by allowing users to process and query large data sets. To date, IBM Cloud Pak for Data supports many industry leading analytics tools, and the Analytics Engine adds yet another choice for developers and data scientists. Analytics Engine enables several languages such as Scala 2.11 & 2.12, Python 3.8 and Spark 2.4 & 3.0.
IBM Analytics Engine is part of IBM Watson™ Studio, in IBM Cloud Pak for Data, and can be used to run Watson Studio jobs in the IBM Analytics Engine. Spark is a fast engine optimized for use with large scale in-memory data bases. IBM z15’s 40 TB of DRAM memory protected by RAIM is an ideal platform for use with Spark. Analytics Engine powered by Apache Spark on IBM Z helps organizations instantiate a highly efficient analytics deployment that reduces the latency, cost inefficiencies and potential security exposures associated with data movement.
With tight integration with Watson Studio, Machine learning and with the qualities of service of IBM Z, organizations can develop and deploy their applications on a platform optimized for production. Organizations can also flexibly customize Apache Spark and configure clusters with third-party libraries and other packages.
Instead of spending time deploying and managing Spark, organizations can spend time writing applications to drive better customer experiences and make faster decisions. Learn more here (https://www.ibm.com/cloud/analytics-engine