
Today, we announce that the latest version of Apache Spark, Spark 4.0, will be generally available for z/OS later this quarter in IBM Z Platform for Apache Spark v1.4. Why “1.4?”, you ask? This allows us to align our product release number with the current Apache Spark version number, as well as indicate a significant jump in the underlying Apache Spark version, version 4.0.
Apache Spark 4.0, made available by the open source community earlier this year, contains a number of performance and functionality improvements, the most notable of which is the general availability (GA) of Spark Connect. This new feature allows users to express their requirements not through language dependent APIs but in SQL-like string-based parameters and gRPC/Protobuf data transfers. Over time, this will eliminate the need for Spark to support the growing number of new languages available and concentrate on the established languages like Java, Scala and Python. Users will use the Spark Connect Server, running on their Unix System Services (USS) ID, to manage the interaction with the Spark cluster. For more information, see the documentation on the Apache Spark website.
In this release, all of the features of the z/OS platform integration have been added to the latest Apache Spark version. Jobname support and improved Spark job scheduler support allow for familiar management and submission of Spark jobs and resources. Additionally, z/OS-specific port security enhancements are designed to give the open source Apache Spark platform a z/OS level of security. Why move the data frequently when you can move compute processing to the data and access it as often as needed?
In addition, the z/OS-exclusive Spark WebUI Authentication feature is now enabled by default. Originally introduced as an optional feature in Spark 3.5.1.6 as part of IBM Z Platform for Apache Spark v1.1 release, it enhances cluster security by requiring user authentication to access the Spark WebUI, protecting both cluster details and the underlying system. Note that additional configuration is required to fully enable and use the Spark WebUI Authentication feature.
This new release of IBM Z Platform for Apache Spark has several dependency changes that you may need to prepare for:
1. The Apache Spark base has been updated to use Java 17 as its minimum supported release. IBM Z Platform for Apache Spark 1.4 is planned to support Java 17 and 21 at GA and plans to make Java 25 support available during the release lifecycle.
2. In addition to the Java update, Apache Spark 4.0 is built on a newer release of the Scala language, 2.13. (This change is likely to require application updates and recompiles to use Apache Spark 4.0.)
3. On z/OS, IBM Z Platform for Apache Spark 1.4 now requires an updated level of the Bourne Again Shell, also known as Bash. Bash 5.2.37 is made available from the IBM Open Enterprise Foundation product, or you can find Bash 5.3 at the zOpenTools Community website at zopen community - Open Source for z/OS
4. Finally, Python users are encouraged to upgrade their systems to Python 3.11 or later. Apache Spark 4.0 lists Python 3.11 as its minimal supported Python release.
One final improvement worth highlighting: As part of this release, Apache Spark development has updated a number of dependency packages in order to enhance the security of the new version. These changes help ensure that Apache Spark remains aligned with other Open Source Software updates over time, and that it is relevant and trusted well into the future.
Information on Apache Spark 4.0:
Information on IBM Z Platform for Apache Spark 1.4 (to be available at GA):
Questions? Contact us!
- Reach out to us at aionz@us.ibm.com