New product versions release frequently, but what makes this one special and motivates you to say "We need to move to 2.5.0, right away!" is the abundance of features added across various personas. We listened and we researched. We crafted each feature with care, thinking of how to make it easier for you to maintain and use in the future.
Let’s look at IBM Spectrum Conductor 2.5.0 from different personas, starting with features common to all, and then ones specifically designed with individual personas in mind.
Common across personas:
Bundling Miniconda, instead of Anaconda
Reducing the time on conda was key to this release. By bundling Miniconda distributions with IBM Spectrum Conductor 2.5.0 instead of Anaconda distributions, the time to deploy the conda distribution is much faster. This offers not only speed, but also a more streamlined experience for administrators to ensure that the organizations’ conda channels provide packages, rather than use the ones from an Anaconda distribution.
Conda management enhancements
Managing your conda environments is now much easier. Previously, you created conda environments by using a conda YAML file, but now you can also update your conda environment by using a conda YAML file. If you go with the latter option, you can download the conda YAML file for any existing conda environments, and then modify as needed. In addition, for more ease, you can now run conda operations in offline mode, or on a single host, when you create or update conda environments.
New reporting index with combined usage and planning information
Across all roles, users want to know various details about cluster usage. IBM Spectrum Conductor 2.5.0 introduces a new Elasticsearch index that has Spark application resource usage information, which is combined with details on the resource plan and Spark priority information. With this extra information, you can learn valuable cluster information: understand why slots are changing across Spark applications when new ones are submitted, and when priorities or plans are changed.
IBM Cloud Pak® for Data integration
When you are looking at expanding beyond Spark, Dask, and AI flows for even greater insight into deployment and monitoring your models, IBM Cloud Pak for Data can be a huge benefit. This scratches only the surface of the tools available in IBM Cloud Pak for Data: for existing IBM Spectrum Conductor customers, why not marry the two products together and get the best of both worlds? Or, if you’re a new customer looking to have large compute clusters, then IBM Spectrum Conductor is the perfect fit with IBM Cloud Pak for Data for all its additional capabilities.
New in IBM Cloud Pak for Data 3.5 and IBM Spectrum Conductor 2.5.0 is an integration to offload your Spark workloads to IBM Spectrum Conductor. Use your Jupyter notebooks in Watson Studio in IBM Cloud Pak for Data by specifying environments, mapping to IBM Spectrum Conductor instance groups. The Spark workload runs in the IBM Spectrum Conductor cluster that uses all the settings of the instance group that is specified in the environment. The Hadoop Execution Engine add-on to IBM Cloud Pak for Data runs Jupyter Enterprise Gateway (JEG) to offload the workload to IBM Spectrum Conductor.
Generic instance groups
In previous releases of IBM Spectrum Conductor, we offered application instances as a generic way to execute long running services on the cluster. However, administrators wanted to be able to run services in the instance groups that were already set up for its different users, and to also avoid users from going to multiple locations. This configuration is now possible with generic instance groups, as we introduce a new concept called components. The first supported component with IBM Spectrum Conductor 2.5.0 is Dask. You can also add your own components.
A component is a nice generic way to add new capabilities into an instance group: it builds on the application instances feature of defining an application template YAML file, but also combines with some of the unique features we already offer with instance groups (for example, the Spark configuration editor), by defining a generic configuration file. We encourage you to integrate components and look forward to working with you to expand on instance groups even more in the future!
Upgraded Elastic Stack version
A release would never be complete without updating the software to newer versions. We upgraded software for many components of IBM Spectrum Conductor 2.5.0 (such as to leverage the latest capabilities or to address security vulnerabilities), but the upgraded Elastic Stack is worth calling out. We now use Elastic Stack version 7.8.1, and our own custom-built security layer instead of Search Guard, for handling SSL and security throughout the Elastic Stack.
RHEL 8 support
IBM Spectrum Conductor 2.5.0 supports RHEL 8.1 and RHEL 8.2 on Linux® x86 64-bit and Linux on POWER®. In addition, RHEL 8.0 is supported for Linux® x86 64-bit.
Security is a significant concern for all of us. IBM Spectrum Conductor 2.5.0 supports Security-Enhanced Linux® (SELinux) to provide enhanced security.
Host factory enhancements
Host factory is a little-known feature in IBM Spectrum Conductor with large impact to your cluster: use host factory to burst into the cloud when there aren’t enough resources in your cluster to handle the current workload. The feature has been around for a few releases, but with IBM Spectrum Conductor 2.5.0 brings various significant host factory improvements, including:
- Cost-based cloud host and provider selection, to minimize the overall cost
- Multi-tenancy support
- Calendar-based host provisioning
- A utilization-based scale-out and scale-in policy, to add to the existing Spark workload-based scale-out and scale-in policy
- AWS Spot Instances support
Spark lifecycle enhancements
Deeper integration of Spark with EGO was always a key part of our integration to have Spark run the fastest on IBM Spectrum Conductor. This release adds new integrations to help you maximize your cluster usage. Now you can run multiple Spark tasks on a single slot and is configurable at the instance group and application level. Those using IBM® Spectrum Symphony are familiar with this option and it brings parity between IBM Spectrum Symphony and IBM Spectrum Conductor.
In addition, it is now possible to configure multiple resource groups for Spark workload. You can manually configure to have a primary resource group that is used first, and then a secondary one used.
For data scientists:
Spark 3.0.1 support
Apache Spark 3.0 brought many new enhancements to Spark. With IBM Spectrum Conductor 2.5.0, we added support for the latest version, Spark 3.0.1. See the following respective release notes for a list of enhancements in Spark 3.0 and Spark 3.0.1.
As Python is used for many machine learning workflows, it brings the need for a native scheduling approach. Spark can meet numerous use cases but isn't always the best option. Dask is now integrated into the instance group lifecycle, which you can create instance groups with Spark and Dask, or with only Dask (so that Spark is now optional in an instance group). More details on the Dask integration to follow in future blog posts.
New simplified flow
Data scientists that are focused on using notebooks might experience the flow to use notebooks as tedious. With IBM Spectrum Conductor 2.5.0 is an updated My Notebooks & Applications page (previously known as My Applications & Notebooks), which by default, shows all your notebooks across all instance groups that you can fully manage (start, stop, log retrieval, and other operations). The old view is still available, by selecting a Show applications checkbox. In addition, a new tab with the notebook management capabilities is available in the Show applications view.
Jupyter Notebook package changes
A new Jupyter Notebook package is available with version 6.0.0, which supports all versions of Jupyter 6.0.0 and later. This new notebook package no longer bundles Jupyter Enterprise Gateway (JEG); instead, it’s a requirement to first install JEG in your conda environment. This configuration gives greater flexibility and is easier for managing the JEG version used. This is a great opportunity to stay up to date on the latest, and eliminating conflicts with the version we try to install with IBM Spectrum Conductor.
In addition, with the integration of Dask, we enhanced the Jupyter start script to check for package dependencies to ensure that your notebook works with the features that are enabled, providing a better user experience.
JupyterLab is now the default for Jupyter 6.0.0 Notebook
JupyterLab is now started by default when you use the Jupyter 6.0.0 Notebook package, assuming JupyterLab is installed in your conda environment. If JupyterLab is not installed, the configuration defaults back to the original Jupyter Notebook interface. There are many benefits to using JupyterLab, and we recommend you to check it out.
Give it a try and tell us what you think by downloading IBM Spectrum Conductor 2.5.0 on Passport Advantage or the evaluation version!
We hope you are as excited as we are about this new release! Log in to this community page to comment on this blog post. We look forward to hearing from you on the new features, and what you would like to see in future releases. #SpectrumComputingGroup