File and Object Storage

 View Only

How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster

By Archive User posted Mon November 27, 2017 02:17 AM

IBM Spectrum Scale Sharing Nothing Cluster performance tuning guide has been posted and please refer to link before you doing the below change.

Here is the tuning steps.
Step1: Configure spark.shuffle.file.buffer
By default, this must be configured on $SPARK_HOME/conf/spark-defaults.conf.
To optimize Spark workloads on an IBM Spectrum Scale filesystem, the key tuning value to set is the 'spark.shuffle.file.buffer' configuration option used by Spark (defined in a spark config file) which must be set to match the block size of the IBM Spectrum Scale filesystem being used.

The user can query the size of the blocksize for an IBM Spectrum Scale filesystem by running: 'mmlsfs