Optimizing your workload using Worker Threads in IBM SDK for Node.js

Back to Blog List

Optimizing your workload using Worker Threads in IBM SDK for Node.js – z/OS, V12

Node.js|Claire Nelson|Apr 2, 2020

Like

This article will demonstrate the benefits of worker threads, and how to utilize them in IBM SDK for Node.js – z/OS®, V12.

Node.js utilizes an event loop, which allows it to perform asynchronous, non-blocking I/O, through the use of callbacks, promises, and async/await. This makes Node.js the perfect candidate for I/O workloads, since the event loop allows I/O operations to run in parallel. Your JavaScript code executes in a single thread, which traditionally meant that you couldn't perform CPU-intensive workloads without blocking the event loop. With the support of worker threads in IBM SDK for Node.js – z/OS, V12, it is now possible to efficiently run non-blocking CPU-intensive operations.

Let's look at an application where we need to perform some CPU-intensive workloads.

Without Worker Threads

// cpu.js

/* Simulate CPU-intensive workload. */
function cpuWorkload(unsortedList) {
  let sortedList = unsortedList.slice();
  let swapped;
  do {
    swapped = false;
    for (let i = 0; i < sortedList.length; ++i) {
      if (sortedList[i] > sortedList[i + 1]) {
        let tmp = sortedList[i];
        sortedList[i] = sortedList[i + 1];
        sortedList[i + 1] = tmp;
        swapped = true;
      }
    }
  } while(swapped);

  return sortedList;
}

/* Simulate I/O-intensive workload. */
function ioWorkload(length) {
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      let list = Array(length).fill().map(() => Math.floor(Math.random() * length));
      resolve(list);
    }, 1000);
  });
}

/* Run I/O workload to fetch data, then process data with CPU workload. */
async function runWorkload() {
  console.log('Begin I/O workload');
  let ioStart = process.hrtime();
  let data = await ioWorkload(30000);
  let ioEnd = process.hrtime(ioStart);
  console.log(`I/O workload done in ${(ioEnd[0] + ioEnd[1] / 1000000000).toFixed(2)}s`);

  console.log('Begin CPU workload');
  let cpuStart = process.hrtime();
  cpuWorkload(data);
  let cpuEnd = process.hrtime(cpuStart);
  console.log(`CPU workload done in ${(cpuEnd[0] + cpuEnd[1] / 1000000000).toFixed(2)}s`);
}

function run() {
  let procStart = process.hrtime();
  let workloads = [];

  // Arbitrarily run it 3 times for demonstration.
  for (let i = 0; i < 3; i++) {
    workloads.push(runWorkload());
  }

  Promise.all(workloads).then(function() {
    let procEnd = process.hrtime(procStart);
    console.log(`All workloads done in ${(procEnd[0] + procEnd[1] / 1000000000).toFixed(2)}s total`);
  }).catch(err => console.error(err));
}

run();

In our application, we simulate a simple workload of fetching some data, and then processing the data we fetched. Fetching data is an I/O workload, while processing data is a CPU workload. We'll also output log messages so we can better see how our workload behaves. Let's try running our application.

Begin I/O workload
Begin I/O workload
Begin I/O workload
I/O workload done in 1.01s
Begin CPU workload
CPU workload done in 4.04s
I/O workload done in 5.05s
Begin CPU workload
CPU workload done in 4.15s
I/O workload done in 9.20s
Begin CPU workload
CPU workload done in 3.97s
All workloads done in 13.18s total

From our output*, we can see what it means to block the event loop. Our I/O operations began in parallel. However, as soon as one I/O operation finished, and the data processing began, further operations had to wait until the CPU workload finished before continuing.

Our immediate thought might be to go ahead and put the data processing into a worker thread. There is a caveat though - spawning workers isn't cheap. To avoid making the wrong optimization, we want to make sure that most of our execution time is in fact being spent on our CPU workload, rather than something else. We can run node with the --runtime-call-stats option.

node --runtime-call-stats cpu.js

This will output a table that logs how many times each Runtime Function/C++ Builtin was executed, and how long it took to execute them. What we care about is the JS_Execution, which happens to be on the very first line*:

                      Runtime Function/C++ Builtin        Time             Count
========================================================================================
                                      JS_Execution  12064.63ms  99.29%       184   0.71%

We can see that a significant portion of the total execution time was spent on JavaScript code. This means that our workload is highly CPU-intensive, and using worker threads would be the correct optimization decision.

Worker Threads to the Rescue

Let's optimize our code by putting our CPU workload into worker threads. We must split our code into two files. In io.js, we keep most of our code as-is. However, in place of where we would've ran our CPU workload, we will instead call runWorker(), a new function we created. runWorker() is responsible for spawning a new Worker thread that executes worker.js in parallel, without blocking the event loop.

// io.js
const { Worker } = require('worker_threads');

/* Use a worker thread to handle CPU-intensive workloads. */
function runWorker(workerData) {
  return new Promise((resolve, reject) => {
    // spawn worker.js as a Worker, and pass in workerData to the thread.
    let worker = new Worker('./worker.js', { workerData });
    // returns when the Worker sends a message containing the processed data.
    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', (code) => {
      if (code !== 0) {
        reject(new Error(`Worker stopped with exit code ${code}`));
      }
    });
  });
}

/* Simulate I/O-intensive workload. */
function ioWorkload(length) {
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      let list = Array(length).fill().map(() => Math.floor(Math.random() * length));
      resolve(list);
    }, 1000);
  });
}

/* Run I/O workload to fetch data, then process data with CPU workload. */
async function runWorkload() {
  console.log('Begin I/O workload');
  let ioStart = process.hrtime();
  let data = await ioWorkload(30000);
  let ioEnd = process.hrtime(ioStart);
  console.log(`I/O workload done in ${(ioEnd[0] + ioEnd[1] / 1000000000).toFixed(2)}s`);

  console.log('Begin CPU workload');
  let cpuStart = process.hrtime();
  await runWorker(data);
  let cpuEnd = process.hrtime(cpuStart);
  console.log(`CPU workload done in ${(cpuEnd[0] + cpuEnd[1] / 1000000000).toFixed(2)}s`);
}

function run() {
  let procStart = process.hrtime();
  let workloads = [];

  // Arbitrarily run it 3 times for demonstration.
  for (let i = 0; i < 3; i++) {
    workloads.push(runWorkload());
  }

  Promise.all(workloads).then(function() {
    let procEnd = process.hrtime(procStart);
    console.log(`All workloads done in ${(procEnd[0] + procEnd[1] / 1000000000).toFixed(2)}s total`);
  }).catch(err => console.error(err));
}

run();

Our CPU-intensive workload will be moved to worker.js. We will need to use the worker_threads API to pass in data and return the results, but we can keep our code for cpuWorkload the same.

// worker.js
const { workerData, parentPort } = require('worker_threads');

/* Simulate CPU-intensive workload. */
function cpuWorkload(unsortedList) {
  let sortedList = unsortedList.slice();
  let swapped;
  do {
    swapped = false;
    for (let i = 0; i < sortedList.length; ++i) {
      if (sortedList[i] > sortedList[i + 1]) {
        let tmp = sortedList[i];
        sortedList[i] = sortedList[i + 1];
        sortedList[i + 1] = tmp;
        swapped = true;
      }
    }
  } while(swapped);

  return sortedList;
}

function run() {
  // workerData is a clone of the data passed to this worker thread.
  let processedData = cpuWorkload(workerData);
  // parentPort allows communication with the parent that spawned the worker.
  // parentPort.postMessage() sends a message containing data back to the parent.
  parentPort.postMessage(processedData);
}

run();

To run our application, we only need to run io.js.

Begin I/O workload
Begin I/O workload
Begin I/O workload
I/O workload done in 1.00s
Begin CPU workload
I/O workload done in 1.01s
Begin CPU workload
I/O workload done in 1.01s
Begin CPU workload
CPU workload done in 5.51s
CPU workload done in 5.52s
CPU workload done in 5.52s
All workloads done in 6.54s total

Our CPU workload* was able to run in parallel without blocking other operations. We can verify this by using the --runtime-call-stats option again.

node --runtime-call-stats io.js

We now see the Runtime Function/C++ Builtin table be printed a total of 4 times - once for each worker thread, and once for the main thread. If we look at the table for the main thread, we can see that JS_Execution is now a significantly smaller percentage of the total execution time*.

                      Runtime Function/C++ Builtin        Time             Count
========================================================================================
                   API_ValueDeserializer_ReadValue      8.69ms  12.61%         6   0.02%
                    PreParseWithVariableResolution      8.14ms  11.82%      1312   4.56%
                                      JS_Execution      7.22ms  10.48%       258   0.90%

Conclusion

Worker threads are very useful for applications that need to perform CPU-intensive workloads. In this article we saw a very simple example of how worker threads can be used to optimize an application. You may encounter more complicated scenarios in real world applications, such as the need to process I/O workloads from a Stream. If you want to learn how to resolve complex problems with worker threads, API documentation can be found at https://nodejs.org/docs/latest-v12.x/api/worker_threads.html.

* The numbers shown are intended only as visual aids to help better understand worker threads. You may not see the same numbers on your machine.

IBM Z and LinuxONE - Group home