Mastering Multiple Threads (Part 1)
A few months ago, Quebit posted a good article about using RunProcess to shorten data load times; the link is here and it is well worth a read.
While I am a big fan of RunProcess, I think it worthwhile to describe a use case before going into some specific design considerations.
Fast and Furious
I think we can agree that it is preferable to load large volumes of data outside of working hours. Users can get annoyed when they are stuck in a wait, particularly when this is for more than a few minutes.
Of course intra-day loads are sometimes avoidable (or even desirable) when users require refreshed data. At month-end for an example, we reload our General Ledger every 60 minutes, and this allows us to complete our reporting much more quickly than would otherwise be possible. Users know when this takes place and will usually tolerate a short delay around these times.
There is also a practical limit on how much you can actually process overnight. Feeder reprocessing, for example, can take hours on large cubes and requires exclusive cube access. These kind of jobs must have priority, and it is therefore important to make the best use of multi-threading on other processes where possible.
For these reasons, when it comes to data loads I try to use a 'fast and furious' approach - get it done as quickly as possible.
Load File Example
Back in our on-premise version, we had a server with six cores and there was a requirement to import and process six large data files each day. These files came from different source systems and required a considerable amount of cleaning and calculation before the data could be loaded into the cube.
Here are a few of the approaches we tried:
Approach 1: ExecuteProcess
Using a traditional ExecuteProcess, each state-based process runs sequentially as a single thread through a chore or a control process.
The diagram below shows what this looks like. State file 1 is processed, then state file 2, etc. Each file ticks along in the background and the whole thing would finish in just under 3 hours. Only 16% of the available CPU was used ( 1/ 6 core ).
There's nothing wrong with this approach, except that it just drags on. It limits the amount of work that can be done in the overnight processing window. It seems so wasteful to have five cores doing nothing for most of that time.
Approach 2: Hustle
When IBM introduced TM1RunTI back in 2018, the clever folks at Cubewise launched a small utility called Hustle that orchestrated the running of TM1RunTI. It took a little bit of work to get running, but when set up correctly, Hustle would minimise the processing time by ensuring available the maximum numbe of cores were busy. It could also leave headroom available to allow cores to remain open for users to continue working. Best of all Hustle, would know when a process was finished, so dependent processes could be scheduled to run in the same job stream.
In this case, you'll see that our processing completed in around a third of the time. Hustle would start state file 1, 2, 3, and 4 together. It then processed file 5 when core 2 became available, and file 6 when core 4 became available. This was achieved without swamping the server.
Approach 3: RunProcess
I wasn't sure that Hustle would be available to in us in the IBM Cloud. So in preparation for our cloud migration, I ended up rewriting the Hustle processes to use RunProcess instead. RunProcess is standard TM1 functionality and is a little easier to implement than Hustle, and achieves much the same kind of result, albeit slightly less efficiently.
This is what a RunProcess looks like:
You can see here that state file 1 starts processing on its own. After a predefined delay, state file 2 is then initiated. After another delay, state file 3 is initiated.
Well although RunProcess does allow you to maximise the use of threads but it has two serious drawbacks.
The first drawback is that there is no easy way to know that the last process has finished. Cubewise has proposed an approach using the synchronise function. I have tested this and it does work, but it's still not quite as good as Hustle.
The second drawback is there's is no easy way to manage the server load. This can lead to the server becoming swamped and users not even being able to connect. The easiest way around this is through the use of a Sleep() command. When submitting a large number of RunProcess requests, I insert a wait period (say 20 minutes) to give the server a little time to recover.
If you have a large difference in the processing times, setting a long sleep time may leave large gaps between jobs. If the sleep time is too short, the server may still become swamped as jobs bunch together towards the end.
This approach is not ideal, but it still gets the job done.
There are good reasons for wanting to run jobs on multiple threads. It is not always possible to get all of the heavy-lifting done overnight, and therefore the 'fast and furious' can be the best approach.
RunProcess is good in theory, but as you'll see in Part 2, there are a couple of things that need to be considered.