Since IBM Streams v4.2, Streams applications are automatically optimized using the automatic fusion and threading features.However, if you would like to manually optimize your applications, this post is a guide to doing so. It includes a link to a presentation on how to manually optimize Streams applications. It is also of interest to anyone who would like to have a better understanding of how the SPL runtime works internally.The attached slides are meant to be presented, but they were written with enough context so that they should be understandable on their own. There have been several requests to make these generally available.
The slides present and explain the following performance lessons:
Compile with -a.
Fuse operators into the same PE to reduce communication costs.
Insert threaded ports into PEs to increase throughput through pipeline parallelism.
Prefer threaded ports over PEs to obtain pipeline parallelism.
Use multiple PEs in an application to take advantage of multiple hosts.
Use one PE per host.
If there are two PEs on the same host, they should probably be fused into one PE. Insert threaded ports to regain parallelism.
Improve the performance of bottlenecks to improve the throughput of an application.
Trying to improve the performance of an application without knowing who is the bottleneck is a waste of time.
When a parallel region is no longer the bottleneck, further parallelism will not help.
Know your hardware. Distribute PEs to hosts so as to avoid over-subscribing any resource (cores, memory, disk, etc.) on that host.