When a file is uploaded to Global Mailbox, the system uses the file size to determine where to store the payload associated with the file.
- If the file is greater than 10kb, the payload is stored on the local shared disk and is replicated via the Aspera FASP protocol.
- If the file is smaller than 10kb, the payload is encrypted using a dynamically generated encryption key and stored in a table in the Cassandra database. Cassandra then replicates this table to the other data centers. These are called inline payloads.
This blog post discusses how to tune the system to increase the throughput of smaller files. Specifically this post is related to the encryption phase. As mentioned, each file is encrypted and stored in Cassandra. The system uses a pool of threads to pre-generate encryption keys to increase performance.
If you receive a lot of small files which are stored in Cassandra, this thread pool can be tuned to increase the throughput of smaller files.
The following parameters can be used to tune the pre-generation of the encryption keys used during upload of smaller files. These parameters are added to the global.properties file in each data center.
The total number of threads that can pre-generate encryption keys for small payloads. With more threads, more keys can be pre-generated at the same time. The default value for this property (if not specified in configuration files) is 4. The maximum value for this property is 32.
The total number of pre-generated encryption keys across all threads. With more pre-generated encryption keys, more files can be uploaded faster because the key has already been generated.
Prior to 6.2, the default value is 100 and the maximum value for this property is 1000. For 6.2 and higher, the default value is 2000 and there is no limit for the maximum value (other than memory limitations).
Results from IBM testing on 6.1.x
IBM tests Global Mailbox performance to determine throughput of uploads and processing. We use a mixed load of various file sizes.
Our tests used the following load:
- 80% files less than 10kb
- 20% files greater than 10kb
When using the default configuration for the two properties above, the throughput achievable in our test runs on our hardware was approximately 140 files per second.
After tuning these parameters with the following values, the throughput increased to 175 files per second. This is approximately a 25% increase in throughput.
- More threads take more heap
- The JVM does not have infinite threads, setting the number of threads too high can have a negative impact
- More pre-generated keys means that more memory from the heap will be used to store the keys
- Your workload may be different and may not see the same results
- Always test in a pre-production environment with your workload to see the impact of such changes
If you are receiving a lot of small files, you can use these tuning parameters to increase the throughput of files in your system. Ensure that you try out new values in a pre-production environment to understand the effect before moving them into production.