An application is said to have a scalability problem when any of the following applies:
The application cannot do more work even though plenty of physical resources is available. For example, the application has enough CPU, RAM, fast disks, network bandwidth, yet the throughput provided by the application either stays constant or degrades as more work is pushed into the application. With more work being pushed into the application, the physical resources usage will show that no resource is exhausted. For example, the CPU usage is not 100%, but the system/kernel CPU usage is increased, while the user/application CPU usage decreases or remains constant. Another example, RAM usage may increase with more work being pushed into the application while CPU usage decreases or remains constant.
The application consumes more physical resources than expected to do more work. For example, if 2 CPUs are needed to achieve a throughput of 10 transactions per second, we expect that doubling the number of CPUs should double the throughput. However, when testing this expectation, we may find that the application needs more than double the number of CPUs to double the throughput. The higher the number of CPUs needed to achieve an increase in throughput, the worse is the scalability of this application.
The number of errors the application throws increases dramatically as more work is pushed into the application. It is not unusual to see a lot of timeout errors, for example, as more work is pushed into the application. For example, in WebSphere Application Server, there are many timeout settings that can be relevant to a particular application. Connection pools have a number of timeout parameters, but one timeout setting that can be relevant is the connection timeout which is the time a thread needs to wait for an available connection before it times out. The thread needs to get a connection to some endpoint system to continue processing. For example, it may need a database connection to execute one or more of the CRUD operations. If the thread cannot get a connection, it will time out and the transaction the thread is processing is considered a failure. The transaction may be retried depending on the application. The retries will typically decrease throughput because a retried transaction is taking more time to finish. The higher the number of retries, the lower the throughput. The timeouts can also give you a sense of false positive improvement because the shorter the timeout, the more work would be falsely perceived to be performed by the application. That’s one reason why you should track all transactions to see what happened to them.
I have listed symptoms that I experienced in my performance optimization work. There may be more symptoms than what I have listed. Having said that, I will continue talking about scalability issues in the next post.