In the last post, I addressed the software testing area as one area to consider before the software solution is considered production-ready. In this post, I will address other areas that should be considered before the solution is production-ready.
A. Software Supportability
If a software solution cannot be supported, or is hard to support, then that solution is not production-ready. I will now address the areas that will make a software solution supportable.
1. Logging
The software solution should log key events as it executes. These key events are logged regardless whether problems are occurring or not. The software should support logging levels in case additional logging details are needed during troubleshooting a problem. Programming frameworks that support such logging levels are available. For example, the Java Logging API and Log4J are example Java logging frameworks that support various logging levels such as Critical, Error, Warning, Information, and Audit levels.
2. Monitoring
As a software solution executes, it consumes resources at various layers. For example, the solution consumes some physical resources, namely CPU cycles or times, physical RAM, disk IO, and network IO. The solution may also consume resources in middleware layers. For example, the solution consumes a certain number of threads in a thread pool, connections in a connection pool, etc.
During performance and resiliency testing of the software solution, each resource is used in a certain amount. Each resource type and usage for each software solution should be documented during performance testing. So, when the solution is deployed in a production environment, the specific resources required by this solution are well identified and can be monitored.
3. Alerting
Once resources required by each software solution are identified, the corresponding resource thresholds should be easily established. If a resource metric value surpasses the corresponding threshold, an alert is triggered to get the attention of someone to look into this event.
4. Documentation
Once an alert of an event occurs, the right expertise is needed to look into this event. The event may indicate a problem in the software solution itself. In that case, having a troubleshooting guide can be essential even if the software developer is involved in troubleshooting. That is the case because the software developer may have moved onto other tasks and forgot many details of the solution at hand. Even if the software developer still remembers what was done, the developer may not be available and someone else must look into the problem.
In the last paragraph, only a small part of the troubleshooting guide is mentioned. However, the solution documentation should include other aspects. Here is a list of documents that can be essential for the support of the solution:
- Solution architecture overview
- Standard operating procedures
- Backup & recovery
- Housekeeping
- Disaster recovery
In the next post, I will address other software solution production readiness areas of interest.