1. Optimized workloads by Application Developers
Poorly written queries are the crux of most of the performance problems associated with workloads. There is no escape from designing schema, considering database architecture and writing optimal queries. Nowadays, there is a lot of tooling available around developing optimized workloads.
2. Workload Management using database features
Popular databases like Netezza have time tested workload management, which can help to set up scheduler rules, guaranteed resource allocation for groups, short query bias and prioritized query execution to extract MAXIMUM performance during highly concurrent environment with existing resources. For example, adhoc user running uncontrolled query can give impression that the system is starving for resources and create request for scale up. A better approach would be to cap all the adhoc users under (say) 10% resource group allocation to avoid uncontrolled use of the system.
3. Identify exact performance problems and fix them
Performance issues associated with popular MPP data warehouse products are mostly around data skew, processing skew and network/compute bottlenecks. And generally, bottlenecks shift from one place to other if you solve the first bottleneck. With advances in database engine (especially in optimizer and planners), there are many tweaks available at different level (data ingestion, SQL, engine, workload management, resource allocation) to improve workload performance. You can start looking into historical workload data (query timings, resource allocation, plan files) and find bottlenecks. Once you find bottlenecks, it will be easy to come up avenues to resolve those bottlenecks, which help improve performance.
4. Ethical Scaling
Scaling data warehouse will help throw more resources, which will cater to increased resource demand from workloads. You will end up without of control costs and unhappy CFO if you don’t set appropriate scaling policies. Popular databases provide rudimentary features like starting a cluster immediately after the first query arrives in queue or only after the system estimates there is enough query load to keep the cluster busy. As I mentioned earlier, most of the database companies lack motivation for coming up with solid features which will avoid cost overruns due to scaling.
What the industry needs is workload informed autoscale, based on AI. While, industry awaits such a feature, come up with your own policies around scaling may be using following –
1. Do we need to increase capacity by a factor of 2X? Will it improve performance of workload with same (2X) factor?
2. How we can scale up and scale down in time bound fashion, so that cost-predictability is achieved?
3. Do we need autoscale (especially starting cluster immediately after query is queued)? Wouldn’t some system resources wasted?
IF the underlying platform provides “Ethical Scaling” THEN databases like Netezza with time tested workload management capabilities can provide the best overall utilization of the scaled/provided resources even if those resources are constrained. In short, SCALE ethically…
Scaling is a great feature if customers design workload, concurrency and scaling policies considering price/performance and use case driven approach. And don’t let themselves run into marketing gimmicks, resulting cost overruns.