Authored by: Tina Chan and Mitesh Vasa.
Image by Malik Johnson
Now you can more easily work with big data sets in IBM Data Virtualization on Cloud Pak for Data 5.0.3 without worrying about managing your caches. A new autocaching feature automates the cache lifecycle for you from creating caches to evicting caches. The autocaching feature manages only the caches it creates, not any user-defined caches that you create, so you can still manually manage your caches without interference.
In addition, you can customize the autocaching settings including:
-
How often autocaching runs
-
The amount of storage space that auto-generated caches should occupy
-
The type of queries in your workload that you want autocaching to analyze
-
The name of auto-generated caches
How can you use autocaching?
Consider the following ways in which the autocaching feature in Data Virtualization can help you with your workflow:
Problem
|
How autocaching addresses the problem
|
You find that creating and tuning caches manually can be too time-consuming in environments that have changing data demands.
|
Autocaching does the work for you to ensure that the most important queries are optimized so that the cache always reflects the current needs of the system.
|
You experience high latency from remote data sources that can lead to delays in decision-making processes.
|
Autocaching caches your most frequently used data, which reduces the need for time-consuming operations such as remote data joins, thereby speeding up queries. Caching frequently used data is helpful when you are querying data from remote data sources where network latency can cause performance issues.
|
You experience storage constraints when you work with larger queries.
|
Autocaching manages cache storage through a user-defined soft upper limit. When the total size of auto-generated caches exceeds this limit, autocaching evicts low-ranked caches to ensure that the system stays within storage limits.
|
How does autocaching work?
Autocaching uses the cache recommendation engine, which analyzes query workloads, and it generates a list of recommended caches based on several factors including frequency, cardinality, and run time. Autocaching then creates the top ranked caches in batches during its scheduled run.
Autocaching also automatically evicts caches that aren’t used frequently. This eviction process prioritizes removing low-ranked or unused caches, freeing up space for critical caches that might benefit you in the future.
The following diagram shows the autocaching process flow, including how it decides when to create and evict caches, and under what conditions these decisions are made.
Diagram by Malik Johnson
To use autocaching now, update your Data Virtualization service to the latest release.
Learn more
#CloudPakforData