Cloud Pak for Data

 View Only

Do you have big data sets and caching problems? Autocaching is a game changer

By Tina Chan posted 24 days ago

  

Authored by: Tina Chan and Mitesh Vasa.

Decorative image

Image by Malik Johnson

Now you can more easily work with big data sets in IBM Data Virtualization on Cloud Pak for Data 5.0.3 without worrying about managing your caches. A new autocaching feature automates the cache lifecycle for you from creating caches to evicting caches. The autocaching feature manages only the caches it creates, not any user-defined caches that you create, so you can still manually manage your caches without interference.

In addition, you can customize the autocaching settings including: 

  • How often autocaching runs

  • The amount of storage space that auto-generated caches should occupy

  • The type of queries in your workload that you want autocaching to analyze

  • The name of auto-generated caches

How can you use autocaching? 

Consider the following ways in which the autocaching feature in Data Virtualization can help you with your workflow:

Problem

How autocaching addresses the problem

You find that creating and tuning caches manually can be too time-consuming in environments that have changing data demands.

Autocaching does the work for you to ensure that the most important queries are optimized so that the cache always reflects the current needs of the system.

You experience high latency from remote data sources that can lead to delays in decision-making processes.

Autocaching caches your most frequently used data, which reduces the need for time-consuming operations such as remote data joins, thereby speeding up queries. Caching frequently used data is helpful when you are querying data from remote data sources where network latency can cause performance issues.

You experience storage constraints when you work with larger queries.

Autocaching manages cache storage through a user-defined soft upper limit. When the total size of auto-generated caches exceeds this limit, autocaching evicts low-ranked caches to ensure that the system stays within storage limits.

How does autocaching work?

Autocaching uses the cache recommendation engine, which analyzes query workloads, and it generates a list of recommended caches based on several factors including frequency, cardinality, and run time. Autocaching then creates the top ranked caches in batches during its scheduled run.

Autocaching also automatically evicts caches that aren’t used frequently. This eviction process prioritizes removing low-ranked or unused caches, freeing up space for critical caches that might benefit you in the future. 

The following diagram shows the autocaching process flow, including how it decides when to create and evict caches, and under what conditions these decisions are made. 

Diagram showing the autocaching process, including when caches are created and evicted
Diagram by Malik Johnson
To use autocaching now, update your Data Virtualization service to the latest release.
Learn more

#CloudPakforData
0 comments
21 views

Permalink