Data is growing in today's times and organizations are spread across geographies and sites. This causes immense need for data sharing. IBM Spectrum Scale can help store huge amounts of data and its Active File Management (AFM) feature helps to share this data very effectively. AFM adds remote data sharing capabilities to Spectrum Scale. It provides seamless data movement between Spectrum Scale clusters and sites with ease. It can do the same on demand, periodically and continuously which makes it extremely flexible. This also helps increase global collaboration and data availability.
AFM can be used for data distribution across sites. In this scenario the home site stores the complete library of information. A cache or edge site pulls portions of this data and stores it locally for future use. So the first access is a cache "miss" and the request is sent over a WAN to the home site. But further accesses will be a cache "hit". AFM keeps the cache or edge site up to date, so any modified data at home regularly gets pulled into the cache.
The cache sites can pull data, either on demand as applications access data or if you know what the application will need you can use AFM tools to “prefetch” files. Performing an AFM prefetch, implies pre-populating the cache with the data from home. This can be done for selected files or for the entire contents, based on need. Since this data is already available locally at the cache, the applications will get better performance while reading it.
In the above figure, there are four cache sites (A, B, C and D) and the site in the center is the home or central or head quarter site. All the four cache sites have an AFM read-only mode (RO) relationship with home. The data is written at the home site and all the cache sites pull in this data and serve the applications reading it. The Site ‘A’ prefetches the data at periodic intervals and the Site ‘B’ pulls in the data only on actual access.
A media company which distributes music and movies can use this model.
Content creation applications generate large amounts of data like imagery and videos. This happens in let’s say the central studios, which are situated at Los Angeles. This data is then used for content distribution to various other locations like New York, Chicago and San Francisco etc. It is served and streamed from these other locations, which are closer to end-users. This helps enabling video on demand and pay per view channels. So, in this case the central studio becomes the home site and the remote locations are the cache sites.
The amount of storage at the home and cache sites can be different, often cache sites are much smaller than the central library. In such edge caching deployments, a cache only needs to be large enough to hold the “active data set”.
For example, the home site distributing movies stores the entire movie library. But a geographically separate cache, may only need to store the latest or classic movies, as those are commonly requested by users. But on-demand it can always pull any movie from the home, if any user asks for same. On a weekly basis it is also possible to prefetch the latest releases and keep them ready to serve users.
In the case where the cache is smaller than home you can use the AFM "Eviction" feature. Eviction means that data blocks of files residing in the cache are removed from the local file system, but the metadata of these files is retained at the cache. So, you can see the entire movie listing, at the cache without all of them actually being stored locally. As data grows at the cache site, the data not being actively accessed can be evicted or removed from the cache site.
Eviction can be done automatically or manually:
– Automatic Eviction: The automatic eviction is based on fileset quotas.
– Manual Eviction: This can be done for specific files selected by an Information Lifecycle Management (ILM) policy. This adds flexibility in terms of specifying which particular files shouldn’t be eating up your disk space. E.g. Least Recently Used (LRU) files or files larger than a particular size can be evicted.
Since all the data resides at the home in this model, the backup can be handled at the home site. This can make backups more cost-effective and easier to manage.
Please see the below link for more details:
https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/b1lins_quickreference_afm.htm #Cloud#IBMSpectrumScale#IBMSpectrumStorage#Softwaredefinedstorage#softwaredefinedstorage