File and Object Storage

 View Only

Leveraging Storage Scale filehist's age histogram for Optimising Data Archiving Decisions

By KEDAR KARMARKAR posted Wed May 29, 2024 06:06 AM

  

Authored by @KEDAR KARMARKAR and @Bipali Gade

In this blog we explain the functionalities of Filehist, its enhancements and the significance of these enhancements in crafting policies to optimise the data archiving process. 

Getting Started with Filehist 

Filehist is a sophisticated tool that presents a detailed histogram analysis of files within a given filesystem, with a primary focus on their sizes and the cumulative percentages of both files and space usage. 

Filehist is at your disposal, as a samples tool included in Storage Scale installation. 

The analysis report is divided into two parts: 

  1. Histogram of Files <= One 256K Block in Size 

  1. Histogram of Files with N 256K Blocks (plus end fragment) 

It additionally showcases the output of: 

  • df -k command, to display the information about disk space usage for file systems. 

  • mmlsfs command to list the attributes of a file system. 

  • tslsdisk command, filesystem disk Information and status report 

  • Filesystem space utilization summary of files, non zero files and directory 

 

This data is essential for understanding the current allocation, file size distribution and usage of space within the filesystem. 

Use Case 

 

Frequently, customers seek to archive data, whether to reduce costs or to liberate faster storage resources. Data archiving is guided by predefined policies. Crafting these policies for data archiving involves establishing criteria to identify data eligible for archiving. However, the absence of adequate tools to identify the data often complicates this decision-making process. 

The Filehist utility facilitates understanding the age distribution of files in a given filesystem, thereby aiding in the formulation of more effective archiving policies to enhance data management efficiency and optimize storage resources. By providing insights into the age of data, it assists in making informed decisions about archiving priorities.  

Furthermore, it offers estimates regarding the potential storage capacity that can be reclaimed by archiving data beyond a certain age. 

 

Filehist’s Enhancement 

Filehist has been updated with a new feature. It now provides a detailed analysis of the distribution of files by their age, showing both the cumulative and individual percentages of files and their sizes within each age group. This enhancement is available starting from versions 5.2.0.0 and 5.1.9.3. 

Navigating Tool Functionality  

  1. cd /usr/lpp/mmfs/samples/util 

  1. make tsinode 

This command creates a tuple for each file, containing information such as inode, size, mtime, ctime and more. Our utility utilizes these tuples to generate histograms, calculating the age based on the 'mtime’ attribute. 

  1. cd <path_to_store_filehist_report>

When you execute the filehist command in  <path_to_store_filehist_report> directory, a report file will be generated in this directory for each filesystem, containing the corresponding histograms. 

  1. /usr/lpp/mmfs/samples/debugtools/filehist 

 

Executing the command will generate individual files for each filesystem, each named as "filesystem_name.filesum." 

For instance, the histogram for the gpfs0 filesystem will reside in file named “gpfs0.filesum”. 

 

As depicted in the image, the histogram captures the age and size distribution of a filesystem. 

 

In the "Age" column, there are 14 buckets categorising files based on various ages. The "Older" bucket encompasses all files older than 1 year while "0 day" includes files not surpassing 24 hours in age. Subsequently, "1 day" encapsulates files aged between 24 and 48 hours, and so forth up to "6 day." The "2 week" represents files aged between 1 week and 2 weeks, and "6 months" includes files older than 3 months but less than or equal to 6 months. 

The Count column indicates the number of files in each bucket. File% and Size% provide the respective percentages of files and file size within each bucket, offering insights into the significance and contribution of each bucket to the overall numbers. 

Additionally, File%ile and Size%ile represent the cumulative percentage of files and file size, respectively. This information aids in comprehending the overall distribution and contribution of files across different age categories. 

 

ILM Policy 

Drawing from the output of enhanced Filehist, you can construct ILM policies pertaining to mtime attribute (MODIFICATION_TIME) of the file and provide appropriate number of days based on how much data you want to archive vs how much data you want to retain on the primary data storage. 

Sample ILM policy: 

Migration based on modification age 

The following migration policy migrates files modified longer than 30 days ago from ‘production’ to ‘archive’ pool:

/* macro for modification time */ 

define( mod_age, 

(DAYS(CURRENT_TIMESTAMP) - DAYS(MODIFICATION_TIME)) ) 

 

/* migration based on modification age*/ 

RULE 'modAge' MIGRATE FROM POOL 'production' TO POOL 'archive' 

WHERE (mod_age > 30) 

In summary, with recent Filehist enhancements we get insights into the age of data to make informed decisions about the data archiving strategies. 

#IBMStorageScale #data #data


#Highlights
#Highlights-home
0 comments
57 views

Permalink