Authored by @KEDAR KARMARKAR and @Bipali Gade
In this blog we explain the functionalities of Filehist, its enhancements and the significance of these enhancements in crafting policies to optimise the data archiving process.
Getting Started with Filehist
Filehist is a sophisticated tool that presents a detailed histogram analysis of files within a given filesystem, with a primary focus on their sizes and the cumulative percentages of both files and space usage.
Filehist is at your disposal, as a samples tool included in Storage Scale installation.
The analysis report is divided into two parts:
-
Histogram of Files <= One 256K Block in Size
-
Histogram of Files with N 256K Blocks (plus end fragment)
It additionally showcases the output of:
-
df -k command, to display the information about disk space usage for file systems.
-
mmlsfs command to list the attributes of a file system.
-
tslsdisk command, filesystem disk Information and status report
-
Filesystem space utilization summary of files, non zero files and directory
This data is essential for understanding the current allocation, file size distribution and usage of space within the filesystem.
Frequently, customers seek to archive data, whether to reduce costs or to liberate faster storage resources. Data archiving is guided by predefined policies. Crafting these policies for data archiving involves establishing criteria to identify data eligible for archiving. However, the absence of adequate tools to identify the data often complicates this decision-making process.
The Filehist utility facilitates understanding the age distribution of files in a given filesystem, thereby aiding in the formulation of more effective archiving policies to enhance data management efficiency and optimize storage resources. By providing insights into the age of data, it assists in making informed decisions about archiving priorities.
Furthermore, it offers estimates regarding the potential storage capacity that can be reclaimed by archiving data beyond a certain age.
Filehist has been updated with a new feature. It now provides a detailed analysis of the distribution of files by their age, showing both the cumulative and individual percentages of files and their sizes within each age group. This enhancement is available starting from versions 5.2.0.0 and 5.1.9.3.
Navigating Tool Functionality
-
cd /usr/lpp/mmfs/samples/util
This command creates a tuple for each file, containing information such as inode, size, mtime, ctime and more. Our utility utilizes these tuples to generate histograms, calculating the age based on the 'mtime’ attribute.
-
cd <path_to_store_filehist_report>
When you execute the filehist command in <path_to_store_filehist_report> directory, a report file will be generated in this directory for each filesystem, containing the corresponding histograms.
-
/usr/lpp/mmfs/samples/debugtools/filehist
Executing the command will generate individual files for each filesystem, each named as "filesystem_name.filesum."
For instance, the histogram for the gpfs0 filesystem will reside in file named “gpfs0.filesum”.
As depicted in the image, the histogram captures the age and size distribution of a filesystem.
In the "Age" column, there are 14 buckets categorising files based on various ages. The "Older" bucket encompasses all files older than 1 year while "0 day" includes files not surpassing 24 hours in age. Subsequently, "1 day" encapsulates files aged between 24 and 48 hours, and so forth up to "6 day." The "2 week" represents files aged between 1 week and 2 weeks, and "6 months" includes files older than 3 months but less than or equal to 6 months.
The Count column indicates the number of files in each bucket. File% and Size% provide the respective percentages of files and file size within each bucket, offering insights into the significance and contribution of each bucket to the overall numbers.
Additionally, File%ile and Size%ile represent the cumulative percentage of files and file size, respectively. This information aids in comprehending the overall distribution and contribution of files across different age categories.
Drawing from the output of enhanced Filehist, you can construct ILM policies pertaining to mtime attribute (MODIFICATION_TIME) of the file and provide appropriate number of days based on how much data you want to archive vs how much data you want to retain on the primary data storage.
Migration based on modification age
The following migration policy migrates files modified longer than 30 days ago from ‘production’ to ‘archive’ pool:
/* macro for modification time */
(DAYS(CURRENT_TIMESTAMP) - DAYS(MODIFICATION_TIME)) )
/* migration based on modification age*/
RULE 'modAge' MIGRATE FROM POOL 'production' TO POOL 'archive'
|
In summary, with recent Filehist enhancements we get insights into the age of data to make informed decisions about the data archiving strategies.
#IBMStorageScale #data #data
#Highlights#Highlights-home