File and Object Storage

 View Only

Analyze IBM Spectrum Scale File Access Audit with ELK Stack

By SANDEEP PATIL posted Mon July 30, 2018 02:57 PM

File Access Audit logging is useful to capture the file access activities required by businesses for various reasons - from regulatory compliance's  to implementing data centric security to deriving insights from file access logs and so on.
So what is file access auditing? Well, in simple language it is recording the details of all types of  access  for a given set of files so that an administrator or a security auditor can later figure out answers to who/when/where/what/why/which questions associated with a file access for any given period of time.
From IBM Spectrum Scale 5.0.0 we introduced the inherent support for File Audit Logging (FAL) which allows to log the file access activities on a selected Spectrum Scale filesystems. The FAL feature logs in the access into configured log files which are configured to be immutable.  Taking this further, there is a need for administrators or security auditors or for that matter the new age data scientist who do log analysis to have the ability to visualize these audit logs in order to: 
  - Do security analysis on the file access 
 - Create access reports for audit purpose
  - Understand top data access users
  - Figure out if there is any data access anomaly
  - Understand Read VS Write pattern workload at file level 
  - Identify file access pattern per timeline
  - Highlight undesired spike on data access
  - Do deep analysis to understand which files are potentially related based on access patterns
The possibilities can be plenty based on what you would like to derive from the FAL logs with an query based excellent visualization.
Elastic Stack (ELK) is one of the most popular stacks that are used to query, visualize and analyze such log records to derive insights on the above mentioned points and beyond.  Here is a quick link which talks on how Elastic Stack is used in solutions for Security Analytic.
In this blog we will go through the details on how one can integrate IBM Spectrum Scale File Access Logging feature with the ELK stack. As a starter kit, this will help administrators to have IBM Spectrum Scale file access logs to be analyzed and visulized over ELK stack.
The setup is mainly divide into 5 steps.  
i) Configuring FAL in IBM Spectrum Scale 
ii) Basic ELK Setup
iii) Passing live IBM Spectrum Scale FAL logs to ELK using Beats
iv)  Parsing of Spectrum Scale FAL log using Logstash
v) Visualization of parsed data using Kibana for analysis
Note: This blog is facilitating a starting point. It can be expanded to meet your various business needs and you may  have to take into consideration various ELK stack specific security measures for your deployment which is not covered in this blog.
Step 1: Configuring FAL in IBM Spectrum Scale 
Configuring FAL consists of installation & enabling the service on IBM Spectrum Scale server. One can install & configure FAL by either using IBM Spectrum Scale Install toolkit or by manual installation.
Before one installs FAL, it is vital to understand the IBM Spectrum Scale FAL components and concepts and hence it is recommended to read the "Introduction to File Audit Logging" in IBM Spectrum Scale Knowledge center (here is the link)
For setup using the Installer:
For prerequisite & limitation for FAL, it is advisable to go through the below before the setup:
Once you setup & enable the File Audit Service you will be able to see the status by running the below Spectrum Scale CLI.
[root@nsd1 ~]# mmaudit all list
Audit     Cluster                                 Fileset   Fileset             Retention 
Device    ID                                      Device    Name                (Days)    
cesSharedRoot 14704456100053598410  cesSharedRoot.audit_log          365       
fs1       14704456100053598410   fs1       .audit_log          90        
[root@nsd1 ~]# 
Depending on the number of Filesystem on which you have enabled the service you will be able to see the difference in output list. The above output indicates that the audit logging is enabled for 'fs1'  and 'cesSharedRoot' filesystem. For 'fs1' filesystem the audit logs will be stored on '.audit_log' fileset (which is the default fileset) under the filesystem 'fs1' (which is the default fileset device). 
Now that the FAL is setup and configured, let us see how the access records look like. Typically, there are series of filesystem operations that are involved when user interacts with the  filesytem. The FAL feature records all these operations for completeness and accuracy.  For example, when one creates a file (say via a touch command) the FAL record will consists of 4 different operations namely, Create, Xattechange, Open & Close that take place at the filesystem level. So now let us 'touch' a file on a filesystem (fs1) that we have configured for FAL and the check its audit records.
# touch /ibm/fs1/one
Now let us go and see what got audited in the File access logs. The default location where the file audit logs will reside is under .audit_log fileset under the filesystem for which the FAL auditing is enabled (which can be changed using mmaudit command). In our case the logs will be stored under /ibm/fs1/.audit_log
Following is the output from the audit log file :
# cd /ibm/fs1/.audit_log/152_14704456100053598410_2_audit/2018/06/12
# grep "/ibm/fs1/one" auditLogFile_ces1_2018-06-12_14:02:40
{"LWE_JSON": "0.0.1", "path": "/ibm/fs1/one", "oldPath": null, "clusterName": "gpfs.scalecluster", "nodeName": "ces1", "nfsClientIp": "", "fsName": "fs1", "event": "CREATE", "inode": "95488", "linkCount": "1", "openFlags": "0", "poolName": "system", "fileSize": "0", "ownerUserId": "0", "ownerGroupId": "0", "atime": "2018-06-12_16:21:36+0530", "ctime": "2018-06-12_16:21:36+0530", "eventTime": "2018-06-12_16:21:36+0530", "clientUserId": "0", "clientGroupId": "0", "processId": "32210", "permissions": "200100644", "acls": null, "xattrs": null, "subEvent": "NONE" }
{"LWE_JSON": "0.0.1", "path": "/ibm/fs1/one", "oldPath": null, "clusterName": "gpfs.scalecluster", "nodeName": "ces1", "nfsClientIp": "", "fsName": "fs1", "event": "XATTRCHANGE", "inode": "95488", "linkCount": "1", "openFlags": "0", "poolName": "system", "fileSize": "0", "ownerUserId": "0", "ownerGroupId": "0", "atime": "2018-06-12_16:21:36+0530", "ctime": "2018-06-12_16:21:36+0530", "eventTime": "2018-06-12_16:21:36+0530", "clientUserId": "0", "clientGroupId": "0", "processId": "32210", "permissions": "200100644", "acls": null, "xattrs": "security.selinux-unconfined_u:object_r:unlabeled_t:s0", "subEvent": "NONE" }
{"LWE_JSON": "0.0.1", "path": "/ibm/fs1/one", "oldPath": null, "clusterName": "gpfs.scalecluster", "nodeName": "ces1", "nfsClientIp": "", "fsName": "fs1", "event": "OPEN", "inode": "95488", "linkCount": "1", "openFlags": "35138", "poolName": "system", "fileSize": "0", "ownerUserId": "0", "ownerGroupId": "0", "atime": "2018-06-12_16:21:36+0530", "ctime": "2018-06-12_16:21:36+0530", "eventTime": "2018-06-12_16:21:36+0530", "clientUserId": "0", "clientGroupId": "0", "processId": "32210", "permissions": "200100644", "acls": null, "xattrs": null, "subEvent": "NONE" }
{"LWE_JSON": "0.0.1", "path": "/ibm/fs1/one", "oldPath": null, "clusterName": "gpfs.scalecluster", "nodeName": "ces1", "nfsClientIp": "", "fsName": "fs1", "event": "CLOSE", "inode": "95488", "linkCount": "1", "openFlags": "35138", "poolName": "system", "fileSize": "0", "ownerUserId": "0", "ownerGroupId": "0", "atime": "2018-06-12_16:21:36+0530", "ctime": "2018-06-12_16:21:36+0530", "eventTime": "2018-06-12_16:21:36+0530", "clientUserId": "0", "clientGroupId": "0", "processId": "32210", "permissions": "200100644", "acls": null, "xattrs": null, "subEvent": "NONE" }
IBM Spectrum Scale File Audit Logging generate records into Json format. (For more details related to the Logging format,  please refer to
By default, Spectrum Scale File Audit Logging service creates an independent fileset on the filesystem itself with fileset name as .audit_log with 365 days retention time. But, you can modify the fileset name , filesytem  and retention duration for particular audit using mmaudit command.
# mmaudit fs0 enable ‐‐log‐fileset fileset1 ‐‐log‐fileset‐device fs1 ‐‐retention 90
At the end of Step 1, we have configured and setup FAL for the filesystem that we want. We have verified that we are getting the file access records.
Step 2: Basic ELK Setup
Installation of ELK Stack is a straightforward process (assuming a single node setup in our case). You can download the packages based on your system architecture & operating system. Here is the link of Package Download Page of ELK :
Typically, you will have a set of system nodes hosting the ELK stack, but for this blog we have gone with a single node setup for simplicity (as shown in the below figure). So you need to install packages elasticsearch-6.2.4.rpm, kibana-6.2.4.rpm & logstash-6.2.4.rpm on elkserver ( we have taken a dedicated separate server node) and install filebeat-6.2.4-x86_64.rpm package on any Spectrum Scale node (which will have access to .audit_logs fileset). Start Elasticsearch & Kibana service on ELKserver to test the installation.
$ service elasticsearch start
$ service kibana start
Note:  You will need to understand certain basics of ELK stack if you are new to ELK. Following has the resources for getting you started:
Below figures shows a simple one node ELK setup used for this illustration.
At the end of Step 2, we have the the basic ELK components (elastic search and kibana) installed and running.
Step 3: Passing live IBM Spectrum Scale FAL logs to ELK using beats
In this step we have to configure the setup such that all the audit events are continuously streamed to the ELK server. ELK stack provides Filebeat (one of the key component of Beats) which will act as a light weight shipper to forward all logs from IBM Spectrum Scale FAL to Logstash service. In short it will create a pipeline to transfer data generated by FAL to Logstash for further parsing. You are required to make few changes into the configuration of Filebeat. 
For configuration changes in Filebeat, edit /etc/filebeat/filebeat.yml file on spectrum scale node where you have installed Filebeat and make the below changes:
 – type: log
 enabled: true
 – /ibm/fs1/.audit_log/*/*/*/*/*
 hosts: ["<elkserver_ip>:5044"
The above configuration changes is telling filebeat to transfer any updates to files under  "/ibm/fs1/.audit_log/*/*/*/*/*" to the ELK server's 5044 port number.
Note:  Make sure to comment “Elasticsearch output” session since we are using Logstash as output for filebeat. By default, Elasticsearch output is set in filebeat.yml file.
$ service filebeat start
At the end of Step 3 , we have now configured filebeat to forward the FAL logs to ELK server.
Step 4: Parsing of Spectrum Scale FAL log using logstash
Actual magic starts in Logstash after completion of FAL & ELK configuration. Logstash configuration needs to be done on ELK server node. Create a configuration file under /etc/logstash/conf.d directory (say file_audit_log.conf) for logstash configuration
Logstash configuration can be logically divided into 3 parts. 
  1. input : details about what type of input logstash is expecting (eg: to make Logstash service listen to Filebeat request)
  2. filter : how to parse the input
  3. output : where should logstash redirect the results
Below is an sample example for Spectrum Scale FAL records that you can use.
Ex: /etc/logstash/conf.d/file_audit_log.conf
input {
  • beats {
    • host => “<elkserver ip>
    • port => 5044
  • }
filter {
  • json {
  •       source => message
  •       target => parsedJson
  •     }
  • mutate {
  •       add_field => {
  •         "eventTime" => "%{[parsedJson][eventTime]}"
  •       }
  • }
  • grok {
  •       patterns_dir => ["/etc/logstash/patterns"]
  •       match => { "eventTime" => [ "%{GPFSLOG_DATE:timestamp}" ] }
  •       remove_field => [ "message" ]
  •      remove_field => [ "eventTime" ]
  • }
  • date {
  •       match => [ "timestamp", "yyyy-MM-dd_HH:mm:ssZ" ]
  •       remove_field => [ "timestamp" ]
  • }
output {
  •  elasticsearch {
    •  hosts => [“”] 
    • manage_template => false
    •  index => “fal_log”
  •  }
  •  stdout { codec => rubydebug }
For parsing the date generated by FAL logs we need to create a pattern which will help Logstash do the parsing.
$ mkdir -p /etc/logstash/patterns
$ echo "GPFSLOG_DATE %{YEAR}-%{MONTHNUM}-%{MONTHDAY}_%{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE}" >> /etc/logstash/patterns/extra
$ service logstash start
For details on logstash  configuration and syntax refer to 
At the end of Step 4 we have configured the Logstash to accept the Spectrum Scale FAL incoming messages from filebeat, parse them and store them on elastic search database.
Steps 5: Visualization of parsed data using Kibana for analysis
Kibana, a visualization tool,  can help you to extract data in the form of tables, charts & many more. Kibana was installed and setup in Step 2 and since it is a  web application you access it through port 5601. All you need to do is point your web browser at the ELK server node  where Kibana is running and specify the port number. 
In Kibana you will need to create the required visualization and dashboard to address your needs. Creating them is relatively straight forward while you can learn more on Kinaba and  those aspects here
Inorder to make it easy for you, we have created few visualization and dashboard for Spectrum Scale FAL audit logs. Attached are the two json files which contain the  schema for the FAL  visualization and dashboard.
  1. fal_visulization.json
  2. fal_dashboard.json
You will have to import these json files into Kibana  (to import the Json to Kibana Dashboard, you need to go to Kibana GUI ->Management Tab > Saved Objects > Import)
Once you have done that you can then go to the Kibana GUI->Dashboard panel and view the dashboards and derive insights, generate reports, find out  data access anomaly, etc and meet your business objectives.
Here is another blog from the team which talks on how Spectrum Scale eObject logs can be used with ELK for billing like purpose
1 comment



Sun November 29, 2020 05:04 PM

The article seems to be incomplete?