Object stores are great for unstructured data. Some types of data are written once, read a time or two in the first days, and then are rarely accessed again, for examples your vacation pictures; or are written once and only accessed in emergencies like backup data.
Other data is accessed very often and needs fast response times such as in example the latest news or anything around a current sports event.
As of today, several methods exist to ensure data is automatically placed using an optimal storage tier. Usually tiering is based on heatmaps means criteria like last accessed or modified time, size, and or type. See a blog for object tiering based on heatmaps using the IBM Spectrum Scale product. Link:
Hot cakes or hot objects, they better be served fast.
But how about data, that is important for you and should be served fast no matter if it is accessed frequently or not?
Data not as important for you should not take up space in your waiter’s way serving the hot objects. Fast storage is more expensive and as of that, data stored there should better be hot and important for fast serving. So, why keep objects hot while they are not as important anymore?
It would be very useful if object metadata could also be used as a criteria for data placement!
As of that, users would be able to control tiering by setting object metadata. This means you control tiering not by object heat but any other measurement needed. Even by updating the metadata at a later time, users are able to control data placement easily.
A user could add any key value pair as object metadata and have the tiering based on the values or even based on the existence of a key or any combination of key value pairs of the metadata.
Tiering based on object metadata opens up for a lot new usecases.
Sample Usecases:
- Any free-format text: A user could add football game details such as “match # in season” as metadata and as of that could detect the last 3 match’s independent of the date.
- Classification: Documents could be tagged as Confidential, are detected as such and will be placed in a certain tier.
- Analytics: Data gathered by sensors is used for calculations or further analytics is done on the data. To shorten computing time, the data needs to be available fast. Once calculations are done, the data could be placed in a slower tier.
- Multiple Timestamps: Additional timestamps besides last accessed or modified but based on object content could be easily used for tiering. As an example an object stored is a certificate. A “certificate issued at” timestamp could be added and tiering could be done based on this.
The following diagram shows an example for a combination of usecases b), c) and d).
It demonstrates, how tiering based on metadata could be used in a scientific environment. It shows the different stages of the objects and in which storage pool they will end up depending on their metadata.
Btw.: If you think analytics on Object Stores might be difficult as of the need to copy data or other restrictions, watch this presentation:
From Archive to Insight: Debunking Myths of Analytics on Object Stores.
- Multiple Sensors inject Objects with metadata key: “Analyzed” value: “NO” into IBM Spectrum Scale. Incoming Objects are stored in Tier 1 Pool, which is a fast storage pool and allows fast analysis.
- Data Analyzer runs analytics on the Objects and generates “Result Objects” in the fast Tier 1 Pool.
- The “Result Objects” are marked with a metadata “Classification: HighConfidential” tag. IBM Spectrum Scale ILM detects the HighConfidential tag and moves all Objects having such tag into a Tier 2 Pool which is placed in a physical vault.
- Data Analyzer updates the sensor Objects metadata with “Analyzed: YES” and also adds a timestamp like “Analyzed At: 09/05/16”.
IBM Spectrum Scale ILM detects the “Analyzed: YES” tag and moves all Objects with this key / value metadata pair from fast Tier 1 Pool into a slower Tier 3 Pool.
The Analyzed At metadata detail can be used for further tiering later on.
What is needed?
In this blog I would like to give an example on how this can be achieved by using the IBM Spectrum Scale product and standard and or swiftonfile swift storage policies.
IBM Spectrum Scale includes a high performance metadata scan interface that allows you to efficiently process the metadata for billions of files. In combination with the Object Protocol, this functionality can also be leveraged for Object tiering based on metadata.
See the below schema for a graphical description of the IBM Spectrum Scale ILM functionality:
Prerequisites:
To repeat the examples below, ensure the following prerequisites are met:
- IBM Spectrum Scale is installed, Protocol and Object support is enabled.
- for the swiftonfile example, enable file access, can be done by executing:
# mmobj file-access enable
- create a file-access swift storage policy named Unified-Policy, can be done by executing:
# mmobj policy create Unified-Policy --enable-file-access
- upload an object called demo.object into a demo_cont container that is linked to a default swift storage policy.
# swift upload demo_cont demo.object
- upload an object called demo_unified.object into a demo_cont_unified container that is linked to a file-access swift storage policy.
# swift upload demo_cont_unified demo_unified.object --header "X-Storage-Policy: Unified-Policy"
- Add a custom metadata key value pair like “Tier: Silver” to both created objects, can be done by executing i.e.:
# swift post demo_cont demo.object --header "x-object-meta-Tier: Silver"
# swift post demo_cont_unified demo_unified.object --header "x-object-meta-Tier: Silver"
Steps to detect objects by their metadata for tiering:
- The method how customer metadata is stored differs depending on the used Swift storage policy. As such also the metadata key value detection method differs. The following paragraphs show the differences.
Example to list the current metadata for demo_unified.object via the swift stat command. It should look like:
# swift stat demo_cont_unified demo_unified.object
Account: AUTH_f506278cc70f4486841b15aa8a89349a
Container: demo_cont_unified
Object: demo_unified.object
Content Type: application/octet-stream
Content Length: 30
Last Modified: Fri, 19 Aug 2016 08:51:11 GMT
ETag: d8a094adf0809c603bbf3d54f35cd842
Meta Tier: Silver
Accept-Ranges: bytes
Connection: keep-alive
X-Timestamp: 1471596670.06819
X-Trans-Id: tx4746536c1f2c4b7889774-0057b714f9
- 1.1 Standard Swift storage policy:
If a standard Swift storage policy is used, metadata is stored in pickled format.
To see an example, follow these steps:
- Retrieve the needed base path by running the IBM Spectrum Scale command
# mmobj config list --ccrfile object-server.conf --section DEFAULT --property devices --format-none.
- Use the Swift swift-get-nodes command to find your object location in the filesystem.
Example: # swift-get-nodes /etc/swift/object.ring.gz AUTH_f506278cc70f4486841b15aa8a89349a demo_cont demo.object
- Once you found the object in the filesystem, run the IBM Spectrum Scale command
mmlsattr -L --dump-attr
to dump objects metadata and see how it’s presented in pickled format on disk:
# mmlsattr -L --dump-attr
/mnt/gpfs0/object_fileset/o/z1device46/objects/15500/02c/f2326f7425a397222c236a1d2a42a02c/1471522653.72124.data
…
user.swift.metadata: "??}q?(U?Content-Lengthq?U?30U?nameq?
U
- 1.2 SwiftOnFile storage policy:
If a SwiftOnFile storage policy is used, metadata is stored as key/value pairs in JSON format.
Data that was stored in a container linked to a SwiftOnFile storage policy can be easily found on disk by the following steps:
- Retrieve the needed base path by running the IBM Spectrum Scale command
mmobj policy list --verbose
and search for the Unified-Policy policy entry. The Fileset Path details tell the base path followed by a directory named like: “s(Policy Index, also shown in mmobj policy list)z1device1”.
Example: /mnt/gpfs0/obj_Unified-Policy/s31791608180z1device1/
- To get to the object append the account and container as directories and the object name as file.
Example: /mnt/gpfs0/obj_Unified-Policy/s31791608180z1device1/AUTH_f506278cc70f4486841b15aa8a89349a/demo_cont_unified/demo_unified.object
- Now use the mmlsattr -L --dump-attr command to see how it’s presented in JSON format on disk:
# mmlsattr -L --dump-attr /mnt/gpfs0/obj_Unified-
Policy/s31791608180z1device1/AUTH_f506278cc70f4486841b15aa8a89349a/demo_cont_unified/demo_unified.object
…
user.swift.metadata: "{"Content-Length": "30", "X-Object-PUT-Mtime": "1471523955.76507", "ETag":
"d8a094adf0809c603bbf3d54f35cd842", "X-Object-Type": "file", "X-Timestamp":
"1471523955.71966", "X-Type": "Object", "Content-Type": "application/octet-stream", "X-Object-Meta-Tier": "Silver"}"
- With the help of the IBM Spectrum Scale Information Lifecycle Management (ILM) feature, policy rules consisting of a SQL-like statement can be executed that will detect objects based on the given key value pairs. Found objects are tiered as told.
As explained before the method how customer metadata is stored differs and as such also the policy rule used for the search. The following section shows example rules to use for object detection based on custom metadata key value pairs (In this blog, I am giving just an example on how to search based on metadata entries. The actual tiering code can be reviewed in IBM Spectrum Scale documentation).
Basically just a rule is needed that gets executed with the IBM Spectrum Scale mmapplypolicy command.
The following script code is sample code for demonstration only, no warranty.
- 2.1 Standard Swift storage policy:
Create a new file and add the following as Rule:
RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '
RULE 'ListAll' LIST 'allfiles'
SHOW( XATTR('user.swift.metadata') )
WHERE XATTR('user.swift.metadata') IS NOT NULL AND RegEx(XATTR('user.swift.metadata'),
['${key}..${value}'])
(see 3. a) for enhancements)
The rule can be executed as follows:
/usr/lpp/mmfs/bin/mmapplypolicy -P -s /tmp -f /tmp/results -
I defer -L 1 2>/tmp/progress
See the following example code snippet, which is proof of concept work:
#!/usr/lpp/mmfs/bin/mmksh
#
# Don't use for production, experimental, demo prototype
if [[ -z $3 ]] then
print "Usage: xattrSearchComSwiftPickled"
print " "
print " "
exit 1
fi
typeset path=$1
typeset key=$2
typeset value=$3
rulefile="/tmp/list.rule"
rm -f /tmp/results.list.allfiles
rm -f $rulefile
echo "... creating the list.rule file (/tmp/list.rule) ..."
echo " searching for \"Object-Meta-${key}\": \"${value}\""
print "
RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '
RULE 'ListAll' LIST 'allfiles'
SHOW( XATTR('user.swift.metadata') )
WHERE XATTR('user.swift.metadata') IS NOT NULL AND RegEx(XATTR('user.swift.metadata'),
['${key}..${value}'])
" > $rulefile
echo "... running policy and creating filelist ..."
ap=$(/usr/lpp/mmfs/bin/mmapplypolicy $path -P $rulefile -s /tmp -f /tmp/results -I defer -L 1
2>/tmp/progress &)
wait
# the following is an example on how to continue with the results file
# it just prints out the found files, but the results could also be used for tiering actions
# by setting storage pool attribute on the files
echo "... printing results (from /tmp/results.list.allfiles) ..."
if [[ -e "/tmp/results.list.allfiles" ]] ; then
while read line
do
filePathNameEnc=$(print $line | awk -F ' ' '{ print $NF }')
filePathName=$(/usr/lpp/mmfs/bin/mmcmi fullDecode $filePathNameEnc)
echo ">>> File: "${filePathName}
echo ""
done fi
Example output:
# ./xattrSearchComSwiftPickled.sh /mnt/gpfs0/object_fileset/ Tier Silver
... creating the list.rule file (/tmp/list.rule) ...
searching for "Object-Meta-Tier": "Silver"
... running policy and creating filelist ...
... printing results (from /tmp/results.list.allfiles) ...
>>> File:
/mnt/gpfs0/object_fileset/o/z1device46/objects/15500/02c/f2326f7425a397222c236a1d2a42a02c/147
1522653.72124.data
Above showed a search on pickled data that can be used with current, unmodified code. Further down (see Enhancements 3. a & b) you will find a demonstration on how this can be improved with small code changes to the swift diskfile module.
- 2.2 SwiftOnFile storage policy:
Create a new file and add the following as Rule:
RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '
RULE 'ListAll' LIST 'allfiles'
SHOW( XATTR('user.swift.metadata') )
WHERE XATTR('user.swift.metadata') IS NOT NULL AND RegEx(XATTR('user.swift.metadata'),
['\"X-Object-Meta-${key}\": \"${value}\"'])
The rule can be executed as follows:
/usr/lpp/mmfs/bin/mmapplypolicy -P -s /tmp -f /tmp/results -I defer -L 1 2>/tmp/progress
See the following example code snippet, which is proof of concept work:
#!/usr/lpp/mmfs/bin/mmksh
# Don't use for production, experimental, demo prototype
if [[ -z $3 ]] then
print "Usage: xattrSearchComSwiftJSON"
print " "
print " "
exit 1
fi
typeset path=$1
typeset key=$2
typeset value=$3
rulefile="/tmp/list.rule"
rm -f /tmp/results.list.allfiles
rm -f $rulefile
echo "... creating the list.rule file (/tmp/list.rule) ..."
if [[ -z $3 ]]; then
value=".*?"
fi
echo " searching for \"X-Object-Meta-${key}\": \"${value}\""<
br>
print "
RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '
RULE 'ListAll' LIST 'allfiles'
SHOW( XATTR('user.swift.metadata') )
WHERE XATTR('user.swift.metadata') IS NOT NULL AND RegEx(XATTR('user.swift.metadata'),
['\"X-Object-Meta-${key}\": \"${value}\"'])
" > $rulefile
echo "... running policy and creating filelist ..."
ap=$(/usr/lpp/mmfs/bin/mmapplypolicy $path -P $rulefile -s /tmp -f /tmp/results -I defer -L 1
2>/tmp/progress &)
wait
# the following is an example on how to continue with the results file
# it just prints out the found files, but the results could also be used for tiering actions
# by setting storage pool attribute on the files
echo "... printing results (from /tmp/results.list.allfiles) ..."
if [[ -e "/tmp/results.list.allfiles" ]] ; then
while read line
do
filePathNameEnc=$(print $line | awk -F ' ' '{ print $NF }')
filePathName=$(/usr/lpp/mmfs/bin/mmcmi fullDecode $filePathNameEnc)
echo ">>> File: "${filePathName}
echo ""
done fi
Example Output:
# ./xattrSearchComSwiftJSON.sh /mnt/gpfs0/obj_Unified-Policy/ Tier Silver
... creating the list.rule file (/tmp/list.rule) ...
searching for "X-Object-Meta-Tier": "Silver"
... running policy and creating filelist ...
... printing results (from /tmp/results.list.allfiles) ...
>>> File: /mnt/gpfs0/obj_Unified-
Policy/s31791608180z1device1/AUTH_f506278cc70f4486841b15aa8a89349a/demo_cont_unified/demo_uni
fied.object/pre>
- Enhancements:
- Running searches on pickled data is probably not the best idea. With a small change (becomes more complex if migration is needed) in the swift diskfile, the data can be saved in JSON format as it’s done for the file-access data via the swiftonfile diskfile.
The rule to run via IBM Spectrum Scale ILM is then similar to the file-access rule.
- An even easier search could be done, if custom metadata would have it’s own extended attribute in IBM Spectrum Scale. Means the standard swift metadata is stored under the user.swift.metadata extended attribute, but any additional customer metadata is stored under it’s own extended attribute. Example:
user.swift.Object-Meta-Tier: "Silver"
.
The mmlsattr -L --dump-attr output would look like:
# mmlsattr -L --dump-attr /mnt/gpfs0/obj_Unified-
Policy/s31791608180z1device1/AUTH_f506278cc70f4486841b15aa8a89349a/demo_cont_unified/demo_unified.object
…
user.swift.metadata: "{"Content-Length": "30", "X-Object-PUT-Mtime": "1471596670.12717", "ETag":
"d8a094adf0809c603bbf3d54f35cd842", "X-Timestamp": "1471596670.06819", "X-Object-Type":
"file", "X-Type": "Object", "Content-Type": "application/octet-stream"}"
user.swift.Object-Meta-Tier: "Silver"
The rule to find keys and values in this format could look like:
RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '
RULE 'ListAll' LIST 'allfiles'
SHOW( XATTR('user.swift.Object-Meta-${key}') )
WHERE XATTR('user.swift.Object-Meta-${key}') IS NOT NULL AND XATTR('user.swift.Object-
Meta-${key}') LIKE '${value}'
- Besides searching directly on the data, another option is to use an external database like Elasticsearch, that provides a middleware that connects the object store with a database to store metadata. See the following link for an example: IBM Spectrum Scale Object Metadata Search Open Beta
Note: The above blog is my personal view and is / should not be related to that of my employer’s.
#Softwaredefinedstorage