File and Object Storage

File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform 

 View Only

Are you heating up your frozen daiquiri and even pay for it? Serve the, as you specified, hot objects fast and move the ones you want to freeze into the fridge!

By Simon Lorenz posted Thu September 08, 2016 04:30 PM

  
Object stores are great for unstructured data. Some types of data are written once, read a time or two in the first days, and then are rarely accessed again, for examples your vacation pictures; or are written once and only accessed in emergencies like backup data.

Other data is accessed very often and needs fast response times such as in example the latest news or anything around a current sports event.

As of today, several methods exist to ensure data is automatically placed using an optimal storage tier. Usually tiering is based on heatmaps means criteria like last accessed or modified time, size, and or type. See a blog for object tiering based on heatmaps using the IBM Spectrum Scale product. Link: Hot cakes or hot objects, they better be served fast.

But how about data, that is important for you and should be served fast no matter if it is accessed frequently or not?
Data not as important for you should not take up space in your waiter’s way serving the hot objects. Fast storage is more expensive and as of that, data stored there should better be hot and important for fast serving. So, why keep objects hot while they are not as important anymore?

It would be very useful if object metadata could also be used as a criteria for data placement!


As of that, users would be able to control tiering by setting object metadata. This means you control tiering not by object heat but any other measurement needed. Even by updating the metadata at a later time, users are able to control data placement easily.

A user could add any key value pair as object metadata and have the tiering based on the values or even based on the existence of a key or any combination of key value pairs of the metadata.

Tiering based on object metadata opens up for a lot new usecases.


Sample Usecases:



The following diagram shows an example for a combination of usecases b), c) and d).
It demonstrates, how tiering based on metadata could be used in a scientific environment. It shows the different stages of the objects and in which storage pool they will end up depending on their metadata.
Btw.: If you think analytics on Object Stores might be difficult as of the need to copy data or other restrictions, watch this presentation: From Archive to Insight: Debunking Myths of Analytics on Object Stores.
daiquiri-01


  1. Multiple Sensors inject Objects with metadata key: “Analyzed” value: “NO” into IBM Spectrum Scale. Incoming Objects are stored in Tier 1 Pool, which is a fast storage pool and allows fast analysis.

  2. Data Analyzer runs analytics on the Objects and generates “Result Objects” in the fast Tier 1 Pool.

  3. The “Result Objects” are marked with a metadata “Classification: HighConfidential” tag. IBM Spectrum Scale ILM detects the HighConfidential tag and moves all Objects having such tag into a Tier 2 Pool which is placed in a physical vault.

  4. Data Analyzer updates the sensor Objects metadata with “Analyzed: YES” and also adds a timestamp like “Analyzed At: 09/05/16”.


IBM Spectrum Scale ILM detects the “Analyzed: YES” tag and moves all Objects with this key / value metadata pair from fast Tier 1 Pool into a slower Tier 3 Pool.

The Analyzed At metadata detail can be used for further tiering later on.

What is needed?


In this blog I would like to give an example on how this can be achieved by using the IBM Spectrum Scale product and standard and or swiftonfile swift storage policies.

IBM Spectrum Scale includes a high performance metadata scan interface that allows you to efficiently process the metadata for billions of files. In combination with the Object Protocol, this functionality can also be leveraged for Object tiering based on metadata.

See the below schema for a graphical description of the IBM Spectrum Scale ILM functionality:
daiquiri-02

Prerequisites:


To repeat the examples below, ensure the following prerequisites are met:

  1. IBM Spectrum Scale is installed, Protocol and Object support is enabled.

  2. for the swiftonfile example, enable file access, can be done by executing:
    # mmobj file-access enable

  3. create a file-access swift storage policy named Unified-Policy, can be done by executing:
    # mmobj policy create Unified-Policy --enable-file-access

  4. upload an object called demo.object into a demo_cont container that is linked to a default swift storage policy.
    # swift upload demo_cont demo.object

  5. upload an object called demo_unified.object into a demo_cont_unified container that is linked to a file-access swift storage policy.
    # swift upload demo_cont_unified demo_unified.object --header "X-Storage-Policy: Unified-Policy"

  6. Add a custom metadata key value pair like “Tier: Silver” to both created objects, can be done by executing i.e.:
    # swift post demo_cont demo.object --header "x-object-meta-Tier: Silver"
    # swift post demo_cont_unified demo_unified.object --header "x-object-meta-Tier: Silver"


Steps to detect objects by their metadata for tiering:



  1. The method how customer metadata is stored differs depending on the used Swift storage policy. As such also the metadata key value detection method differs. The following paragraphs show the differences.

    Example to list the current metadata for demo_unified.object via the swift stat command. It should look like:
    # swift stat demo_cont_unified demo_unified.object
    Account: AUTH_f506278cc70f4486841b15aa8a89349a
    Container: demo_cont_unified
    Object: demo_unified.object
    Content Type: application/octet-stream
    Content Length: 30
    Last Modified: Fri, 19 Aug 2016 08:51:11 GMT
    ETag: d8a094adf0809c603bbf3d54f35cd842
    Meta Tier: Silver
    Accept-Ranges: bytes
    Connection: keep-alive
    X-Timestamp: 1471596670.06819
    X-Trans-Id: tx4746536c1f2c4b7889774-0057b714f9


    1. 1.1 Standard Swift storage policy:
      If a standard Swift storage policy is used, metadata is stored in pickled format.
      To see an example, follow these steps:


    2. 1.2 SwiftOnFile storage policy:
      If a SwiftOnFile storage policy is used, metadata is stored as key/value pairs in JSON format.
      Data that was stored in a container linked to a SwiftOnFile storage policy can be easily found on disk by the following steps:




  2. With the help of the IBM Spectrum Scale Information Lifecycle Management (ILM) feature, policy rules consisting of a SQL-like statement can be executed that will detect objects based on the given key value pairs. Found objects are tiered as told.
    As explained before the method how customer metadata is stored differs and as such also the policy rule used for the search. The following section shows example rules to use for object detection based on custom metadata key value pairs (In this blog, I am giving just an example on how to search based on metadata entries. The actual tiering code can be reviewed in IBM Spectrum Scale documentation).
    Basically just a rule is needed that gets executed with the IBM Spectrum Scale mmapplypolicy command.
    The following script code is sample code for demonstration only, no warranty.

    1. 2.1 Standard Swift storage policy:
      Create a new file and add the following as Rule:
      RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '
      RULE 'ListAll' LIST 'allfiles'
      SHOW( XATTR('user.swift.metadata') )
      WHERE XATTR('user.swift.metadata') IS NOT NULL AND RegEx(XATTR('user.swift.metadata'),
      ['${key}..${value}'])
      (see 3. a) for enhancements)
      The rule can be executed as follows:
      /usr/lpp/mmfs/bin/mmapplypolicy -P -s /tmp -f /tmp/results -
      I defer -L 1 2>/tmp/progress


      See the following example code snippet, which is proof of concept work:
      #!/usr/lpp/mmfs/bin/mmksh
      #
      # Don't use for production, experimental, demo prototype



      if [[ -z $3 ]] then
      print "Usage: xattrSearchComSwiftPickled"
      print "   "
      print "   "
      exit 1
      fi

      typeset path=$1
      typeset key=$2
      typeset value=$3

      rulefile="/tmp/list.rule"

      rm -f /tmp/results.list.allfiles
      rm -f $rulefile

      echo "... creating the list.rule file (/tmp/list.rule) ..."

      echo "        searching for \"Object-Meta-${key}\": \"${value}\""

      print "
      RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '

      RULE 'ListAll' LIST 'allfiles'
      SHOW( XATTR('user.swift.metadata') )
      WHERE XATTR('user.swift.metadata') IS NOT NULL AND RegEx(XATTR('user.swift.metadata'),
      ['${key}..${value}'])
      " > $rulefile

      echo "... running policy and creating filelist ..."

      ap=$(/usr/lpp/mmfs/bin/mmapplypolicy $path -P $rulefile -s /tmp -f /tmp/results -I defer -L 1
      2>/tmp/progress &)
      wait

      # the following is an example on how to continue with the results file
      # it just prints out the found files, but the results could also be used for tiering actions
      # by setting storage pool attribute on the files
      echo "... printing results (from /tmp/results.list.allfiles) ..."
      if [[ -e "/tmp/results.list.allfiles" ]] ; then
      while read line
      do
      filePathNameEnc=$(print $line | awk -F ' ' '{ print $NF }')
      filePathName=$(/usr/lpp/mmfs/bin/mmcmi fullDecode $filePathNameEnc)
      echo ">>> File: "${filePathName}
      echo ""
      done fi
      Example output:
      # ./xattrSearchComSwiftPickled.sh /mnt/gpfs0/object_fileset/ Tier Silver
      ... creating the list.rule file (/tmp/list.rule) ...
      searching for "Object-Meta-Tier": "Silver"
      ... running policy and creating filelist ...
      ... printing results (from /tmp/results.list.allfiles) ...
      >>> File:
      /mnt/gpfs0/object_fileset/o/z1device46/objects/15500/02c/f2326f7425a397222c236a1d2a42a02c/147
      1522653.72124.data

      Above showed a search on pickled data that can be used with current, unmodified code. Further down (see Enhancements 3. a & b) you will find a demonstration on how this can be improved with small code changes to the swift diskfile module.

    2. 2.2 SwiftOnFile storage policy:
      Create a new file and add the following as Rule:
      RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '

      RULE 'ListAll' LIST 'allfiles'
      SHOW( XATTR('user.swift.metadata') )
      WHERE XATTR('user.swift.metadata') IS NOT NULL AND RegEx(XATTR('user.swift.metadata'),
      ['\"X-Object-Meta-${key}\": \"${value}\"'])
      The rule can be executed as follows:
      /usr/lpp/mmfs/bin/mmapplypolicy -P -s /tmp -f /tmp/results -I defer -L 1 2>/tmp/progress

      See the following example code snippet, which is proof of concept work:
      #!/usr/lpp/mmfs/bin/mmksh
      # Don't use for production, experimental, demo prototype



      if [[ -z $3 ]] then
      print "Usage: xattrSearchComSwiftJSON"
      print "   "
      print "   "
      exit 1
      fi

      typeset path=$1
      typeset key=$2
      typeset value=$3

      rulefile="/tmp/list.rule"

      rm -f /tmp/results.list.allfiles
      rm -f $rulefile

      echo "... creating the list.rule file (/tmp/list.rule) ..."

      if [[ -z $3 ]]; then
      value=".*?"
      fi

      echo "        searching for \"X-Object-Meta-${key}\": \"${value}\""<
      br>

      print "
      RULE EXTERNAL LIST 'allfiles' EXEC '' ESCAPE '% '

      RULE 'ListAll' LIST 'allfiles'
      SHOW( XATTR('user.swift.metadata') )
      WHERE XATTR('user.swift.metadata') IS NOT NULL AND RegEx(XATTR('user.swift.metadata'),
      ['\"X-Object-Meta-${key}\": \"${value}\"'])
      " > $rulefile

      echo "... running policy and creating filelist ..."

      ap=$(/usr/lpp/mmfs/bin/mmapplypolicy $path -P $rulefile -s /tmp -f /tmp/results -I defer -L 1
      2>/tmp/progress &)
      wait

      # the following is an example on how to continue with the results file
      # it just prints out the found files, but the results could also be used for tiering actions
      # by setting storage pool attribute on the files
      echo "... printing results (from /tmp/results.list.allfiles) ..."
      if [[ -e "/tmp/results.list.allfiles" ]] ; then
      while read line
      do
      filePathNameEnc=$(print $line | awk -F ' ' '{ print $NF }')
      filePathName=$(/usr/lpp/mmfs/bin/mmcmi fullDecode $filePathNameEnc)
      echo ">>> File: "${filePathName}
      echo ""
      done fi

      Example Output:
      # ./xattrSearchComSwiftJSON.sh /mnt/gpfs0/obj_Unified-Policy/ Tier Silver
      ... creating the list.rule file (/tmp/list.rule) ...
      searching for "X-Object-Meta-Tier": "Silver"
      ... running policy and creating filelist ...
      ... printing results (from /tmp/results.list.allfiles) ...
      >>> File: /mnt/gpfs0/obj_Unified-
      Policy/s31791608180z1device1/AUTH_f506278cc70f4486841b15aa8a89349a/demo_cont_unified/demo_uni
      fied.object/pre>



  3. Enhancements:


  4. Besides searching directly on the data, another option is to use an external database like Elasticsearch, that provides a middleware that connects the object store with a database to store metadata. See the following link for an example: IBM Spectrum Scale Object Metadata Search Open Beta


Note: The above blog is my personal view and is / should not be related to that of my employer’s.
#Softwaredefinedstorage
0 comments
1 view

Permalink