File and Object Storage

File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform 

 View Only

Using IBM Cloud Object Storage retention enabled vaults

By Nils Haustein posted Sat February 29, 2020 10:39 AM

  

 By Nils Haustein and Harald Seipp

 

In this blog article we explain the basic concepts of IBM Cloud Object Storage retention enabled vaults and explain how it is configured and managed. Retention enabled vaults allow to store objects in a WORM-manner (write-once-read-many) which means objects cannot be deleted or changed during the specified or default  retention period. The retention period can be predefined on a vault level or it can be set individually for each object. We will further provide some coding example giving an idea how easy it is to manage objects in a retention enabled vault of an IBM COS system.

 

Motivation

Object storage is the primary data storage provided in the cloud, and it is also increasingly used for on-premises solutions. Object storage is growing for the following reasons:

  • It is designed for scale in many ways (multi-site, multi-tenant, massive amounts of data).
  • It is easy to use and yet meets the growing demands of enterprises for a broad expanse of applications and workloads.
  • It allows users to balance storage cost, location, and compliance control requirements across data sets and essential applications.

 Due to its characteristics, object storage is becoming a significant storage repository for active archive of unstructured data, both for public and private clouds. Some types of archived data have to be kept in a write-once-read-many fashion (WORM) in order to meet regulatory requirements. IBM Cloud Object Storage can be used to archive and protect many different types of unstructured data.

 

Introduction to IBM Cloud Object Storage

 IBM® Cloud Object Storage (IBM COS) is a scalable object storage that can be deployed in the following modes:

  • Private on-premises object storage
  • Dedicated object storage (single tenant)
  • Public object storage (multi-tenant)
  • Hybrid object storage (a mix of on-premises, dedicated or public offerings)

 IBM COS can be accessed through an object interface using HTTP/REST Application Programming Interfaces (API), such as Amazon Web Services Simple Storage Service (AWS S3), OpenStack Swift and the IBM Cloud Object Storage System™ Simple Object over HTTP (SOH) API. Object storage API operations - such as PUT, GET, DELETE and LIST - allow applications to manage objects. The objects are stored in a logical storage construct that is called a Vault. In the world of Amazon Web Services (AWS) a vault corresponds to a bucket.

 Starting with IBM COS level 3.12 (released in November 2017), vaults can be configured in compliance mode. A vault configured in compliance mode is also called retention enabled vault or retention vault. To learn more about the basic concepts, architectures and advantages of IBM COS have a look at the corresponding IBM Redbook publications [1] and [5].

 

IBM COS retention enabled vaults

An IBM COS system can provide standard vaults and retention enabled vaults. A vault is a logical storage partition within an IBM COS system, equivalent to a bucket in AWS. Retention enabled vaults enforce retention protection for all objects stored in that vault.

 Objects can either be stored using the standard AWS S3 API or an S3-compatible API extension, where the latter is providing further retention control [2]. When objects are stored with the standard S3 API a predefined retention periods is applied to each object at the time the object is stored. The predefined retention period is defined on vault level. When using the S3-compatible API retention extensions, the retention period can additionally be given with the API request. In addition, the S3-compatible API extension allows to extent the retention period for individual objects and set and remove legal holds.

 Objects stored in a retention enabled vault cannot be deleted nor modified during the retention period. Versioning is disabled by default for retention enabled vaults. After the retention period has expired, objects can be deleted if there is no legal hold set on the object. Thus, legal holds provide an additional level of protection for objects and can be used by applications to control the object life cycle.

 The IBM COS retention function has been assessed for compliance according to the US Security and Exchange Commission (SEC) by an independent auditor [4]. An assessment according to European laws and regulations is planned.

 

Configuring a vault with retention enabled

We assume the IBM COS system has been setup and initialized already. In order to create a retention enabled vault, the system vault protection setting in the system configuration menu of the IBM COS Manager web interface must be enabled. Select the tab Configure and select Configure Vault Protection:

Vault protection


To create a vault with retention enabled, launch the vault creation wizard and select Enabled at the Retention setting as shown below:

Vault Retention Setting


The Retention setting Enabled means that the vault is enabled for retention. Once set, this cannot be changed later. All objects stored in this vault will be protected during a retention period. The following retention period limits can be specified:

  • The retention duration is a default retention period. This retention period is applied to objects that are stored without an explicit retention period. A typical scenario for this is when objects are stored using the standard S3 API that does not allow to store objects with an explicit retention period.
  • The minimum and maximum duration are boundaries for retention periods that can be specified for an object. If an object is stored using the S3-compatible API extension with a retention duration smaller than the minimum duration or greater, then this operation will not succeed and trigger an error response.

 

Managing vault access

Once a retention enabled vault has been created, users can be entitled to upload and download objects into the vault. This requires creating a user and allow access for the user to the vault. Select the Create Account wizard under the Security menu. Enter the username, password, roles and vault access permissions, as shown below:

Vault access

 

In the screen above select Move to Owner or Move to Read/Write in order to grant the user access to the vault named RetentionEnabledVault.

 

Obtain the user credentials

Once the user has been created and granted access to the vault, you have to obtain the user credentials. The user credentials for the S3 API are represented by an access key and secret key. These keys can be obtained from the Security menu, select the user and obtain the keys as shown below:

Access keys

These user credentials now have to be used for any object storage operation.

 

Some basics about the API

Retention vaults can be used with the standard S3 API. Thus, standard S3 PUT operations can be used to upload an object, GET operations can be used to download an object, HEAD operations can be used to show the object metadata and DELETE operations can be used to delete an object. Deletion however is only possible if the object retention period has expired and no legal hold is set. In addition, the standard S3 API does not allow to manage retention times. This means objects inherit the default retention period predefined for the vault. Find below some examples for the S3-compatible API extension that allows to manage object retention.

 Important notice: Retention enabled vaults enforce the use of AWS signature version 4. In addition, when uploading objects, the content MD5 has to be provided with the request.

 

To manage object retention periods and legal holds, the S3-compatible API extension has to be used. For example, to PUT an object with a retention period, the following header has to be provided with the PUT request:

PUT /BucketName/ObjectName HTTP/1.1
Host: myBucket.mydsNet.corp.com
Date: Wed, 8Feb 201717:50:00GMT
Authorization: {authorization-string}
Content-Type: text/plain
Retention-Period: 220752000

 The tag Retention-Period specifies the retention period in seconds. Alternatively, the tag Retention-Expiration-Date=<date> can be given which specifies the expiration date. The date format must be in accordance to the ISO 8601 format, e.g. it must be formatted according to the following pattern: YYYYMMDDThhmmssZ. For more details about the date and time formats see [6].

  

It is also possible to extend the retention period using the POST operation. Here is an example POST request:

POST /BucketName/ObjectName?extendRetention HTTP/1.1
Additional-Retention-Period: 31470552

 The tag Additional-Retention-Period specifies the retention period to be added to the current retention in seconds. As alternate option, the tag New-Retention-Date: <date> can be given which specifies the new expiration date. The new expiration date must be greater than the current expiration date. The date format must be in accordance to the ISO 8601 format, e.g. it must be formatted according to the following pattern: YYYYMMDDThhmmssZ (see [6]).

  

It is also possible to place a legal hold on the object. An object with at least one legal hold cannot be deleted even if the retention period has expired. A legal hold can be set with a PUT operation or after the object has been uploaded using a POST operation. An object can have more than one legal hold. This enables the implementation of different life cycle management policies, where each policy uses its own set of legal hold IDs. To post a legal hold, the following operation can be used:

POST https://{endpoint}/{bucket-name}/{object-name}?legalHold&add={legal-hold-ID}= #path style
POST https://{bucket-name}.{endpoint}/{object-name}?legalHold&add={legal-hold-ID}= # virtual host style

 The tag legal-hold-ID is a user-defined name string given to the legal hold. It must be unique for an object.

  

Likewise a legal hold can be removed using the POST call:

POST https://{endpoint}/{bucket-name}/{object-name}?legalHold&remove={legal-hold-ID}= #path style
POST https://{bucket-name}.{endpoint}/{object-name}?legalHold&remove={legal-hold-ID}= # virtual host style

  

It is also possible to view the vault protection setting,  i.e. the predefined retention periods (default, minimum and maximum). Here is an example for the corresponding GET operation:

GET https://{endpoint}/{bucket-name}?protection= # path style
GET https://{bucket-name}.{endpoint}?protection= # virtual host style

 

 In the next section we provide some tooling-based examples that illustrate how to use retention enabled vaults.

 

Examples for using retention enabled vault via API

After we have created a vault with retention enabled we can start using it. Objects can be uploaded, downloaded, listed and deleted using the standard S3 API or the S3-compatible API extensions. Retention enabled vaults enforce the use of Content-MD5 and AWS Signature version 4, regardless if the standard S3 API or the S3-compatible API extension are used.

 Content-MD5 is a mechanism to verify the integrity of the data transferred via HTTP. Thereby the MD5 algorithm is applied on the raw data that is being transmitted and the resulting MD5 checksum is added to the header of the HTTP request by the sender. The receiver applies the MD5 algorithm on the raw data again and compares the resulting MD5 checksum with the checksum in the HTTP request.

 With an AWS Signature HTTP requests sent to the Object Storage are signed, allowing the receiver (Object Storage) to determine the identity of the sender (Object Client). AWS Signature version 4 uses the AWS access key to compute the signature. The computed signature must be added to the request header.

 In this chapter we provide examples for using retention enabled vaults with standard tools and some special code that incorporates the S3-compatible API extensions.

 

Using the standard S3 API

When using the standard S3 API, the normal object storage API calls such as PUT, GET, LIST and DELETE can be used. On standard PUT-requests the default retention period configured for the vault is applied for the object. Objects can only be deleted if the retention period has expired. This is checked by the IBM COS system and if the retention period has not expired the DELETE request receives an error 451 indicating “Unavailable for legal reasons”.

The standard S3 API does not allow to explicitly set a retention period for an object or to set or remove legal holds on objects. It does also not allow to show the retention period of an object and legal holds. In order to do this, the S3-compatible API extension is required.

To exploit the standard S3 API the AWS command line interface can be used [3]. After installing and configuring the tool, it can be used to upload, download and list objects in a retention enabled vaults. However, it does not have a notion for managing retention periods. Find below some guidance for using this tool.

Once the tool has been installed, you have to configure the user credentials. This is done with the following command that will store the credentials in ~/.aws/credentials by default.

# aws configure

 

You can configure multiple users with different credentials. To configure different profiles with different user credentials use the command:

# aws configure --profile profile-name

 

In order to use a particular profile use the option --profile profile-name with the commands shown below.

List all vaults (buckets) configured in the system:

# aws [--profile profile-name] --endpoint-url http://cos1 s3 ls

 
List the content of a particular vault named “myvault”:

# aws [--profile profile-name] --endpoint-url http://cos1 s3 ls s3://myvault

 

Upload a file named “myobject” to the vault “myvault”

# aws [--profile profile-name] --endpoint-url http://cos1 s3 cp ./myobject s3://myvault

 

List an object:

# aws [--profile profile-name]  --endpoint-url http://cos1 s3 ls s3://myvault/myobject

 

Download an object:

# aws [--profile profile-name]  --endpoint-url http://cos1 s3 cp  s3://myvault/myobject ./myobjectcopy

 

The deletion of an object may fail if the object still has a retention period:

# aws [--profile profile-name] --endpoint-url http://cos1 s3 rm  s3://myvault/myobject

If the retention period of the object has not expired, the delete operation will fail with the following error message:

delete failed: s3://Mirror1/myobject An error occurred (UnavailableForLegalReasons) when calling the DeleteObject operation: Unavailable For Legal Reasons

 

As you can see, the standard API allows basic operations, but it does have any capability to manage retention periods and legal holds for objects. In order to find out more about the object retention, the S3-compatible API extension should be used.

 

Using the S3-compatible API extension

The S3-compatible API extensions allows to manage the retention for objects. In this section, we will give you some coding examples for managing object retention in a retention enabled vault. These examples are based on Python and require the modules requests and requests-aws4auth.

In order to initialize AWS version 4 authentication method, use the following method provided by the AWS4Auth module, whereby the parameters ACCESS_KEY and SECRET_KEY are the user credentials outlined in section Obtain the user credentials. The region denotes the region of the object storage:

from requests_aws4auth import AWS4Auth
auth = AWS4Auth(ACCESS_KEY, SECRET_KEY, 's3', region)


 The object auth is used within the next examples.

The following example shows how to PUT an object with an explicit retention period. The retention period is specified as number of seconds. The object with name myretobject will be uploaded to bucket mybucket to the accesser with the DNS alias cos1:

# create the object data
OBJECT_DATA_RET = 'This is some object data.”

# Calculate content_MD5, this requires modules hashlib and base64
m = hashlib.md5()
m.update(OBJECT_DATA_RET)
content_md5 = base64.encodestring(m.digest()).strip()

# create header including the retention period
ret_time = 172400
headers = {CONTENT_MD5: content_md5, Retention-Period: ret_time}

# Build the URL and send the PUT request using the module requests
ACCESSER_IP = 'cos1'
CONTAINER = 'mybucket'
OBJECT_NAME = 'myretobject'
url = 'http://%s/%s/%s' % (ACCESSER_IP, CONTAINER, OBJECT_NAME)
response = requests.put(url=url, auth=auth, headers=headers, data=OBJECT_DATA_RET)

# The response itself contains the http status code
print response

# If the status code is not good (200) then print the response text
if response.status_code > 200:
    print response.text

  

Instead of the retention period, the expiration date can be given. The expiration date must be given in ISO 8601 format, e.g. YYYYMMDDThhmmssZ. The same PUT method as shown above can be used, the only difference is the header tag:

# Calculate content_MD5, this requires modules hashlib and base64
# …

# create header including the expiration date
exp_date = ‘20201231T235959Z’
headers = {'Content-MD5': content_md5, 'Retention-Expiration-Date': exp_date}

# Build the URL and send the PUT request using the module requests
# see above
# …

 Make sure that the object name does not already exist in the vault

 
The HEAD operation shows some interesting details about the object, including the expiration date and the number of legal holds. The requests class allows to decode the header content through the returned response object:

# create the URL for the HEAD request
ACCESSER_IP = 'cos1'
CONTAINER = 'mybucket'
OBJECT_NAME = 'myretobject'
url = 'http://%s/%s/%s' % (ACCESSER_IP, CONTAINER, OBJECT_NAME)
response = requests.head(url=url, auth=auth)

# check the response status and if good then print the header
print response
if response.status_code > 200:
  print response.text
else:
  print '\nObject Information:'
  print '  Object Name:\t\t', OBJECT_NAME
  print '  Object Size:\t\t', response.headers['content-length'], response.headers['accept-ranges']
  print '  Insert Date:\t\t', response.headers['date']
  print '  Expiration Date:\t', response.headers['retention-expiration-date']
  print '  Legal Hold Count:\t', response.headers['retention-legal-hold-count'] if LEGAL_HOLD_COUNT in response.headers else None

  

The next examples shows how to extend the retention period using the POST operation. The retention period is given in seconds, in this example we extend the retention by 86400 seconds which is one day:

# create header with additional retention period tag
add_time = 86400
headers = {'Additional-Retention-Period:’ add_time}

# Requests POST Object with custom headers and EXTEND_RETENTION Request
ACCESSER_IP = 'cos1'
CONTAINER = 'mybucket'
OBJECT_NAME = 'myretobject'
url = 'http://%s/%s/%s?%s' % \
      (ACCESSER_IP, CONTAINER, OBJECT_NAME, 'extendRetention')
response = requests.post(url=url, auth=auth, headers=headers)
print response

if response.status_code > 200:
  print response.text

  

Instead of specifying a retention period to be added to the current expiration date, a new expiration date can be set for the object. The new date must be greater than the old date. The difference to the previous example is the header tag:

# create header with new retention expiration date
new_date = '20211231T235959Z'
headers = {'New-Retention-Expiration-Date:' new_date}

# Requests POST Object with custom headers and EXTEND_RETENTION Request
ACCESSER_IP = 'cos1'
CONTAINER = 'mybucket'
OBJECT_NAME =  'myretobject'
url = 'http://%s/%s/%s?%s' % \
      (ACCESSER_IP, CONTAINER, OBJECT_NAME, 'extendRetention')
response = requests.post(url=url, auth=auth, headers=headers)
print response

  

It is also possible to set one or more legal holds on an object. The object cannot be deleted as long as a legal hold exists. The following example shows how to set a legal hold with the ID myhold1:

# Requests POST Object to add a legal hold
ACCESSER_IP = 'cos1'
CONTAINER = 'mybucket'
OBJECT_NAME = 'myretobject'
legal_hold_id = 'myhold1'
LEGAL_HOLD_REQ = 'legalHold'
url = 'http://%s/%s/%s?%s&add=%s' % \
       (ACCESSER_IP, CONTAINER, OBJECT_NAME, 'legalHold', legal_hold_id)
response = requests.post(url=url, auth=auth)
print response

if response.status_code > 200:
  print response.text

  

Likewise, a legal hold can be removed, the difference to the above example is the request modifier as part of the URL:

# Requests POST Object to remove a legal hold
ACCESSER_IP = 'cos1'
CONTAINER = 'mybucket'
OBJECT_NAME = 'myretobject'
legal_hold_id =  'myhold1'
LEGAL_HOLD_REQ = 'legalHold'
url = 'http://%s/%s/%s?%s&remove=%s' % \
      (ACCESSER_IP, CONTAINER, OBJECT_NAME, 'legalHold', legal_hold_id)
response = requests.post(url=url, auth=auth)
print response

if response.status_code > 200:
  print response.text

  

As shown above, the S3-compatible API extension provides a comprehensive set of operations to manage retention on objects. It is also possible to show the retention setting such as object expiration time and legal holds  using a GET request:

# get the retention setting of an object using the legalHold modifier
ACCESSER_IP = 'cos1'
CONTAINER = 'mybucket'
OBJECT_NAME = 'myretobject'
LEGAL_HOLD_REQ = 'legalHold'
url = 'http://%s/%s/%s?%s' % \
      (ACCESSER_IP, CONTAINER, OBJECT_NAME, 'legalHold')
response = requests.get(url=url, auth=auth)
print response.text

 

The object parameter response.text shows the XML encoded retention details of the object. Find an example below:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><RetentionState xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><CreateTime>Mon, 28 May 2018 19:32:19 GMT</CreateTime><RetentionPeriod>520060</RetentionPeriod><RetentionPeriodExpirationDate>Sun, 03 Jun 2018 20:00:00 GMT</RetentionPeriodExpirationDate><LegalHolds><LegalHold><ID>myhold1</ID><Date>Mon, 28 May 2018 19:35:09 GMT</Date></LegalHold></LegalHolds></RetentionState>

 The object in this example has been created on 28 May 2018 19:32:19 and has a retention period of 520060 seconds which correlates to an expiration date of 03 Jun 2018 20:00:00. In addition this object has one legal hold named myhold1.

 

Summary

IBM COS retention enabled vaults allow storing objects in a write-once-read-many fashion (WORM) as demanded by laws and regulations in many countries. Consequently, IBM COS has been assessed for compliance according to rules of the Securities and Exchange Commission (SEC) of the United States [4]. The assessment of IBM COS retention enabled vaults according to other country laws and regulations is planned.

 The addition of retention management functions to the standard S3 API is a bold step that the IBM COS team has taken. It enables application developers who are familiar with the widely accepted S3 API to quickly adopt IBM COS retention enabled vaults and manage archive data in accordance to legal demands. And it enables users and applications to truly store all kind of data in an IBM COS system, including data that must be protected in a WORM fashion.

 IBM COS retention enabled vaults are one of the key storage offerings addressing regulatory compliance. For this purpose, the IBM COS team collaborates with key software vendors such as Veritas Enterprise Vault, IBM Filenet and NICE in order to enable applications using IBM COS retention enabled vaults.

 

Appendix

References

[1] IBM Redpaper: “IBM Cloud Object Storage - Concepts and Architecture"
https://www.redbooks.ibm.com/abstracts/redp5537.html

 [2] IBM COS application programming interface reference:
Go to https://www.ibm.com/docs/en/coss - select the version - from the table of content select Developer Reference and APIs

https://www.ibm.com/support/knowledgecenter/en/STXNRM_3.14.9/coss.doc/csoApi_apireference.html

 [3] Amazon Web Services command line interface

https://aws.amazon.com/cli/

 [4] Assessment report of IBM COS regarding US SEC 17a-4f rules created by Cohasset Associates:

https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=WUL12394USE

Assessment report for IBM COS version 3.14 regarding German, Swiss and French laws:
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?58AAE97EC3894C93BA2BCFE72E03192D

 [5] IBM Redbook: “Cloud Object Storage as a Service: IBM Cloud Object Storage from Theory to Practice - For developers, IT architects and IT specialists”

http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg248385.html?Open

 [6] W3C date and time formats

https://www.w3.org/TR/NOTE-datetime

 

Disclaimer

This document reflects the understanding of the authors in regard to questions asked about archiving solutions with IBM hardware and software. This document is presented “As-Is” and IBM does not assume responsibility for the statements expressed herein. It reflects the opinions of the author. These opinions are based on several years of joint work with the IBM Systems group. If you have questions about the contents of this document, please direct them to the Author (nils_haustein@de.ibm.com).

 

The coding examples provided here in are for illustrational purposes only. There is no guarantee that these examples work. In addition IBM does not assume any liability caused by damages using these coding examples.

 

The following terms are trademarks or registered trademarks of the IBM Corporation in the United States or other countries or both:  IBM logo, IBM Cloud Object Storage.

 

Amazon Web Services (AWS) is a [registered} trademark of Amazon Inc. in the United States and other countries.

 

Other company, product, and service names may be trademarks or service marks of others.

 

0 comments
26 views

Permalink