File and Object Storage

File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform 

 View Only

IBM-Storage-Scale-CES-S3-datashare-bucket-policy

By RAVI KUMAR KOMANDURI posted Mon July 29, 2024 03:21 AM

  

Authors: @RAVI KUMAR KOMANDURI @Anandhu Karattuparambil

In the era of AI Data & Analytics workloads, the Object access is a key differentiator for any product in the industry. This Object access, would enhance the Research & Academia, Retail, Financial, Telecom sectors etc etc to analyze their businesses and aid in coming up with innovative strategies for enhancing their end users experience. IBM Storage Scale Product[1] (aka Spectrum scale as previously called), coined a feature, known as "S3 protocol"[2], designed for addressing such use cases. As data is everywhere, so is the need to maintain single source of data while making them available across varied environments, consumed in different forms based on their usage. 

In this blog our main focus would be on how the data across different S3users is shared in a secure manner with the help of Linux Standard Access and Bucket policies. Here in the S3 environment, the S3users are the end users, would run their workloads to upload/download data, which resides under respective buckets in the underlying IBM Storage Scale filesystems. Now, this data can be shared (like in other use cases) for the S3 user's to perform their Visualization / analytics experiments based on their requirements. 

Let us now delve into further detail on achieving the same. 


Pre-requisites:

- Deploy an IBM Storage Scale cluster 
- Enable the S3 protocol in the environment
- Create the Accounts and Buckets using the mms3 Command Line Interface (CLI)

Use case 1: Create two S3 accounts with the same group id(gid) and apply bucket policies

 In this use case, we describe about the data share across S3users
Step 1:

  Create S3 accounts using mms3 CLI as shown below 

mms3 account list
    Name              New Buckets Path               Uid      Gid      User
    ------            -----------------              ---      ---      ----
    account-second    /mnt/gpfs0/account-second      2000     2000     None 
    account-first     /mnt/gpfs0/account-first       1000     2000     None

 

Step 2:

Create aliases on the Application or the Client node for the respective accounts created i.e account-first, account-second, account-third with their S3 access and secret keys. Use the AWS S3 CLI for performing the IO uploads and downloads for the S3users and then create a bucket using the alias for account-first as shown below 

alias aws-account-first='AWS_ACCESS_KEY_ID=<access-key> AWS_SECRET_ACCESS_KEY=<secret-key> aws --endpoint https://<IP:SSL_port> --no-verify-ssl '

aws-account-first s3 mb  s3://bucket-first

 

Note: Aliases in examples are referred as "aws-account-*" (i.e first, second, third) for easy command usage than specifying the entire S3 CLI command with the access, secret keys. 

Step 3:

 Now list the bucket created for the S3user, using the mms3 CLI command

 

mms3 bucket list bucket-first
    Name            Filesystem Path                         Bucket Owner
    ------          ---------------                         -------------   
    bucket-first    /mnt/gpfs0/account-first/bucket-first   account-first

Step 4:

Upload an object to the bucket via the S3 CLI for the S3user account-first with the alias created earlier 

aws-account-first s3 cp file1.ppt s3://bucket-first                                                   
    upload: ./file1.ppt to s3://bucket-first/file1.ppt

Step 5:

Even though the S3user account-second  has the same group id (gid), the bucket created by s3user account-first cannot list the contents of the bucket. Now using a bucket policy setup by the account-first, grant access to all the data in the bucket for the users in the same group. Sample bucket policy is shown below 

aws-account-first s3api put-bucket-policy --bucket bucket-first --policy '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"AWS": "*"},"Action": ["s3:PutObject","s3:DeleteBucket","s3:DeleteObject","s3:ListBucket","s3:ListAllMyBuckets","s3:GetObject"],"Resource": ["arn:aws:s3:::bucket-first","arn:aws:s3:::bucket-first/*"]}]}'

 

Step 6:

Now list the bucket policy with S3api call to check the policy is applied to the bucket 

aws-account-first s3api get-bucket-policy --bucket bucket-first | jq '.Policy | fromjson | .'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "*"
        ]
      },
      "Action": [
        "s3:PutObject",
        "s3:DeleteBucket",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:ListAllMyBuckets",
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::bucket-first",
        "arn:aws:s3:::bucket-first/*"
      ]
    }
  ]
}

Step 7:

As S3user account-second, now try to upload and list the contents from the bucket

    aws-account-second s3 ls
    2024-07-16 07:30:14 bucket-first
   
    aws-account-second s3 cp file2.doc s3://bucket-first 

    upload: ./file2.doc to s3://bucket-first/file2.doc
   
    aws-account-second s3 ls s3://bucket-first

    2024-07-16 08:01:40          125 file1.ppt
    2024-07-16 08:05:59          350 file2.doc

As you have worked through the above steps, it is evident that the S3users can share the data across same group (gid) by having the bucket policies in place which is a 2 step security in the S3 world. 

Use case 2: Create two S3 accounts with the different group id(gid) and apply bucket policies to share data 

This scenario is an extension to the Use case 1, where  S3user "account-third" is created using the mms3 CLI with a different group id. 

Step 1:

Create an S3user "account-third" with different group id (gid) 

mms3 account list 
    Name                   New Buckets Path           Uid     Gid     User
    ------                 -----------------          ---     ---     ----
    account-third         /mnt/gpfs0/account-third   3000    3000    None
    account-first         /mnt/gpfs0/account-first   1000    2000    None

Step 2:

S3user account-third cannot list the content of the bucket "bucket-first"  as this has a different group id (gid).  System administrator needs to set the appropriate permissions on the NewBucketsPath of the "account-first" user bucket for this S3user to access the data

chmod 775 /mnt/gpfs0/account-first/
chmod 775 /mnt/gpfs0/account-first/bucket-first/

Step 3:

Now account-third will be able to list the bucket

  aws-account-third s3 ls
    2024-07-16 07:30:14 bucket-first

 

Note:  Even though account-third is able to access the bucket-first, it is still unable to list the objects inside it. Reason is by default S3 uploads take place with default mode bits 660 on the objects. 

Step 4:

Now sysadmin will modify the permissions on the objects to 664 of the s3user bucket-first directory contents on relevant files, so the other group users can access the objects

chmod 664 /mnt/gpfs0/account-first/bucket-first/*

 

Step 5:

Now list the objects again using account-third, displays all the objects in the bucket "bucket-first"

 

     aws-account-third s3 ls s3://bucket-first
          2024-07-16 08:01:40          125 file1.ppt
          2024-07-16 08:05:59          350 file2.doc

This is how the S3users can share the data across different users with the help of bucket policies. There are other methods which exist today in the S3 world, which will be explored in future blogs. However, this blog gives an high level overview on how this is achieved. 

Conclusions:

In summary, the S3users can share the data across them by applying the bucket policies at a fine grain level. 

References:
[1] https://www.ibm.com/docs/en/storage-scale/5.2.0
[2] https://community.ibm.com/community/user/storage/blogs/madhu-punjabi/2024/04/26/ibm-storage-scale-ces-s3

0 comments
122 views

Permalink