Authors: @RAVI KUMAR KOMANDURI @Anandhu Karattuparambil
In the era of AI Data & Analytics workloads, the Object access is a key differentiator for any product in the industry. This Object access, would enhance the Research & Academia, Retail, Financial, Telecom sectors etc etc to analyze their businesses and aid in coming up with innovative strategies for enhancing their end users experience. IBM Storage Scale Product[1] (aka Spectrum scale as previously called), coined a feature, known as "S3 protocol"[2], designed for addressing such use cases. As data is everywhere, so is the need to maintain single source of data while making them available across varied environments, consumed in different forms based on their usage.
In this blog our main focus would be on how the data across different S3users is shared in a secure manner with the help of Linux Standard Access and Bucket policies. Here in the S3 environment, the S3users are the end users, would run their workloads to upload/download data, which resides under respective buckets in the underlying IBM Storage Scale filesystems. Now, this data can be shared (like in other use cases) for the S3 user's to perform their Visualization / analytics experiments based on their requirements.
Let us now delve into further detail on achieving the same.
Pre-requisites:
- Deploy an IBM Storage Scale cluster
- Enable the S3 protocol in the environment
- Create the Accounts and Buckets using the mms3 Command Line Interface (CLI)
Use case 1: Create two S3 accounts with the same group id(gid) and apply bucket policies
In this use case, we describe about the data share across S3users
Step 1:
Create S3 accounts using mms3 CLI as shown below
mms3 account list Name New Buckets Path Uid Gid User ------ ----------------- --- --- ---- account-second /mnt/gpfs0/account-second 2000 2000 None account-first /mnt/gpfs0/account-first 1000 2000 None |
Step 2:
Create aliases on the Application or the Client node for the respective accounts created i.e account-first, account-second, account-third with their S3 access and secret keys. Use the AWS S3 CLI for performing the IO uploads and downloads for the S3users and then create a bucket using the alias for account-first as shown below
alias aws-account-first='AWS_ACCESS_KEY_ID=<access-key> AWS_SECRET_ACCESS_KEY=<secret-key> aws --endpoint https://<IP:SSL_port> --no-verify-ssl '
aws-account-first s3 mb s3://bucket-first
|
Note: Aliases in examples are referred as "aws-account-*" (i.e first, second, third) for easy command usage than specifying the entire S3 CLI command with the access, secret keys.
Step 3:
Now list the bucket created for the S3user, using the mms3 CLI command
mms3 bucket list bucket-first Name Filesystem Path Bucket Owner ------ --------------- ------------- bucket-first /mnt/gpfs0/account-first/bucket-first account-first |
Step 4:
Upload an object to the bucket via the S3 CLI for the S3user account-first with the alias created earlier
aws-account-first s3 cp file1.ppt s3://bucket-first upload: ./file1.ppt to s3://bucket-first/file1.ppt |
Step 5:
Even though the S3user account-second has the same group id (gid), the bucket created by s3user account-first cannot list the contents of the bucket. Now using a bucket policy setup by the account-first, grant access to all the data in the bucket for the users in the same group. Sample bucket policy is shown below
aws-account-first s3api put-bucket-policy --bucket bucket-first --policy '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"AWS": "*"},"Action": ["s3:PutObject","s3:DeleteBucket","s3:DeleteObject","s3:ListBucket","s3:ListAllMyBuckets","s3:GetObject"],"Resource": ["arn:aws:s3:::bucket-first","arn:aws:s3:::bucket-first/*"]}]}' |
Step 6:
Now list the bucket policy with S3api call to check the policy is applied to the bucket
aws-account-first s3api get-bucket-policy --bucket bucket-first | jq '.Policy | fromjson | .' { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": [ "*" ] }, "Action": [ "s3:PutObject", "s3:DeleteBucket", "s3:DeleteObject", "s3:ListBucket", "s3:ListAllMyBuckets", "s3:GetObject" ], "Resource": [ "arn:aws:s3:::bucket-first", "arn:aws:s3:::bucket-first/*" ] } ] } |
Step 7:
As S3user account-second, now try to upload and list the contents from the bucket
aws-account-second s3 ls 2024-07-16 07:30:14 bucket-first aws-account-second s3 cp file2.doc s3://bucket-first upload: ./file2.doc to s3://bucket-first/file2.doc aws-account-second s3 ls s3://bucket-first 2024-07-16 08:01:40 125 file1.ppt 2024-07-16 08:05:59 350 file2.doc |
As you have worked through the above steps, it is evident that the S3users can share the data across same group (gid) by having the bucket policies in place which is a 2 step security in the S3 world.
Use case 2: Create two S3 accounts with the different group id(gid) and apply bucket policies to share data
This scenario is an extension to the Use case 1, where S3user "account-third" is created using the mms3 CLI with a different group id.
Step 1:
Create an S3user "account-third" with different group id (gid)
mms3 account list Name New Buckets Path Uid Gid User ------ ----------------- --- --- ---- account-third /mnt/gpfs0/account-third 3000 3000 None account-first /mnt/gpfs0/account-first 1000 2000 None |
Step 2:
S3user account-third cannot list the content of the bucket "bucket-first" as this has a different group id (gid). System administrator needs to set the appropriate permissions on the NewBucketsPath of the "account-first" user bucket for this S3user to access the data
chmod 775 /mnt/gpfs0/account-first/ chmod 775 /mnt/gpfs0/account-first/bucket-first/ |
Step 3:
Now account-third will be able to list the bucket
aws-account-third s3 ls 2024-07-16 07:30:14 bucket-first |
Note: Even though account-third is able to access the bucket-first, it is still unable to list the objects inside it. Reason is by default S3 uploads take place with default mode bits 660 on the objects.
Step 4:
Now sysadmin will modify the permissions on the objects to 664 of the s3user bucket-first directory contents on relevant files, so the other group users can access the objects
chmod 664 /mnt/gpfs0/account-first/bucket-first/* |
Step 5:
Now list the objects again using account-third, displays all the objects in the bucket "bucket-first"
aws-account-third s3 ls s3://bucket-first 2024-07-16 08:01:40 125 file1.ppt 2024-07-16 08:05:59 350 file2.doc |
This is how the S3users can share the data across different users with the help of bucket policies. There are other methods which exist today in the S3 world, which will be explored in future blogs. However, this blog gives an high level overview on how this is achieved.
Conclusions:
In summary, the S3users can share the data across them by applying the bucket policies at a fine grain level.
References:
[1] https://www.ibm.com/docs/en/storage-scale/5.2.0
[2] https://community.ibm.com/community/user/storage/blogs/madhu-punjabi/2024/04/26/ibm-storage-scale-ces-s3