File and Object Storage

 View Only

Multi-protocol data sharing across Data Access Services(S3)and CSI

By RAVI KUMAR KOMANDURI posted Wed October 05, 2022 07:03 AM

  
The emergence of AI & Analytics workloads is one of the latest industry trends that is driving the need for comprehensive data access. IBM Spectrum Scale Data Access Services(DAS) [1] is a modernized, containerized solution built on top of IBM Spectrum Scale Containerized Native Storage Access [2] to address new industry needs by providing multi-protocol data access with S3 and CSI.


Industries like
Research & Academia, High Performance Computing, and Commercial establishments (Retail/Banking etc.) produce petabytes of data that require filtering and preparation before ultimately being analysed to deliver data insights. Storing these data in multiple locations with many versions creates a tedious process to maintain a single source of truth of the data. This drives the need for IBM’s Global Data Platform, powered by Spectrum Scale,which provides a common platform for data sharing across different establishments, entities, and use cases without making copies.


This blog describes one such use of
data sharing across the two sites using Spectrum Scale Data Access Services S3(DASS3) with a Containerized Storage Interface (CSI) [3] environment. DAS S3 is an object storage solution that runs in a secure environment on top of OpenShift Container Platform [4] with Spectrum Scale File system [5] serving as the backend storage infrastructure. Let us now delve into further details for multi-protocol data access between S3and CSI (we will refer to DAS as DAS Cluster and a second cluster termed CSI Cluster.

In the DAS Cluster, the S3 Access protocol drives the object IO which sits as an intermediate layer between the End users and underlying Spectrum Scale File system [6]. Each user has a unique set of Access/Secret keys generated by an administrator that is used as a mechanism to upload/download objects in their respective directories in the backend Storage cluster.

DAS cluster with Spectrum Scale File system

 

Figure 1: DAS Cluster that is deployed with the required building blocks with underlying Spectrum Scale as backend Storage Cluster.

Note: Two key considerations to note from the above figure: 1)Access/Secret keys, and 2) Multi-Category Security (MCS),these represent the respective values for user/group, security level access in the backend Storage Cluster.


With these data points at hand, the Spectrum Scale File
system is serving the second cluster (CSI Cluster) for sharing the data generated from DAS Cluster and vice versa. The cluster is deployed with Spectrum Scale Container Native Storage Access along with the CSI component that runs the applications to access the data generated by the two clusters with the same set of User credentials.

CSI and DAS cluster share data using the Spectrum Scale Filesystem as the backend

 Figure 2: CSI Cluster (left) and DAS Cluster(right) that is deployed with the required building blocks and Spectrum Scale as underlying backend Storage Cluster

Let us now discuss use cases:

1. Write data from DAS Cluster,Read and Write from the CSI Cluster then Re-Read from DAS Cluster

The primary goal in this case is the S3 application creates objects (i.e.uses S3 Put) which the DAS Cluster stores as a file in the backend Spectrum Scale Cluster. The CSI Cluster then accesses the data mounted on the same volume with required credentials, updates the data and then the DAS Cluster can re-read the data.

Step wise procedure
There are few steps to be performed on the Storage Cluster, DAS Cluster and then CSI Cluster for the data sharing to occur

Storage Cluster:

1. Create the backend directory on the Storage Cluster, set SELinux MCS labels along with User id (uid) and Group id (gid)

$ mkdir /mnt/fs1/s3userhpodir-csishare
$ chcon system_u:object_r:container_file_t:s0:c123,c456  /mnt/fs1/s3userhpodir-csishare
$ chown 5007:5200 /mnt/fs1/s3userhpodir-csishare
$ ls -laZd /mnt/fs1/s3userhpodir-csishare
drwxr-xr-x. 2 5007 5200 system_u:object_r:container_file_t:s0:c123,c456 4096 Sep 22 02:50 /mnt/fs1/s3userhpodir-csishare


2. Check the file system and the respective directory path once the data is stored

DAS Cluster:

3. Create an Account on the DAS Cluster with the uid, gid that was used on the Storage Cluster

$ mmdas account create s3user-hpo --uid 5007 --gid 5200 --newBucketsPath "/mnt/remote-sample/s3userhpodir-csishare"

4. Create S3 alias and use AWS CLI S3 to create a bucket

$ alias s3u5007='AWS_ACCESS_KEY_ID=lLGtTpuoYEn3aXkYcCcg AWS_SECRET_ACCESS_KEY=t6q+chBxcTrrM/8OxqY3jhHqDid55MhWDlqD2rAU aws --endpoint https://10.49.0.109 --no-verify-ssl s3'
$ s3u5007 mb s3://bucket-hpo-csi-share
5. Create bucket and upload an object with the user credentials
$ s3u5007 cp /root/file-hpo s3://bucket-hpo-csi-share
$ s3u5007 ls s3://bucket-hpo-csi-share
2022-09-22 03:46:29 42 file-hpo

6. Check the data on the backend Storage Cluster

$ ls -laZ /mnt/fs1/s3userhpodir-csishare/bucket-hpo-csi-share/file-hpo
-rw-r-----. 1 5007 5200 system_u:object_r:container_file_t:s0:c123,c456 42 Sep 22 03:46 /mnt/fs1/s3userhpodir-csishare/bucket-hpo-csi-share/file-hpo

CSI Cluster:
7. Create a new project in the cluster, describe the namespace of the project

$ oc new-project csi-new-project

$ oc describe ns csi-new-project
Name:    csi-new-project
Labels:

               kubernetes.io/metadata.name=csi-new-project
               pod-security.kubernetes.io/audit=privileged
               pod-security.kubernetes.io/audit-version=v1.24
               pod-security.kubernetes.io/warn=privileged
               pod-security.kubernetes.io/warn-version=v1.24
Annotations:
               openshift.io/description:
               openshift.io/display-name:
               openshift.io/requester: kube:admin
               openshift.io/sa.scc.mcs: s0:c26,c10
               openshift.io/sa.scc.supplemental-groups: 1000670000/10000
               openshift.io/sa.scc.uid-range: 1000670000/10000
Status:    Active

8. Set the MCS labels of namespace to the Spectrum Scale File system directory (for ex:s0:c123,c456) [7]

$ oc annotate namespace csi-new-project --overwrite openshift.io/sa.scc.mcs="s0:c123,c456"

9. Create the static PV [8] for the Spectrum Scale File system directory using CSI and apply

$ cat pv-hpocsi-share.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
   name: static-scale-static-pv-hpocsi-demo
spec:
    capacity:
    storage: 1Gi
    accessModes:
        -ReadWriteMany
    csi:
    driver: spectrumscale.csi.ibm.com
volumeHandle: 0;0;11068626594609331835;2B340B0A:62D64D11;;;/mnt/remote-sample/s3userhpodir-csishare/bucket-hpo-csi-share

$ oc apply -f ./pv-hpocsi-share.yaml

10. Create the PVC that would bound to the PV

$ cat pvc-hpocsi-share.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: static-pvc-hpocsi-demo
spec:
  volumeName: static-scale-static-pv-hpocsi-demo
  accessModes:
     -ReadWriteMany
  resources:
    requests:
      storage: 1Gi

$ oc apply -f ./pvc-hpocsi-share.yaml

11. Create the application pod in the namespace with the uid set to runAsuser variable

$ cat pod-create-hpo-csi.yaml
apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo-rkomandu
spec:
     containers:
     - name: sec-ctx-demo-2
       image: busybox:1.28
       command: [ "sh", "-c", "sleep 1h" ]
       securityContext:
           runAsUser: 5007
           runAsGroup: 5200
       volumeMounts:
           - name: mypvc
             mountPath: /tmp/busy-box
     volumes:
       - name: mypvc
          persistentVolumeClaim:
              claimName: static-pvc-hpocsi-demo
              readOnly: false

$ oc apply -f ./pod-create-hpo-csi.yaml

12. Login to the application pod, check the object that was uploaded from DAS Cluster, can be read from the pod of CSI Cluster, update the content of the file and read the content of the file

$ oc exec --stdin --tty security-context-demo-rkomandu --/bin/sh

$ df -h /tmp/busy-box
Filesystem Size Used Available Use% Mounted
remote-sample 2.9T 10.4G 2.9T 0% /tmp/busy-box

$ ls -lrt /tmp/busy-box
total 1
-rw-r----- 1 5007 5200 42 Sep 22 10:46 file-hpo

$ cat /tmp/busy-box/file-hpo
this is new file created from DAS Cluster

$ echo "add this sentence from application pod on the CSI cluster" >> /tmp/busy-box/file-hpo

$ ls -lrt /tmp/busy-box/file-hpo
-rw-r-----1 5007 5200 100 Sep 22 12:33 /tmp/busy-box/file-hpo

$ cat /tmp/busy-box/file-hpo
this is new file created from DAS Cluster
add this sentence from application pod on the CSI cluster

DAS Cluster:

13. Using AWS CLI s3,download the object from the bucket that was created previously using the uid, gid and read the content

$ s3u5007 cp s3:// bucket-hpo-csi-share/file-hpo /tmp
download: s3://bucket-hpo-csi-share/file-hpo to ../tmp/file-hpo

$ cat /tmp/file-hpo
this is new file created from DAS Cluster
add this sentence from application pod on the CSI cluster

Overall, this indicates the data that is shared between the two clusters can be read and updated in either way.

2. Write data from CSI Cluster, Read and update from the DAS Cluster then Re-read from CSI Cluster

In this use case, let us create the file from CSI Cluster that gets stored in the backend Spectrum Scale Cluster. DAS Cluster then access the object via the AWS CLI s3 and update the data using the required credentials. Further on, by uploading the object from DAS Cluster the same data can be read from the CSI Cluster.

Step wise procedure
There are few steps to be performed on the Storage Cluster, CSI Cluster and then DAS Cluster for the data sharing to occur

Storage Cluster:

1. Create the backend directory on the Storage Cluster with User id (uid) and Group id (gid) and then set the directory with SELinux MCS labels

$ mkdir /mnt/fs1/csiuserdir-hposhareaccess
$ chcon system_u:object_r:container_file_t:s0:c123,c456 /mnt/fs1/csiuserdir-hposhareaccess
$ chown 5008:5300 /mnt/fs1/csiuserdir-hposhareaccess
$ ls -laZd /mnt/fs1/csiuserdir-hposhareaccess
drwxr-xr-x. 2 5008 5300 system_u:object_r:container_file_t:s0:c123,c456 4096 Sep 22 10:45 /mnt/fs1/csiuserdir-hposhareaccess

CSI Cluster:

2. Create a new project

$ oc new-project csi-hpo-data-share-namespace

$ oc describe ns csi-hpo-data-share-namespace
Name:        csi-hpo-data-share-namespace
Labels:      kubernetes.io/metadata.name=csi-hpo-data-share-namespace
                   pod-security.kubernetes.io/audit=privileged
                   pod-security.kubernetes.io/audit-version=v1.24
                   pod-security.kubernetes.io/warn=privileged
                   pod-security.kubernetes.io/warn-version=v1.24
Annotations: 
                   openshift.io/description:
                   openshift.io/display-name:
                   openshift.io/requester: kube:admin
                   openshift.io/sa.scc.mcs: s0:c26,c20
                   openshift.io/sa.scc.supplemental-groups: 1000690000/10000
                   openshift.io/sa.scc.uid-range: 1000690000/10000
Status:       Active

3. Set the MCS labels of the created project namespace to the Spectrum Scale File system directory (for ex: s0:c123,c456)

$ oc annotate namespace csi-hpo-data-share-namespace –overwrite openshift.io/sa.scc.mcs="s0:c123,c456"

4. Create the static PV [8] for the Spectrum Scale File system directory using CSI

$ cat pv-csiuser-hposhare.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
    name: scale-static-pv-csiuser-hpo-demo
spec:
    capacity:
        storage: 1Gi
    accessModes:
      - ReadWriteMany
    csi:
       driver: spectrumscale.csi.ibm.com
       volumeHandle: 0;0;11068626594609331835;2B340B0A:62D64D11;;;/mnt/remote-sample/csiuserdir-hposhareaccess

$ oc apply -f ./pv-csiuser-hposhare.yaml
  persistentvolume/scale-static-pv-csiuser-hpo-demo created

5. Create the PVC that would bound to the PV

$ cat pvc-hpocsi-share.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
     name: static-pvc-csiuser-hpo-demo
spec:
    volumeName: scale-static-pv-csiuser-hpo-demo
    accessModes:
        - ReadWriteMany
    resources:
        requests:
           storage: 1Gi

$ oc apply -f ./pvc-hpocsi-share.yaml
persistentvolumeclaim/static-pvc-csiuser-hpo-demo created
6. Create the application pod in the namespace with the uid, gid set to runAsuser, runAsGroup variable

$ cat pod-create-csiuser-hposhare.yaml
apiVersion: v1
kind: Pod
metadata:
     name: security-csiuser-context-demo-rkomandu
spec:
     containers:  
      - name: sec-ctx-csiuser-demo-2
        image: busybox:1.28
       command: [ "sh", "-c", "sleep 1h" ]
       securityContext:
           runAsUser: 5008
           runAsGroup: 5300
           volumeMounts:
             - name: mypvc
               mountPath: /tmp/busy-box-2
     volumes:
       - name: mypvc
         persistentVolumeClaim:
             claimName: static-pvc-csiuser-hpo-demo
             readOnly: false

$ oc apply -f ./pod-create-csiuser-hposhare.yaml
pod/security-csiuser-context-demo-rkomandu created

7.  Login to the application pod, create a directory and file inside

$ oc exec --stdin --tty security-csiuser-context-demo-rkomandu -- /bin/sh
$ df -h /tmp/busy-box-2
Filesystem                Size      Used Available Use% Mounted on
remote-sample           2.9T     10.4G      2.9T   0% /tmp/busy-box-2
$ mkdir /tmp/busy-box-2/csi-createdir
$ ls -ld /tmp/busy-box-2/csi-createdir
drwxr-xr-x    2 5008     5300          4096 Sep 22 18:22 /tmp/busy-box-2/csi-createdir
$ echo “this is new file created from CSI Cluster application pod” > /tmp/busy-box-2/csi-createdir/file-csi
$ ls -l /tmp/busy-box-2/csi-createdir/file-csi
-rw-r--r--    1 5008     5300            64 Sep 22 18:23 /tmp/busy-box-2/csi-createdir/file-csi

DAS Cluster:

8. Create an S3 account on the DAS Cluster that matches with the uid, gid of the directory created in the backend Storage Cluster which was accessed in the CSI Cluster

$ mmdas account create s3user-csi --uid 5008 --gid 5300 --newBucketsPath /mnt/remote-sample/csiuserdir-hposhareaccess

9. Create an S3 Export with the directory that was created in the CSI Cluster.

$ mmdas export create newbucket-forhpo5008user --filesystemPath "/mnt/remote-sample/csiuserdir-hposhareaccess/csi-createdir"

Note: DAS S3 Exports present directories and files stored in Spectrum Scale, represents as S3 Buckets and S3 Objects towards S3 applications

10. Using AWS CLI s3, create an alias for the respective user, download the object from the bucket using the access and secret ket and read the content

$ alias s3u5008='AWS_ACCESS_KEY_ID=XYZLGtTpuoYEn3aXkYc AWS_SECRET_ACCESS_KEY=p9Q+chBxcTrrM/8OxqY3jhHqDid55MhWDlqD2rAU aws --endpoint https://10.49.0.109 --no-verify-ssl s3'
$ s3u5008 ls s3:// newbucket-forhpo5008user
2022-09-22 11:23:41         64 file-csi
$ s3 cp s3:// newbucket-forhpo5008user/file-csi /tmp
download: download: s3://newbucket-forhpo5008user/file-csi to ../tmp/file-csi
$ cat /tmp/file-csi
"this is new file created from CSI Cluster application pod”
$ echo "content is written from hpo user" >> /tmp/file-csi
$ s3 cp /tmp/file-csi s3://newbucket-forhpo5008user
upload: ../tmp/file-csi to s3://newbucket-forhpo5008user/file-csi

CSI Cluster:

11. Login from the CSI Cluster application pod and read the content of the file
$ oc exec --stdin --tty security-csiuser-context-demo-rkomandu -- /bin/sh
$ ls -lrt /tmp/busy-box-2/csi-createdir/
total 1
-rw-r-----    1 5008     5300            97 Sep 22 18:40 file-csi
$ cat /tmp/busy-box-2/csi-createdir/file-csi
“this is new file created from CSI Cluster application pod”
content is written from hpo user

With these steps, it is evident that the two-way access between the DAS Cluster and CSI Cluster is operational with the backend Storage Cluster as Spectrum Scale

Conclusion

This blog has described multi-protocol data access using DAS and CSI cluster solutions, while avoiding versioning, copying, and data migration. Further work will continue as we look to expand the data access capabilities of IBM’s Global Data Platform for unstructured data

References

 [1] https://www.ibm.com/docs/en/scalecontainernative?topic=515-spectrum-scale-data-access-services
 [2] https://www.ibm.com/docs/en/scalecontainernative
 [3] https://www.ibm.com/docs/en/spectrum-scale-csi
 [4] https://www.redhat.com/en/technologies/cloud-computing/openshift/container-platform
 [5] https://www.ibm.com/docs/en/spectrum-scale
 [6] https://www.ibm.com/docs/en/scalecontainernative?topic=architecture-data-path
 [7] https://community.ibm.com/community/user/storage/blogs/gero-schmidt1/2022/04/01/advanced-static-volume-provisioning-on-ocp
 [8] https://kubernetes.io/docs/concepts/storage/persistent-volumes/

Additional Reference

https://community.ibm.com/community/user/storage/blogs/ulf-troppens/2022/05/27/spectrum-scale-data-access-services-das-s3
https://www.ibm.com/docs/en/spectrum-scale-csi?topic=provisioning-creating-persistent-volume-pv

0 comments
47 views

Permalink