watsonx.data

 View Only

Watsonx.data and IKC : Installing, Enabling and Integration of IKC on watsonx.data and use cases.

By PRAVEEN SOGALAD posted 11 days ago

  

Topic : How to install, enable and integrate IBM Knowledge catalog on IBM Watsonx.data(On-Prem)

Prerequisites : IBM Watsonx.data on-prem Environment. 

Once you have a working IBM Watsonx.data On-Prem Environment (FYR : Steps), proceed with the below steps

Installing IKC on IBM Watsonx.data on-prem CPD Environment

1) Log using cpd-cli in to the Red Hat OpenShift Container Platform cluster:

cpd-cli manage login-to-ocp \
--username=${OCP_USERNAME} \
--password=${OCP_PASSWORD} \
--server=${OCP_URL}

2) Run the following command to create the required OLM objects for IBM Knowledge Catalog in the operators project for the instance:

cpd-cli manage apply-olm \
--release=${VERSION} \
--cpd_operator_ns=${PROJECT_CPD_INST_OPERATORS} \
--components=wkc

3) Create the custom resource for IBM Knowledge Catalog. Note : The command that you run depends on the storage on your cluster. In this case, the Watsonx.data is installed using Red Hat OpenShift Data Foundation Storage. 

cpd-cli manage apply-cr \
--components=wkc \
--release=${VERSION} \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
--block_storage_class=${STG_CLASS_BLOCK} \
--file_storage_class=${STG_CLASS_FILE} \
--license_acceptance=true

4) Validating the installation : 

$ cpd-cli manage get-cr-status \
> --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
> --components=wkc
[INFO] 2024-06-20T02:31:51.980774Z Checking architecture: amd64
[INFO] 2024-06-20T02:31:51.980812Z Checking podman or docker
[INFO] 2024-06-20T02:31:52.021113Z Dockerexe: podman
[INFO] 2024-06-20T02:31:52.074703Z Container olm-utils-play-v2 is running already. Image: icr.io/cpopen/cpd/olm-utils-v2:latest
[INFO] 2024-06-20T02:31:52.126621Z Processing subcommand get-cr-status
[INFO] 2024-06-20T02:31:52.126660Z Run command: podman exec -it olm-utils-play-v2 get-cr-status --cpd_instance_ns=zen --components=wkc
Execute script: python3 /opt/ansible/bin/get_cr_status.py --cpd_instance_ns zen --components wkc

Running the get_cr_status.py script. Start of the log.
================================================================

[INFO] Output the result in the JSON format:

{"wkc": [{"CR-kind": "WKC", "CR-name": "wkc-cr", "Namespace": "zen", "Creationtimestamp": "2024-06-18T13:35:16Z", "Version": "4.8.5", "Status": "Completed"}]} 

[INFO] Output the result in the below chart:

Component    CR-kind    CR-name    Namespace    Status     Version    Creationtimestamp     Reconciled-version    Operator-info
-----------  ---------  ---------  -----------  ---------  ---------  --------------------  --------------------  ---------------
wkc          WKC        wkc-cr     zen          Completed  4.8.5      2024-06-18T13:35:16Z  N/A                   N/A 

The get_cr_status.py script ran successfully. End of the log.
================================================================

[SUCCESS] The status of custom resources was saved to '/home/admin/work/cpd-cli-workspace/olm-utils-workspace/work/status.csv'.

[SUCCESS] 2024-06-20T02:31:53.616679Z You may find output and logs in the /home/admin/work/cpd-cli-workspace/olm-utils-workspace/work directory.
[SUCCESS] 2024-06-20T02:31:53.616732Z The get-cr-status command ran successfully.
[admin@bastion-gym-lan work]$ podman exec -it olm-utils-play-v2 get-cr-status --cpd_instance_ns=zen --components=wkc
Execute script: python3 /opt/ansible/bin/get_cr_status.py --cpd_instance_ns zen --components wkc

Running the get_cr_status.py script. Start of the log.
================================================================

[INFO] Output the result in the JSON format:

{"wkc": [{"CR-kind": "WKC", "CR-name": "wkc-cr", "Namespace": "zen", "Creationtimestamp": "2024-06-18T13:35:16Z", "Version": "4.8.5", "Status": "Completed"}]} 

[INFO] Output the result in the below chart:

Component    CR-kind    CR-name    Namespace    Status     Version    Creationtimestamp     Reconciled-version    Operator-info
-----------  ---------  ---------  -----------  ---------  ---------  --------------------  --------------------  ---------------
wkc          WKC        wkc-cr     zen          Completed  4.8.5      2024-06-18T13:35:16Z  N/A                   N/A 

$ oc get WKC wkc-cr
NAME     VERSION   RECONCILED   STATUS      AGE
wkc-cr   4.8.5     4.8.5        Completed   40h

Verify all pods are in healthy state : Running/ Completed. 

$ oc get pods -n ${PROJECT_CPD_INST_OPERANDS} | grep wkc
c-db2oltp-wkc-db2u-0                                              1/1     Running     0          40h
c-db2oltp-wkc-instdb-6j8sm                                        0/1     Completed   0          40h
wkc-base-roles-init-b94xd                                         0/1     Completed   0          40h
wkc-bi-data-service-5f978b7ff9-2f7gn                              1/1     Running     0          39h
wkc-catalog-api-jobs-6bb854bbd8-xh8hq                             1/1     Running     0          39h
wkc-data-rules-646497b788-7bjq2                                   1/1     Running     0          39h
wkc-db2u-init-cmzf8                                               0/1     Completed   0          40h
wkc-extensions-translations-init-hnx9b                            0/1     Completed   0          40h
wkc-glossary-service-7864b65cbf-knjjs                             1/1     Running     0          39h
wkc-glossary-service-sync-cronjob-28647691-cxhtw                  0/1     Completed   0          64m
wkc-glossary-service-sync-cronjob-28647721-xddr4                  0/1     Completed   0          34m
wkc-glossary-service-sync-cronjob-28647751-5rnf4                  0/1     Completed   0          4m51s
wkc-gov-ui-64dc96c968-vx5qf                                       1/1     Running     0          39h
wkc-mde-service-manager-67cd6fd4c5-72426                          1/1     Running     0          39h
wkc-metadata-imports-ui-5c6fc94899-nrplj                          1/1     Running     0          39h
wkc-post-install-init-vhlpc                                       0/1     Completed   0          39h
wkc-roles-init-wgllk                                              0/1     Completed   0          40h
wkc-search-574cb4bd8c-c2jf8                                       1/1     Running     0          40h
wkc-term-assignment-79ff55994f-pflkk                              1/1     Running     0          39h
wkc-workflow-service-7447f55946-q7jh9                             1/1     Running     0          39h

$ oc get pods -n ${PROJECT_CPD_INST_OPERANDS} | grep db2
c-db2oltp-wkc-db2u-0                                              1/1     Running     0          40h
c-db2oltp-wkc-instdb-6j8sm                                        0/1     Completed   0          40h
wkc-db2u-init-cmzf8                                               0/1     Completed   0          40h

Verify all the IKC/WKC operators are running:

$ oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep wkc
ibm-cpd-wkc-operator-796d9c89db-zn2c7                             1/1     Running     0             41h
ibm-cpd-wkc-operator-catalog-fft6p                                1/1     Running     0             41h

Enabling IKC on IBM Watsonx.data on-prem CPD Environment

Follow the below steps along with screenshot hits. 

1) Login to CPD console with admin id and password. Incase you need to fetch the CPD console login details on which IKC and WXD is installed, run below command : 

cpd-cli manage get-cpd-instance-details \
> --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
> --get_admin_initial_credentials=true
[INFO] 2024-06-20T03:00:37.857265Z Checking architecture: amd64
[INFO] 2024-06-20T03:00:37.857301Z Checking podman or docker
[INFO] 2024-06-20T03:00:37.922710Z Dockerexe: podman
[INFO] 2024-06-20T03:00:38.009667Z Container olm-utils-play-v2 is running already. Image: icr.io/cpopen/cpd/olm-utils-v2:latest
[INFO] 2024-06-20T03:00:38.091837Z Processing subcommand get-cpd-instance-details
[INFO] 2024-06-20T03:00:38.091903Z Run command: podman exec -it olm-utils-play-v2 get-cpd-instance-details --cpd_instance_ns=zen --get_admin_initial_credentials=true

CPD Url: cpd-zen.apps.6670e8f7359145001ee5b6e3.cloud.techzone.ibm.com
CPD Username: cpadmin
CPD Password: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

[SUCCESS] 2024-06-20T03:00:39.407353Z You may find output and logs in the /home/admin/work/cpd-cli-workspace/olm-utils-workspace/work directory.
[SUCCESS] 2024-06-20T03:00:39.407447Z The get-cpd-instance-details command ran successfully.

2) Select Catalogs - > All Catalogs

3) Create a new catalog from the catalog page: Click on New Catalog. 

4) On the new catalog creation page, enter all the details like new catalog name and description for the same. Make sure to check Enforce data protection rules box and click on create to create the desired new catalog. 
If you see below window popping up, click ok and proceed further. 
5) A new catalog will be created and a new catalog page will be open. Now lets create a connection to IBM Watsonx.data. Click on Add to catalog -> Connection. On New connection page, search for IBM Watsonx.data in the search bar. You will find IBM Watsonx.data. Select the same and click on Select. 

6) Create connection: IBM watsonx.data page will open where in you will need to enter the connection information to create the required connection. 

      FYR : Refer to IBM Watsonx.data connection for details on connection parameters. 

  •     Connection overview : 
  •    Connection Details : 
  • Credentials :
  • Certificates : 
  • Engine connection details :
  • Now, click on Test connection button to verify the connection you see the The test was successful message like below:

  • Click on Create, post which the connection will be created successfully and added to the catalog page as below: 

  • Click on Add to catalog  -> Catalog asset option. This will take you to Add asset from connection page as below. Click on Select source option.

    • Now, select the source table you want to import.

    • Select the table and click on Add button.  You can see the table employee is added successfully. 

    Integrating IKC on IBM Watsonx.data on-prem CPD Environment
    • Login to IBM Watsonx.data on-prem CPD Environment using cpadmin and respective password. 

    • Click on Instances and open Lakehouse console and select Access control.

    • Select Integrations tab and click on Integrate service button

         

    • Integrate service window will open with IBM Knowledge Catalog service selected. 
    • Select the applicable Bucket catalogs for IKC Governance.
    • Input the IKC endpoint and Zen API key. Generate the ZenApiKey token by using the following command.

      echo "<username>:<api_key>" | base64

    • echo "cpadmin:3wmmEUnpA7jjKiWlxRhEkDIvpkijpoNT1pzdbHBG" | base64
      Y3BhZG1pbjozd21tRVVucEE3ampLaVdseFJoRWtESXZwa2lqcG9OVDFwemRiSEJHCg==
      Note : To optain API key for cpadmin, from lakehouse console, from top right, click on user icon and select Profile and settings. Later click on API key and Generate new key.  Use this generated key in the above command and generate the Zen API key.  

        

    • Click on Integrate. You will see a Success message on top of Integrate icon and post which the IKC integration will be successful. 

     

    Governance use cases

    Post successfully Installing, Enabling and Integration of IKC on watsonx.data, you can transform or mask data in watsonx.data based on the data protection rules that are defined in the IBM Knowledge Catalog. The current IKC - WXD Integration will support only transformation/masking (Redact , Obfuscate and Substitute ) and DENY of data as per the rules defined in IKC. The other data protection rules will be coming in the future releases. 

    Usecase 1 : Substitute :  When this rule is defined on columns,  the column data will be replaced with a hashed value for a user or group who do not own the asset(s) (table(s)). 

    Lets login to IBM Cloud Pack for Data url using admin user name(cpadmin) and password and define this rule and see how the output looks. 

    • Login using to CPD url using cpadmin and password.
    • from Homepage, select Rules.
    • from Rules page,  select New data protection rule from Add rule tab.
    • on New data protection rule, enter the rule Name and Business definition and click Next.  
    • lets define a rule like below : If a non owner of the asset (i.e wxduser in this case), tries to access the employee table, then the salary column data will be replaced with the hashed value.  Create this rule.
    •  Now, login to lakehouse console using the user who is not the owner of the asset (i.e employee table here), and from Query workspace, perform select on this particular table and verify that the salary column data is replaced with hashed values.
    • Now, login to lakehouse console using the admin who is the owner of the asset (i.e employee table here), and from Query workspace, perform select on this particular table and verify that the salary column data is displayed with actual data. 

    Usecase 2 : Deny : When this rule is defined on an asset,  the asset/table data will be access denied for a user or group who do not own the asset(s) (table(s)). 

    • Login to IBM Cloud Pack for Data url using admin user name(cpadmin) and password and define this rule using the same steps as above. 
    • Now, login to lakehouse console using the user who is not the owner of the asset (i.e employee table here), and from Query workspace, perform select on this particular table and verify that the access to this asset/table is denied. 
    • Now, login to lakehouse console using the admin who is the owner of the asset (i.e employee table here), and from Query workspace, perform select on this particular table and verify that the user is able to access the table data. 

    Usecase 3 : Redact columns :  When this rule is defined on columns,  the column data will be replaced with a string of one repeated character like # for a user or group who do not own the asset(s) (table(s)). 

    • Login to IBM Cloud Pack for Data url using admin user name(cpadmin) and password and define this rule using the same steps as above. 
    •  Now, login to lakehouse console using the user who is not the owner of the asset (i.e employee table here), and from Query workspace, perform select on this particular table and verify that the PHONE_NUMBER column data is replaced with repeated # values.
    • Now, login to lakehouse console using the admin who is the owner of the asset (i.e employee table here), and from Query workspace, perform select on this particular table and verify that the PHONE_NUMBER column data is displayed with actual data.
 

              

 
  


#watsonx.data
#Catalog

0 comments
11 views

Permalink