IBM Cloud Global

 View Only

Installing IBM Cloud Pak for Data v4.5 on a Private IBM Cloud Redhat Openshift Clusters

By Neela Shah posted Thu September 08, 2022 09:57 AM

  
By:  Neela Shah (neela@us.ibm.com)
        Todd Johnson (toddjohn@us.ibm.com)


Introduction
As security becomes front and center for enterprises, it is important that it is accounted for even when using resources on a public cloud. One of the ways to secure a cluster is via eliminating access to the public internet. IBM Public Cloud provides the ability to isolate your IBM Cloud Redhat OpenShift cluster from the public network.  By creating the cluster using IBM Cloud VPC infrastructure, you can specify that it is only accessible from the private VPC network via a private service endpoint. In addition, configuring the VPC subnets that the cluster worker nodes are attached to with no public gateway, ensures your cluster does not have outbound access to the public network.  In this setup, the cluster has no inbound or outbound connectivity to the public internet.
Of course, not having access to the public network, you must be connected to your VPC's private network to access the cluster.  This can be accomplished, among other methods, by using the IBM Cloud VPC VPN service. This service is a fully managed IBM VPN solution which provides either a site-to-site or client-to-site VPN connection to your VPC resources including your IBM Cloud Redhat Openshift clusters.
A setup like this ensures the cluster access is secured with no access to the public network, but what about being able to use the cluster? Does it provide the same ease of application deploy? Does the user have to do anything special to use the applications that are deployed? This blog will cover how to deploy enterprise software such a IBM Cloud Pak for Data v4.5.x onto your isolated IBM Cloud Redhat Openshift Kubernetes Service cluster.


Storage options

IBM Cloud Pak for Data supports a couple different storage options - OpenShift Data Foundation (ODF) as well as Portworx. Ensure that one of these storage solutions are installed on your cluster. 
To install OpenShift Data Foundation (ODF), follow these instructions - https://cloud.ibm.com/docs/openshift?topic=openshift-ocs-storage-prep
To install Portworx on your cluster
For the purposes of this blog, we will use Portworx as our storage solution. 


Pre-requisites

  • IBM Cloud Account
  • IBM Cloud Redhat Openshift Kubernetes Service cluster with private only cloud service endpoint on the IBM Cloud VPC infrastructure with at least 3 worker nodes, 16x64 each.   
    • The minimum recommendation for Cloud Pak for Data is 16 cores, 64GB RAM, 1 TB Persistent storage. This minimum recommendation is not sufficient to install all of the services. You must ensure that you have sufficient resources for the services that you planned to install. The installation does not verify whether there are sufficient resources on the cluster to install Cloud Pak for Data. You can add additional worker nodes even after the cluster is created. If you are running other applications on your Red Hat OpenShift cluster, ensure that you have sufficient resources on the cluster before you install Cloud Pak for Data. For more information, see System Requirements for IBM Cloud Pak for Data.  We used a 3 node 16x64 cluster with 500GB persistent storage disks on each worker node.  This is enough to install Cloud Pak for Data and the Watson Studio and Watson Machine Learning services for demo purposes.
  • OpenShift Data Foundation (ODF) or Portworx installed on your ROKS cluster
  • A valid IBM Cloud Pak for Data license
We will be installing IBM Cloud Pak for Data v4.5.x via a combination of the CLI and OpenShift Operator Hub. 


Preparing your environment to deploy IBM CloudPak for Data v4.5.x
  • Since the cluster we are using has enabled the private only cloud service endpoint, the cluster is running in a  restricted network setup with no outbound connectivity. Without outbound network connectivity, the cluster will not be able to build the default operator hub catalog sources. So, we will first disable the default catalog sources using this openshift command - 
    • `oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'`
  • Setup your local environment to run cpd-cli
    • Run the following cpd-cli command after gathering your token and server information. You can get both these pieces of information by going to the IBM Cloud console and opening the OpenShift console. Right hand top, click on the `IAMxxx` - 

    • Click on the copy login command and then the next page you can click on `Display Token` you will find the token and server on this page.  eg 
    • Oc login cmd- oc login --token=<token> --server=<server>
  • Run the following cpd-cli command by replacing the server and token information your retrieved above. 
                 cpd-cli manage login-to-ocp --token=<token> --server=<server>
  • Setup the environment on your workstation to configure IBM CloudPak for Data. 
    • Add the KUBECONFIG var to cpd_var.sh. Be sure to add an export for the KUBECONFIG. If your kubeconfig is stored at the default location like ours is, you would need to add a line like this to the cpd_var.sh file - 
      • `export KUBECONFIG=~/.kube/confg`
    • Run `bash ./cpd_var.sh` to validate the file and ensure there are no errors
    • If you stored any passwords in the file, prevent others from reading the file by running `chmod 700 cpd_vars.sh`
    • Run `source ./cpd_var.sh` to setup the env on your local workstation


Deploying IBM CloudPak for Data v4.5.x
If you followed the above steps to prepare your local workstation environment, you are now ready to start deploying IBM CloudPak for Data.
  • Create a namespace each for the operators and the cloudpak. Run the following two commands. 
    1. oc new-project cpd-operators 
    2. oc new-project cpd
  • cpd-cli has the ability to run commands in a preview mode so you can check if the command will hit any issues. So, we will first run the command in a preview mode. 
                ```
cpd-cli manage apply-olm \
--components=cpfs,cpd_platform,scheduler \
--release=4.5.0 \
--upgrade=false \
--cpd_operator_ns=cpd-operators \
--case_download=false \
--catsrc=true \
--sub=true \
--preview=true \
-v
               ```
  • Once the command preview is successful, you can re-run the same command without the --preview option as below-
             ```
cpd-cli manage apply-olm \
--components=cpfs,cpd_platform,scheduler \
--release=4.5.0 \
--upgrade=false \
--cpd_operator_ns=cpd-operators \
--case_download=false \
--catsrc=true \
--sub=true \
-v
          ```
  • Once the apply-olm command above has completed successfully (it does take a few mins to get through), run the command to setup the two namespaces using the commands below - 
```
cpd-cli manage setup-instance-ns \
--cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
--cpd_operator_ns=${PROJECT_CPD_OPS}

```
  • Now you should see the IBM Cloud Pak for data operator show up in the Operator Hub. Go to the operator hub and click on the `Installed Operators`. Select `Cloud Pak for Data Platform Operator` so we can create an instance of the operator. Do the following on this screen- 
    • Update the name, as needed. 
    • Select the type of License and Accept the License  (Don't hit Create just yet).

  • We will need to switch to the yaml view to specify the ns of cpd (the form view does not allow this option). Do the following on this screen - 
    • Change the namespace to cpd
    • Make sure license is still showing accepted
    • Change the storage class to `portworx-rwx-gp3-sc` as indicated below
    • Click on Create

  • This will take upwards of an hour or so to finish.  To check on when the Cloud Pak for Data service is ready, go to Administrator --> CustomResourceDefinitions. Search for ibmcpd and click on it.
    • Select the instance you created. 
    • Switch to the instances tab. 
    • Scroll down to see the Details section and if the creation has completed successfully you will see the Type as `Successful` as below - 



Accessing Cloud Pak for Data Console 
  • Once the instance is ready as above, to access the Cloud Pak for Data Console, lets first get the credentials for the user `admin`. 
    • These can be found by going to Workloads -->. Secrets in the Openshift Console. 
    • You will see a secret here called `admin-user-details` (Make sure you are in the cpd project.) 
    • Select the secret and at the bottom you will see the Data section has the user_admin_password. You can use the copy button to copy these credentials. 

  • To get the URL for the Cloud Pak for Data console. go to Networking --> Routes in the Left Nav and you will see the route that you can use to get to the console. 
    • Click on the URL in the Location field and login using the credentials you retrieved above for user `admin`. 

  • Once you login, you should see the Cloud Pak for Data console similar to this - 



Installing IBM Cloud Pak for Data v4.5.x services 
Additional services can be deployed now using the cpd-cli by changing the components list. We will take you through installing 2 additional services, Watson Studio (WS) and Watson Machine Learning (wml), for IBM Cloud Pak for Data so you get the idea of how to go about installing any other additional services after IBM Cloud Pak for Data is installed. 


Deploying Watson Studio
  • Similiar to how we ran the cpd-cli above to install the base product, we will first use the preview option in the cli to deploy watson studio to make sure everything looks good. 
```
cpd-cli manage apply-olm \
--components=ws \
--release=4.5.0 \
--upgrade=false \
--cpd_operator_ns=cpd-operators \
--case_download=true \
--catsrc=true \
--sub=true \
--preview=true \
-v
```

  • Once the preview command is complete without any errors, we can proceed with removing the preview option and get the watson studio operator into our cluster using this command - 
```
cpd-cli manage apply-olm \
--components=ws \
--release=4.5.0 \
--upgrade=false \
--cpd_operator_ns=cpd-operators \
--case_download=true \
--catsrc=true \
--sub=true \
-v
```

  • After the operator is installed in our cluster, we can create an instance of Watson Studio. Go to the installed operator and choose to create an instance of watson studio in the cpd namespace. Do the following on this screen- 
    • Update the name, as needed. 
    • Select the type of License and Accept the License 

  • Scroll for further options and do the following - 
    • For Storage Vendor, select `portworx`
    • Switch to the yaml view

  •  In the yaml view, change the namespace to cpd. This option is not available in the form view. 
    • Remove the quotes around the false for ignoreForMaintenance field as below.
    • Select Create. 

  • This will take approximately 1.5 hour or so to finish.  To check on when the Watson Studio instance is ready, go to Administrator --> CustomResourceDefinitions. 
    • Search for ws and click on it. 
    • Select the instance you created. 
    • Switch to the instances tab. 
    • Scroll down to see the Details section and if the creation has completed successfully you will see the Type as `Successful` as below - 

  • Additionally, you should also see the service as Enabled on the Cloud Pak for Data console on the services page as follows- 



Deploying Watson Machine Learning (WML)
  • Similiar to how we ran the cpd-cli above to install the base product, we will first use the preview option in the cli to deploy Watson Machine Learning (WML) to make sure everything looks good. 
```
cpd-cli manage apply-olm \
--components=wml \
--release=4.5.0 \
--upgrade=false \
--cpd_operator_ns=cpd-operators \
--case_download=true \
--catsrc=true \
--sub=true \
--preview=true \
-v
```

  • Once the preview command is complete without any errors, we can proceed with removing the preview option and get the watson machine learning operator into our cluster using this command - 
```
cpd-cli manage apply-olm \
--components=wml \
--release=4.5.0 \
--upgrade=false \
--cpd_operator_ns=cpd-operators \
--case_download=true \
--catsrc=true \
--sub=true \
-v
```

  • Go to the installed operator and choose to create an instance of wml in the cpd namespace. Do the following on this screen- 
    • Update the name, as needed. 
    • Select the type of License and Accept the License 

  • Scroll for further options and do the following - 
    • For Storage Vendor, select `portworx`
    • Make sure to uncheck the ignoreForMaintenance checkbox
    • Switch to the yaml view
  • In the yaml view, change the namespace to cpd. 
    • Remove the quotes around the false for ignoreForMaintenance field as below
    • Select Create. 

  • This will take approximately 1 hour or so to finish.  To check on when the Watson Machine Learning instance is ready, go to Administrator --> CustomResourceDefinitions. 
    • Search for wml and click on WmlBase. 
    • Select the instance you created. 
    • Switch to the instances tab. 
    • Scroll down to see the Details section and if the creation has completed successfully you will see the Type as `Successful` as below - 

  • Additionally, you should also see the service as Enabled on the Cloud Pak for Data console on the services page as follows- 



Conclusion
It is possible to have a secure cluster with no outbound connectivity and still be able to deploy and use valuable products like IBM Cloud Pak for Data as shown above. This blog focuses on using Portworx as the storage solution, but you should be able to instead use OpenShift Data Foundation (ODF) as your storage solution just as easily.


References

#Highlights-home
#Highlights
0 comments
2419 views

Permalink