zPET - IBM Z and z/OS Platform Evaluation and Test - Group home

Defining Data Sources for Hyper Protect Data Controller in zPET

  
In an enterprise environment with large amounts of business data, you often find that data spread across multiple data sources.  While this can be a problem in itself, you must also ensure you are protecting that data while simultaneously maximizing the value you can extract. Hyper Protect Data Controller has allowed us to access our data from a single interface that can protect that data inflight.

In z/OS Platform Evaluation and Test (zPET), we expose three data sources via Hyper Protect Data Controller: Db2, DVM, and IMS. Connecting to each of these data sources from Hyper Protect Data Controller can be as simple as uploading the relevant JDBC client jars and populating the connection details in the Hyper Protect Data Controller policy. In this blog, we will detail how we set up our data source connections and how we then access that data in Jupyter notebooks.

Defining a Connection in an Hyper Protect Data Controller Policy

Before we discuss the specifics of each database connection we have in zPET, we'd first like to touch on the different policy elements that go into creating a database connection, as these will apply to all of our following examples. Below we have one of the Db2 connections from our policy:
<connection name="DB2_BOOKDB">
      <aliases>
        <alias>BOOKDB</alias>
      </aliases>
      <source url="jdbc:db2://zpet.host.name:446/USIBMTESTDB2" driver="com.ibm.db2.jcc.DB2Driver" user="zpetuser" password="ourpassword"/>
</connection>

The first element is the connection itself
which contains the name attribute. The name of the connection should be a unique name for the connection and shouldn't be the exact name of the alias we will define later. To understand why this is so, we also need to understand the <aliases> section.

As noted in the documentation, an alias must be defined as a valid schema for the connection. Certain Hyper Protect Data Controller actions only require that you provide  the connection name in order to interact with the data, while others (like the query examples we will expand on below) require only an alias. Having a connection name that is unique and different from the alias (schema) helps us avoid confusion when connecting to our databases.

Next, we have the <source> element which contains the URL or JDBC string, driver, user, and password attributes. The URL and driver will be unique for each type of database that you're connecting to. If you're unsure what the values for URL and driver should be, it's best to discuss with your database administrator. More information on these elements can be found in the official Hyper Protect Data Controller documentation. For the user attribute, this can be any ID that has access to the desired database. For the password, the value in the policy is not the user's password itself, instead you must first define the password via the /passwords API. When doing so, you provide a passwordName and the actual password; the former is then used in the connection definition within the policy. More information on passwords can be found here.

Now that we have a good understanding of the what makes up a database connection in an Hyper Protect Data Controller policy, we can take a look at examples for the three databases we utilize in zPET. 

Defining a Db2 Connection

Db2 holds the majority of business data in zPET. Not only are Db2 databases used by actively running workloads, we also maintain historical data in Db2 data warehouses. We have already discussed Hyper Protect Data Controller's role in one of our Db2 based workloads in a previous blog, so now we would like to discuss how exactly we connect to Db2 through the Hyper Protect Data Controller policy.

Required JARs

  • db2jcc4.jar
  • db2jcc_license_cisuz.jar

Establishing a connection

<connection name="DB2_BOOKDB">
      <aliases>
        <alias>BOOKDB</alias>
      </aliases>
      <source url="jdbc:db2://zpet.host.name:446/USIBMTESTDB2" driver="com.ibm.db2.jcc.DB2Driver" user="zpetuser" password="ourpassword"/>
</connection>
<connection name="DB2_BOOKWH">
      <aliases>
        <alias>BOOKWH</alias>
      </aliases>
      <source url="jdbc:db2://zpet.host.name:446/USIBMTESTDB2" driver="com.ibm.db2.jcc.DB2Driver" user="zpetuser" password="ourpassword" />
</connection>

In the above policy snippet you can see that we have
two <connection> tags that resolve to different Db2 connections. We define two connections because we have two different Db2 schemas that we want to access. As we previously discussed, the <alias> element allows us to specify schema, and because only one <alias> is allowed per <connection> we must define two distinct connections. Both schemas reside within the same Db2 database and are therefore defined to use the same JDBC string, as well as the same Db2 driver JAR and username/password combination. Now that we have a connection established, we can attempt to connect to Db2 through Hyper Protect Data Controller.

Query Examples

For the following query examples (as well as those in the following sections of this blog) we'll be utilizing Jupyter notebooks (if you're not familiar with Jupyter notebooks, we highly suggest learning more about them here)

First we do some initial setup, importing the necessary packages and setting up some variables that will be used throughout the project. Please note that while we won't highlight this section again, it is used by all the following code snippets. We keep queries for all of our databases in one notebook, so therefore if you split your queries into multiple notebooks you would need to copy this portion into each.
import impala
from impala.dbapi import connect
from impala.util import as_pandas

LDAP_USER = "APPUSR"
LDAP_USER_PW = "appusrpw"
HPVS_HOST_IP = "hpdc.host.name"
BASE_URL = "https://%s:19443" % HPVS_HOST_IP

The following
snippet is what actually executes the query against the database. We make use of the connect package from impala.dbapi to facilitate the connection and as_pandas from impala.util in order to parse the result set
. This is certainly not the only method that can be used to query data, however we have found that this is what works best for us.
DB2_QUERY = "select * from BOOKDB.STORE limit 10"

print(DB2_QUERY)

conn = connect(host=HPVS_HOST_IP,
               port=10010,
               user=LDAP_USER,
               password=LDAP_USER_PW,
               use_ssl=True,
               auth_mechanism='PLAIN')

cur = conn.cursor()
cur.execute(DB2_QUERY)
results_pandas = as_pandas(cur)
print(results_pandas)


Output

select * from BOOKDB.STORE limit 10

  STORE_ID                     ADDRESS         CITY STATE        ZIP TAX_RATE  \

0        9               522 South Rd.  Poughkeesie    NY  12601       0.0000   

1        9            14205 W 52ND AVE       ARVADA    CO  800021505   0.0500   

2        9         14400 E ALAMEDA AVE       AURORA    CO  800122506   0.0500   

3        9             2780 W 72ND AVE  WESTMINSTER    CO  800305503   0.0500   

4        9            1251 W LEHIGH PL    ENGLEWOOD    CO  801103459   0.0500   

5        9         6300 S SYRACUSE WAY    ENGLEWOOD    CO  801116720   0.0500   

6        9  7853 E ARAPAHOE CT  # 2800    ENGLEWOOD    CO  801121361   0.0500   

7        9            7795 S PEORIA ST    ENGLEWOOD    CO  801124102   0.0500   

8        9            16000 HIGHWAY 86        KIOWA    CO  801178415   0.0500   

9        9               1715 WAZEE ST       DENVER    CO  802021231   0.0500   




  MANAGER_FIRST_NAME MANAGER_LAST_NAME PHONE_NUMBER OPENNING_DATE  \

0            DB Team               PET   9144337317    2000-02-01   

1           RONALD B          MAASBERG   3039633902    1999-11-10   

2           DARIUS E             ABADI   3039717389    1999-11-10   

3           MARTIN W              SAAR   3039854525    1999-11-10   

4           HAROLD E          MAASBERG   3039653117    1999-11-10   

5             ANGELA            ABALOS   3039660132    1999-11-10   

6              HENRY           KAAIHUE   3039826507    1999-11-10   

7             MIGUEL            ABAGON   3039864043    1999-11-10   

8            NANCY J              SAAM   3039652395    1999-11-10   

9            RICHARD           AARSTAD   3039684223    1999-11-10   




  LAST_TRANS_TIME CLOSING_DATE REGION_ID STATUS  

0      2000-02-01         null         1      1  

1      1999-11-10         null         1      1  

2      1999-11-10         null         1      1  

3      1999-11-10         null         1      1  

4      1999-11-10         null         1      1  

5      1999-11-10         null         1      1  

6      1999-11-10         null         1      1  

7      1999-11-10         null         1      1  

8      1999-11-10         null         1      1  

9      1999-11-10         null         1      1  

A
point of interest in the above code is the use of the alias BOOKDB. Depending on the database, this may be different from what you're accustomed to when issuing a query. For example, a normal query to Db2 might look like this:
select * from STORE limit 10

As detailed above, in our Hyper Protect Data Controller policy we assigned aliases for our Db2 connections. For our first connection, we defined the alias BOOKDB in our connection object. So to query Db2 through Hyper Protect Data Controller we prepended our alias to the table name, making our final query:
select * from BOOKDB.STORE limit 10

By changing the alias we're using from BOOKDB to BOOKWH (thereby changing the connection we're using), as well as changing the table name, we can query a different table within a different schema by using a similar code snippet:
DB2_QUERY = "select * from BOOKWH.STORE_DIM limit 10"

conn = connect(host=HPVS_HOST_IP,
               port=10010,
               user=LDAP_USER,
               password=LDAP_USER_PW,
               use_ssl=True,
               auth_mechanism='PLAIN')

cur = conn.cursor()
cur.execute(DB2_QUERY)
results_pandas = as_pandas(cur)
print(results_pandas)


Output

  STORE_ID                     ADDRESS         CITY STATE        ZIP TAX_RATE  \

0        9               522 South Rd.  Poughkeesie    NY  12601       0.0000   

1        9            14205 W 52ND AVE       ARVADA    CO  800021505   0.0500   

2        9         14400 E ALAMEDA AVE       AURORA    CO  800122506   0.0500   

3        9             2780 W 72ND AVE  WESTMINSTER    CO  800305503   0.0500   

4        9            1251 W LEHIGH PL    ENGLEWOOD    CO  801103459   0.0500   

5        9         6300 S SYRACUSE WAY    ENGLEWOOD    CO  801116720   0.0500   

6        9  7853 E ARAPAHOE CT  # 2800    ENGLEWOOD    CO  801121361   0.0500   

7        9            7795 S PEORIA ST    ENGLEWOOD    CO  801124102   0.0500   

8        9            16000 HIGHWAY 86        KIOWA    CO  801178415   0.0500   

9        9               1715 WAZEE ST       DENVER    CO  802021231   0.0500   




  MANAGER_FIRST_NAME MANAGER_LAST_NAME PHONE_NUMBER OPENNING_DATE  \

0            DB Team               PET   9144337317    2000-02-01   

1           RONALD B          MAASBERG   3039633902    1999-11-10   

2           DARIUS E             ABADI   3039717389    1999-11-10   

3           MARTIN W              SAAR   3039854525    1999-11-10   

4           HAROLD E          MAASBERG   3039653117    1999-11-10   

5             ANGELA            ABALOS   3039660132    1999-11-10   

6              HENRY           KAAIHUE   3039826507    1999-11-10   

7             MIGUEL            ABAGON   3039864043    1999-11-10   

8            NANCY J              SAAM   3039652395    1999-11-10   

9            RICHARD           AARSTAD   3039684223    1999-11-10   




  LAST_TRANS_TIME CLOSING_DATE REGION_ID STATUS  

0      2000-02-01         null         1      1  

1      1999-11-10         null         1      1  

2      1999-11-10         null         1      1  

3      1999-11-10         null         1      1  

4      1999-11-10         null         1      1  

5      1999-11-10         null         1      1  

6      1999-11-10         null         1      1  

7      1999-11-10         null         1      1  

8      1999-11-10         null         1      1  

9      1999-11-10         null         1      1
As we have seen, once the connection is established, it is quite easy to query the data. In our environment, the Jupyter notebook was running locally on one of our laptops.  Nothing is run directly on Z, yet we are still running secure queries against data that lives on Z - cool!

Defining an IMS Connection

IMS, while certainly less known in the distributed world, is still a major source of enterprise data in Z environments. As such, it is important to know how to connect to and query IMS through Hyper Protect Data Controller. 

Required JARs

  • imsudb.jar
  • imsutm.jar

Establishing a connection

Connecting to IMS is fairly similar to connecting to Db2. Again, it is as simple as taking your existing JDBC connection string and giving the connection a name and alias:
<connection name="zpetims">
 <aliases>
   <alias>CREDIT</alias>
 </aliases>
 <source url="jdbc:ims://zpet.host.name:9995/LISTCRED:dpsbOnCommit=true;treatInvalidDecimalAsNull=true;fetchSize=0;" driver="com.ibm.ims.jdbc.IMSDriver" user="zpetuser" password="ourpassword"/>
</connection>
Here we see essentially the same concepts as in the Db2 connection: a unique connection name, an <alias> used to point to a specific schema, a similar (albeit more complex) JDBC URL, the driver JAR file, and username/password information.

Query Examples

Much like the Db2 example, querying IMS through Hyper Protect Data Controller is fairly straightforward. In fact, it is nearly identical apart from our query and the way we parse the query results. Instead of parsing the query result with as_pandas like we used with Db2, we use cur.fetchall(). This is necessary due to the format of the data IMS returns to Hyper Protect Data Controller, and is simply a syntactical difference to be aware of.
IMS_QUERY = "select * from CREDIT.CCARDSEG FETCH FIRST 10 ROWS ONLY"

conn = connect(host=HPVS_HOST_IP,
               port=10010,
               user=LDAP_USER,
               password=LDAP_USER_PW,
               use_ssl=True,
               auth_mechanism='PLAIN')

cur = conn.cursor()
cur.execute(IMS_QUERY)
print(cur.fetchall())


Output

[
('0 ', '0 ', 'Customer ', 'Anonymous ', '1050', '112', '999999', ' '),
('1 ', '1 ', 'Sawada ', 'Mitsue ', '1050', '112', '999999', ' '),
('10 ', '10 ', 'Pipion ', 'Joseph ', '1050', '112', '999999', ' '),
('100 ', '100 ', 'Oizumi ', 'Shinichiro ', '1050', '112', '999999', ' '),
('1000 ', '1000 ', 'Kim ', 'Sam ', '1050', '112', '999999', ' '),
('10000 ', '10000 ', 'Sloss ', 'N ', '1050', '112', '999999', ' '),
('100000 ', '100000 ', 'Catamas ', 'Scott ', '1050', '112', '999999', ' '),
('1000000 ', '1000000 ', 'Tang ', 'Ben C ', '1050', '112', '999999', ' '),
('10000000 ', '10000000 ', 'Grant ', 'Lawrence H ', '1050', '112', '999999', ' '),
('10000001 ', '10000001 ', 'Royer ', 'M L ', '1050', '112', '999999', ' ')
]
Again, please note that we are using the CREDIT.tableName dot notation to access our zpetims connection via the CREDIT alias. Additionally, as mentioned earlier we are still making use of the initial imports and variables we discussed in the Db2 section of this blog. 

Defining a DVM Connection

DVM provides a way to access a variety of mainframe data sources. Whether your data resides in Db2, MQ, VSAM, IMS, or even regular seuential files you can map it to be accesible and then access it via a virtual table over JDBC. Once this mapping in DVM is performed, you can configure HPDC to protect this data in-flight.

Required JARs

  • dv-jdbc-*.jar
  • log4j-api-*.jar
  • log4j-core-*.jar
Note: we're using * to denote that these jar names can be version-dependent, so you may need to use different version based on your environment and requirements.

Establishing a connection

As we have seen in the previous examples, you typically specify an alias in your query which will inform Hyper Protect Data Controller which connection it should use. Due to the design of DVM, this type of notation is not supported. To avoid assigning an alias to our DVM connection, it must be specified as default connection. In our case, we did not want to make DVM the default connection for our entire environment, so we made it the default connection for a given persona which we know will utilize DVM.

To accomplish this we define the DVM connection as we normally would:
<connection name="DVM">
<aliases>
   <alias>dvma</alias>
 </aliases>
 <source url="jdbc:rs:dv://zpet.host.name:14828;DBTY=DVS;CompressionType=UNCOMPRESSED;MXBU=4MB" driver="com.rs.jdbc.dv.DvDriver" user="zpetuser" password="ourpassword"/>
</connection>

And reference it by name as the default connection in a given persona:
<persona personaName="DS" description="Data Scientist">
 <defaultConnection>DVM</defaultConnection>
 <groups>
   <group>DSGRP</group>
 </groups>
</persona>
Only personas who have DVM defined as the default connection will be able to query DVM, although it can be discovered and described by any persona via the API.

Now that we have a persona which is able to query DVM, we can setup our query much like we have done previously. Again, the query itself is straightforward, but please note that we must specify the appropriate user (DSUSR) to ensure we are using the persona with DVM as the default connection. Additionally, we do not make use of an alias nor the dot notation we saw in previous examples, we simply access the table via its name BOOK_ORDER_FACT_SAMPLE.

LDAP_USER = "DSUSR"
LDAP_USER_PW = "dsusrpw"
DVM_QUERY = "select ORDER_ID, CUSTOMER_ID, PAYMENT from BOOK_ORDER_FACT_SAMPLE limit 10"

conn = connect(host=HPVS_HOST_IP,
               port=10010,
               user=LDAP_USER,
               password=LDAP_USER_PW,
               use_ssl=True,
               auth_mechanism='PLAIN')

cur = conn.cursor()
cur.execute(DVM_QUERY)
results_pandas = as_pandas(cur)
print(results_pandas)

Output

    ORDER_ID CUSTOMER_ID PAYMENT

0  318841260        9999   81.63

1  318841261          99  114.75

2  318841262        9999  183.95

3  318841263       99999  181.08

4  318841264        9999  248.56

5  318841265        9999   77.78

6  318841266        9999  214.77

7  318841267        9999  142.89

8  318841268        9999  238.84

9  318841269        9999  194.15

Conclusion

This brings us to the end of the blog. Herein, we discussed how we setup connections to different databases through Hyper Protect Data Controller and then used those connections to query real data. We talked about the different elements that go into an Hyper Protect Data Controller policy's <connection> element definition, and highlighted some of the not-so-obvious parts. We also gave examples of connections to 3 different databases, as well as how to address some unique situations with them. Hopefully, after reading this blog you have a better understanding of how to create this sort of setup in your own environment. As always, please refer to the main Hyper Protect Data Controller documentation for further information on this subject.

Authors

Trent Balta (Trent.Balta@ibm.com)
Torin Reilly (treilly@us.ibm.com)

References

Hyper Protect Data Controller - Main documentation - https://www.ibm.com/docs/en/hpdc/1.2.2
Hyper Protect Data Controller - JARs information - https://www.ibm.com/docs/en/hpdc/1.2.2?topic=apis-jdbc-jars
Hyper Protect Data Controller - Defining Database Connections - https://www.ibm.com/docs/en/hpdc/1.2.2?topic=details-defining-database-connections
Hyper Protect Data Controller - Passwords - https://www.ibm.com/docs/en/hpdc/1.2.2?topic=apis-passwords
Hyper Protect Data Controller - Describe Policy & Data - https://www.ibm.com/docs/en/hpdc/1.2.2?topic=apis-policies#d6321e907