In an enterprise environment with large amounts of business data, you often find that data spread across multiple data sources. While this can be a problem in itself, you must also ensure you are protecting that data while simultaneously maximizing the value you can extract. Hyper Protect Data Controller has allowed us to access our data from a single interface that can protect that data inflight.
In z/OS Platform Evaluation and Test (zPET), we expose three data sources via Hyper Protect Data Controller: Db2, DVM, and IMS. Connecting to each of these data sources from Hyper Protect Data Controller can be as simple as uploading the relevant JDBC client jars and populating the connection details in the Hyper Protect Data Controller policy. In this blog, we will detail how we set up our data source connections and how we then access that data in Jupyter notebooks.
Defining a Connection in an Hyper Protect Data Controller Policy
Before we discuss the specifics of each database connection we have in zPET, we'd first like to touch on the different policy elements that go into creating a database connection, as these will apply to all of our following examples. Below we have one of the Db2 connections from our policy:
<connection name="DB2_BOOKDB">
<aliases>
<alias>BOOKDB</alias>
</aliases>
<source url="jdbc:db2://zpet.host.name:446/USIBMTESTDB2" driver="com.ibm.db2.jcc.DB2Driver" user="zpetuser" password="ourpassword"/>
</connection>
The first element is the connection itself which contains the name attribute. The name of the connection should be a unique name for the connection and shouldn't be the exact name of the alias we will define later. To understand why this is so, we also need to understand the <aliases> section.
As noted in the documentation, an alias must be defined as a valid schema for the connection. Certain Hyper Protect Data Controller actions only require that you provide the connection name in order to interact with the data, while others (like the query examples we will expand on below) require only an alias. Having a connection name that is unique and different from the alias (schema) helps us avoid confusion when connecting to our databases.
Next, we have the <source> element which contains the URL or JDBC string, driver, user, and password attributes. The URL and driver will be unique for each type of database that you're connecting to. If you're unsure what the values for URL and driver should be, it's best to discuss with your database administrator. More information on these elements can be found in the official Hyper Protect Data Controller documentation. For the user attribute, this can be any ID that has access to the desired database. For the password, the value in the policy is not the user's password itself, instead you must first define the password via the /passwords API. When doing so, you provide a passwordName and the actual password; the former is then used in the connection definition within the policy. More information on passwords can be found here.
Now that we have a good understanding of the what makes up a database connection in an Hyper Protect Data Controller policy, we can take a look at examples for the three databases we utilize in zPET.
Defining a Db2 Connection
Db2 holds the majority of business data in zPET. Not only are Db2 databases used by actively running workloads, we also maintain historical data in Db2 data warehouses. We have already discussed Hyper Protect Data Controller's role in one of our Db2 based workloads in a previous blog, so now we would like to discuss how exactly we connect to Db2 through the Hyper Protect Data Controller policy.
Required JARs
- db2jcc4.jar
- db2jcc_license_cisuz.jar
Establishing a connection
<connection name="DB2_BOOKDB">
<aliases>
<alias>BOOKDB</alias>
</aliases>
<source url="jdbc:db2://zpet.host.name:446/USIBMTESTDB2" driver="com.ibm.db2.jcc.DB2Driver" user="zpetuser" password="ourpassword"/>
</connection>
<connection name="DB2_BOOKWH">
<aliases>
<alias>BOOKWH</alias>
</aliases>
<source url="jdbc:db2://zpet.host.name:446/USIBMTESTDB2" driver="com.ibm.db2.jcc.DB2Driver" user="zpetuser" password="ourpassword" />
</connection>
In the above policy snippet you can see that we have two <connection> tags that resolve to different Db2 connections. We define two connections because we have two different Db2 schemas that we want to access. As we previously discussed, the <alias> element allows us to specify schema, and because only one <alias> is allowed per <connection> we must define two distinct connections. Both schemas reside within the same Db2 database and are therefore defined to use the same JDBC string, as well as the same Db2 driver JAR and username/password combination. Now that we have a connection established, we can attempt to connect to Db2 through Hyper Protect Data Controller.
Query Examples
For the following query examples (as well as those in the following sections of this blog) we'll be utilizing Jupyter notebooks (if you're not familiar with Jupyter notebooks, we highly suggest learning more about them here).
First we do some initial setup, importing the necessary packages and setting up some variables that will be used throughout the project. Please note that while we won't highlight this section again, it is used by all the following code snippets. We keep queries for all of our databases in one notebook, so therefore if you split your queries into multiple notebooks you would need to copy this portion into each.
import impala
from impala.dbapi import connect
from impala.util import as_pandas
LDAP_USER = "APPUSR"
LDAP_USER_PW = "appusrpw"
HPVS_HOST_IP = "hpdc.host.name"
BASE_URL = "https://%s:19443" % HPVS_HOST_IP
The following snippet is what actually executes the query against the database. We make use of the connect package from impala.dbapi to facilitate the connection and as_pandas from impala.util in order to parse the result set. This is certainly not the only method that can be used to query data, however we have found that this is what works best for us.
DB2_QUERY = "select * from BOOKDB.STORE limit 10"
print(DB2_QUERY)
conn = connect(host=HPVS_HOST_IP,
port=10010,
user=LDAP_USER,
password=LDAP_USER_PW,
use_ssl=True,
auth_mechanism='PLAIN')
cur = conn.cursor()
cur.execute(DB2_QUERY)
results_pandas = as_pandas(cur)
print(results_pandas)
Output
select * from BOOKDB.STORE limit 10
STORE_ID ADDRESS CITY STATE ZIP TAX_RATE \
0 9 522 South Rd. Poughkeesie NY 12601 0.0000
1 9 14205 W 52ND AVE ARVADA CO 800021505 0.0500
2 9 14400 E ALAMEDA AVE AURORA CO 800122506 0.0500
3 9 2780 W 72ND AVE WESTMINSTER CO 800305503 0.0500
4 9 1251 W LEHIGH PL ENGLEWOOD CO 801103459 0.0500
5 9 6300 S SYRACUSE WAY ENGLEWOOD CO 801116720 0.0500
6 9 7853 E ARAPAHOE CT # 2800 ENGLEWOOD CO 801121361 0.0500
7 9 7795 S PEORIA ST ENGLEWOOD CO 801124102 0.0500
8 9 16000 HIGHWAY 86 KIOWA CO 801178415 0.0500
9 9 1715 WAZEE ST DENVER CO 802021231 0.0500
MANAGER_FIRST_NAME MANAGER_LAST_NAME PHONE_NUMBER OPENNING_DATE \
0 DB Team PET 9144337317 2000-02-01
1 RONALD B MAASBERG 3039633902 1999-11-10
2 DARIUS E ABADI 3039717389 1999-11-10
3 MARTIN W SAAR 3039854525 1999-11-10
4 HAROLD E MAASBERG 3039653117 1999-11-10
5 ANGELA ABALOS 3039660132 1999-11-10
6 HENRY KAAIHUE 3039826507 1999-11-10
7 MIGUEL ABAGON 3039864043 1999-11-10
8 NANCY J SAAM 3039652395 1999-11-10
9 RICHARD AARSTAD 3039684223 1999-11-10
LAST_TRANS_TIME CLOSING_DATE REGION_ID STATUS
0 2000-02-01 null 1 1
1 1999-11-10 null 1 1
2 1999-11-10 null 1 1
3 1999-11-10 null 1 1
4 1999-11-10 null 1 1
5 1999-11-10 null 1 1
6 1999-11-10 null 1 1
7 1999-11-10 null 1 1
8 1999-11-10 null 1 1
9 1999-11-10 null 1 1
A point of interest in the above code is the use of the alias BOOKDB. Depending on the database, this may be different from what you're accustomed to when issuing a query. For example, a normal query to Db2 might look like this:
select * from STORE limit 10
As detailed above, in our Hyper Protect Data Controller policy we assigned aliases for our Db2 connections. For our first connection, we defined the alias BOOKDB in our connection object. So to query Db2 through Hyper Protect Data Controller we prepended our alias to the table name, making our final query:
select * from BOOKDB.STORE limit 10
By changing the alias we're using from BOOKDB to BOOKWH (thereby changing the connection we're using), as well as changing the table name, we can query a different table within a different schema by using a similar code snippet:
DB2_QUERY = "select * from BOOKWH.STORE_DIM limit 10"
conn = connect(host=HPVS_HOST_IP,
port=10010,
user=LDAP_USER,
password=LDAP_USER_PW,
use_ssl=True,
auth_mechanism='PLAIN')
cur = conn.cursor()
cur.execute(DB2_QUERY)
results_pandas = as_pandas(cur)
print(results_pandas)
Output
STORE_ID ADDRESS CITY STATE ZIP TAX_RATE \
0 9 522 South Rd. Poughkeesie NY 12601 0.0000
1 9 14205 W 52ND AVE ARVADA CO 800021505 0.0500
2 9 14400 E ALAMEDA AVE AURORA CO 800122506 0.0500
3 9 2780 W 72ND AVE WESTMINSTER CO 800305503 0.0500
4 9 1251 W LEHIGH PL ENGLEWOOD CO 801103459 0.0500
5 9 6300 S SYRACUSE WAY ENGLEWOOD CO 801116720 0.0500
6 9 7853 E ARAPAHOE CT # 2800 ENGLEWOOD CO 801121361 0.0500
7 9 7795 S PEORIA ST ENGLEWOOD CO 801124102 0.0500
8 9 16000 HIGHWAY 86 KIOWA CO 801178415 0.0500
9 9 1715 WAZEE ST DENVER CO 802021231 0.0500
MANAGER_FIRST_NAME MANAGER_LAST_NAME PHONE_NUMBER OPENNING_DATE \
0 DB Team PET 9144337317 2000-02-01
1 RONALD B MAASBERG 3039633902 1999-11-10
2 DARIUS E ABADI 3039717389 1999-11-10
3 MARTIN W SAAR 3039854525 1999-11-10
4 HAROLD E MAASBERG 3039653117 1999-11-10
5 ANGELA ABALOS 3039660132 1999-11-10
6 HENRY KAAIHUE 3039826507 1999-11-10
7 MIGUEL ABAGON 3039864043 1999-11-10
8 NANCY J SAAM 3039652395 1999-11-10
9 RICHARD AARSTAD 3039684223 1999-11-10
LAST_TRANS_TIME CLOSING_DATE REGION_ID STATUS
0 2000-02-01 null 1 1
1 1999-11-10 null 1 1
2 1999-11-10 null 1 1
3 1999-11-10 null 1 1
4 1999-11-10 null 1 1
5 1999-11-10 null 1 1
6 1999-11-10 null 1 1
7 1999-11-10 null 1 1
8 1999-11-10 null 1 1
9 1999-11-10 null 1 1
As we have seen, once the connection is established, it is quite easy to query the data. In our environment, the Jupyter notebook was running locally on one of our laptops. Nothing is run directly on Z, yet we are still running secure queries against data that lives on Z - cool!
Defining an IMS Connection
IMS, while certainly less known in the distributed world, is still a major source of enterprise data in Z environments. As such, it is important to know how to connect to and query IMS through Hyper Protect Data Controller.
Required JARs
Establishing a connection
Connecting to IMS is fairly similar to connecting to Db2. Again, it is as simple as taking your existing JDBC connection string and giving the connection a name and alias:
<connection name="zpetims">
<aliases>
<alias>CREDIT</alias>
</aliases>
<source url="jdbc:ims://zpet.host.name:9995/LISTCRED:dpsbOnCommit=true;treatInvalidDecimalAsNull=true;fetchSize=0;" driver="com.ibm.ims.jdbc.IMSDriver" user="zpetuser" password="ourpassword"/>
</connection>
Here we see essentially the same concepts as in the Db2 connection: a unique connection name, an <alias>
used to point to a specific schema, a similar (albeit more complex) JDBC URL, the driver JAR file, and username/password information.
Query Examples
Much like the Db2 example, querying IMS through Hyper Protect Data Controller is fairly straightforward. In fact, it is nearly identical apart from our query and the way we parse the query results. Instead of parsing the query result with as_pandas like we used with Db2, we use cur.fetchall(). This is necessary due to the format of the data IMS returns to Hyper Protect Data Controller, and is simply a syntactical difference to be aware of.
IMS_QUERY = "select * from CREDIT.CCARDSEG FETCH FIRST 10 ROWS ONLY"
conn = connect(host=HPVS_HOST_IP,
port=10010,
user=LDAP_USER,
password=LDAP_USER_PW,
use_ssl=True,
auth_mechanism='PLAIN')
cur = conn.cursor()
cur.execute(IMS_QUERY)
print(cur.fetchall())
Output
[
('0 ', '0 ', 'Customer ', 'Anonymous ', '1050', '112', '999999', ' '),
('1 ', '1 ', 'Sawada ', 'Mitsue ', '1050', '112', '999999', ' '),
('10 ', '10 ', 'Pipion ', 'Joseph ', '1050', '112', '999999', ' '),
('100 ', '100 ', 'Oizumi ', 'Shinichiro ', '1050', '112', '999999', ' '),
('1000 ', '1000 ', 'Kim ', 'Sam ', '1050', '112', '999999', ' '),
('10000 ', '10000 ', 'Sloss ', 'N ', '1050', '112', '999999', ' '),
('100000 ', '100000 ', 'Catamas ', 'Scott ', '1050', '112', '999999', ' '),
('1000000 ', '1000000 ', 'Tang ', 'Ben C ', '1050', '112', '999999', ' '),
('10000000 ', '10000000 ', 'Grant ', 'Lawrence H ', '1050', '112', '999999', ' '),
('10000001 ', '10000001 ', 'Royer ', 'M L ', '1050', '112', '999999', ' ')
]
Again, please note that we are using the CREDIT.tableName dot notation to access our zpetims connection via the CREDIT alias. Additionally, as mentioned earlier we are still making use of the initial imports and variables we discussed in the Db2 section of this blog.
Defining a DVM Connection
DVM provides a way to access a variety of mainframe data sources. Whether your data resides in Db2, MQ, VSAM, IMS, or even regular seuential files you can map it to be accesible and then access it via a virtual table over JDBC. Once this mapping in DVM is performed, you can configure HPDC to protect this data in-flight.
Required JARs
- dv-jdbc-*.jar
- log4j-api-*.jar
- log4j-core-*.jar
Note: we're using * to denote that these jar names can be version-dependent, so you may need to use different version based on your environment and requirements.
Establishing a connection
As we have seen in the previous examples, you typically specify an alias in your query which will inform Hyper Protect Data Controller which connection it should use. Due to the design of DVM, this type of notation is not supported. To avoid assigning an alias to our DVM connection, it must be specified as default connection. In our case, we did not want to make DVM the default connection for our entire environment, so we made it the default connection for a given persona which we know will utilize DVM.
To accomplish this we define the DVM connection as we normally would:
<connection name="DVM">
<aliases>
<alias>dvma</alias>
</aliases>
<source url="jdbc:rs:dv://zpet.host.name:14828;DBTY=DVS;CompressionType=UNCOMPRESSED;MXBU=4MB" driver="com.rs.jdbc.dv.DvDriver" user="zpetuser" password="ourpassword"/>
</connection>
And reference it by name as the default connection in a given persona:
<persona personaName="DS" description="Data Scientist">
<defaultConnection>DVM</defaultConnection>
<groups>
<group>DSGRP</group>
</groups>
</persona>
Only personas who have DVM defined as the default connection will be able to query DVM, although it can be discovered and described by any persona via the API.
Now that we have a persona which is able to query DVM, we can setup our query much like we have done previously. Again, the query itself is straightforward, but please note that we must specify the appropriate user (DSUSR) to ensure we are using the persona with DVM as the default connection. Additionally, we do not make use of an alias nor the dot notation we saw in previous examples, we simply access the table via its name BOOK_ORDER_FACT_SAMPLE.
LDAP_USER = "DSUSR"
LDAP_USER_PW = "dsusrpw"
DVM_QUERY = "select ORDER_ID, CUSTOMER_ID, PAYMENT from BOOK_ORDER_FACT_SAMPLE limit 10"
conn = connect(host=HPVS_HOST_IP,
port=10010,
user=LDAP_USER,
password=LDAP_USER_PW,
use_ssl=True,
auth_mechanism='PLAIN')
cur = conn.cursor()
cur.execute(DVM_QUERY)
results_pandas = as_pandas(cur)
print(results_pandas)
Output
ORDER_ID CUSTOMER_ID PAYMENT
0 318841260 9999 81.63
1 318841261 99 114.75
2 318841262 9999 183.95
3 318841263 99999 181.08
4 318841264 9999 248.56
5 318841265 9999 77.78
6 318841266 9999 214.77
7 318841267 9999 142.89
8 318841268 9999 238.84
9 318841269 9999 194.15
Conclusion
This brings us to the end of the blog. Herein, we discussed how we setup connections to different databases through Hyper Protect Data Controller and then used those connections to query real data. We talked about the different elements that go into an Hyper Protect Data Controller policy's <connection> element definition, and highlighted some of the not-so-obvious parts. We also gave examples of connections to 3 different databases, as well as how to address some unique situations with them. Hopefully, after reading this blog you have a better understanding of how to create this sort of setup in your own environment. As always, please refer to the main Hyper Protect Data Controller documentation for further information on this subject.
Authors
Trent Balta (Trent.Balta@ibm.com)
Torin Reilly (treilly@us.ibm.com)
References