Db2 for z/OS and its ecosystem

 View Only

What is SQL Data Insights and why you should know it

By Catherine Wu posted Fri October 13, 2023 04:10 PM

  

By Catherine Wu and Guanjun Cai 

Imagine a world where a database management system does not just simply retrieve and store data. With the integration of advanced AI technology, Db2 for z/OS can now delve deep into your data, exposing unseen relationships between columns and rows and offering insights through SQL-based AI queries. This groundbreaking fusion where AI meets Db2 is called SQL Data Insights (SQL DI). If your critical applications rely heavily on Db2 for z/OS and the data continues to reside in Db2, it's imperative to know everything about SQL DI. Utilize its strength to fully unlock the potential of your core data assets. 

So, how does SQL DI set itself apart from traditional machine learning when leveraging data that reside in Db2? 

Traditional machine learning typically requires an extensive process and a skilled team with in-depth knowledge of data science. The lengthy process often includes the five distinct phases of data processing, data curation, model training, and model deployment to applications for scoring. For seasoned data scientists, navigating through these phases can span months. By the time they finally derive insights from data, the data could be outdated. SQL DI, however, dramatically simplifies the process and presents a time-efficient and cost-effective approach to discovering insights in your Db2 data.  

As the following chart illustrates, SQL DI distills the entire machine learning process into just three quick steps: creating AI objects from Db2 tables or views, enabling AI objects for model training, followed by executing AI queries for model consumption. 

 

The AI queries are crafted by using Db2 built-in functions. This means while traditionally we might have required trained data scientists, with SQL DI, as long as you have an application team that understands the data, you can create the model. From there, anyone well-versed in SQL can run SQL DI to harness its AI capabilities to address business challenges in days, if not hours. 

 

Does it sound too good to be true? Well, the strength of SQL DI stems from its deep integration with the Db2 core engine. 

SQL DI leverages database technologies (SQL, Stored Procedure, Table, Catalog, and Utilities), coupled with advanced deep learning techniques, to proficiently train neural network models. At its core, SQL DI utilizes the vector embedding technique, similar to the word embedding technology used in Natural Language Processing. This makes it possible to determine semantic similarities among textual tokens through the distance between their vectors.  

Ingeniously, SQL DI extends the word embedding technique to relational databases to enable “database embedding. Every value within a database column, regardless of its type, is translated into a text token. Consequently, a database record is perceived similarly to an unordered bag-of-words in an English-like sentence, where each token maintains equal relationships with others, regardless of its position in the record. The semantic model of SQL DI then deduces the significance of database values based on surrounding column values, both within and across table rows. Leveraging this trained model, SQL DI empowers you to execute SQL AI queries on relational data to detect and match semantically similar data directly within Db2.  

 How do you use SQL DI to address your business challenges? Let’s see an example.  

Let’s say that you’re a retail business and you want to identify customers who share common buying patterns. You have the transaction records of the past three months stored in the `TRANSACTION` table in Db2 and the input from a recent in-store survey. In the survey, a client named Claire expressed her interest in a newly introduced product. You want to discern the top one hundred clients whose average spending power and purchase history mirror those of Claire. By targeting this group, you can promote the new product more effectively and potentially boost sales. 

You can quickly accomplish this goal in a matter of a couple of easy steps. Assume that “Claire3668” identifies client Claire in the `TRANSACTIONS` table with columns ‘PRODUCT_ID’, ‘PRODUCT_CATEGORY’, 'PURCHASE_AMOUNT’, and ‘PURCHASE_DATE’ and in the ‘CUSTOMERS’ table with columns ‘CUSTOMER_ID’, ‘EMAIL’, and 'ZIPCODE’ 

First, create a new PURCHASEHISTORY table that aggregates transaction records by CUSTOMER_ID in the past 6 months. You can create the table by issuing a SQL statement that looks like the following example:  

INSERT INTO PURCHASEHISTORY (CUSTOMER_ID,EMAIL, ZIPCODE, 

AVG_ELECTRONICS, AVG_HOME, AVG_CLOTHES, AVG_TOYS)           

SELECT C.CUSTOMER_ID, C.EMAIL, C.ZIPCODE,                                                          

AVG(CASE WHEN T.PRODUCT_CATEGORY = 'ELECTRONICS' THEN               

T.PURCHASE_AMOUNT ELSE NULL END) AS AVG_ELECTRONICS,                    

AVG(CASE WHEN T.PRODUCT_CATEGORY = 'HOME' THEN             

T.PURCHASE_AMOUNT ELSE NULL END) AS AVG_HOME,                  

AVG(CASE WHEN T.PRODUCT_CATEGORY = 'CLOTHES' THEN                

T.PURCHASE_AMOUNT ELSE NULL END) AS AVG_CLOTHES,                     

AVG(CASE WHEN T.PRODUCT_CATEGORY = 'TOYS' THEN                      

T.PURCHASE_AMOUNT ELSE NULL END) AS AVG_TOYS                            

FROM CUSTOMER_INFO C                                                    

LEFT JOIN TRANSACTION_HIST T ON C.CUSTOMER_ID = T.CUSTOMER_ID           

WHERE MONTH(CURRENT DATE - T.PURCHASE_DATE) <= 6  GROUP BY C.CUSTOMER_ID, C.EMAIL, C.ZIPCODE;   

 The 'PURCHASEHISTORY' table now contains customer ID, email, zipcode, and the average purchase amount for the categories "ELECTRONICS", "HOME", "CLOTHES", and "TOYS" over the last 6 months. Then, enable table `PURCHASEHISTORY` for AI queries in the SQL DI user interface, which automatically trains a neural network model for the table. When the training completes, run the following AI query: 

SELECT AI_SIMILARITY(X.CUSTOMER_ID,'Claire3668’ USING 

MODEL COLUMN CUSTOMER_ID) AS SIMILARITYSCORE, X.CUSTOMER_ID, X.EMAIL 

FROM PURCHASEHISTORY X  

WHERE X. CUSTOMER_ID <> 'Claire3668’'  

ORDER BY SIMILARITYSCORE DESC  

FETCH FIRST 100 ROWS ONLY; 

 

As specified, the query generates a similarity score and identifies the top hundred customers who are likely interested in the new product just as Claire was. As shown below, SQL DI fetches and displays the top 5 rows of the query return in the UI and uploads the complete result set to Db2:  

SIMILARITYSCORE 

CUSTOMER_ID 

EMAIL 

0.94321120 

Alex0325 

Alex0325@testmail.com 

0.92480023 

William2211 

William2211@testmail.com 

0.90755321 

Ella1632 

Ella1632@testmail.com 

0.89776223 

Iris2098 

Iris2098@testmail.com 

0.88232523 

Josephine8899  

Josephine8899@testmail.com 

 

The scenario in the example is simple but representative, making self-evident the capabilities and benefits of SQL DI. In addition to the substantially simplified model deployment and application development, SQL DI provides the following unique values as you transition your business into the era of AI and transform your operations from the traditionally labor intensive to the AI infused cost-effective: 

  • Eliminates the necessity for setting up various systems for ETL, model training and deployment, thereby optimizing resource usage and accelerating time-to-value. 

  • Democratize data insights, reducing the need for deep data science expertise. 

  • Bypasses the need of creating separate applications to access model outputs on new and changed data. 

  • Maintains data on the IBM zSystems platform, leading to quicker processing and reduced data storage costs while improving security through minimized data transfers. 

 You can take advantage of SQL DI across different industries, whether you are an insurance company looking to identify similar claims or a financial system that needs to detect fraud transactions as they occur. With SQL DI, you get real-time insights that are often hidden in your Db2 data by running simple AI queries with the fraction of time, expertise, and effort. With AI functions like similarity, analogy, and semantic clustering built into Db2 where your mission-critical data reside, you can extract actionable business insights on-the-fly to instantly facilitate your decision-making. It’s that simple and straightforward. 

So, stay ahead of your competition and start to harness the power of SQL DI today. 

 

Additional resources 

About the authors 

Yan (Catherine) Wu serves as the Program Director for Db2 for z/OS at IBM's Silicon Valley Lab. As an accomplished engineering lead, her expertise encompasses database management, data governance, machine learning, and enterprise design thinking. Catherine consistently collaborates with large clients, identifying potent use cases for cutting-edge data technologies and determining how these innovations can address complex business challenges. 

Guanjun Cai is a senior content architect and developer for IBM Db2 for z/OS and IBM Db2 Data Gates. His areas of expertise include content architecture and development as well as user interface content design.  

Acknowledgments 

 We'd like to extend our heartfelt gratitude to Rajesh Bordawekar, Principal RSM, IBM Research and Akiko Hoshikawa, Distinguished Engineer, IBM Db2 for z/OS, for their invaluable reviews and feedback. Further, we deeply appreciate the expert assistance on SQL intricacies provided by Jae Lee and Cynthia Suo of the Db2 for z/OS development team. Their insights and expertise significantly enrich this piece. 

 


#Db2forz/OS
#db2z13
1 comment
26 views

Permalink

Comments

Wed January 10, 2024 12:40 PM

Catherine, awesome, awesome: you make look like it is so easy and maybe it is! I'll try to get my hands on a SQL DI-enabled DB2 15 Sub-system (that seems not be so easy),

One question: is Generative AI also available in the SQL DI AI engine?