Netezza Performance Server

 View Only

Netezza on Cloud integration with Tableau

By BRAJESH PANDEY posted 28 days ago

  

Tableau

Tableau is a visual analytics platform to help people see and understand data, explore and manage data, and quickly discover and share insights that can change businesses and the world. For more details - refer Tableau documentation

NPSaaS

A high-performance, petabyte-scale, fully managed, pay-as-you-go cloud data warehouse service on AWS and Azure. Netezza Performance Server as a Service (NPSaaS) offers a high-performance, cloud-native data warehouse designed for scalable analytics and insights accessible in a single platform. It brings decades of innovation in data governance and security, in-database analytics and machine learning, and hybrid-columnar processing to your data in the cloud. For more details - refer Netezza documentation

Time Travel in Netezza

NPSaaS time travel feature enables database users to retrieve and analyze historical data and reconstruct deleted data. It supports users to run query using system time with AS OF, BEFORE, BETWEEN and FROM time expression to travel the temporal table as per the use case. 

RETAIL use-case:

In the following section, we will go over a “RETAIL” use case where we will use Tableau to visually analyze warehouse inventory, find anomalies in ETL batch jobs and then reconstruct the data to the previous valid value using NPSaaS’s time travel feature.

To connect “Tableau” to “NPSaaS”, one would need to navigate to “Netezza” connector and then input details to make connection. For example,

Let’s start with the demo.

As you can see below, once connected “Retails” database, there were 2 tables listed under the database.

“items” - A table to store information about items. This table doesn’t get updated frequently (regular Netezza table)

“warehouse“ - A temporal table to store item inventory. “warehouse” table gets updated through a batch job periodically. 

Data exploration: Tableau provides a great visual tool to explore the data. For example, I was able to simply drag and drop both tables, and it automatically detected a relationship between them for the join. Very neat! I was also able to view the results and understand the data.

Data Exploration

To make the demo simpler and easier to understand, I limited the warehouse items to only "Bike" and "Helmet".  ETL batch job runs every day and updates “Bike” and “Helmet” inventory proportionally (1:1) in the warehouse.


Day 1 - ETL first batch

Let’s look at the view of the data (in a Tableau sheet) when the first batch was inserted to “warehouse” table. As you can see that the quantities for the "Bike-kids" and "Helmet-kids" products were updated proportionally,  this is evident from the fact that both products have the same quantity, 25. Also, the data was updated proportionally and accurately for all groups, including "Bike-men" + "Helmet-men", "Bike-girls" + "Helmet-girls", and "Bike-electric" + "Helmet-cameramounted". 

First Batch

Day 2 - ETL second batch

After a successful first batch, the warehouse inventory was updated with a second batch. Let's look at the data to see if the bike and helmet counts were updated proportionally for their respective group pairs.

Data looked correct. For example, since the last update, 100 quantities were added to “Bike-girls” and the same quantity added to “Helmet-girls” as well, it was proportional. Similarly for other bike and helmet groups.

Second Batch

Day 3 - ETL third batch

Let’s look at the data after 3rd batch update to “warehouse”. As you see quantity for most of the bike-helmet pairs got updated proportionally, for example, since the last batch, 25 quantities were added to “Bike-men” and the same for “Helmet-men”. However, there was a discrepancy between “Bike-kids” and “Helmet-kids” pair. The “Bike-kids” product had total 50 quantities while “Helmet-kids” had 75. That pair was not updated proportionally as analyzed by the charts below.

Third Batch

After further batch analysis, it was found that there were no qualities received for “Bike-kids” and “Helmet-kids” pair in the warehouse for third batch and total count for “helmet-kids” was updated incorrectly by mistake.

To correct the data for “helmet-kids” item, NPSaaS time travel feature SYSTEM_TIME AS OF <day 2 batch timestamp> was used to find the previous known correct value and then update warehouse table with that previous value.

After correcting “helmet-kids”, in the chart below, quantities for respective bike and helmet were updated correctly and proportionally.

Third Batch - Corrected

Tableau provides a neat option to create a dashboard where you can bring all the sheets together to visualize whole story in a single view.

Dashboard

To Summarize

In this blog, we talked about how Tableau can easily be integrated with NPSaaS data warehouse to analyze and visualize data together. It further extended with a real life ETL use-case and how NPSaaS time travel feature can help solve the problem easily.

References

https://www.ibm.com/products/netezza

https://www.tableau.com

https://cloud.ibm.com/docs/netezza?topic=netezza-introducing_tt

https://medium.com/@akbg/time-travel-in-netezza-performance-server-8390cf125fcc

0 comments
21 views

Permalink