Blogs

Back to Blog List

Introduction to Data Subscription for Transparent Supply

By Wiggs Civitillo posted Thu July 08, 2021 02:39 PM

Co-authored by Yichong Yu (yichong@us.ibm.com) and Prashanth Kayampady (prashanth.kayampady@in.ibm.com)

Summary

This article provides an overview of how to use Data Subscription to consume data from the IBM Blockchain Transparent Supply Platform.* Data Subscription supports the large-scale, subscription-based export of Transparent Supply data. In this article, you’ll find an overview of how Data Subscription works and detailed instructions on how to set up Data Subscription.

*Note: All references to IBM Blockchain Transparent Supply also apply equally to IBM Food Trust, as IBM Food Trust is convened on the IBM Blockchain Transparent Supply

Introduction

IBM Blockchain Transparent Supply is a blockchain platform that enables companies to build their own ecosystem to share supply chain data with trusted partners. Transparent Supply provides different entitlement mode options, and the owner of the data decides the entitlement mode of each document to manage its visibility. The immutable nature of blockchain ensures that data cannot be tampered with, and allows many different partners to join a blockchain ecosystem and securely share data on the platform. Transparent Supply maintains an immutable, distributed, and shared ledger for all supply chain participants.

For the input data that gets ingested into the platform, Transparent Supply supports the use of GS1 standards to capture various data types to facilitate business communications among partners. Transparent Supply integrates data submitted by different partners to build end-to-end supply chain views specific to each organization based on their entitlements.

Once data has been written to the blockchain, there are a few different ways for an organization to consume data that they’re entitled to:

APIs

APIs provide quick and easy way to interact with the Transparent Supply platform, and utilize features and components including Connector, Trace, Insights, Documents, etc.

Data Subscription

Data Subscription allows subscribed data sets to be dumped periodically and can be used for the use cases that need to access data at scale.

There are two flavors of the data subscription:

Base data subscription

Customers can subscribe to basic datasets via the base data subscription. Data will be dumped in JSON format to a client-owned COS bucket. Refer to base data subscription documentation can be found here.

Insights data subscription

Customers with access to the Insights module can subscribe to Insights data sets via the Insights data subscription. Data will be dumped in Parquet format to a client-owned COS bucket. Refer to Insights data subscription documentation can be found here.

Private Smart Contracts

Private Smart Contracts allow entitled documents to be pushed to subscribed smart contracts in a private channel to facilitate private transactions.

This article focuses on data subscription services and introduces data subscription with sample use cases and programs. The samples in this article use Insights data subscription API endpoint and load data in Parquet format, base data subscription client can follow a similar flow with base data subscription API endpoint and JSON format.

Data Subscription Example

We will use a toy supply-chain ecosystem as an example. There are 3 types of participants in the toy supply chain – manufacture plants, distributor warehouses, and retailer stores. The manufacturer produces, packages and ships toys to distributor warehouses. The distributor unpackages and repackages the deliveries based on what the retailer needs, then ships the toys to the retailer stores. Retailer then unpackages the delivered packages and sells the toys.

Here is the data flow that captures the transactions and events in the supply chain. All the data types, including master products, master facilities, transactions, and events can be ingested into the Transparent Supply platform via the Connector API.

For this example, data simulation has been run based on the setup above and test data was ingested into sample organizations. In the following sections, we will use the sample data sets generated to demonstrate how to use Transparent Supply Data Subscription.

Setting up Data Subscription

Data subscription supports the large-scale, subscription-based export of Transparent Supply data. Once subscribed, Transparent Supply writes batches of the data to client owned IBM Cloud Object Storage (COS) account. To utilize data subscription feature, there are a few steps involved:

Set up IBM Cloud Object Storage and bucket
Set up Transparent Supply Data Subscription

We will discuss each step in details in the following subsections.

Step 1: Set up IBM Cloud Object Storage and bucket

Data Subscription uses a IBM Cloud Object Storage bucket to store the subscribed data. A COS bucket with HMAC credentials needs to be set up prior to setting up Data Subscription with Transparent Supply.

First, log in to IBM cloud at https://cloud.ibm.com/login. If you don’t have IBM id, please create one here.

Click on ‘Catalog’ on the top row, and search for ‘Object Storage’. Choose the plan based on your need, and provision Object Storage instance.

If you already have COS instance provisioned, you can access the instance by clicking on the dropdown menu on the upper left corner, click on the ‘Resource List’ and choose the COS instance under ‘Storage’.

Once you are in the COS instance, if you have the right permission, you should see ‘Create bucket’ button, where you can create a bucket based on your need. You can also find the location of your bucket.

Please note that if you need to create Cloud Functions actions or triggers, the Cloud Functions namespace must be in the same region as your COS bucket, for example, us-east in this example.

Create a credential with writer role. To create credentials, click on ‘Service credentials’ on the left, and click on ‘New credential’ button. For HMAC credentials, make sure the button in ‘Advanced options’ is enabled.

You can inspect service credential information once it is done.

You can also find bucket configuration by clicking on the three vertical dots and ‘Configuration’ in the dropdown menu. You can find the public endpoint information there.

You can find the SQL URL by clicking on the three vertical dots and ‘SQL URL’ in the dropdown menu. This is the endpoint that can be used by SQL Query service later.

To subscribe to Transparent Supply data subscription service, we need to collect the following information about the bucket. This information can be viewed as service credentials or bucket configuration as shown above.

Public endpoint
Bucket name
HMAC access_key_id
HMAC secret_access_key

For more details about IBM COS, please refer to the COS documentation here.

Step 2: Set up Transparent Supply Data Subscription

Now the COS bucket is ready, and we can subscribe the COS bucket with Transparent Supply Data Subscription service.

The Insights data subscription API PUT endpoint can be called to create an Insights data subscription, using the HMAC access_key_id as the s3_access_key and the HMAC secret_access_key as the s3_secret_key. If at any point any of those values change, simply call the same endpoint with the updated values.

Here is a sample body for the PUT:

{

"datasets": "Products,Locations,Events,PurchaseOrders,DespatchAdvices",

"parameters": {

"target": "ibm_cos",

"bucket": "bts_insights_bucket",

"object_store_endpoint": "https://s3.us-east.cloud-object-storage.appdomain.cloud",

"s3_access_key": "2962207aXXXXXXXXXXXXXXXXXXX0bccd23b:f3YYYYYYYYYYYYYYc05b7072",

"s3_secret_key": "2962207aYYYYYYYYYYYYYYYYYYX0bccd23b:f3XXXXXXXXXXXXXXc05b7072"

}

Where

target – supported target is only “ibm_cos” at this time, but other types of target may be introduced later (eg. amazon_s3);
1. object_store_endpoint is the public endpoint from the bucket configuration;
2. bucket is the bucket name;
3. s3_access_key is the HMAC access_key_id;
4. s3_secret_key is the HMAC secret_access_key;
5. datasets field specifies the subscribed data sets and is optional. If unset, all available data sets are subscribed. Otherwise, it can contain a list of data sets.

PurchaseOrders

DespatchAdvices

ReceiveAdvices

Payloads

Events

Products

Locations

Companies

EntryExitLedger

EntryExitCurrent

OrderReconciliation

InTransitLedger

InTransitCurrent

After creating or modifying a subscription, the GET endpoint can be used to verify the subscription information, although the HMAC keys will be encrypted.

The DELETE endpoint can be used to delete a subscription

Please refer to Transparent Supply documentation here for the latest supported data sets, and the structure of each data set.

Consuming Curated Data from Insights Data Subscription

Once a subscription is successfully created, data will be exported to the specified bucket, per a system-defined schedule. Each time Transparent Supply exports Insights data to the bucket, a text file ‘snapshot-latest.mf’ is created, which has the information about the latest files whose name has the timestamp of the latest update. The curation files are in parquet format.

In this section, we will demonstrate different ways to use the curation data sets.

Data pulling using pyspark in Jupyter notebook
Automatic data movement via cloud trigger and IBM SQL Query service

Step 3.1: Data pulling using pyspark in Jupyter notebook

In this section, we will demonstrate how to pull data from the COS bucket with pyspark, and do some simple analysis in Jupyter notebook.

First create SparkSession and set Hadoop configuration

Second, read manifest.mf file

Now, you can read any of the data set that you subscribe to. In this example, I am reading the OrderReconciliation data set, which has summary information about the status of purchase orders (PO), dispatch advices (DA), and receiving advices (RA).

You can do various analysis on the dataset, and here is an example that simply plot out the total_ordered.

Here is another example to calculate and plot out the daily status of POs, DAs, and RAs.

Prepare the daily ordered information:

Prepare the daily dispatched information:

Prepare the daily received information:

Plot the status of PO, DA, and RA.

Refer to the complete Jupyter notebook sample here.

Step 3.2: Automatic data movement via cloud trigger and IBM SQL Query service

Data in COS bucket can be processed and analyzed, and the result can be written to various destinations, for example, another COS bucket or databases. The data flow can be automated by triggering Cloud functions via COS bucket updates.

In this section, we will demonstrate how to set up cloud functions to automatically process the data when there is a change with the given object in COS. In this example, data is picked up from the source COS bucket, and written to another COS bucket via SQL Query service.

Here are the steps to set up and config the IBM Cloud services used for this demo.

Step 1: COS bucket

COS bucket is the bucket used for Transparent Supply data subscription. The input bucket should have been set up already. Set up another output bucket. Note that the Cloud Functions namespace must be in the same region as your COS buckets.

Step 2: SQL Query service

Set up the SQL Query service instance
- Login to IBM Cloud, and search for SQL Query in the Catalog
- Choose region, plan, and fill in service name, and press Create

Step 3: Create service ID and API key

Set up a service ID and API key:
- Create service ID

Click Manage -> Access (IAM):

Click ‘Service IDs’ on the left:

Fill in name and description, click on ‘Create’.

Assign service ID to SQL Query service and COS instance

Create API key

Click on ‘Service IDs’ on the left, ‘API keys’ tab on the top, and create a new API key. This key will be used by cloud functions to pull data from COS and push to COS or DB.

Step 4: Cloud Functions

Create a name space, which is in the same region as the COS bucket.

Assign Notification Manager role to Cloud Functions namespace

Before cloud trigger is created, a pre-requisite is to Assigning the Notifications Manager role to your Cloud Functions namespace.

Navigate to the Grant a Service Authorization page in the IAM dashboard.
From Source service, select Functions. Then, from Source service instance, select a Cloud Functions namespace. Note: Only IAM-enabled namespaces are supported.

In Target service, select Cloud Object Storage, then from Target service instance, select your IBM Cloud Object Storage instance.

Assign the Notifications Manager role and click Authorize.

Create Cloud trigger

Action on the bucket: WRITE
file prefix: snapshot-latest.mf/_SUCCESS
Cloud Action to be triggered: demo-db-loader-trigger
Parameters:
- bucket_name: “<Name of each bucket>”
- key_name: "snapshot-latest.mf"

Connect the trigger to an action

Copy over the codes. Below is the SQL Query statement to move data from input bucket to output bucket.

SELECT * FROM COS_INPUT_BUCKET_SQL_URL

STORED AS PARQUET AS a

INTO COS_OUTPUT_BUCKET_SQL_URL

Where:

COS_INPUT_BUCKET_SQL_URL and COS_OUTPUT_BUCKET_SQL_URL are the SQL URLs to the COS buckets. Section 3.1 describes the steps to get SQL URL for the given COS bucket.

Example:

SELECT * FROM cos://us-east/demo-btsdev-smart-retailer/Products-2020-11-23T21-00-21/ STORED AS PARQUET AS a

INTO cos://us-east/test-output-2/test

Set parameters:

Now the automatic data flow should have been setup. Next time when the snapshot-latest.mf/_SUCCESS file in the input COS bucket is updated, the cloud action should be triggered. You can check the content in the output bucket, or you can check the status of the SQL Query job.

Sample content in the COS output bucket:

You can check the status of SQL Query jobs on the console, or via API call. Here is what you would see on SQL Query service console. You can see the list of jobs, the SQL statement, and the results.

Settings for moving data to DB2 database

To enable the automatic data flow to DB2, DB2 user with the same service ID should be created. DB2 Standard or Enterprise plan is needed to do user management.

Below are a few DB2 related settings if the destination is IBM Cloud DB2 service.

Set up the DB2 service instance
Login to IBM Cloud, and search for DB2 in the Catalog
Choose region, plan, and fill in service name, and press Create
Make note of the connection information

Note down the DB2 crn info. You can retrieve the <db service crn> by opening the resource list in the IBM Cloud dashboard. Scroll down to the database service instance and click in any of the columns other than the first column.

Make sure to assign service ID to access DB2

Below is the statement to move data from input COS bucket to DB2 table.

SELECT * FROM COS_INPUT_BUCKET_SQL_URL STORED AS PARQUET AS a

INTO CRN_DB2_INSTANCE/DB_SCHEMA.DB_TABLE

Where:

COS_INPUT_BUCKET_SQL_URL is the SQL URL to the COS bucket. Section 3.1 describes the steps to get SQL URL for the given COS bucket.

CRN_DB2_INSTANCE is the CRN of the DB2 instance. Section 4.3 a describes the steps to get CRN of the DB2 instance.

DB_SCHEMA.DB_TABLE is the table to write to. Please make sure that the user created in section 4.3 c has access to the given DB_SCHEMA.

Example:

SELECT * FROM cos://us-east/demo-btsdev-smart-retailer/Products-2020-11-23T21-00-21/ STORED AS PARQUET AS a

INTO crn:v1:bluemix:public:dashdb-for-transactions:us-south:a/b280a477a69abdc8f1ccfc3c350e626f:fe2b7e7d-600e-4ff6-80a5-5f46919a6794::/DEMO.ORDER_STATUS

A few extra notes:

It is recommended to move data to staging table first, for further processing;
In real production environments, we also set up multiple staging tables, and rotate them for the data movements;
Some of the curation data sets have many columns and may reach the DB2 page limitation. You can pick the columns that you are interested in in the SQL statements.

Summary

This article provided an overview of the IBM Blockchain Transparent Supply Data Subscription service. Data Subscription periodically sends bulk data to a client owned COS bucket, which is especially well suited for data analysis of supply chain data at scale.

Please follow the implementation steps and let us know if you have any questions or concerns below.

Please note the following:

As data is processed and sent to COS periodically, this may mean that there is a delay before newly ingested data appears in the COS data set. This may affect inventory calculations in the case of data being submitted out-of-order and may mean that the COS data set may not always match API queries.
To make good use of the curation data set, one needs to understand the meaning of the different columns. For example, if there are different Units of Measure (UOMs) in the order status (OrderReconciliation) data set, the status would be split under different UOMs, and special conversion may be needed for the status statistics.
Please also note that columns may be added or adjusted to the curation data sets over time. If you only need a subset of the columns, you can also list the columns that you are interested in.

#IBMBlockchainTransparentSupplyandIBMFoodTrust
#SupplyChain

0 comments

37 views

Permalink

https://community.ibm.com/community/user/blogs/wiggs-civitillo1/2021/07/08/introduction-to-data-subscription-for-transparent

Blogs

Introduction to Data Subscription for Transparent Supply

By Wiggs Civitillo posted Thu July 08, 2021 02:39 PM

Permalink

Additional
Resources

Office

Quick Links

Blogs

Introduction to Data Subscription for Transparent Supply

By Wiggs Civitillo posted Thu July 08, 2021 02:39 PM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources