StreamSets and ARIS Process Mining

 View Only
Sun July 31, 2022 05:00 PM

This article outlines the steps involved in implementing a solution using StreamSets, webMethods.io Integration and ARIS Process Mining

1 Introduction

1.1 Why StreamSets?

StreamSets is a Data Ops platform. It can work with data at a large scale. Smart data pipelines can be built and deployed across hybrid and multi-cloud platforms from a single portal.

1.2 Why ARIS Process Mining?

ARIS Process Mining lets you understand business processes to find bottlenecks and opportunities for improvement. Compare designed processes to as-is processes to see, if they execute as planned and make changes before they impact the bottom line.

1.3 How can they work together?

ARIS process mining needs process execution data/logs/audit trail for mining the processes. ARIS process mining is more effective when there is a continuous stream of data coming in. StreamSets which processes data at a large scale can extract meaningful process audit trails and send it to ARIS Process Mining. ARIS Process Mining can analyze the data and churn out meaningful insights about the processes.

1.4 Use Case for this article

Assume all the systems involved in the required process are publishing Audit data/activities to Kafka. From the audit data available in Kafka we can build a Process Mining pipeline.
StreamSets can extract the data from Kafka, and aggregate and transform data. The resultant data can be sent to ARIS process mining via webmethods.io Integration.
ARIS Process Mining can use this data to Mine the processes and show the near real-time Process Analytics and Dashboards.
Webmethods.io Integration is required to build a workflow to upload data to ARIS Process mining using Data APIs. To ingest data client applications need to call a series of APIs. As StreamSets is a Data Ops platform, it’s not suited to build a functional app. Hence webmethods.io is a suitable platform to build such a workflow with an existing ARIS Process Mining connector.
Picture1_usecase_diagram

2 Pre-requisite

• ARIS Process Mining Cloud tenant
webMethods.io cloud tenant with Integration enabled
• StreamSets Cloud Tenant and StreamSets Data Collector instance, Data Collector should have network access to connect to Kafka instance and internet.
• Kafka and Zookeeper Setup.

3 Implementation

3.1 Configure ARIS Process Mining Instance

3.1.1 Enable the Data Ingest APIs

Login to ARIS Process Mining Instance with User having Engineer and Process mining Admin roles.

Goto Administration > System Integration

Add System Integration for type “Data Ingest API” and Auth type ‘Client Credentials’
Picture2_pm_create_system_int_api

This creates Client Credentials, with ClientID and Client Secret.
Picture2_pm_create_system_int_api3

The credentials should be used to get Access Tokens to call the APIs.
Picture3_pm_auth_postman

Curl command:

curl --location --request POST 'https://mc.ariscloud.com/api/applications/login' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Accept: application/json' \
--data-urlencode 'clientId={Client Id}' \
--data-urlencode 'clientSecret={Client Secret}' \
--data-urlencode 'tenant={Tenant Name}'

3.1.2 Create a Process Mining Project and Data Collection

Go to Projects and Create a project and an associated Data Collection. Create an Analysis in the Project.
Picture4_pm_create_project

3.1.3 Add Data Ingest API License to Data Collection

Go to Data Collections and open the new Data Collection created

Goto Connections

Add a new connection, give it a name, and Select the System Integration to attach an available License. If you don’t have a license contact Admin and get Data Ingest licenses.

This step is critical, without this connection you will not have permission to upload data using REST APIs
Picture5_pm_data_collection_connection_license

Picture6_pm_data_collection_connection2

3.1.4 Create Table in the Data Collection using REST API

Ingest APIs don’t work with tables created directly on the portal. So tables need to be created from the APIs.

Goto Source Tables

Create a table using REST APIs
Picture7_pm_data_collection_create_table

Curl Command to create table

curl --location --request POST 'https://processmining.ariscloud.com/mining/api/pub/dataIngestion/v1/dataSets/testdata/sourceTables' \
--header 'Authorization: Bearer {Access Token}' \
--header 'Content-Type: application/json' \
--data-raw '[
{
"name": "parceldelivery_csv",
"namespace": "default",
"columns": [
{
"dataType": "STRING",
"name": "Case_ID"
},
{
"dataType": "STRING",
"name": "Activity"
},
{
"dataType": "FORMATTED_TIMESTAMP",
"name": "Start",
"format": "dd.MM.yyyy HH:mm"
},
{
"dataType": "FORMATTED_TIMESTAMP",
"name": "End",
"format": "dd.MM.yyyy HH:mm"
},
{
"dataType": "STRING",
"name": "Product"
},
{
"dataType": "STRING",
"name": "Customer"
},
{
"dataType": "STRING",
"name": "Country"
},
{
"dataType": "STRING",
"name": "Delivery type"
}
]
}
]'

After execution of the API, check the portal for newly created table.
Picture8_pm_data_collection_table_view

3.2 webMethods.io Workflow to send data to Process Mining

Create a webMethods.io Workflow and use ARIS Process Mining connector.
Add an account for ARIS Process Mining account.
ARIS Process Mining Data Ingest APIs needs to be called in a particular order to ingest data. Implement the order as shown below

Picture9_wm_workflow_view

Create a Webhook to accept JSON array as input .

The input should be the process data that can be submitted to process mining.
Picture10_wm_workflow_webhook

Picture11_wm_webhook_input_sig

3.3 StreamSets Data Pipeline

StreamSets supports HTTP Client as a Destination. REST API calls can be implemented using this destination. It is very configurable hence it’s easy to implement any REST API call.

But ARIS Process mining Data Ingest API is a complex set of API calls. Implementing such a workflow is not a good use case for StreamSets, as StreamSets is meant for data processing and not for building functions and app integrations.

To complete the use case call the webMethods.io workflow using the REST endpoint created using webhook from StreamSets using HTTP Client destination.

Set Data Format to JSON array of objects

Below is a simple Data Pipeline in StreamSets,

Data is sourced from Kafka, using origin as Kafka Multitopic Consumer

And Destination as HTTP Client, calling webMethods.io REST API

Picture12_ss_data_pipeline_pic1

Picture13_ss_data_pipeline_pic2

Picture15_ss_data_pipeline_pic3

4 Results

When the data is uploaded to ARIS Process Mining, it starts processing the data. The status can be seen on the overview page of the Data Collection

4.1 StreamSets

Picture16_ss_data_pipeline_result

Picture17_ss_data_pipeline_result2

4.2 webMethods.io Integration transactions

Picture18_wm_txn_logs

4.3 ARIS Process Mining Data Collection Overview

Displays current status: Processing Data when uploaded data is being processed, once completed the status changes to Data Loaded
Picture19_pm_data_collection_result1

Picture20_pm_data_collection_result2

Picture21_pm_project_analysis_process

Picture22_pm_project_analysis_dashboard

Next steps

Use a real-world business process from a customer project to implement this solution.

Useful links | Relevant resources

https://academy.streamsets.com/


#StreamSets
#webMethods-cloud
#ARIS
#ARIS-Process-Mining
#webMethods
#webMethods-io-Integration
#process-mining