- Do you need to copy records out from your Salesforce instance into some other CRM?
- Do you need to modify large quantities of records in your database and then upload them to another SaaS provider?
- Do you need to load, process, and transform large quantities of data from your hosted applications?
If you answered enthusiastically 'yes' for any of these questions (and even if you didn't...), then this is the blog for you!
With our next release of App Connect on AWS, we're pleased to announce our new batch processing capability, allowing you to batch retrieve records from external applications, manipulate them and then load them into another system with scale. For those with experience of App Connect on IBM Cloud, this feature has parity with the batch processing capability there, so all of the same scale, performance, and reliability can be expected here.
With batch processing support you can:
- Load large numbers of records from hosted applications into IBM App Connect.
- Manipulate records individually, using a range of IBM App Connect connectors and toolbox nodes.
- Perform actions when all of the records have been completed using other connector and toolbox nodes.
- View/Monitor the status of batches during Authoring, and then prepare, and run the batches in a production workload.
In this blog post, we are going to explore creating, and running a batch flow using the Designer Interface from IBM App Connect on AWS.
What is a batch?
Currently in IBM App Connect on AWS you can retrieve records from an external SaaS application, but doing this for a large number of records takes time and needs to be done sequentially, using something like the ‘For Each’ node. This is suitable for hundreds or maybe even thousands of records, but if you’ve got very large data sets and don’t care about the order the records are extracted in then there are faster ways of getting through your workloads… enter batch processing.
With batch processing, records are extracted in parallel and in a user-configurable number, this means if you want to retrieve 10,000 records from Salesforce and perform some operation on them, you can do just that!
Anatomy of a batch flow
Much like batch processing in other offerings of App Connect, there are 3 components to a batch flow:
- Retrieve Node
This is the connector linking to the application that is providing the records for the batch processing. For example, this could be a Salesforce connector retrieving all of the leads from your instance of Salesforce, or it could be an IBM Cloudant connector, retrieving documents from a Database store. Batch processing is built for scale and can handle multiple thousands of records, so bring any dataset here, large or small!
- Processing Flow
This is where the contents of the records can be manipulated for whatever your business need may be, inside of this processing flow, you can add as many connections to external services or toolbox nodes as you need to be able to transform the data to requirements.
The contents of this sub-flow are executed once per record retrieved from the batch retrieve node.
- End Flow
This sub-flow is executed after the final record from the batch retrieve operation is processed, and it is only invoked once. This part of the batch flow is optional and can be used to provide something like a status notification when the batch processing has been completed. There are mappings available for you to use at this step which contain information about the completion status of the batch.
Creating a batch flow
Let’s start a company! Suppose we run a company 'QualiTea' working in the import/export of the finest quality Tea products, and you'd like to extract the customer contact information from Salesforce on a cadence and store that information into IBM Cloudant, so that the records are backed up in case of data loss from Salesforce. Let’s also suppose that when the data copy is complete, you’d like to send a notification to the internal team to let them know their records are safe and sound!
This is all possible with batch processing in App Connect!
With the anatomy of a batch flow in mind, we isolate the key components of our batch processing node:
- Retrieve Node
In this case our source of information is the retrieve leads operation from Salesforce.
- Processing Flow
To fulfil our requirements we need to take each individual record, and create a document in IBM Cloudant that is an exact copy of the record in Salesforce. In the future, if we have requirements to do so, we could use the full suite of IBM App Connect connectors and toolbox nodes to be able to modify the records before we store them.
This means we'll need the IBM Cloudant connector (to store the information in a Cloudant DB).
- End Flow
When our flow is completed, we'll want to send a report to the internal team so they know the processing is complete, for this we'll need the Slack connector.
Putting it together
If you haven't already provisioned an instance of App Connect on AWS, provision at least a trial of the service on the AWS Marketplace. If you already have an instance of App Connect on AWS then your instance should already have access to the newly added batch functionality, and you can continue to follow this post.
In this worked example, we’re going to be using the Salesforce, IBM Cloudant, and Slack connectors which require accounts with those external service providers, if you don’t have accounts for these services, either create trial accounts (where applicable) or substitute for accounts you do have.
From the Designer home page for your instance, click to create a new event driven flow.
Configure a Scheduler Node that executes the flow on the 1st of every month and when the flow is initially turned on.
Add a batch node
From here you’ll need to connect accounts for Salesforce, IBM Cloudant, and Slack. If you do not have credentials for accounts from these providers, feel free to swap out connectors for accounts you do have access to, although you might need to modify the data in the flow accordingly.
Once connected, configure the batch process as shown below. We’re going to be retrieving our customer contact information from Salesforce using the ‘Retrieve Accounts’ node and then passing this data into IBM Cloudant to back up the record. Once the batch is completed, as shown below, we’ll use a Slack ‘send message’ operation to send a notification to our internal QualiTea team.
When your accounts are connected and you’ve added the nodes to flow, click the ‘start flow’ button at the top right of the flow designer, this will start your flow, and from here you’ll be able to see your batches execute. Once you’ve clicked to start your flow, navigate to the home page of the Designer UI where you should see your flow now in a running state as seen below:
Awesome, our flow works!
Now we're really integrating, but how do we ensure that this flow is ready for production work loads?
The Designer authoring experience is great for getting us to create flows and integrate our applications really quickly, but it’s not built for the scale of your production work loads, but fear not! Through the Designer UI it’s possible to export your flow and stand it up as its own production-ready integration using the Dashboard UI. Doing so is out of the scope of this get-up-and-running blog post, but will be followed up shortly in another blog post.
As mentioned earlier, running batch flows on the Designer is not meant for your production workloads, flows started through the Designer UI are run on infrastructure shared with the rest of your flows, which is not scalable. Instead, for production flows, you should export your flow as a runtime flow asset (BAR File) and then import this into a standalone integration through the Dashboard UI.
It’s also worth noting that batch flows are asynchronous and don’t run sequentially in your flow, when the batch process is reached the process is started, but then the following nodes in the flow are executed immediately after. This means for the following example the order of execution is as such:
- The flow is triggered.
- The batch process is started.
- The log node executes.
- The main flow is now complete, but the batch is still executing.
- The final Slack action in the batch process is invoked.
- The batch is complete.
Rounding it out
After following this blog post, you should now have a basic flow set up that’s executing batches. Don’t forget that you are not limited to just the nodes covered in this blog post, there are a number of connectors and toolbox nodes available to suit your use case.
More documentation for batch is available through the IBM Documentation: https://www.ibm.com/docs/en/app-connect/saas?topic=utilities-using-batch-processing.