Greetings!
We tackled this issue by performing a series of hourly (or whatever time you want) queries to Resilient. The flow is as follows, very similar to the one you stated:
- Query Incidents in X time using the query_paged functionality of the API, obtain all Incident IDS.
- We then query to "'/incidents/' + incident + '/workflow_instances" to obtain detailed information about the Workflows executed in the Incident (capped to 500 items, sadly, but it is rare to hit the 500 workflows per Incident cap)
- Extract the name, status, time to completion, and comments / logs for every workflow.
- In our case we extract this into excel and also into some log ingestor (Like it could be QRADAR) that have built-in alerting, which lets us know if an item failed.
To be fair, it does not take too long time to execute. Asides from incidents query_paged, which can vary, workflow_instances endpoint is pretty fast.
Sadly I don't think data feeder supports workflows or playbooks.
It would be great to add Workflow / Actions statistics to data_feeder. I think it could bring in near real-time statistics, which is very needed.
Hope this helps.
Cheers!
------------------------------
Pol Estecha Hernández
------------------------------
Original Message:
Sent: Wed August 16, 2023 09:18 AM
From: Andreas Fiehn
Subject: How can we monitor workflow instances?
We would like to be notified whenever a workflow has failed in Resilient. Additionally, we would also like to monitor script and playbook errors.
One way this could be obtained for workflows would be to monitor the workflow instances which can be obtained from the following endpoint:
/orgs/{org_id}/workflow_instances/{wi_id}
The downside here is that we can only get one instance at a time.
We could make a script that loops through active incidents and then gets the workflow instance ids related to the incident. Then we can get the instances one by one. With a lot of incidents in Resilient this takes a long time and is not feasible.
To monitor other incident statistics we use the Data Feeder plugin for SOAR to feed all the data to an external PostgreSQL database. With this duplicate database lookups become much faster than using the REST API in Resilient. Is there a similar way to feed the workflow instance part of the Resilient database to an external database?
We have also considered monitoring Resilient logs with QRadar.
Any suggestions to monitoring workflow, playbook and script errors would be highly appreciated.
------------------------------
Andreas Fiehn
------------------------------