IBM Security QRadar SOAR

 View Only

Machine Learning with Natural Language Processing

By Mark Scherfling posted Mon May 04, 2020 10:41 AM

  

Resilient.jpg
We’re excited to introduce a new capability to IBM Security Resilient. We've just published version 1.0.0 of our Natural  Language processing integration on the App Exchange. In addition to our existing machine-learning (ML) integration, the natural language processing (NLP) solution provides capabilities to incident response that go beyond the daily management of incidents, taking advantage of the wealth of knowledge saved in your incident history.

The previous ML solution provides incident field prediction, such as an incident’s severity, based on a model of past incident fields. Unlike the ML solution, which required a working knowledge of machine learning to order to tune the model for your use, the NLP solution hides this complexity and returns a list of incidents that are determined “similar” based on textual data.

Use case


Consider a case where a phishing incident is escalated to Resilient with details explained in the incident description field. It’s possible for an analyst to search past incidents for keywords matching this phishing incident. But if different keywords are used, finding the correct and relevant matches may be incomplete at best and may fail to produce matches all together. And with hundreds of past incidents, this search effort can become a time-consuming activity.

NLP overcomes the keyword matching issue by looking more at the meaning of words and the proximity of words in combination. For example, with NLP, an incident description with the phrase “phishing attack” may match a previous incident with a description containing “suspicious email” which would be otherwise missed.

The NLP integration acts on the description field of a given incident and compares the words and their meanings to a model produced from past incidents. This activity identifies incidents that are most similar to this phishing attack. The resulting action will attach these similar incidents to the phishing incident in a datatable and rank them by a calculated similarity score, making these incidents easy to access. Referencing these similar incidents allows an analyst to easily see what actions were taken in the past that may be applicable to this incident, therefore accelerating the remediation process.

NLP Modeling


We’ve created a tool, `res-ml`, which is used to build NLP models with existing incident data. The data is processed by various machine-learning NLP algorithms provided by this integration and the resulting model is saved in files that are processed when executing a search with the NLP solution.

The diagram below depicts the logic flow of the NLP integration from building the model to predicting similarity for new incidents.

How NLP integrations works with Resilient incident data

 

When the search function is invoked on a new incident, the integration reads the incident’s description field, loads the model, and processes the model files to:

  1. Determine the general meaning of each word and assign values to the words to make them measurable.
  2. Calculate a weight factor for each word, using this information to determine the general meaning of each sentence.
  3. Compare each sentence to the sentences saved in the model.
  4. Determine the most similar incidents.


In addition to automatically scanning the incident description field, this integration provides a function to execute a search based on a specific string for ad-hoc analysis.

Summary


Overall, the Machine Learning with Natural Language Processing solution for Resilient offers an efficient way for you to build models and compare incidents to make more informed decisions on next steps.

We encourage you to take advantage of the community forum to share your experience using the NLP application and to provide feedback.


 

1 comment
28 views

Permalink

Comments

Tue September 13, 2022 09:13 AM

Please update fn_machine_learning_nlp function. Currently it doesn't work anymore because of the query_paged wildcard parameter when trying to download all Artifacts (on SOAR v45.2.37).

res_utils.download_artifacts:97

The search '*' has been blocked because it contains wildcards or is not specific enough.