IBM QRadar is a Security Information and Event Management (SIEM) solution from IBM that provides security awareness and compliance monitoring.
With IBM QRadar, you can get the present Security posture of you environment using a combination of Event, Flow, Vulnerability, Threat Indicators etc processing capabilities.
 
The Event processing capability of IBM QRadar is based on DSMs, also known as Device Support Modules. These are used by IBM QRadar to normalize event data that is receives from different Log/Event Sources.
 
Some log sources might send in extra information as part of the payload. This is something that QRadar Administrators need to pay attention to these since they can be used for additional correlation, searches etc.  The Custom Event Property (CEP) feature of QRadar can be useful in such scenarios. You can use QRadar's CEPs to extract more information from the payload than what QRadar extracts by default. It is possible to create a CEP by writing a regex to extract information from the payload.  It is important to optimize this CEP as much as possible. We have discussed about optimizing the CEP's in our previous blog titled “Optimizing CEP in QRadar:
https://community.ibm.com/community/user/security/blogs/saket-nimdeokar/2022/09/01/optimizing-cep-in-qradar
 
In this blog, we will discuss about advanced optimizations of CEPs and how to prevent some of the pitfalls while creating optimized Regexs while creating CEPs in QRadar.
=============================================================================
Section 1: How to avoid “.*?” while writing Regex
=============================================================================
 Payload 1:
<159>Jul 16 16:37:26 forcepoint.vseries.test LEEF:1.0|Forcepoint|Security|8.5.3|transaction:blocked|sev=7      cat=1504         usrName=qradar1            loginID=qradar1          src=x.x.7.33     srcPort=34311 srcBytes=0       dstBytes=0      dst=x.x.10.10  dstPort=443    proxyStatus-code=403            serverStatus-code=0   duration=66    method=POST disposition=1064        contentType=- reason=0-17336-Generic.Content.Web.RTSS            policy=Super Administrator**IM Chat and Conferencing Policy        role=8  userAgent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36   url=https://www.qradar.example.test/psettings/jobs/profile-shared-with-recruiter logRecordSource=%<logRecordSource>
Example 1: Capturing the information with special characters
Consider the above payload. Our aim is to capture reason=0-17336-Generic.Content.Web.RTSS.
If you observe, we need to capture digits, words, special character hyphens (-), and dots (.). The easiest way to capture this is to use ".*?". However on using this, there will be a huge impact on performance. Let's check the number of steps and time to capture the payload using this Regex:
| Regex 1 | reason=(.*?)\spolicy | 
  
| Regex 2 | reason=([\w\-\.]+)\spolicy | 
 
 
Let us compare Regex 1 and Regex 2. Regex 1 uses, "reason=(.*?)\spolicy" as an expression, which took 129 steps to execute and 0.9 milliseconds of time. However on the other hand Regex 2 uses "reason=([\w\-\.]+)\spolicy" as an expression and it took only 65 steps to execute, which is 50% less compared to Regex 1. Also, the time taken to execute by Regex 2 is 0.0 milliseconds, which is far less than Regex 1. Hence, Regex 2 is much better than Regex 1.
Example 2: Capturing URL from payload
Let us consider another example and try to capture URL from the payload 1. URL contains special characters such as colon (:), slashes (/), hypen (-), word characters, digits, etc.
 
| Regex 1 | url=(.*?)\slogRecordSource | 
 
 
| Regex 2 | url=([\w\:\/\.\-]+) | 
 
Now let's analyze Regex 1 and 2 together. Regex 1 uses the simpler to write expression "url=(.*?)\slogRecordSource". However as you can see, it takes 194 steps to capture the URL and 2.4 milliseconds. On the other hand, Regex 2 used just 29 steps to capture the URL. This is about seven times less than Regex 1 where " url=([\w\:\/\.\-]+)" is used as the expression. Additionally, Regex 2 takes 0.1 milliseconds to capture a URL. This is significantly less than Regex 1's 2.4 milliseconds.
Imagine you are receiving thousands of events per second from different Log Sources in real-time. These events then enter into QRadar. To prevent performance issues of your QRadar deployment, you need to optimize the Regular Expressions.
 
=============================================================================
Section 2: Capturing Timestamp from the payload
=============================================================================
=============================================================================
Timestamp 1: Capturing: Jul 16 16:37:26 from payload 2
=============================================================================
 Payload 2:
<159>Jul 16 16:37:26 forcepoint.vseries.test LEEF:1.0|Forcepoint|Security|8.5.3|transaction:blocked|sev=7      cat=1504         usrName=qradar1            loginID=qradar1          src=x.x.7.33     srcPort=34311 srcBytes=0       dstBytes=0      dst=x.x.10.10  dstPort=443    proxyStatus-code=403            serverStatus-code=0   duration=66    method=POST disposition=1064        contentType=- reason=0-17336-Generic.Content.Web.RTSS            policy=Super Administrator**IM Chat and Conferencing Policy        role=8  userAgent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36   url=https://www.qradar.example.test/psettings/jobs/profile-shared-with-recruiter logRecordSource=%<logRecordSource>
 
| Regex | \>(\w{3}\s\d{2})\s([\d\:]+) | 
| Capture group | $1 $2 | 
| Date Format | MMM dd HH:mm:ss | 
 
 
Comparison: “\>(.*?)\sforcepoint” and  "\>(\w{3}\s\d{2})\s([\d\:]+)"
Let us compare what would have happened if we wrote (.*?) instead of "\>(\w{3}\s\d{2})\s([\d\:]+)" . If we use ".*?" then it will take 48 steps and the time taken would be 0.4 milliseconds. However if you use “\>(\w{3}\s\d{2})\s([\d\:]+)” then it would take just 11 steps which is four times lesser compared to ".*?". Also, the time taken will be significantly less if we use "\>(\w{3}\s\d{2})\s([\d\:]+)"  as our regular expression. You can refer to the below two screenshots:
 
 
=============================================================================
 Timestamp 2: EPOCH TIME: Capturing CurrentTime=1648543496131  from payload 3
=============================================================================
 Payload 3:
DeviceType=Estreamer    DeviceAddress=x.x.222.111     CurrentTime=1648543496131       recordType=IPS_IMPACT_ALERT     recordLength=335        timestamp=29 Mar 2022 01:44:54   netmapDomainRef=0       impactAlertData.eventId=44869   impactAlertData.detectionEngineId=2     impactAlertData.eventSecond=1648543493  impactAlertData.impact=7 impactAlertData.sourceAddress=x.x.111.22     impactAlertData.destinationAddress=x.x.222.55 impactAlertData.description=[1:58562:1] "SERVER-WEBAPP Oracle WebLogic Server remote code execution attempt" [Impact: Potentially Vulnerable] From "CAPTURE-DATA.111." at Tue Mar 29 08:44:53 2022 UTC [Classification: Web Application Attack] [Priority: 1] {tcp} x.x.111.44:47536 (united states)->x.x.222.55:80 (unknown)
 
| Regex | CurrentTime=(\d+) | 
| Capture Group | $1 | 
| Date Format | ssssssssssSSS | 
 
Comparison: “CurrentTime=(.*?)\s” vs “CurrentTime= (\d+)”
 Let's analyze what would have happened if ".*?" had been used in place of "\d+.". The (.*?) will take 45 steps whereas (\d+) will need only 18 steps. The difference between the two is more than 2.5 times. This is shown in the two screenshots below. Therefore, (\d+) is a significantly better regex than (.*?)
The timestamp is in EPOC time; hence the date format "ssssssssssSSS" is used.
 
=============================================================================
Timestamp 3 : Capturing timestamp=29 Mar 2022 01:44:54 from payload 3
=============================================================================
 
Comparison:  timestamp=(.*?)\snetmap and timestamp=([\d\w\s]+)\s([\d\:]+)
 In the two examples below, "timestamp=(.*?)snetmap" extracted the data from the payload in 172 steps whereas "timestamp=([\d\w\s]+)s([\d:]+)" did so in 94 steps. This demonstrates that the regex "timestamp=([\d\w\s]+)s([\d\:]+)" is preferable.
 
============================================================================= 
Timestamp 4: Capture 2015-06-24T14:15:51Z from payload 4
=============================================================================
 Payload 4:
<38>2015-06-24T14:15:51Z sshd[12239959]: Failed password for invalid user test from x.x.x.x port 57436 ssh2
 
| Regex | \>([\w\-\:]+) | 
| Capture Group | $1 | 
| Date Format | yyyy-MM-dd'T'HH:mm:ss'Z' | 
 
Comparison: “\>(.*?)\ssshd” and ”\>([\w\-\:]+)”
 In payload 4, you will see that the letter "T" is present in between the date and the time and the letter "Z" is present at the end of the time. Hence, we have used "yyyy-MM-dd'T'HH:mm:ss'Z'" as the date format to capture the timestamp correctly in QRadar.
Also as can be seen from the screenshots below, ”\>([\w\-\:]+)”executes in 10 times lesser number of steps than “\>(.*?)\ssshd”.
We should therefore use ”\>([\w\-\:]+)” when creating this regex.
 
 
=============================================================================
Timestamp 5: AM/PM in timestamp. Capturing timestamp=19/5/2022 3:40:00 PM from payload 5
=============================================================================
Payload 5:
 <38>2015-06-24T14:15:51Z sshd[12239959]: Failed password for invalid user test from x.x.x.x port 57436 ssh2 timestamp=19/5/2022 3:40:00 PM user=xyz
 
| Regex | timestamp=([\d\/]+)\s([\d\:]+)\s(\w+) | 
| Capture Group | $1 $2 $3 | 
| Date Format | dd/MM/yyyy h:m:s a | 
 
 
Comparison: “timestamp=(.*?)\suser” and “timestamp=([\d\/]+)\s([\d\:]+)\s(\w+)”
 
This timestamp contains AM/ PM which makes the date format challenging. The date format used here is "dd/MM/yyyy h:m:s a" which is somewhat different than other date formats. Also, we have used three capture groups rather than one. Hence, we use the capture group as "$1 $2 $3". As you can see, the number of steps to execute is higher when we use the "timestamp=(.*?)\suser" regex to capture the time stamp.
 
 
=============================================================================
Timestamp 6: Distributed date and time. Capturing “date=20220524 time=02:24:34” from payload 6
=============================================================================
 Payload 6: 
 <38>2015-06-24T14:15:51Z sshd[12239959]: Failed password for invalid user test from x.x.x.x port 57436 ssh2 date=20220524 time=02:24:34
 
| Regex | date=(\d+)\stime=([\d\:]+) | 
| Capture group | $1 $2 | 
| Date Format | yyyyMMddHH:mm:ss | 
 
 
Comparison: “date=(.*?) stime=(.*?)\s” and “date=(\d+)\stime=([\d\:]+)”
 If you look at payload 6, you'll notice that the date and time have been distributed and that there is no separator in date (date=20220524). So, in addition to focusing on the date format, we also need to capture the date and time independently in different capture groups. As can be seen in the screenshot above, we have used the optimised regex "date=(d+)stime=([d:]+)" and the date format "yyyyMMddHH:mm:ss". To compare the performance of these two regexes, see the snapshots below.
 
 
In summary, we understood through all the examples given above that ".*?" will have a negative impact on the performance of the CEPs in QRadar. This in turn would have a drastic impact on the performance of QRadar itself. Based on the type of information we are extracting from the payload, we can use any of the other options given above.
 
If you have any questions regarding any of the points mentioned above or want to discuss this further, feel free to get in touch with us.