IBM Security QRadar

 View Only

Detecting Domain Typosquatting Using Cosine Similarity in QRadar

By MUTAZ ALSALLAL posted Fri June 19, 2020 07:54 AM

  
QRadar.jpg

Cosine similarity is a nice technique to measure the similarity between two things, eg: you can use it to see if two documents are similar to each other.

We can use it in cyber security to hunt for different threats, such as: Domain Typosquatting, malicioius powershell commands, etc.

In this article, we will see how to use cosine similarity to detect Domain Typosquatting activities, which is usually used in targeted attacks as well as spear phishing.

"Typosquatting, also called URL hijacking, a sting site, or a fake URL, is a form of cybersquatting, and possibly brandjacking which relies on mistakes such as typos made by Internet users when inputting a website address into a web browser. Should a user accidentally enter an incorrect website address, they may be led to any URL (including an alternative website owned by a cybersquatter)."

  

Similarity Check

 

Let's say our domain is called example.com and an adversary wants to target our employees by sending a phishing email with some URLs pointing to a domain like:

examp1e.com

exemple.com

examlpe.com

...

and so on.

 

To detect such an activity, we can't use the equal operator for comparison, like:

'exemple.com' == 'example.com'


It won't match, but our eyes as humans can see that the two keywords are very similar.

The computer can use some mathematical techniques to check the similarity between the two keywords, a simple technique can be by checking the character frequency.

In fact, the way we are applying the cosine similarity is by creating a vector with the character frequency for each keyword, and then we apply the cosine similarity formula:



Cosine similary requires the vectors to be in the same size. For that, we will first find the common characters between the two keywords.

We have already implemented a custom AQL function that will calculate the cosine similarity for you, so you don't need to know any mathematical equations, and you can use it as simple as the following example.

See below
the similarity between facebook.com and facekook.com:


It is 95% similar!

The similarity score function will return a number between 0 to 1. 0 represents no similarity, and the closer to 1 the more similarity between the two keywords.

Here's another example to see if any communication happened to a domain that looks like example.com:



Note: It's better to limit the similarity check to the top level domain.

You can get that Custom AQL function from the following X-Force Collection.



#QRadar
0 comments
18 views

Permalink