IBM Security Guardium

 View Only

Artificial Intelligence and Cybersecurity: Risks and Opportunities

By Walid RJAIBI posted Tue May 12, 2020 04:34 AM

  
Guardium.jpg
Co-authored by Walid Rjaibi, Distinguished Enginner and CTO for Data Security, IBM and Mokhtar Kandil, Data Science and AI Architect, IBM.


Although coined over 50 years ago, the term "Artificial Intelligence" has only recently started gaining significant popularity to the point where it has become somewhat of a buzz word, in all areas of society, whether it is business, government, military, research and academia or others.

The specific advantages of using AI are probably as numerous as the number of use cases themselves and therefore cannot be listed exhaustively. While this article focuses on the relevance of AI in some of the most important cybersecurity use cases, there are some general benefits of AI which can be identified upfront as they correspond to advantages which are usually associated with new technologies:

Increases of productivity saving time and reducing costs
Artificial intelligence allows for increases in the levels of efficiency in the automation of tasks in a similar fashion that more traditional computer programs and industrial machines did with respect to manual tasks.
 
Reduction in the numbers of errors
Although training AI algorithms can be time consuming and error prone, the progresses in the availability of large data sets and high computing power are leading to reductions in bias. Additionally, the general reduction in the complexity associated with explicitly coding business logic versus using data driven approaches is helping making the AI algorithms more reliable.
 
Ability to learn continuously
Machine learning, reinforcement learning and several other AI based methods are using data patterns to uncover the logic behind a specific process and do not require explicit programming anymore. This data based approach provides great flexibility, and constitutes a significant progress in the ability to learn continuously, by adapting to an evolving environment without requiring manual intervention to program new rules or new logic.
 
Improvements in the customer or user experience and satisfaction
Ultimately, all the improvement mentioned above in terms of increases in productivity, reduction in error rates and continuous learning without explicit programming or manual interventions all converge towards providing an improved customer or user experience. This in turn leads to improved user satisfaction rates, which will typically result in wider or better adoption of the technology at hand.
 
In its most general form, cyber security can be considered as the protection of computer systems from the theft, damage or availability to their various hardware, software and data components.
 
Artificial intelligence plays a central role in protecting both the data and software components of computer systems but it can also be used to create new threats against the same targets. It is therefore a double-edged sword which makes it very critical for cyber security experts to compose with. Let us look at a few use cases covering both sides of the fence.

 
1. Automatic Data Classification


When considering the exponential increases in the amounts of data which are stored and exchanged on a regular basis, one of the major challenges faced by any organization today is to reliably identify where sensitive information resides in both structured and unstructured data repositories. Indeed, the first step to protect sensitive data is to discover it, which is the purpose of automatic data classification.

Data classification is not a trivial problem and has presented several challenges especially when dealing with repositories containing unstructured data such as legal documents, contracts and medical information. One of the most significant challenges to overcome when attempting to properly discover and classify sensitive information is to be able to understand the text in a semantic way, rather than based on mechanical rules, in ways which are very similar to how humans are able to derive context and meaning from some inherently ambiguous text. This process is quite complex and covers several tasks such as:

  • Sentence boundary disambiguation: finding the beginning and end of a sentence, specifically in the absence of correct punctuation.
  • Word sense disambiguation: bass, is it the fish or a low frequency sound wave. Book, is it a verb indicating the act of making a reservation or noun indicating something to read?
  • Entity extractor: Find entities such as people, places, companies. April, is it the month of the year or a first name?
  • Micro understanding: extracting sensitive information from the text.
  • Macro understanding: classification, categorization, topic analysis, summarizations.

 
Natural Language Processing is a fundamental branch of Artificial intelligence which has provided several techniques for understanding and classifying text automatically. The use of AI based classification improves the speed at which text and documents can be processed and reduces the number of errors therefore improving the overall efficiency of the security administrator who is attempting to identify the sensitive information to better protect it... 

2. Threat Detection: Anomaly or Outlier Detection

Once data has been properly discovered and classified, the next step in data protection is to detect unusual or improper data access patterns. Machine learning and Artificial intelligence play again a key role in increasing and enhancing the ability of security administrators to detect and eliminate such threats. One of the most commonly known examples of outlier or anomaly detection is used by all major email providers to raise alerts when a suspicious login is taking place due to some unrecognized geographical location or other similar deviation from the expected pattern.
IBM Security has been pushing the envelope in refining such types of approaches through various innovations including:
 
- IBM Security Guardium Outlier detection

An outlier is a behavior by a particular source in a particular time period or scope that is outside of the “normal” time frame or scope of the particular source or user's activity. Outliers can indicate a security violation that is taking place, even if the activities themselves do not directly violate an existing security policy.

User activity that is identified as a suspected outlier can include:

  • User accessing a resource (table or file) for the first time
  • User selecting specific data in a source that he has never selected before (or through a channel never used before)
  • Exceptional volume of errors. For example, an application generates more SQL errors than it has in the past. This could indicate that there is an SQL injection attack in progress.
  • Activity that itself is not unusual, but its volume is unusual (excessive reads, excessive writes)
  • Activity that itself is not unusual, but the time of the activity is unusual. For example, a DBA is accessing a particular table more frequently than in the past. This could indicate that the DBA is slowly downloading small amounts of data over time.

- IBM Security Guardium Sequence based anomaly detection

Most business and transactional operations typically follow a certain repetitive order of events (debits and credits, registrations, etc.…). Sequence based anomaly detection uses deep learning technology such as Recurrent Neural Networks to learn the specific sequence of patterns in business logic which are referred to as "Logical Operations". Once the initial set of Logical Operations is learned and get predicted with enough accuracy, the neural network can start generating alerts when an unexpected sequence is encountered which can be either a brand new sequence or a modification of an existing one.
 
Both the outlier detection and the sequence based anomaly detection are unsupervised learning techniques which also benefit from continuous training so that they can adapt to changing environments or patterns. These packages can be easily deployed in an existing system with little effort and will constitute very solid tools in the panoply of the cybersecurity expert for automatically identifying, fighting and eliminating several types of attacks or malicious behaviour.

3. Attacks on AI

Since the underlying nature of Artificial intelligence assets is identical to all other computer-based assets that is to say executable software and data components, it follows that AI can be subject to similar types of malicious attacks which can cause computer software to fail or produce incorrect results. It follows that identifying the types of attacks which can be carried out on AI assets and how to defend against them is an important aspect to highlight as part of the relevance of AI in the field of cybersecurity. In this section we will cover some of these important aspects.

One of the most serious types of attacks which can be carried against machine learning and artificial intelligence models is known as "adversarial" attacks and those come in two flavors:

  • Evasion attacks

    In this type of situation, the attack is performed on a model which is already trained; and consists in introducing slight perturbations in the input (typically an image) in order to trigger a response which is very different from the expected classification. In the case of deep neural networks classifiers for images (computer vision), the perturbations are small enough to be totally invisible to the human eye yet cause the target neural network to produce a completely wrong classification. Several examples are available showing how a small perturbation to a few pixels in an image can cause a deep neural network to completely misidentify a bird as a toaster or a turtle as a rifle or can allow a subject to impersonate another by simply wearing a pair of specially designed glasses. The ease with which this type of attack can be carried out and its efficiency are of clear concern when considering the importance that deep neural networks are taking in areas like security (facial recognition to unlock phones or at fast lane entry points in many international airports) or transportation (self-driving cars) and other aspects of daily life. It is worth noting that similar examples also exist in other areas of machine learning and deep neural networks such as speech to text transcriptions, sentiment analysis, malware detection, etc.

  • Poisoning attacks

    Poisoning attacks are performed at training time and consist in including a maliciously crafted set of labeled data points in the training data set which are meant to be used as backdoors later once the model is trained. Here again, this type of attack was studied in several areas of AI such as sentiment analysis, malware detection and computer vision. In this latter category, poisoning can be achieved by carefully including a small number of pixels at a specific location in a picture to have it classified into whichever category the attacker wishes. One such famous example shows how a "back-doored" stop sign was classified as a speed limit.
     
    Defenses against both types of attacks mentioned above are being actively developed by the research community at large and while complete robustness is never guaranteed, IBM has been actively developing an "Adversarial Robustness Toolbox" to help users of AI and deep neural networks build and enhance the defenses of their networks against such attacks. [ART]

 4. Use of AI for attacks


Most advancements in science and technology can be equally used for very opposite purposes (from an ethical perspective) and this is true for AI as well. As previously discussed, AI based technology can be used for threat detection to help organizations fight malicious behavior and attacks. However, the availability of the same AI technologies is not proprietary and can be equally leveraged by cyber criminals as well to develop new breeds of malware or malicious software which would be extremely difficult to detect or recognize. Deep neural networks can result in the development of new types of malware which are much smarter and much more difficult to detect than more "traditional" viruses from the PC era or web-based malware such as trojan horses, spyware, phishing or cross site scripting. More details are available in a 2019 paper [MWB2019], in which Malwarebytes labs describes several ways in which AI can be abused by threat actors for criminal or nefarious purposes:

  • Use AI to trigger large amounts of false positives and damage trust and confidence in existing security software (incorrect detections of threats, incorrect flagging of packages as malware infected, etc.)
  • New types of phishing based on videos known as DeepFakes (creating a fake video of a real person)
  • New types of worms
  • New types of trojans
  • Captcha solving

IBM Research has further studied the potential ramifications of how AI can be weaponized to generate new types of malicious software through a project codenamed "DeepLocker" [DL2018] and the results are very concerning. The IBM security researchers behind DeepLocker were successful in creating a product which masquerades as a video conferencing package and remains "completely stealthy" until the precise moment where a target is identified, at which point the attack can be launched. The novelty resides in the fact that the DeepLocker malicious payload is hidden inside a Convolutional Neural Network which would have been trained to recognize some specific individual(s), making it very difficult for any security expert to reverse engineer the malware and discover its target or triggering conditions even if it is discovered. The generalization of AI techniques in the development of malware is therefore one of the top concerns for cybersecurity experts.


Additional reading:

[ART]

https://arxiv.org/abs/1807.01069

https://github.com/IBM/adversarial-robustness-toolbox

 

[MWB2019]

https://resources.malwarebytes.com/files/2019/06/Labs-Report-AI-gone-awry.pdf

 

[DL2018]

https://securityintelligence.com/deeplocker-how-ai-can-power-a-stealthy-new-breed-of-malware/


0 comments
16 views

Permalink