Blog

Machine Learning for CyberSecurity

There are three dimensions of cybersecurity tasks are Why, What and How. The first dimension is a goal, or a task (e.g., detect threats, predict attacks, etc.). According to Gartner’s PPDR model, all security tasks can be divided into five categories: 

  • prediction;
  • prevention;
  • detection;
  • response;
  • monitoring.

The second dimension is a technical layer and an answer to the “What” question (e.g., at which level to monitor issues). Here is the list of layers for this dimension:

  • network (network traffic analysis and intrusion detection);
  • endpoint (anti-malware);
  • application (WAF or database firewalls);
  • user (UBA);
  • process (anti-fraud).

The third dimension is a question of “How” (e.g., how to check security of a particular area): in transit in real time; at rest; historically; etc.

Machine learning for Network Protection

Network protection refers to well-known Intrusion Detection System (IDS) solutions. Some of them used a kind of ML years ago and mostly dealt with signature-based approaches. ML in network security implies new solutions called Network Traffic Analytics (NTA) aimed at in-depth analysis of all the traffic at each layer and detect attacks and anomalies.

There are some examples to show how can ML help here:

  • regression to predict the network packet parameters and compare them with the normal ones;
  • classification to identify different classes of network attacks such as scanning and spoofing;
  • clustering for forensic analysis.

Machine learning for Endpoint Protection

The new generation of anti-viruses is Endpoint Detection and Response. It’s better to learn features in executable files or in the process behavior. Keep in mind that if you deal with machine learning at endpoint layer, your solution may differ depending on the type of endpoint (e.g., workstation, server, container, cloud instance, mobile, PLC, IoT device).

Every endpoint has its own specifics but the tasks are common:

  • regression to predict the next system call for executable process and compare it with real ones;
  • classification to divide programs into such categories as malware, spyware and ransomware;
  • clustering for malware protection on secure email gateways (e.g., to separate legal file attachments from outliers).

Machine learning for Application Security

Where to use ML in app security? — WAFs or Code analysis, both static and dynamic. To remind you, Application security can differ. There are web applications, databases, ERP systems, SaaS applications, micro services, etc. It’s almost impossible to build a universal ML model to deal with all threats effectively in near future. However, you can try to solve some of tasks. 

Here are examples what you can do with machine learning for application security:

  • regression to detect anomalies in HTTP requests (for example, XXE and SSRF attacks and auth bypass);
  • classification to detect known types of attacks like injections (SQLi, XSS, RCE, etc.);
  • clustering user activity to detect DDOS attacks and mass exploitation.

Machine learning for User Behavior

This area started as Security Information and Event Management (SIEM). SIEM was able to solve numerous tasks if configured properly including user behavior search and ML. Then the UEBA solutions declared that SIEM couldn’t handle new, more advanced types of attacks and constant behavior change. The market has accepted the point that a special solution is required if the threats are regarded from the user level. However, even UEBA tools don’t cover all things connected with different user behavior.

There are domain users, application users, SaaS users, social networks, messengers, and other accounts that should be monitored. Unlike malware detection focusing on common attacks and the possibility to train a classifier, user behavior is one of the complex layers and unsupervised learning problem. As a rule, there is no labelled dataset as well as any idea of what to look for. Therefore, the task of creation a universal algorithm for all types of users is tricky in user behavior area.

Here are the tasks that companies solve with the help of ML:

  • regression to detect anomalies in User actions (e.g., login in unusual time);
  • classification to group different users for peer-group analysis;
  • clustering to separate groups of users and detect outliers.

Machine learning for Process Behavior

The process area is the last but not least. While dealing with it, it’s necessary to know a business process in order to find something anomalous. Business processes can differ significantly. You can look for fraud in banking and retail system, or a plant floor in manufacturing. The two are totally different, and they demand a lot of domain knowledge. In machine learning feature engineering (the way you represent data to your algorithm) is essential to achieve results. Similarly, features are different in all processes.

In general, there are the examples of tasks in the process area:

  • regression to predict the next user action and detect outliers such as credit card fraud;
  • classification to detect known types of fraud;
  • clustering to compare business processes and detect outliers.

Leave a Reply

Your email address will not be published. Required fields are marked *