IBM Security QRadar

 View Only

Load Balancing Syslog Data to QRadar

By Alaa Ali posted Thu April 22, 2021 08:52 AM

  
12 min read

This article is about load balancing regular syslog data (i.e., TCP/UDP port 514). For information about load balancing other QRadar data such as TLS Syslog, TCP Multiline, etc., another blog post will be posted soon.

If you have any questions or comments about this, please leave a comment!

Overview

Syslog data can be load balanced to QRadar using external load balancer vendors/software, this can help ensure your EPS is evenly distributed to QRadar hosts. However, you also need to be aware of the implications/problems this brings with it in QRadar, and the ways to alleviate these implications, which will highly depend on your specific environment and architecture. This article aims to summarize the considerations when load balancing syslog data to QRadar. 

If you're not already familiar with general load balancing concepts, here's a quick refresher: you configure a load balancer with what is commonly known as the "VIP." It has a DNS name and an IP address (e.g., qradar-syslog.company.com / 10.1.2.3) and listens on your port (in our case, port 514). You then tie it to a "pool," which is the list of servers that you want to load balance the data to (in our case, these pool members are our QRadar servers, such as the ECs/EPs/DLCs/etc.). Log sources send their data to the VIP IP or DNS name on port 514, and the load balancer would take care of sending that data to the QRadar servers in a load-balanced manner.

Note: whenever there is a mention of a QRadar server, appliance, host, or "pool member" in this article, these all refer to QRadar hosts that can receive log data, i.e., a Console (All-In-One), an Event Collector (EC), a Disconnected Log Collector (DLC), an Event Processor (EP), or a Hybrid/Combo Processor.

The most common load balancer vendors we've seen used with QRadar are F5 Big-IP LTM and Citrix Netscaler, although open-source/free software such as HAProxy and NGINX would likely also work.

There are generally two categories of load balancers: network load balancers (operating at Layer 4) and application load balancers (operating at Layer 7). We discuss network load balancers in this article, they work by simply balancing data at the transport layer, and they don't care about the actual application data itself.

Here are the considerations when load balancing syslog data in QRadar.

Load Balancing Architecture

Typically, you would have two separate VIPs/pools for syslog data: one for TCP and one for UDP. It's good practice to separate TCP and UDP VIPs/pools in the load balancer configuration because each could have its own settings. How you configure your load balancer like this depends on what load balancer you use.



For small-medium customers, you could create a single VIP called "qradar-syslog.company.com."

For larger customers, you could create multiple VIPs for different device types, such as "qradar-syslog-linux.company.com," "qradar-syslog-windows.company.com." This configuration would give you more control over how to configure the pools for those device types. For example, you might want to load balance data only to a few QRadar servers dedicated for Windows logs and use other QRadar servers for Linux, etc.

Health Check (Monitor)

You should configure a health check (or health monitor) on the pool members to determine which are healthy and ready to accept traffic.

OPTION 1 (recommended): TCP 514 Port Health Check
The best health monitor to use for syslog load balancing is to check if TCP port 514 is open on the QRadar host; use this for both TCP and UDP traffic. Your load balancer might provide a 'UDP port monitor' feature where it would check if UDP port 514 is open, but (in my opinion) checking if a UDP port is open on a host is never reliable, and sometimes load balancer vendors recommend against using it (or using it with a secondary monitor). Because of this, for the UDP 514 QRadar data, you should use the TCP 514 check as well. This might seem counter-intuitive, but the same software process in QRadar (ecs-ec-ingress) listens on port 514 (UDP and TCP), meaning if TCP 514 is open and healthy, UDP 514 should also be open.

OPTION 2 (not recommended): Ping Check
By default, QRadar hosts don't respond to ping. But even if you allow that, ping is not the best way to check if a QRadar host is 'healthy' or not for receiving traffic - for instance; the host could respond to ping but still not be able to receive data because the ecs-ec-ingress process is down, which would result in incorrect health checks. Do not use a ping check. This option is on this list to highlight that you shouldn't use it.

In summary, for both TCP and UDP port 514 load balancer pools, set the health monitor (health check) to check on TCP port 514 on the members.

Load Balancing Method

How a load balancer decides what data to send to what server is determined by the load balancing algorithm. There are two popular load balancing algorithms:
  • round-robin: data is distributed in a loop (in rotation), meaning the first connection goes to the first server, the second connection goes to the second, third to third, etc.
  • Least-connections: the load balancer keeps track of how many connections each server in the pool has, and it distributes data to the one with the least amount of connections.
Round-robin is a straightforward load balancing algorithm, but it doesn't consider the load of the servers (i.e., how many connections each one has). For example, if server1 has three active connections and the rest of the servers don't, the round-robin method will still send syslog data to server1 when its turn comes up. On the other hand, the least-connections method would skip server1 and send the connection to server2 instead.

Note: this section might need a bit of an expansion. I'm not entirely sure how the least-connections method works with UDP.

Load balancers provide many other algorithms, but this article does not cover them.

In summary, for syslog port 514 data, in most cases, it is better to use a least-connections algorithm instead of round-robin.

Persistence (Sticky)

Session persistence (session stickiness) is a feature in load balancers meant to ensure that a 'session' (and subsequent sessions) from a client goes to the same destination server; in other words, the load balancer would track the connections and would make sure that traffic from a source keeps going to the same destination server. There are several types of session persistence; one type uses cookies, used in web (HTTP) traffic; another type is simply sticking the source to the first destination server it connects to for a specified time. You can read more about session persistence here and here.

If something like this is enabled for syslog data, the data would not be distributed evenly amongst the pool members if one source sends many logs while the other doesn't. QRadar servers don't care about "session persistence" in this sense - it's OK for a source to send a log to qradar-ec1 then send the next log to qradar-ec2. If your goal is to ensure an even distribution as much as possible, you should disable any session persistence.

It is worth noting that the term "session" here does not mean an actual TCP session (i.e., a TCP connection and handshake) - the term instead refers to a session in the context of the application data being sent. A load balancer will always ensure that an actual TCP session (SYN > SYN-ACK > ACK > data > FIN/RST) is done entirely between a client and a server.

In summary, it would be best if you disabled any session persistence (or affinity persistence) when load balancing regular syslog data if you want to evenly distribute data.

Impact on Local Rules

Depending on your load balancing architecture and QRadar architecture, you might have to consider the impact load balancing will have on your QRadar rules. If you have a local rule with stateful tests, e.g., tests that look for a count of events over a period of time, that rule runs locally on each Event Processor independently. If you load balance all of your data evenly across all your Event Processors, some of these rules might not fire because some logs ended up on ep1 and the others on ep2.

You could either change some of your rules to Global rules, meaning the correlation will be done on the Console, although it's not always a good idea to turn on many Global rules for performance reasons. You could also cleverly architect the load balancing and ingestion of data so that related logs all end up on the same Event Processor.

If you only have one EP (or All-In-One Console) and you're load balancing across multiple ECs then you won't have to worry about this because the rule engine runs on EPs/Console, not the ECs.

In summary, familiarize yourself with the impact of load balancing on QRadar rules and what that means to you. You can read more about global/local correlation here and here.

TCP Load Balancing

Attempting to load balance TCP syslog data can sometimes result in a misbalance/uneven distribution to the QRadar hosts, meaning some could end up getting a lot more EPS than others. This typically happens because of the way some log sources send their data when they're configured to use TCP.

Load balancers balance TCP data by the TCP connection. This means if a single log source opens just one TCP connection and sends thousands of EPS through it (often called long-lived TCP connections), the load balancer won't "load balance" it because it's all coming through a single TCP connection, so it sends all that data to a single QRadar server until that log source closes the connection (or it's reset somehow) then the load balancer will send the next connection to the next pool member. This behavior completely depends on the log source and how the vendor company developed the code in their devices to send data. Other kinds of log sources could behave differently and send data through multiple subsequent TCP connections, and the load balancer would happily balance that properly.

This has been experienced with Palo Alto, for example, where a single Palo Alto firewall opens a single TCP connection and sends all the logs through it, so it all ends up going to a single QRadar server, overloading it with EPS, even though this data was going through a load balancer.

This has been by far the most difficult aspect of load balancing syslog data I have encountered. You can fix this problem, but it is a bit intrusive; what I've seen done to fix it is: the load balancer team somehow configures the VIP/pool to force a TCP reset (RST) on long TCP connections after a certain amount of time (e.g., 3 or 4 minutes) to reset the connection and allow it to connect to a different pool member. The pool member would still receive many logs in those 3-4 minutes, but this would balance out if multiple log sources fall in this same category.  I've seen this done using 'profiles' applied to the VIP on F5 Big-IP LTM. This is something you'll need to discuss with your load balancer team if you expect to load balance a lot of TCP syslog data.

In summary, you need to be aware that load balancing TCP data can result in an uneven distribution to QRadar. Some hosts get more EPS than others because of some long-lived TCP connections from high-rate log sources; you need to work with your load balancer team to tackle this problem.

Preserving the Real (True) IP of the Source

In QRadar, every event has a Source IP field and a Destination IP field. These are parsed from the event's payload; for example, if an event says "user1 logged in from IP 1.1.1.1 to IP 2.2.2.2," QRadar parses those IPs into the Source IP and Destination IP fields. However, if there are no IPs to parse from the payload, and there is no properly formatted syslog header or there is a syslog header but it doesn't contain an IP address, QRadar uses the IP address of the device that sent this log on port 514 as the Source IP and/or Destination IP. For example, if an event comes from a Linux server with IP 2.2.2.2 and the event payload is "user1 logged in from IP 1.1.1.1," QRadar uses 1.1.1.1 as the Source IP. But because there isn't any Destination IP in the payload, and we assume that there isn't an IP in the syslog header, QRadar uses 2.2.2.2 as the Destination IP because that's where the log came in from.

When you use a load balancer, all the traffic that QRadar receives on the network level will come from the load balancer's IP address since it's a "man in the middle" that's relaying the data; i.e., if you do a tcpdump on QRadar, you'll see the load balancer's IP. As a result, if some log payloads don't have IPs for QRadar to parse, you will see a lot of logs in the QRadar UI with the Source IP and/or Destination IP of the load balancer because QRadar falls back to using the IP address of the device that sent the log. This could cause confusion for your analysts when searching data in QRadar, or for QRadar rules to fire incorrectly.

You might not always have this problem, and it depends on many factors. But if you wanted to alleviate this, you want the load balancer to be "transparent" to QRadar on the network level so that QRadar thinks it's receiving the data directly from the log sources. For UDP traffic, this is very easy to do. For TCP traffic, it's harder.

For UDP traffic: the easiest way to fix this is for the load balancer to spoof the actual IP address on the network level. Meaning, the load balancer will use the log source's IP address in the IP packet when it sends it to QRadar. If you were to do a tcpdump on QRadar, you'd see the IP address of the original log source, so the load balancer is 'transparent' to QRadar. This is easily possible with UDP because it's a connectionless protocol; QRadar doesn't have to talk back to the original sender, so it's easy to use the actual sender's IP on the network level.

Load balancers usually have an option to implement this. On F5 Big-IP LTM, it's called SNAT (source NAT, or source address translation) - you'd want to disable it, which instructs the F5 to not use its own IP and instead use the true source IP. The option on other load balancers could be called Preserve Source IP. Speak with your networking and load balancer teams to see if this can be achieved.

For TCP traffic: it's a bit more involved. TCP is a connection-oriented protocol, so the source and destination need to talk to each other directly. If you simply just "spoof" the source IP like we did in UDP, the QRadar server will then try to respond directly to the original log source to complete the TCP connection because that's where it thinks the connection came from, so it bypasses the load balancer. But when the log source receives this connection from QRadar, it will reject it because it didn't talk to QRadar initially, it talked to the load balancer, so this connection never completes.

To preserve the true IP for TCP traffic, there is one solution I know of: the load balancer needs to be the default gateway for the QRadar servers, i.e., the load balancer will act as the router. Here is what happens: (1) the client talks to the load balancer, (2) the load balancer preserves the source IP in the packets to appear as if it came from the client, (3) QRadar will attempt to talk back to the log source to complete the TCP connection, but that traffic will pass through the load balancer because it's QRadar's default gateway (assuming the log source is outside the QRadar subnet), (4) the load balancer tracks this connection and uses its own IP address in the communication back to the client, and that completes the TCP connection. When implemented this way, the true source IP is preserved on the network level, and QRadar uses it in parsing if needed. You'd need to speak with your networking team and load balancer team to see if you can achieve this architecture.



In summary, using a load balancer can cause QRadar to display the load balancer IP in the Source IP and/or the Destination IP fields in the parsed events, but you can get around that. To preserve the true IP in QRadar: for UDP data, you should be able to easily configure the load balancer to spoof the true source - speak with your load balancer and networking teams on enabling this option (turn off SNAT or turn on the Preserve Source IP feature); for TCP data, the only current way you can preserve the true source is to use the load balancer itself as the default gateway/router, this requires network discussions with your networking and load balancer teams - this has been done before, and it successfully preserved the true source (using Citrix Netscaler load balancers). As always, test it if you plan to do it.

Summary

Data ingested into QRadar can be load balanced, but you need to be aware of these considerations:

  • For both TCP and UDP port 514 load balancer pools, set the health monitor (health check) to check on TCP port 514 on the members.
  • For syslog port 514 data, it is better to use a least-connections algorithm instead of a round-robin in most cases.
  • It would be best if you disabled any session persistence (or affinity persistence) when load balancing regular syslog data if you want to evenly distribute data.
  • Familiarize yourself with the impact of load balancing on QRadar rules and what that means to you. You can read more about global/local correlation here and here.
  • You need to be aware that load balancing TCP data can result in an uneven distribution to QRadar. Some hosts get more EPS than others because of some long-lived TCP connections from high-rate log sources; you need to work with your load balancer team to tackle this problem.
  • Using a load balancer can cause QRadar to display the load balancer IP in the Source IP and/or the Destination IP fields in the parsed events, but you can get around that. To preserve the true IP in QRadar: for UDP data, you should be able to configure the load balancer to spoof the true source easily - speak with your load balancer and networking teams on enabling this option (turn off SNAT or turn on the 'preserve source ip' feature); for TCP data, the only current way you can preserve the true source is to use the load balancer itself as the default gateway/router, this requires network discussions with your networking and load balancer teams - this has been done before, and it successfully preserved the true source (using Citrix Netscaler load balancers). As always, test it if you plan to do it.

Please note that it might not be possible to accomplish all the above steps depending on your situation, so please only use this article as general information for load balancing in QRadar.
1 comment
129 views

Permalink

Comments

Fri February 10, 2023 04:18 AM

How does QRadar behave when it receives logs from the same log source via two or more ECs? How is the Target Event Collector field set in this case?