IBM Security QRadar

 View Only
  • 1.  Events behaviour when Event Collector is down

    Posted Thu July 02, 2020 02:31 AM

    There's a question about Event Collector behaviour  when network is down....(there's a queue with 5 GBs if the EC cannot reach the EP).

    But, what if there are several logsources reporting to an Event Collector and the Event Collector is down? should I need HA for EC for this possible scenario?

    Thanks a lot.



    ------------------------------
    Roberto Ivars
    ------------------------------


  • 2.  RE: Events behaviour when Event Collector is down

    Posted Thu July 02, 2020 04:10 AM
    Hi Roberto,

    HA for ECs is an expensive license and is not HA in that it protects from event loss.

    Testing shows that fail over takes between 60 and 180 seconds (on average) and ingest fully stops on both ECs. So you'll still get event loss. The "HA" is really a mechanism to cater for hardware failure, not failure of any of the software components.

    The time delay is down to how long the ha_manager component takes to work out that there is no traffic on the SSH tunnels.  The difference in time is down to whether it is a graceful fail-over (set offline), or a hard failure (power off).

    Sadly, you will not find this described in any sales literature and there is no public statement on EC HA performance nor anything on the 101 or other sites.

    If you don't mind some event loss, then by all means go for it.

    It may be cheaper to front the EC with a log forwarder (e.g. Splunk). I know of several Customers that use this approach as no event loss is important to them. This covers a mix of SIEM products, from McAfee, IBM QRadar and Log Rhythm, which is a polite way of saying they are all poor performers in this area.

    Welcome views from product managers if this operating mode is likely to ever change.

    Best wishes,

    ------------------------------
    Darren H.
    ------------------------------



  • 3.  RE: Events behaviour when Event Collector is down

    Posted Thu July 02, 2020 07:46 AM

    Thanks for the answer, this resolved my doubts

    :)



    ------------------------------
    Roberto Ivars
    ------------------------------



  • 4.  RE: Events behaviour when Event Collector is down

    Posted Thu July 02, 2020 10:05 AM
    HA is a good option for DR(Disaster Recovery) procedure. you can do a failover or switch over from one Data Center and bring the slave up in other data center. However if you don't want to lose data. you can consider the collector or Data Gateways for Cloud based . Both scenarios they will keep the data until your console will be up and running and then the data will synchronize the events and catch up. both scenarios I have in production and we don't want  lose data. we lose data when one  collector or data gateways are off or having network issue at that scenario we will lose data.

    Hope this can help u

    ------------------------------
    Joaquin Martinez Hernandez
    ------------------------------



  • 5.  RE: Events behaviour when Event Collector is down

    Posted Thu July 02, 2020 12:17 PM
    If you're looking at ways to consider DR, the disconnected log collector (DLC) is now a preferred technical option. The Canadian dev teams have confirmed this.

    Note the SIEM sales community prefer selling HA event collection as it costs more, but as noted, does not really provide HA nor protect from data loss. 

    DR is a whole different use case and depends what you are trying to do.

    QRadar does not support any kind of DR out-of-the-box and event the DR app (still not GA), has significant constraints to enable operation.

    Do design/engineer knowing that the current product has limits.

    Kind regards,

    ------------------------------
    Darren H.
    ------------------------------



  • 6.  RE: Events behaviour when Event Collector is down

    Posted Fri July 03, 2020 01:42 PM
    Yeah HA is good for a complete failure of the hardware in that within 10 minutes or so you'll be up and running again without needing any manual intervention or house-built automation, but because it's not instantaneous event loss still occurs.

    We'll have more robust true HA capabilities in the next year or so as we roll out our next-gen platform but it's a big effort.

    For ECs specifically, if you're dealing solely with event data pushed to QRadar (e.g. syslog), a load balancer in front of 2+ ECs is a strategy some customers use. If both systems are up, the events get round-robin'd across them, if one goes down and the LB is capable of detecting that, it will send them all to the remaining EC.

    One other note, regarding Roberto's initial post: the 5 GB limit is for our license spillover, so if an event rate is received that exceeds the allocated license limit, we can store up to 5 GB of extra data before we start dropping. This is kind of a good-faith allowance to accommodate for brief event spikes that exceed the license. If the downstream EP is unavailable, that uses a different spillover queue that does not have a fixed size. It will fill up to 90-95% (I forget the exact number) of the available disk space on the EC. So if the downstream EP dies, the EC will buffer as long as it can to avoid data loss. So if the EP is part of an HA cluster, we should be able to hold on until the secondary comes up.

    Cheers
    Colin

    ------------------------------
    COLIN HAY
    IBM Security
    ------------------------------



  • 7.  RE: Events behaviour when Event Collector is down

    Posted Mon July 06, 2020 09:29 AM
    Hi Roberto,

    I guess the 3rd party syslog forwarders work inline just like the ECs which means they represent the same single point of failure as the ECs.

    In my opinion you should consider to involve a network based failover or load balancing mechanism in your deployment.
    In case the EC host is down the failover/LB will redirect the syslog traffic to a different IP address which can be
    a) another EC or
    b) you can even forward the traffic to the Console itself (this is not recommended AFAIK but works).
    Bear in mind in either way you have to have adequate EPS license allocated it _in advance_ (not to spill over the redundant EC and lose events.)

    There are sources which are able to send syslog to multiple destinations but you should check the exact capabilities of each of them.

    Regards,
    István

    ------------------------------
    Istvan Nagy Kasza
    ------------------------------