SevOne

SevOne

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

SevOne NMS - Reporting on Device Availability using the Device's SysUpTime Value

By Tim Greenside posted Fri December 22, 2023 02:37 PM

  
I was working with a customer recently who asked whether SevOne can report on device availability by using the sysUpTime value reported by the device.  The good news is "Yes", SevOne can do this!  This requirement is common with service providers who need to contractually report on device availability, and may pay penalties if availability is less than 100%.  
The challenge with reporting on device availability comes when there is a network disruption between the monitored device and the monitoring system.  Network disruptions come in different shapes and sizes -- a firewall policy might block traffic, a router may be actually down causing it stop passing traffic, a link could be down between a data center and a branch office.  If you try to use ICMP "ping" to determine availability, you may not be able to reach a device during a network disruption -- "reachability" has been impacted -- but that doesn't mean that the device is actually "down" or "unavailable".  While a device may be "unreachable", it may be up from a system perspective.  There is a metric that measures this -- the MIB-II standard defines that a device should maintain a counter of the time that has elapsed (number in hundredths of a second) since the device last rebooted causing the SNMP agent on the device to reset.  This metric is known as sysUpTime.  
In order to leverage sysUpTime to determine whether a device has actually gone down, SevOne will periodically check whether the sysUpTime counter has been reset since the last time it checked.  If the device is unreachable due to a "network disturbance", when the network becomes available once again, SevOne will use the current sysUpTime value to determine whether the previous counter value was greater than the current value.  If so, this indicates that there was a device outage and device availability was impacted.  If, on the other hand, the sysUpTime counter has continued to increase from its last known value, then the device is deemed as "available", and SevOne will backfill the sysUpTime % available metric to be 100% for the duration of the network disruption.  This allows the service provider or enterprise network operator to accurately report on device availability despite having reachability impacted due to an unrelated network event.
You can see in the chart below, that when my device became unreachable (ICMP "ping" from the SevOne NMS to the device was disrupted), SevOne detected this network reachability event (blue line plummeting).  During this time, SevOne was unable to reach the device to ask for its sysUpTime value.  So during this period, SevOne is unable to determine whether the device is actually down or just unreachable.  When connectivity was restored (blue line recovers), SevOne was able to once again read from the sysUpTime counter and compare it with the last known good value to determine whether the device was actually down or just unreachable.  SevOne then backfilled the "system uptime" indicator to be 100% available, so that it can accurately report the availability of the device.
This feature isn't enabled out-of-the-box.  You need to perform the following steps to enable it:
1.  In the cluster manager's General Settings", check the "Measure System Uptime" option box.
2.  For the devices you want to use this availability method, you need to enable the "Deferred Data" poller option.  You can do this in device manager for each device, or you can enable deferred data globally via IBM's Rapid Network Automation or using SevOne's API within a script.
3.  Rediscover the device(s) and it will now create a new Deferred Data "system" object type that will provide the "System Uptime" % availability metric to your device.
0 comments
24 views

Permalink