SevOne

SevOne

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

How IBM SevOne NPM Helps Customers Avoid Unplanned Downtime With SNMP Trap Collection

By Ryan Wilson posted Tue November 07, 2023 10:50 AM

  

What is an SNMP trap? 

An SNMP trap is a message originated by an agent or device, sent to a monitoring solution like IBM SevOne NPM, notifying it of an extraordinary event. SNMP traps are useful to SevOne users because they can indicate that a device in their network has entered or is approaching a state that could result in unplanned downtime. 

File:SNMP TRAFFIC1.png - Wikimedia Commons

The Cost of Unplanned Downtime 

A 2019 study performed by Forrester and commissioned by IBM reported on the significant impacts of unplanned outages: 

 

Examples of SNMP traps that indicate the potential for future outages include fan, PSU, or disk failures. One of my core goals as a Product Manager for IBM SevOne NPM is to enable our customers to proactively respond to signals from devices, like SNMP traps, before they escalate into unplanned outages. 

Getting More Value out of Trap Based Events 

In the previous post by my colleague @Matthew Sweet, he discussed how combining SevOne alerts configured with Webhook Definitions can reduce the time to notify network operators when problems occur in the network. He also shared how SevOne admins can include report URLs to relevant insights via webhook messages to Slack. Did you know that as of SevOne NPM 6.5, these Webhook Definitions can also be configured for alerts from trap events? 

Webhooks from SevOne alerts can also be used to automate ticketing in your incident management platform of choice, like ServiceNow. Additionally, with the aid of upstream event correlation platforms like Cloud Pak for AIOps, SNMP traps that trigger alerts in SevOne can be used to correlate failures on network devices to the business-critical applications running on them. 

Consider the case below where I've configured a trap event in SevOne NPM to notify the NOC's slack channel when a fan fails on a device.

SNMP OIDs representing fan failure:  

1.3.6.1.4.1.789.0.33 - fanFail  

1.3.6.1.4.1.789.0.36 - fanRepaired 

These OIDs are resolved automatically into NETWORK-APPLIANCE-MIB::fanFailed and NETWORK-APPLIANCE-MIB::fanRepaired once configured in the event.

 

Since SNMP traps sent to SevOne NPM can now be grouped together into triggering and clearing conditions for the same alert, once the trap arrives signaling that the fan was repaired by a maintenance crew, the previously opened alert in SevOne is closed and the NOC is notified via a webhook to Slack. This reduces time to validate that the repairs were completed and eliminates the need to close trap-based alerts by hand.

I’m excited that SevOne NPM can support this automation use case because it makes managing the network easier than before. In the words of an IBM customer who I recently shared this feature with: “This is exactly what we need!” 

We’d Like Your Feedback 

Do you have a story about how SevOne NPM helped you prevent or reduce unplanned downtime through its SNMP trap collection or webhook integrations? Or do you have a story about a use case that isn’t quite achievable with SevOne and stands in your way of being a hero for your application users? Continuous improvement and the success of our customers is our mantra. We’d love to hear from you! 

0 comments
26 views

Permalink