SevOne

 View Only
Expand all | Collapse all

Interface CRC error monitoring using Sevone

  • 1.  Interface CRC error monitoring using Sevone

    Posted Thu September 14, 2023 03:54 PM

    Can someone put some light on this subject. We have a requirement to monitor the CRC interface errors . Currently we tried to create it using fcs error counter but the value showing up in sevone is very less while averaging compare to the existing monitoring tool.

    We are looking for some help to do a effective crc monitoring, 



    ------------------------------
    AKHIL Raj
    ------------------------------


  • 2.  RE: Interface CRC error monitoring using Sevone

    IBM TechXchange Speaker
    Posted Fri September 15, 2023 04:35 AM
    Edited by Raul Gonzalez Fri September 15, 2023 04:35 AM

    Hi Akhil

    short answer: yes, it is possible.

    long answer: one of the problems we have with vendors is that they don't normally share CRC errors using standardised protocols such as SNMP. Some times they try (like Cisco did with the OID .1.3.6.1.4.1.9.9.45.1.1.1.20) but most of them fail miserably :). So what options do we have? as you might possibly know, SevOne is able to collect any time series data, as long as that data is available in some kind of format. I understand that you are using some kind of CLI command to get the CRC errors, well, with SevOne you can do the same, execute that command (either using our nocode platform, or if you prefer using a custom script) and then ingest that data back into SevOne.

    Similar principles can be found here: https://community.ibm.com/community/user/aiops/viewdocument/monitor-ibm-cloud-metrics-using-the?CommunityKey=fe9d91df-352c-4846-9060-189fd98d00ca&tab=librarydocuments where we are monitoring data coming from APIs.



    ------------------------------
    Raul Gonzalez
    Software Networking Solutions Architect
    IBM
    Brighton, UK
    ------------------------------



  • 3.  RE: Interface CRC error monitoring using Sevone

    Posted Fri September 15, 2023 05:13 AM

    Hi Raul Gonzalez, Thank you for the support .  We received OID details from Cisco reference to CRC - 1.3.6.1.2.1.10.7.2.1.3. I can see its already certified in Sevone and its populating the graph and data . But when we create the thresholds like if CRC more than 500 number > 10 min out of 15 min trigger alarm and It's not generating alert ( but the alert got triggered for same device and same criteria in different NMS tool ). while t-shooting the values showing up is bit confusing as average showing  in 0.584 but total is more than 2k for a time window of 1hr. 

    So wanted to check in forum anyone have seen this issue or using the CRC monitoring effectively using SNMP data.



    ------------------------------
    AKHIL Raj
    ------------------------------



  • 4.  RE: Interface CRC error monitoring using Sevone

    IBM TechXchange Speaker
    Posted Fri September 15, 2023 05:22 AM

    Hi Akhil

    the fact that there is an OID is good news. I guess the problem is with what we do with the data once is collected. Based on the MIB file (http://oidref.com/1.3.6.1.2.1.10.7.2.1.3), this OID is a COUNTER, meaning that we should store only the difference between the last poll and the new poll (also called DELTA value), because the value of this OID will never go down.

    To make it clearer, when we start the device, the value of the OID will be 0. When there is a CRC error, the value of the OID will go to 1, and will remain 1 until there is another CRC error or when the device is rebooted. This can become a problem because if you don't store the value properly, it might be that after 6 months of the device being up, the CRC errors can be at 499, and the following one single CRC error will bring the metric to 500 and trigger the alert on your other NMS system.

    In order to troubleshoot this further, we would need a graphic from both tools (SevOne and the other NMS) to compare how we are treating the data, because it might be that the issue is not in SevOne but on how the other tool has been configured.



    ------------------------------
    Raul Gonzalez
    Software Networking Solutions Architect
    IBM
    Brighton, UK
    ------------------------------



  • 5.  RE: Interface CRC error monitoring using Sevone

    Posted Fri September 15, 2023 05:56 AM
    Edited by AKHIL Raj Fri September 15, 2023 05:57 AM
    Report from another NMS - its in UTC time zone y axis in k(thousands). 
    other NMS CRC report
    Same time same interface timezone is Us time.



    ------------------------------
    AKHIL Raj
    ------------------------------



  • 6.  RE: Interface CRC error monitoring using Sevone

    IBM TechXchange Speaker
    Posted Fri September 15, 2023 06:07 AM

    Thank you Akhil

    I think I know where the issue is, when working with COUNTERS, SevOne shows you the unit/second, not the unit by itself, that's why on the top screenshot you can see that there has been 500-600 errors, whereas in SevOne you will see 1 error/second. 

    If you want to see the absolute number of errors you have to change the aggregation from 'average' (or nothing) to 'total', that should allow you to see very similar values. This also affects the way we create alerts, you should select aggregation 'total' rather than 'average'.

    Example with traffic, top graph is using bytes/sec, second graph is using total bytes

    On alerting policies we should select Total rather than Average:



    ------------------------------
    Raul Gonzalez
    Software Networking Solutions Architect
    IBM
    Brighton, UK
    ------------------------------



  • 7.  RE: Interface CRC error monitoring using Sevone

    Posted Fri September 15, 2023 06:24 AM
    Edited by AKHIL Raj Fri September 15, 2023 06:24 AM

    Thank you Raul Gonzalez, for your quick assistance .

     I understood the point , I have configured the threshold condition using aggregation time over threshold ( if CRC 500 number > 30 min out of 1hr ). Does that aggregation also behave like average ?



    ------------------------------
    AKHIL Raj
    ------------------------------



  • 8.  RE: Interface CRC error monitoring using Sevone

    IBM TechXchange Speaker
    Posted Fri September 15, 2023 06:30 AM

    Hi Akhil

    unfortunately that will not work because the default behaviour on COUNTER OIDs is to use the number/sec. Unless you use a smaller threshold  such as 500 CRC errors / 300 seconds (if you monitor every 5 minutes) = 1.66 as threshold then that approach would work.

    Normally for COUNTER metrics where you are interested not in the rate but in the absolute number, we would recommend using TOTAL as aggregation.



    ------------------------------
    Raul Gonzalez
    Software Networking Solutions Architect
    IBM
    Brighton, UK
    ------------------------------



  • 9.  RE: Interface CRC error monitoring using Sevone

    Posted Fri September 15, 2023 06:49 AM

    Thank you @Raul Gonzalez for this solution . We are trying to bring up the Sevone thresholds similar to the existing configuration ( trigger alert when condition met for 30 min out of 1hr ) . I understood we have to use total aggregator instead of Time over threshold which means we can't give a condition with sliding window.  I will configure the same in policy and test it.

    And also Is it feasible in sevone to create a total - aggregation with Sliding window ( trigger alert when condition met for  xx min out of  yy minute )feature ?. If it can be doable I will open a support request for this.



    ------------------------------
    AKHIL Raj
    ------------------------------



  • 10.  RE: Interface CRC error monitoring using Sevone

    IBM TechXchange Speaker
    Posted Fri September 15, 2023 06:57 AM

    Hi Akhil

    if you want, we can have a short call together to review this requirement. I guess that with 15-20 min we should have enough time to create the alert that you need. Please send me an email at raul.gonzalez@ibm.com to arrange the session.



    ------------------------------
    Raul Gonzalez
    Software Networking Solutions Architect
    IBM
    Brighton, UK
    ------------------------------