Primary Storage

 View Only
Expand all | Collapse all

FlashSystem 5000 Node RAM ECC errors are not reported in the event log

  • 1.  FlashSystem 5000 Node RAM ECC errors are not reported in the event log

    Posted Thu December 22, 2022 05:32 AM
    I don't understand why such important kernel errors from the message log are not passed to the event log!
    The customer only notices the problem when the node reboots.

    kernel: EDAC MC0: 3 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x516a89 offset:0xd40 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
    edac_monitor[2525]: Wrote 0x516a89 to /run/edac_monitor/mc/mc0/dimm0/last_ce_page

    kernel: EDAC MC0: 3 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x516a8b offset:0xf40 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)


    something like that should at least be counted and reported after the threshold value has been reached.

    ------------------------------
    Sebastian Besler vvbasti
    ------------------------------


  • 2.  RE: FlashSystem 5000 Node RAM ECC errors are not reported in the event log

    Posted Tue December 27, 2022 09:36 AM
    Short answer is that we do log an event when the threshold is reached.  However, the threshold is now 100,000 in 24 hours so you are unlikely to hit it.

    ------------------------------
    Tayfun Arli
    ------------------------------