I don't understand why such important kernel errors from the message log are not passed to the event log!
The customer only notices the problem when the node reboots.
kernel: EDAC MC0: 3 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x516a89 offset:0xd40 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)edac_monitor[2525]: Wrote 0x516a89 to /run/edac_monitor/mc/mc0/dimm0/last_ce_page
kernel: EDAC MC0: 3 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x516a8b offset:0xf40 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
something like that should at least be counted and reported after the threshold value has been reached.
------------------------------
Sebastian Besler vvbasti
------------------------------