Original Message:
Sent: Wed April 03, 2024 01:17 PM
From: Andrew M
Subject: DS3524 not responsive
loadDebug
value = 1
excLogShow
---- Log Entry #43 MAR-30-2024 01:52:41 PM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #44 MAR-30-2024 10:30:17 PM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #45 MAR-30-2024 10:30:17 PM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #46 APR-01-2024 11:16:21 PM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #47 APR-01-2024 11:16:21 PM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #48 APR-02-2024 12:18:42 AM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #49 APR-02-2024 12:18:42 AM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #50 APR-02-2024 12:52:27 AM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #51 APR-02-2024 12:52:27 AM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
hwLogShow
-1
------------------------------
Andrew M
Original Message:
Sent: Wed April 03, 2024 04:37 AM
From: Mousa Hammad
Subject: DS3524 not responsive
Hello Andrew,
please provide output of the following commands of the affected CTRL to check if we still have the same trigger of the issue:
loadDebug
excLogShow
hwLogShow
We never used a non branded DIMM in this system, so i can not answer that question.
I will be on vacation from 04.04.2024 until 18.04.2024.
Best regards, Mousa
------------------------------
Mousa Hammad
Original Message:
Sent: Tue April 02, 2024 09:27 AM
From: Andrew M
Subject: DS3524 not responsive
No. Last week is 3 failure
Is it possible to insert non-branded dimm to CTL?
------------------------------
Andrew M
Original Message:
Sent: Mon December 11, 2023 08:32 AM
From: Mousa Hammad
Subject: DS3524 not responsive
Hello Andrew,
we do not provide RCA for EoS product. But from the excLogShow we can see "ECC correctable error " encountered on DIMM and after that CTRL was locked down. Once we unlocked the CTRL via CLI, seems the power cycle was needed to bring the CTRL up.
Best regards, Mousa
------------------------------
Mousa Hammad
Original Message:
Sent: Fri December 08, 2023 12:23 PM
From: Andrew M
Subject: DS3524 not responsive
Why i see in a log
ECC correctable error threshold exceeded " reported on 21st and 22nd November
And after it i think DS is lost link to host
------------------------------
Andrew M
Original Message:
Sent: Fri December 08, 2023 10:02 AM
From: Mousa Hammad
Subject: DS3524 not responsive
Hello Andrew,
glad to hear that CTRL now up and running. It seems there are some flags were not set correctly which blocked the start up sequence. A HW power cycle was required to clear this condition. This is s a unique case which we hit many years ago on another customer site.
All the best and wish you a great weekend.
Best regards Mousa
------------------------------
Mousa Hammad
Original Message:
Sent: Thu December 07, 2023 10:15 AM
From: Andrew M
Subject: DS3524 not responsive
Greate Thanks!!!
It's work! But only aster manual reboot via OFF-ON
But what is happend with him?
------------------------------
Andrew M
Original Message:
Sent: Fri December 01, 2023 12:16 PM
From: Mousa Hammad
Subject: DS3524 not responsive
Hello Andrew,
thanks for providing the output. The Autoload disabled set to OFF which is correct. You ran the command "clearHardwareLockdown". Please now reboot the CTRL by running the command "sysReboot".
If the issue persists, try to power cycle the system by switching the system OFF/ON and check again. If problem persists, we can do nothing more and CTRL needs to be replaced.
Best regards, Mousa
------------------------------
Mousa Hammad
Original Message:
Sent: Fri December 01, 2023 07:53 AM
From: Andrew M
Subject: DS3524 not responsive
here is results of command
clearHardwareLockdown
value = 0 = 0x0
ccmInvalidateCacheStoreData
C interp: unknown symbol name 'ccmInvalidateCacheStoreData'.
M option -
12
CHANGE HARDWARE CONFIGURATION MENU
-------SOFTWARE SWITCH OPTION-------- --CURRENT-- --DEFAULT--
1) Switch #1 (PCI Device Config Disable) Default Off
2) Switch #2 (Manufacturing Diagnostics) Default Off
3) Switch #3 (Invoke Boot Menu) Default Off
4) Switch #4 (Continuous Diagnostics) Default Off
----------SOFTWARE OPTION------------ --CURRENT--
5) Option #1 (Extensive Diagnostics) Off
6) Option #2 (Diagnostics Disable) Off
7) Option #3 (Autoload Disable) Off
8) Option #4 (Network Enable) Off (NVSRAM Enabled)
7
Disk Array Controller - Model 2660
Board Name: LSI Logic RAID Controller
OEM Designation: LSI
Board Serial Number: SV22128343
Board Part Number: 45233-06
Schematic Number: 41211-02
Manufacture Source: V037846 3LCN01
Manufacture Date: 05/27/2012
Board Identifier: 2660
Vendor Id: IBM
Product Id: 1746 FAStT
Product Revision: 1070
Ethernet Node Address: 0080E52F1B42
Battery0 Installation: 04/23/2013
Battery1 Installation: 12/19/2054
Subsystem Name:
Board date and time: 12/01/2023 02:49:04 Fri
System date and time: 12/01/2023 11:17:22 Fri
------------------------------
Andrew M
Original Message:
Sent: Fri December 01, 2023 03:16 AM
From: Mousa Hammad
Subject: DS3524 not responsive
Hello Andrew,
the Command could not be run because the system did not finish teh startup sequence and stopped with 0F on the LED Display.
LED status 0F means "Application Start".. this is part of the System Startup Checkpoints
In the excLogShow i can see these messages "ECC correctable error threshold exceeded " reported on 21st and 22nd November
Please try to run the following command to resolve the issue:
clearHardwareLockdown
Try to run this command in case accepted by system:
ccmInvalidateCacheStoreData
If the LED status still showing "0F", please check the the 'Autoload Disable' optin in the boot operation menu if it is set to Enable. This should be OFF.
You can check/change that by accessing the Boot Opetaion menau by runing the comamnd "M". Then select these options 12, 7, 0 sequence in boot-menu to reach this option.
We had a case long time ago for some unknown reason the 'Autoload Disable' was changed.
Best regards, Mousa
------------------------------
Mousa Hammad
Original Message:
Sent: Fri December 01, 2023 01:43 AM
From: Andrew M
Subject: DS3524 not responsive
Thanks for answer!
all listed commands are unknown on controller.
only excLogShow works
log is
---- Log Entry #11 APR-27-2018 12:31:49 PM ----
04/27/18-17:57:49 (IOSymbol2): PANIC: Invalid response sense data:0x110e4010 or
replyMessage:0x0
Stack Trace for
Executing moduleShow(0,0,0,0,0,0,0,0,0,0) on controller A:
MODULE NAME MODULE ID GROUP # TEXT START DATA START BSS START
--------------- ---------- ---------- ---------- ---------- ----------
RAID 0xebf788 3 0x5f26a60 0x80f4b08 0x81652d0
RAID1 0x1477658 4 0x1477f20 0x1bc4408 0x1bdef78
Debug 0x1ea44e0 5 0x2306620 0x24b24a0 0x24b5c38
IOSymbol2:
0x0026092c vxTaskEntry +0x5c : vkiTask (0x11000468)
0x0017152c vkiTask +0xec : 0x05f7d6e4 ()
0x05f7d880 iop::IoScheduleManager::srcOpTask(iop::IoScheduleManager::TaskControl
*, scsi::Op *+0x1a0: cmd::CmdManager::process(scsi::Op *) ()
0x01702c54 cmd::CmdManager::process(scsi::Op *)+0xf4 : 0x01a70c20 ()
0x01a70c94 Thunk for (offset -4) ql::QlManager::~QlManager()+0x9634: 0x06994904
()
0x06994948 symrpc::SymbolManager::utmCmdHandler(scsi::Op *)+0x48 : symrpc::UtmSe
rvice::handleCommand(scsi::Op *) ()
0x069af038 symrpc::UtmService::handleCommand(scsi::Op *)+0x3f8: slbSendStatus ()
0x05fd2a80 slbSendStatus+0x140: 0x05fda7e4 ()
0x05fda980 normalIoStart+0x1a0: setChkCondOrResConflict(scsi::Op *) ()
0x05fdcea4 setChkCondOrResConflict(scsi::Op *)+0x44 : htd::HtdItnCmdIoStart(scsi
::Op *) ()
0x05fc630c htd::HtdItnCmdIoStart(scsi::Op *)+0x4cc: 0x06051dc4 ()
0x06051df0 sas::LtdItn::sendCmdComplete(scsi::Op *)+0x30 : sas::sasIoInSendStatu
s(sas::_CMD *, unsigned char *, int, unsigned char) ()
0x06061530 sas::sasIoInSendStatus(sas::_CMD *, unsigned char *, int, unsigned ch
ar)+0x730: _vkiCmnErr__link ()
0x0016c5e4 _vkiCmnErr +0x104: 0x0016c820 (0x56a038, 0x7e00dc0, 0x21e07f0)
0x0016cbd0 vkiLogShow +0x570: sxCallback (0x28, 0x5cd33c)
0x0015c790 sxCallback +0x90 : 0x01488b44 ()
0x01488be8 ddcAssertPanicCallback+0xa8 : ddc::DdcManager::ddcInterruptTriggerHan
dler() ()
0x01488f9c ddc::DdcManager::ddcInterruptTriggerHandler()+0x23c: ddc::DdcLogMisc:
:logMisc(REBOOT_REASON) ()
0x0148784c ddc::DdcLogMisc::logTaskSynopsisInfo(int)+0x12c: 0x014b5974 ()
0x014b5974 scap::CaptureManager::captureData(const char *, int, bool)+0x6f4: _vk
iPrintf__link ()
0x0016abc4 _vkiPrintf +0x64 : _vkiVPrintf (0x1af8aa4, 0x21e03d0)
---- Log Entry #12 NOV-21-2023 05:02:28 AM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #13 NOV-21-2023 05:02:28 AM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #14 NOV-22-2023 06:46:07 AM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #15 NOV-22-2023 06:46:07 AM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #16 NOV-22-2023 07:34:28 AM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #17 NOV-22-2023 07:34:29 AM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #18 NOV-22-2023 11:35:59 AM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #19 NOV-22-2023 11:35:59 AM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
value = 1 = 0x1
on disks all LED is off. Blink only when I release disk
rear side look like on picture. Power supply on picture not connected to line. main power sypply is second PS!
------------------------------
Andrew M
Original Message:
Sent: Tue November 28, 2023 06:41 AM
From: Mousa Hammad
Subject: DS3524 not responsive
Hello Andrew,
given the fact that CTL is currently unresponsive, we need to know first the LED status on this CTRL which can be seen on rear side. Besides to that please try to connect via telnet to the CTRL and run the following commands:
vdmShowDriveList
evfShowOwnership
rdacMgrShow
cmgrShow
evfShowAllVols
excLogShow
As soon as i get the results, will check and try to assist you .
Best regards; Mousa
------------------------------
Mousa Hammad
Original Message:
Sent: Sun November 26, 2023 07:50 AM
From: Andrew M
Subject: DS3524 not responsive
Hello. I have one DS3524 with one controller connected to server via LSI SAS2 adapter.
Some weeks ago the link between ds3524 and server is blinked. After some time link is restored.
One day ago link gone ) I do not see disk in windows, in ds storage manager i see ds3524 in status of out-of-band and after some minutes in unresponsive state. ping to controller is ok. I can connect to controller via telnet. smcli -d -v command show me ip addresses of controller and state Unresponsive.
I tried to switch of-on ds3524 - no link
Is it possible to reanimate ds3524?
Greate Thanks!
------------------------------
Andrew M
------------------------------