Original Message:
Sent: Fri April 14, 2023 04:24 AM
From: Holger Martens
Subject: ISP 8.1.18.0 : volume not mounting, process not cancelling
Hello Igor,
I'm dealing with ADSM/ TSM/ SP since V1.0 ... due to the huge amount of supported devices and the different operating systems it's never boring and I learn new things every day - that's why I love to be part of the SP community.
Addressing the issue to our support team is definitely the right thing to do; I hope they will be able to track it down and fix it soon.
Regards,
Holger
------------------------------
Holger Martens
Original Message:
Sent: Thu April 13, 2023 02:48 AM
From: Igor MERKU'
Subject: ISP 8.1.18.0 : volume not mounting, process not cancelling
Hello Holger,
thank you very much for your detailed answer, I very much appreciate your effort.
There is a lot for me to learn - you never stop learning, do you...
When we started off with ISP 8.1 (migrating to a new machine coming from TSM 7) we ran into these performance/lock/unresponsive situation from the start, has been very frustrating. For apparently no reason things settled and have been quite stable for several months up until the recent update from 8.1.17 to 8.1.18 (and kernel update, and lin_tape update) and "here we go again" ...
Yesterday afternoon, after a hard reset, I had the ISP completely for myself, no session, no process, nothing. So I started a backup db full onto file ... and the server went nuts in a blink of an eye. Very frustrating.
I have a case open with IBM, hopefully we can make a sense out of it all, eventually. Might be some stupid option (stupid in the sense of a change/reset to some default value or behaviour) to set up differently with 8.1.18/lin_tape/kernel ... I'll keep this thread posted.
Thanks again, Holger, much appreciated.
Cheers, Igor
------------------------------
Igor MERKU'
Original Message:
Sent: Wed April 12, 2023 08:04 AM
From: Holger Martens
Subject: ISP 8.1.18.0 : volume not mounting, process not cancelling
Hello Igor,
if a process cannot be cancelled it may hang in I/O. More details at very high level in my own words ...
The SP server is running in user space. If a process/ thread needs to issue an I/O, for example to a tape drive, this needs to be done through kernel space. As soon as the thread issued the I/O it has no more control about it and can only wait for the I/O to time out, fail or successfully return to the thread. The thread waiting for the I/O response usually cannot be cancelled. Even if you halt the SP server it's possible the thread is still "hanging around". To clean this up it may be required to reboot the Linux/ Operating System to free the resources. This is not a SP server unique issue ... it's the way Operating Systems work.
Nevertheless, this is a guess from what you describe. I would check for I/O issues with the tape drive/ tape library matching time of issue.
If the issue reoccurs you may check for locking conflicts in the SP server (resources will be locked when in use; SP server threads may wait for a lock to go away allowing access to the resource). The SP server has a build in deadlock detector. If the SP server detects a deadlock, it will cancel a process/ session to resolve the deadlock. In your case the SP server seems to have not detected a deadlock.
You can get some more details by issuing the following commands as SP server admin:
q mount f=d
show mp
q sess f=d
show session
show thread
show locks
q drive f=d
show library
---
The show library output has a stanza for each drive. In the 3rd line of each drive you will find "polled = 0/1". If it's 1, the SP server is polling the drive because it is not responding. This may indicate an I/O issue with the drive, resulting into a process tape mount to hang forever ... because the process already got this drive resource assigned and is not waiting for it ... waiting for the mount to be completed.
By the way ... show thread and show locks may generate a lot of output, depending of the SP server load condition.
I hope I did not confuse you.
Regards,
Holger
------------------------------
Holger Martens
Original Message:
Sent: Tue April 04, 2023 10:26 AM
From: Igor MERKU'
Subject: ISP 8.1.18.0 : volume not mounting, process not cancelling
Hi,
I recently (last thursday) updated our ISP server (SLES12SP5) to V8.1.18.0.
Today I found a process (stgp migration against local disk buffer pool) asking for a volume (LTO) to be mounted and was still waiting after 5 h with a drive reserved.
On the other hand there was a backup stgp process running against the same stgp (with other volumes mounted, so no mount request concurrency).
And the server with some cpu and sys values over 50%, which always makes me suspicious ... although seemed to work fine.
I then nevertheless submitted a cancel backup stgp and an hour later a cancel stgp migration.
After three/two hours both processes were still running. So I halted ISP, rebooted the server and it looks good now.
Any idea where to look for a cause?
Thanks, Igor
------------------------------
Igor MERKU'
------------------------------