Originally posted by: Tibor_B
Hi,
I have a problem that is not easy to describe.
The problem happens when we do something (mostly attaching and detaching) SVC disks to/from our AIX hosts. And sometimes this action somehow corrupts some other unrelated disks, they become unwritable and applications using them are crashing.
Problem is not limited to specific version, just today it happened on AIX 5.2 host.
We use multipath, so all operation are done on vpaths.
Today I was doing detaching disks
umount ... varyoffvg .... exportvg ... And short time after oracle DB using some other disks crashed and I noticed that our err log started being populated with errors like:
Description USER DATA I/O ERROR Probable Causes ADAPTER HARDWARE OR MICROCODE DISK DRIVE HARDWARE OR MICROCODE SOFTWARE DEVICE DRIVER STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED Recommended Actions CHECK CABLES AND THEIR CONNECTIONS INSTALL LATEST ADAPTER AND DRIVE MICROCODE INSTALL LATEST STORAGE DEVICE DRIVERS IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE Detail Data JFS2 MAJOR/MINOR DEVICE NUMBER 002A 0001 FILE SYSTEM DEVICE AND MOUNT POINT /dev/s......lv, /..... The filesystems is still mounted as rw (according to mount command)
Usual recovery is detach and reatach the filesystem (umount, varyoffvg, exportvg and backward). Sometimes it goes without problem, and sometimes it goes but with error when doing varyoffg:
0516-062 lqueryvg: Unable to read or write logical volume manager record. PV may be permanently corrupted. Run diagnostics But it goes on and looks like it is all right and can be attached back to host.
But sometimes when doing importvg it returns:
Method error (/usr/lib/methods/chgvpath): 0514-047 Cannot access a device. Or even:
0516-062 lqueryvg: Unable to read or write logical volume manager record. PV may be permanently corrupted. Run diagnostics 0516-062 lqueryvg: Unable to read or write logical volume manager record. PV may be permanently corrupted. Run diagnostics 0516-1140 importvg: Unable to read the volume group descriptor area on specified physical volume. the way to fix it is:
chdev -l $vpath -a pv=clear
chdev -l $vpath -a pv=yes
rmdev all hdisks and vpath
recreatevg .....
So we usually can recover, but we want to find core problem to avoid crashing
Any idea?
Tibor
#AIX-Forum