AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

View Only

Back to discussions

Expand all | Collapse all

Unhealthy filesystems

1. Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 06:15 AM

Reply
Originally posted by: SystemAdmin

I am having some weird problems with our oracle jfs2 file systems. Whenever we have to power down the lpars a couple of hours after they are powered back on errpt flags problems with the /u01/*/oracle file systems (see below). Umounting the fs and fsck'ing it fixes the problems (quite often with corrupt sibling chain and inode problems) but only with these 2 file systems. Any ideas...Oracle install, processes not closing cleanly??

oslevel 5.3.0.0
LABEL: J2_FSCK_INFO
IDENTIFIER: AE3E3FAD

Date/Time: Wed 15 Oct 09:17:46 2008
Sequence Number: 3904
Machine Id: 00CDCDEA4C00
Node Id:
Class: O
Type: INFO
Resource Name: SYSJ2

Description
FSCK FOUND ERRORS

Probable Causes
INVALID FILE SYSTEM CONTROL DATA

Detail Data
ERROR CODE
0000 0000
RESOLUTION STATE
0000 0000
FILE SYSTEM DEVICE
/dev/fslv01

LABEL: J2_FSCK_INFO
IDENTIFIER: AE3E3FAD

Date/Time: Wed 15 Oct 09:16:45 2008
Sequence Number: 3903
Machine Id: 00CDCDEA4C00
Node Id:
Class: O
Type: INFO
Resource Name: SYSJ2

Description
FSCK FOUND ERRORS

Probable Causes
INVALID FILE SYSTEM CONTROL DATA

Detail Data
ERROR CODE
0000 0000
RESOLUTION STATE
0000 0000
FILE SYSTEM DEVICE
/dev/fslv01

LABEL: J2_IMAP_CORRUPT
IDENTIFIER: 61277850

Date/Time: Mon 13 Oct 11:06:41 2008
Sequence Number: 3902
Machine Id: 00CDCDEA4C00
Node Id:
Class: U
Type: UNKN
Resource Name: SYSJ2
Resource Class: NONE
Resource Type: NONE
Location:
VPD:

Description
FILE SYSTEM CORRUPTION

Probable Causes
INVALID FILE SYSTEM CONTROL DATA

Recommended Actions
PERFORM FULL FILE SYSTEM RECOVERY USING FSCK UTILITY
OBTAIN DUMP
CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
FILE NAME
j2_imap.c
LINE NUMBER
2053
JFS2 MAJOR/MINOR DEVICE NUMBER
0021 0003
JFS2 ERROR LOG FLAG
0008 0010
FILE SYSTEM DEVICE AND MOUNT POINT
/dev/fslv01, /u01/app/oracle

LABEL: J2_FSCK_REQUIRED
IDENTIFIER: B6DB68E0

Date/Time: Mon 13 Oct 11:03:38 2008
Sequence Number: 3901
Machine Id: 00CDCDEA4C00
Node Id:
Class: O
Type: INFO
Resource Name: SYSJ2

Description
FILE SYSTEM RECOVERY REQUIRED

Probable Causes
INVALID FILE SYSTEM CONTROL DATA DETECTED

Recommended Actions
PERFORM FULL FILE SYSTEM RECOVERY USING FSCK UTILITY
OBTAIN DUMP
CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES

Detail Data
ERROR CODE
0000 0005
JFS2 MAJOR/MINOR DEVICE NUMBER
0021 0003
CALLER
0023 E3B4
CALLER
0022 734C
CALLER
0026 FD6C
2. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 06:28 AM

Reply
Originally posted by: tony.evans

How do you power down the LPARs?
What disk subsystem do you run?
Are the filesystems set to check on start?
3. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 06:48 AM

Reply
Originally posted by: SystemAdmin

How do you power down the LPARs? We wait until the DBA has closed the DBs then umount the non root file systems; then run shutdown (no grace) on the lpars. There are multiple lpars (80+) across a p590.
What disk subsystem do you run? All the /u01 fs's are a volume group on vio.
Are the filesystems set to check on start? Normal unix boot
4. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 07:34 AM

Reply
Originally posted by: tony.evans

You get this on any of the 80 LPARs?

How many VIO servers? Are the disks local to the VIO servers or are they SAN / NAS or something?

By checked on boot, I mean do the filesystems have check set to true or false (or nothing) in /etc/filesystems?

Are the VIO servers also being rebooted?
5. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 07:43 AM

Reply
Originally posted by: tony.evans

And what does oslevel -s return?

And what level of software on the VIO servers?
6. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 07:55 AM

Reply
Originally posted by: SystemAdmin

You get this on any of the 80 LPARs? Randomly, but only on the /u01 file systems

How many VIO servers? Are the disks local to the VIO servers or are they SAN / NAS or something? 8 vio servers. The oracle application servers use local vio disks, the oracle RAC servers use fibre attached SAN disks.

By checked on boot, I mean do the filesystems have check set to true or false (or nothing) in /etc/filesystems? false

Are the VIO servers also being rebooted? Yes

And what does oslevel -s return? 5300-03-00

And what level of software on the VIO servers? 5300-03-00
7. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 08:05 AM

Reply
Originally posted by: SystemAdmin

Sorry, this "By checked on boot, I mean do the filesystems have check set to true or false (or nothing) in /etc/filesystems?" should have been nothing
8. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 08:24 AM

Reply
Originally posted by: tony.evans

Ok.

AIX 5.3 ML3 is pretty old (pre 2005). It may be unrelated, and I fully appreciate the difficulty of upgrading production servers, but I strongly recommend you move to a supported TL. It's entirely possible you're suffering an issue with early versions of AIX / VIO causing corruption on virtual scsi disks during a reboot cycle (I don't know of specific PMR's, I'm just suggesting it's a possible option).

If you were to open this as a full PMR with IBM, they'd recommend upgrading to a supported level before proceeding.

What's the order of shutdown? shutdown -h all lpars, then reboot the VIO servers, make sure the VIO servers are all up and running, and then restart the LPARs? Is there any chance that the LPARs are coming back up before the VIO servers?

If you modify the filesystems to check on boot, then it'll fix the corruptions before the applications come up - but that just works around the issue, not resolve it. It would mean you don't need to stop the applications and take manual action though, assuming the corruption is occurring during the reboot phase, rather than while the servers are in use.

Are we sure the VIO servers aren't having any disk connection issues, being rebooted, or that the routes to the disks aren't being affected in some other way?
9. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 09:12 AM

Reply
Originally posted by: SystemAdmin

Shutdown order from HMC is (wait for each group to shutdown before continuing):

Shutdown APP LPARs
Shutdown RAC LPARs
Shutdown VIOS and RMAN LPARs
Shutdown tie breaker P520 LPARs
Shutdown NIM server

obviously reverse on powerup.

There are no errors or syslog errors on the vio servers.
The corruptions seem to occur several hours after the OS has been running...which makes me think the oracle app is the likely suspect when it accesses the file system.
10. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 09:49 AM

Reply
Originally posted by: tony.evans

The corruption is detected after a couple of hours, doesn't mean it didn't happen earlier.

That's why I suggest setting the filesystems to fsck on boot, at least you'll know for certain they were clean when they came up. You could fsck them before shutdown as well.

Could be oracle, could easily be your very out-of-date version of AIX, as I say, if you raise a PMR with IBM they'll suggest you patch as a first action.
11. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 09:54 AM

Reply
Originally posted by: SystemAdmin

Every time we raise a PMR with IBM they advise patching and firmware upgrades...but like you say easier said than done on 24/7 servers!

Thanks for the advice.
12. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 10:16 AM

Reply
Originally posted by: tony.evans

Well, since the problem comes to light after you reboot, you obviously have some room for upgrades, and you can use at least two methods to do 90% of the work without any downtime.

The reason they suggest moving to a recent version is because it has a pretty high hit rate on fixing weird stuff. There are several thousand fixes between the version you're running and the latest TL.
13. Re: Unhealthy filesystems

Like
Archive User
Posted Wed October 15, 2008 10:53 AM

Reply
Originally posted by: CRM

Just a thought, what version was RAC supported on virtual disks. I seem to recall it was something like 1.3 (my metalink login is not working at the moment to confirm), this was based on something like 5.3 TL6. You look to be running 1.1.2 or some very early and unsupported version of VIO.

I would seriously recommend updating the code as per IBMs recommendations!

regards

Chris

AIX

AIX

Unhealthy filesystems

Archive UserWed October 15, 2008 06:15 AM

Archive UserWed October 15, 2008 06:28 AM

Archive UserWed October 15, 2008 06:48 AM

Archive UserWed October 15, 2008 07:34 AM

Archive UserWed October 15, 2008 07:43 AM

Archive UserWed October 15, 2008 07:55 AM

Archive UserWed October 15, 2008 08:05 AM

Archive UserWed October 15, 2008 08:24 AM

Archive UserWed October 15, 2008 09:12 AM

Archive UserWed October 15, 2008 09:49 AM

Archive UserWed October 15, 2008 09:54 AM

Archive UserWed October 15, 2008 10:16 AM

Archive UserWed October 15, 2008 10:53 AM

1. Unhealthy filesystems

2. Re: Unhealthy filesystems

3. Re: Unhealthy filesystems

4. Re: Unhealthy filesystems

5. Re: Unhealthy filesystems

6. Re: Unhealthy filesystems

7. Re: Unhealthy filesystems

8. Re: Unhealthy filesystems

9. Re: Unhealthy filesystems

10. Re: Unhealthy filesystems

11. Re: Unhealthy filesystems

12. Re: Unhealthy filesystems

13. Re: Unhealthy filesystems

Additional
Resources

Office

Quick Links

AIX

AIX

Unhealthy filesystems

Archive UserWed October 15, 2008 06:15 AM

Archive UserWed October 15, 2008 06:28 AM

Archive UserWed October 15, 2008 06:48 AM

Archive UserWed October 15, 2008 07:34 AM

Archive UserWed October 15, 2008 07:43 AM

Archive UserWed October 15, 2008 07:55 AM

Archive UserWed October 15, 2008 08:05 AM

Archive UserWed October 15, 2008 08:24 AM

Archive UserWed October 15, 2008 09:12 AM

Archive UserWed October 15, 2008 09:49 AM

Archive UserWed October 15, 2008 09:54 AM

Archive UserWed October 15, 2008 10:16 AM

Archive UserWed October 15, 2008 10:53 AM

1. Unhealthy filesystems

2. Re: Unhealthy filesystems

3. Re: Unhealthy filesystems

4. Re: Unhealthy filesystems

5. Re: Unhealthy filesystems

6. Re: Unhealthy filesystems

7. Re: Unhealthy filesystems

8. Re: Unhealthy filesystems

9. Re: Unhealthy filesystems

10. Re: Unhealthy filesystems

11. Re: Unhealthy filesystems

12. Re: Unhealthy filesystems

13. Re: Unhealthy filesystems

Related Content

fsck on JFS2 filesystem fails.

Filesystem Space Reclaim in AIX for JFS2

Install AIX 5.1 with jfs2 filesystems ?

Disks not accessible (Aix 7.1)

AIX 7.3 vPMEM device support

Additional Resources

Office

Quick Links

Additional
Resources