Global Storage Forum

 View Only

IBM CEPH FileSystem Snapshot - Key considerations for Configuration and maintenance

By Suma R posted 5 days ago

  

Introduction:

    Here is your One Stop page for all needful and latest(Ceph 7.1) information on Ceph FileSystem(CephFS) Snapshots and Snapshot schedules. This page will help you with configuration if you are a first-time user or if you have it configured already, it can help you with useful tips and troubleshooting ideas during maintenance.

CephFS Snapshots overview:

  • Ceph File System(CephFS) Snapshots are used to take a snapshot of any directory within a Ceph File System.

  • Snapshots can be taken manually with mkdir command, through Ceph CLI commands(for subvolume snapshots only) or automatically with snapshot schedule policies created for a path in File System.

  • CephFS snapshots are asynchronous and are kept in a special hidden directory named .snap existing within any CephFS directory.

  • CephFS snapshots can be used for CephFS_Snapshot_Mirroring , CephFS_Snapshot_Clone

  • Snapshot schedule can be enabled using command ‘ceph mgr module enable snap_schedule’

Four Key Snapshot Considerations:

1.Snapshot name in parent and child directory:

Snapshots created at parent directory through mkdir or schedule, are viewable from child directory but snapshot name when viewed through child directory has some prefix and suffix added to snapshot name.

For eg., Consider Filesystem Volume by name ‘cephfs’ mounted at path /mnt/cephfs.

You may create snapshot at root directory or any directory within. Say Snapshot ‘snap1’ is created at /mnt/cephfs using command ‘mkdir .snap/snap1’.And, there are sub-directories at /mnt/cephfs as volumes and dir1. Then snapshot name listed for ‘ls /mnt/cephfs/volumes/.snap’ or ‘ls /mnt/cephfs/dir1/.snap’ is “_snap1_1”.The snapshot at /mnt/cephfs/volumes/.snap/_snap1_1 is copy of /mnt/cephfs/volumes/.

In case of snapshot created through snapshot schedule, snapshot name is represented as ‘scheduled-2024-06-26-14_54_00_UTC’. If path set for schedule as '/' i.e., a root of FS Volume and say its mounted at /mnt/cephfs. Then ‘ls /mnt/cephfs/.snap’ lists scheduled snapshot as 'scheduled-2024-06-26-14_54_00_UTC' and ‘ls  /mnt/cephfs/volumes/.snap’ lists snapshot name as _scheduled-2024-06-26-14_53_00_UTC_1

2.Snapshot schedule for Volume, any FileSystem directory or a Subvolume:

Schedule for Volume :

For Snapshot to be taken of complete file system using schedule, use Schedule path as ‘/’ in Snapshot Schedule add command as below,

ceph fs snap-schedule add / 1h --fs cephfs

Schedule for any directory within FileSystem :

Given the path for directory in FileSystem is  "/main_dir/sub_dir1/subdir1_1", then schedule can be set to create snapshots automatically for directory "/main_dir/sub_dir1/subdir1_1" using below cmd,

ceph fs snap-schedule add /main_dir/sub_dir1/subdir1_1 1h --fs cephfs

Schedule for a subvolume of non-default subvolumegroup :

If subvolume is used for hosting an application and a snapshot is to be taken at subvolume-level using schedule, then try below command,

ceph fs snap-schedule add / 1h --subvol subvolume1 --group subvolumegroup1

3.Snapshot Retention :

Now that Snapshots are created either manually or through schedule, we need to control how many snapshots to be retained in a given directory. By default, CephFS retains 100 latest snapshots per directory path. We can override the default snapshot retention count in two ways,

    1. Use ceph config set command to change retention count of snapshots as below,

Ceph config set mds mds_max_snaps_per_dir 150

          This config option is applicable for a snapshot created manually or through snapshot schedule.

     2.  If snapshots created through snapshot schedule, we can apply retention through snapshot schedule policy as below,

    • For Snapshot retention on rootFS path /,

ceph fs snap-schedule retention add / h 6 –fs cephfs , here we retain 6 snapshots in given path that are hourly apart

    • For Snapshot retention on subvolume,

ceph fs snap-schedule retention add / h 6 –fs cephfs –subvol subvolume1 –group subvolumegroup1

Note: Ceph config option mds_max_snaps_per_dir precedes retention policy, i.e., if retention policy set 102 snapshots, it will not work, as ceph config option mds_max_snaps_per_dir is set to 100

4. CephFS Snapshot management in Dashboard:

We can manage FileSystem Snapshots through Ceph Dashboard for both manual and scheduled snapshots.

Dashboard_CephFS_Snapshot_Management

Dashboard_CephFS_Snapshot_schedule 

Useful Tips

Snapshot Retention : 

  • We can retain latest ‘n’ snapshots irrespective of time period of each snapshot,  by applying retention rule in snap-schedule command as, ‘ceph fs snap-schedule retention add / n 5 –fs cephfs’ or using snapshot-schedule edit for retention, in dashboard.
  • If more then 100 snapshots needs to be retained for a specific use case, first increase ceph config option mds_max_snaps_per_dir to desired value and then add retention count in policy
  • If less than 100 snapshots are to be retained, maintain the snapshot count in directory as less than the value you desire to retain and then set config option mds_max_snaps_per_dir to desired value. This is because, if you have current snapshot count as 100, you set config value to 50 and retention policy to 50, then it may not delete existing snapshots to match to new retention requirement automatically, for the first time.

Snapshot NFS mount:

We can mount snapshot through NFS, if the use case involves providing access to read-only snapshot content. Here, the snapshot can be from anywhere within filesystem. We need to give appropriate path during NFS export create.

Snapshot at root-level of FS:

If snapshots managed at root of FS, then we don't need additional management for any sub-directories, as snapshots exists at each sub-directory within rootFS, with content as that of sub-directory.

Troubleshooting Ideas

For debugging of snapshot failure, we need to have debug logs enabled for MDS and MGR components in ceph cluster. Default debugging level is 5, it can be scaled as appropriate. For more information on this refer https://www.ibm.com/docs/en/storage-ceph/7.1?topic=troubleshooting-configuring-logging

  • Snapshot delete Error "failed to remove snapshot metadata on snap=cg_snap_osd reason=-28 error in write" : This will be seen if OSD pool is full

  • Error “disk I/O error” seen during snap-schedule add : This can happen if CephFS Volume was deleted with snapshots retained at root-level and CephFS Volume was recreated.

  • Snapshot schedule is deactivated : This could be due to path being invalid/non-existent,resolve path issue either by creating one as mentioned in schedule or create new schedule with correct path.

  • For other unknown error, enable debugging with command,

ceph config set mds log_to_file true;ceph config set mgr log_to_file true;

Wait for error to reappear, when does, note the timestamp. Refer to debug logs in corresponding active MDS nodes and MGR nodes for MDS and MGR logs respectively.

MDS and MGR debug log exists in /var/log/ceph/<fsid> under the Cluster nodes that hosts corresponding daemons.

Benefits:

  • Ceph can create an immutable snapshot using its built-in snapshot functionality to create point-in-time copies of data that cannot be altered.

  • Snapshots are used for Backup and Restore at any level with FileSystem.

  • With Snapshot schedules, we can define the policy for automatic creation of snapshots at desired intervals. The scheduled snapshots created can be used for Backup to remote Cluster.

  • With Snapshot Mirroring, we can replicate snapshot content synchronously to remote site, for data to be instantly available during Disaster Recovery

  • With Snapshot Clone, copies of subvolume can be provisioned for use cases - Unit testing,Local Repository

  • Snapshot creation at any directory within FileSystem adds more flexibility on backup needs

  • The max snapshots limit can be scaled to 4096(validated), so mds_max_snaps_per_dir config option can be set upto 4096 based on user requirements.

Conclusion:

    CephFS Snapshot feature assists in protection of FileSystem and is a basic requirement for Ceph FileSystem users. Auto-management of snapshots at any path in FileSystem through CLI or Dashboard in Ceph is added advantage for user convenience. Snapshots created on CephFS can be used by ODF applications for DR and Backup use cases.

0 comments
3 views

Permalink