Introduction
Ceph is a highly scalable distributed storage system designed to efficiently manage vast amounts of data. At the heart of Ceph's storage infrastructure is the Ceph Filesystem (CephFS), which provides a robust file system layer on top of the Ceph object store (RADOS). When an application interacts with CephFS—such as creating a new file—a series of metadata operations are triggered within the system. These operations are handled by the Metadata Server (MDS), which plays a critical role in managing the filesystem’s structure and ensuring data consistency.
In this blog, we will take a deep dive into the sequence of events that occur on the MDS when a file is created in CephFS. This exploration will highlight the coordination between the client, MDS, and RADOS, providing valuable insights into CephFS’s operation and its efficient handling of metadata in a distributed storage environment.
Creating a File in CephFS SubVolume
Mount the CephFS Volume
- First, mount the CephFS volume to the client system where you want to interact with it:
mount -t ceph <mon-ip>:6789:/ /mnt/cephfs/ -o name=<username>,secret=<key>
-
- Replace <mon-ip> with the IP address of a Ceph monitor node.
- Use the <username> and <key> corresponding to the Ceph user with proper access permissions.
- Navigate to the Mounted Volume and create a empty file “test1”
cd /mnt/cephfs/volumes/subvolgroup_1/subvol_1/ ; touch test1
High-Level Series of events are as follows :
Client Request Initialization
The file creation process begins with a request from a client to create a new file in the CephFS directory structure. This section delves into how the MDS receives, interprets, and prepares to handle the client’s request to establish the new file’s metadata.
Dentry and Inode Preparation
Next, the MDS prepares the dentry (directory entry) and inode, two essential components in managing files in CephFS. We’ll look at how these are structured, linked, and prepared for the new file, laying the groundwork for its metadata.
Locking Mechanism
To ensure data consistency and avoid conflicts, the MDS uses a locking mechanism. This section explains the MDS’s exclusive lock (xlock) operation, evaluating the dentry state, and the purpose of these locks in coordinating concurrent client requests.
Synchronization and Evaluation
Before proceeding, the MDS evaluates and synchronizes the metadata to ensure the dentry and inode are accurate and consistent. This process involves several stages, including the gather and eval_gather functions, which check the state of the dentry and associated metadata.
Replying to the Client
Once the metadata is prepared, locked, and synchronized, the MDS replies to the client, confirming the success of the file creation operation. Here, we’ll discuss the importance of this confirmation and how it signals the successful integration of the new file into the filesystem.
Finalizing Metadata and Caching
In the last stages, the MDS completes the metadata updates and caching necessary to make the file available in the filesystem. This includes unpinning the dentry and completing any outstanding lock and synchronization operations, ensuring a stable and consistent filesystem state.
Let's break down each event and explain the process involved in creating a file within a Ceph Filesystem.
Client Request Initiation
[v2:10.0.64.74:6832/2938102015,v1:10.0.64.74:6833/2938102015] <== client.15714 v1:10.0.65.106:0/1517956680 45581 ==== client_request(client.15714:55 create owner_uid=0, owner_gid=0 #0x10000000006/test1
The client (client.15714) sends a request to create a file named test1 in a directory identified by inode #0x10000000006. The file is being created by the root user (UID and GID 0). The request originates from the client at 10.0.65.106:0 and is sent to the Ceph MDS servers at 10.0.64.74, with Request ID 45581 used to track it.
Request Details and Additional Metadata
caller_uid=0, caller_gid=0{0,}) ==== 193+0+65 (unknown 269641910 0 2155696498) 0x555568860300 con 0x5555685e3800
The caller_uid=0 and caller_gid=0 indicate the user creating the file is root (UID 0, GID 0).
Handling the Client Request
mds.0.server handle_client_request client_request(client.15714:55 create owner_uid=0, owner_gid=0 #0x10000000006/test1
The request is being processed by mds.0 (Metadata Server instance 0), which handles creating the file test1 in the specified directory (#0x10000000006). It checks permissions, resolves the path, and prepares for file creation.
Dispatching the Client Request
mds.0.server dispatch_client_request client_request(client.15714:55 create owner_uid=0, owner_gid=0 #0x10000000006/test1
The MDS has completed initial processing and is now dispatching the client request for actual file creation.
Opening the File with O_CREAT Flag
mds.0.server open w/ O_CREAT on #0x10000000006/test1
The MDS (mds.0) creates the file test1 in the directory (#0x10000000006) using the O_CREAT flag, preparing it for write operations.
Requesting a Lock on the File Path
mds.0.server rdlock_path_xlock_dentry request(client.15714:55 nref=2 cr=0x555568860300) #0x10000000006/test1
The MDS requests a read lock on the path and an exclusive lock on the directory entry (dentry) for test1, ensuring metadata consistency and preventing other processes from modifying the directory entry during file creation.
Traversing the Directory Path
mds.0.cache traverse: path seg depth 0 'test1' snapid head
The MDS resolves the directory structure to locate the path for test1, ensuring the metadata is ready for file creation.
Directory Lookup for test1
mds.0.cache.dir(0x10000000006) lookup (test1, 'head')
The MDS checks the directory with inode #0x10000000006 to see if test1 already exists. Since the file is new, the lookup confirms it doesn't exist, allowing creation to proceed.
Dentry Cache Miss for test1
mds.0.cache traverse: miss on dentry test1 in [dir 0x10000000006 /volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/ [2,head] auth v=71 cv=0/ 0 state=1610612737|complete f(v0 m2024-07-17T16:17:52.260812+0000)
The MDS checks the directory cache for test1 but doesn't find it, confirming the file doesn't exist in the cache. This triggers a deeper lookup in the directory (/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/) with inode #0x10000000006 and the latest snapshot (head).
Exact Snapshot Lookup for test1
mds.0.cache.dir(0x10000000006) lookup_exact_snap (head, 'test1')
The MDS performs a precise lookup for test1 in the latest snapshot (head) to ensure it checks the most recent state of the directory.
Adding a Null Dentry for test1
mds.0.cache.dir(0x10000000006) add_null_dentry [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ino=(nil) state=1073741824 0x5555683d6780]
The MDS adds a null dentry for test1 as a placeholder, indicating the file doesn't exist yet.
Confirming Addition of Null Dentry
mds.0.cache added null [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ino=(nil) state=1073741824 0x5555683d6780]
The null dentry for test1 is added to the cache as a placeholder until the file is created and assigned a valid inode.
Locking the Dentry for Synchronization
mds.0.locker must xlock (dn sync) [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ino=(nil) state=1073741824 0x5555683d6780]
The MDS locks the dentry for test1 with an exclusive lock to ensure synchronization and prevent other processes from modifying it during file creation.
Request for Write Lock (dversion lock)
mds.0.locker must wrlock (dversion lock) [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ino=(nil) state=1073741824 0x5555683d6780]
The MDS requests a write lock for the dentry of test1 to prevent conflicts and ensure consistency in the file’s metadata.
Request for Authorization Pinning
mds.0.locker must authpin [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ino=(nil) state=1073741824 0x5555683d6780]
The MDS requests authorization pinning (authpin) to keep the file’s authorization data intact while operations are ongoing. At this point, test1 has no authentication or authorization applied yet.
Authorization Pinning Applied
mds.0.locker auth_pinning [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ino=(nil) state=1073741824 0x5555683d6780]
Authorization pinning (auth_pinning) is applied to test1, ensuring the dentry’s authorization remains intact and cannot be changed during the file creation process.
Authorization Pin Confirmed
mds.0.cache.den(0x10000000006 test1) auth_pin by 0x555568392880 on [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ap=1 ino=(nil) state=1073741824 | authpin=1 0x5555683d6780] now 1
The authorization pin for test1 is successfully applied, as confirmed by the log entry.
Exclusive Lock Start (dn sync)
mds.0.locker xlock_start on (dn sync) on [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ap=1 ino=(nil) state=1073741824 | authpin=1 0x5555683d6780]
The MDS initiates an exclusive lock (xlock) on the dentry to prevent any modifications by other clients during the file creation process.
Simple Lock Request (dn sync)
mds.0.locker simple_lock on (dn sync) on [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dversion lock) pv=0 v=71 ap=1 ino=(nil) state=1073741824 | authpin=1 0x5555683d6780]
The MDS requests a simple lock (simple_lock) on the dentry for test1 to synchronize metadata operations across all MDS instances.
Request for Exclusive Lock (dn lock)
mds.0.locker simple_xlock on (dn lock) on [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dn lock) (dversion lock) pv=0 v=71 ap=1 ino=(nil) state=1073741824 | request=1 authpin=1 0x5555683d6780]
The MDS requests an exclusive lock (simple_xlock) on the dentry for test1 to ensure no other clients or operations can modify the file's metadata while the lock is held.
Authorization Pin Confirmation
mds.0.cache.den(0x10000000006 test1) auth_pin by 0x5555683d6860 on [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dn lock) (dversion lock) pv=0 v=71 ap=2 ino=(nil) state=1073741824 | request=1 authpin=1 0x5555683d6780] now 2
An additional authorization pin is applied, increasing the authpin to 2, indicating that the pinning process continues.
Exclusive Lock Granted (by request)
mds.0.locker got xlock on (dn xlock x=1 by request(client.15714:55 nref=6 cr=0x555568860300)) [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dn xlock x=1 by request(client.15714:55 nref=6 cr=0x555568860300)) (dversion lock) pv=0 v=71 ap=2 ino=(nil) state=1073741824 | request=1 lock=1 authpin=1 0x5555683d6780]
The MDS grants the exclusive lock (xlock) for test1 to client ID 15714 (with reference count nref=6). This lock allows the client to modify the file's metadata, ensuring that no other client can make changes while the lock is held.
Write Lock Start (dversion lock)
mds.0.locker local_wrlock_start on (dversion lock) on [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [2,head] auth NULL (dn xlock x=1 by request(client.15714:55 nref=6 cr=0x555568860300)) (dversion lock) pv=0 v=71 ap=2 ino=(nil) state=1073741824 | request=1 lock=1 authpin=1 0x5555683d6780]
The MDS starts a write lock (local_wrlock_start) on the dentry for test1 to begin modifying its metadata. The dversion lock ensures that version changes are tracked properly.
Lock Acquisition (wrlock on dversion lock):
mds.0.locker got wrlock on (dversion lock w=1 last_client=15714) [dentry #0x1/...]
The MDS acquires a write lock (wrlock) on the file test1, indicating it is being modified. The last_client=15714 refers to the client that requested the lock.
Cache Denial (pre_dirty):
mds.0.cache.den(0x10000000006 test1) pre_dirty
The MDS denies access to the cache for test1 because it's in a "pre-dirty" state, meaning changes are being prepared but not yet written. This prevents reading stale data while the file is being modified.
Issue New Capabilities
mds.0.locker issue_new_caps for mode 2 on [inode 0x100000007de ...]
The MDS is issuing new capabilities (caps) for the inode of test1, defining the allowed operations (e.g., read, write). Mode 2 likely refers to granting write permissions.
Add Inode to Open Files:
mds.0.openfiles add_inode [inode 0x100000007de ...]
The MDS adds the inode for test1 to its list of open files, tracking the file's state, capabilities, and ongoing operations.
File Capabilities Evaluation:
mds.0.locker eval 7296 [inode 0x100000007de ...]
The MDS checks the capabilities for test1’s inode to verify the client's permissions for the requested operations.
File Lock Evaluation (Sync Lock):
mds.0.locker file_eval wanted=xwb loner_wanted=xwb other_wanted= filelock=(ifile sync) on [inode 0x100000007de ...]
The MDS is checking the file lock for test1, ensuring no conflicts between clients requesting write and execute permissions. The system evaluates the lock request to synchronize the file and prevent simultaneous modifications.
File Evaluation:
mds.0.locker file_eval stable, bump to loner (ifile sync) on [inode 0x100000007de ...]
The MDS checks that the file lock is stable, ensuring the client has exclusive write access. The (ifile sync) indicates that this check is synchronized to prevent conflicts with other clients.
Simple Evaluation (Sync with Auth):
mds.0.locker simple_eval (iauth sync) on [inode 0x100000007de ...]
The MDS checks the file lock and verifies the client's permissions and authentication. The (iauth sync) ensures the client's credentials are synchronized with the system before proceeding.
File Evaluation - Stable, Transitioning to Exclusive:
mds.0.locker simple_eval stable, going to excl (iauth sync) on [inode 0x100000007de ...]
The MDS is updating the file lock to "exclusive" mode, giving the client full access to modify the file. The (iauth sync) ensures the lock and authentication are synchronized before this change.
Simple Exclusive Lock:
mds.0.locker simple_excl on (iauth sync) on [inode 0x100000007de ...]
The file is now exclusively locked for the client, meaning no other client can access it until the lock is released. The (iauth sync) ensures the client is properly authenticated before granting the lock.
Simple Evaluation - Syncing Link:
mds.0.locker simple_eval (ilink sync) on [inode 0x100000007de ...]
The MDS is evaluating the file's link state to ensure consistency across the system. The file is exclusively locked, and the client is authorized for this access.
Simple Evaluation - Syncing Extended Attributes:
mds.0.locker simple_eval (ixattr sync) on [inode 0x100000007de ...]
The MDS is synchronizing the file's extended attributes, which are extra metadata like security labels. The exclusive lock remains in place, ensuring only one client can modify the file at a time.
File Evaluation - Stable, Transitioning to Exclusive for Extended Attributes:
mds.0.locker simple_eval stable, going to excl (ixattr sync) on [inode 0x100000007de ...]
The MDS is transitioning the file's extended attributes to exclusive lock mode, allowing the client to modify them without interference. Synchronization ensures no conflicts during this process.
Simple Exclusive Lock for Extended Attributes:
mds.0.locker simple_excl on (ixattr sync) on [inode 0x100000007de ...]
The file's extended attributes are now exclusively locked, allowing only the client holding the lock to modify them. Synchronization ensures data integrity and consistency.
Scatter Evaluation - Syncing Nesting:
mds.0.locker scatter_eval (inest sync) on [inode 0x100000007de ...]
This log shows a "scatter evaluation" of the file's nested structure, ensuring changes are synchronized across the system. The exclusive locks on authentication, file, and extended attributes remain to maintain file integrity.
Simple Lock on Nested Structure Sync:
mds.0.locker simple_lock on (inest sync) on [inode 0x100000007de ...]
The MDS is locking the file's nested structure to prevent other clients from modifying it concurrently. This lock ensures synchronization with the nested structure, and the exclusive locks on authentication, file, and extended attributes maintain data integrity.
Simple Evaluation - Syncing File Lock:
mds.0.locker simple_eval (iflock sync) on [inode 0x100000007de ...]
The MDS is synchronizing the file's lock to ensure proper access control and prevent conflicts. Exclusive locks are in place to prevent interference from other clients during this process.
Simple Evaluation - Syncing File Policy:
mds.0.locker simple_eval (ipolicy sync) on [inode 0x100000007de ...]
The MDS is synchronizing the file's policy (e.g., ACLs or security settings) to ensure proper access control. Exclusive locks are in place to prevent interference and maintain data integrity during this process.
Issue Capabilities:
mds.0.locker issue_caps: [inode 0x100000007de ...]
The MDS is assigning capabilities to the file, which define the actions a client can perform, such as read or write. These capabilities enable the client to interact with the file as allowed.
Allowed Capabilities for the Client:
mds.0.locker get_allowed_caps loner client.15714 allowed=pAsxLsXsxFsxcrwb, xlocker allowed=pAsxLsXsxFsxcrwb, others allowed=pLs on [inode 0x100000007de ...]
The MDS is reporting the capabilities granted to client 15714, allowing actions like read, write, append, execute, sync write, delete, and others. The exclusive locker has the same full set of capabilities. Other clients have limited permissions, likely only for reading and locking.
Get Allowed Capabilities for Client
mds.0.locker get_allowed_caps loner client.15714 allowed=pAsxLsXsxFsxcrwb, xlocker allowed=pAsxLsXsxFsxcrwb, others allowed=pLs on [inode 0x100000007de ...]
The MDS is reporting the capabilities granted to client 15714, allowing actions like read, write, append, execute, sync write, delete, and others. The exclusive locker has the same full set of capabilities. Other clients have limited permissions, likely only for reading and locking.
-
p: read
-
A: append
-
s: sync write
-
L: lock
-
x: execute
-
r: remove (delete)
-
w: write
-
b: set buffer
Cache Operation - Predirty Journal Parents
mds.0.cache predirty_journal_parents do_parent_mtime linkunlink=1 primary_dn follows head [inode 0x100000007de ...]
The MDS is marking the parent directories as "predirty," indicating that they are about to be modified. The parent directory's metadata is being updated, likely due to a time-related change, such as modifying the directory or file's timestamp.
Cache Operation - Projected Rstat Inode to Frag
mds.0.cache projected_rstat_inode_to_frag first 70 linkunlink 1 [inode 0x100000007de ...]
The MDS is processing the file's metadata by breaking down the inode's status (rstat) to a finer level, likely at the fragment (frag) level, to get a more detailed view of the inode's state.
Flooring of Directory Number (dn) from Parent
mds.0.cache floor of 70 from parent dn [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 ...]
The MDS is locking the parent directory to ensure proper updates when changes are made. The directory entry for the file at the path /volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1
is locked to prevent concurrent modifications during processing.
Set Exclusive Locks Done
mds.0.locker set_xlocks_done on (dn xlock x=1 by request(client.15714:55 nref=5 cr=0x555568860300)) [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 ...]
The MDS has successfully set an exclusive lock on the directory entry for client 15714, completing the operation.
Early Reply - Client Request (File Creation)
mds.0.server early_reply 0 ((0) Success) client_request(client.15714:55 create owner_uid=0, owner_gid=0 #0x10000000006/test1
Client 15714 requested to create a file (test1), and the MDS has confirmed the request was successfully processed.
Adding Client Lease to the Dentry
mds.0.cache.den(0x10000000006 test1) add_client_lease client.15714 on (dn xlockdone x=1)
The MDS is assigning an exclusive lease for the file test1 to client 15714. This lock ensures that only client 15714 can modify the file, preventing other clients from making changes.
Issuing Client Lease
mds.0.locker issue_client_lease seq 20 dur 30000ms on [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [70,head] auth NULL (dn xlockdone l x=1) (dversion lock w=1 last_client=15714) pv=72 v=71 ap=2 ino=(nil) state=1073741824 | request=1 lock=2 authpin=1 clientlease=1 0x5555683d6780]
The MDS is granting client 15714 an exclusive lease for test1, lasting 30 seconds. This ensures the client has exclusive rights to the file during this period.
Setting Trace Distance for Snapshot Head
mds.0.server set_trace_dist added snap head in [inode 0x100000007de [70,head] /volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 auth v72 s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={15714=0-4194304@6f} caps={15714=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@1},l=15714 | request=1 caps=1 0x5555686ae100]
The MDS is updating the snapshot for test1. Client 15714 has exclusive access to the file and its extended attributes. The client has full capabilities, including reading, writing, appending, locking, and modifying attributes. The caps indicate that client 15714 holds an exclusive lock with these rights.
Linking the Primary Inode
mds.0.cache.dir(0x10000000006) link_primary_inode [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [70,head] auth NULL (dn xlockdone l x=1) (dversion lock w=1 last_client=15714) pv=72 v=71 ap=2 ino=(nil) state=1073741824 | request=1 lock=2 authpin=1 clientlease=1 0x5555683d6780] [inode 0x100000007de [70,head] /volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 auth v72 s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={15714=0-4194304@6f} caps={15714=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@1},l=15714 | request=1 caps=1 0x5555686ae100]
The MDS links the inode for test1 to its directory entry, allowing the file to be accessed by its path. The inode holds the file's metadata, while the dentry enables path-based referencing. Client 15714 has exclusive access to the file, as indicated by the auth and caps fields.
Notifying the Link
mds.0.openfiles notify_link [inode 0x100000007de [70,head] /volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 auth v72 s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={15714=0-4194304@6f} caps={15714=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@1},l=15714 | request=1 caps=1 0x5555686ae100]
The MDS notifies the open files system about the new link to the inode, ensuring the system is updated. The inode has been assigned capabilities (e.g., pAsxLsXsxFsxcrwb) granting the client permissions to read, write, and set extended attributes.
Marking the Inode as Dirty
mds.0.cache.ino(0x100000007de) mark_dirty [inode 0x100000007de [70,head] /volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 auth v72 s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={15714=0-4194304@6f} caps={15714=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@1},l=15714 | request=1 caps=1 0x5555686ae100]
The MDS marks the inode for test1 as dirty, indicating it has been modified and needs to be saved later. This ensures the inode reflects any changes, like file size or permissions, and helps keep the filesystem consistent.
Marking the Dentry as Dirty
mds.0.cache.den(0x10000000006 test1) mark_dirty [dentry #0x1/volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 [70,head] auth (dn xlockdone l x=1) (dversion lock w=1 last_client=15714) pv=72 v=71 ap=2 ino=0x100000007de state=1073741824 | request=1 lock=2 inodepin=1 authpin=1 clientlease=1 0x5555683d6780]
The MDS marks the dentry for test1 as dirty, indicating it has been modified. This ensures that changes to the dentry, like updates to metadata or file paths, are properly recorded. The inode is linked to the dentry, and the dentry lock is held exclusively by client.15714.
Handling Share Inode Max Size
mds.0.locker share_inode_max_size on [inode 0x100000007de [70,head] /volumes/subvolgroup_1/subvol_1/91b94f95-769b-4b27-94a4-b2bfc63a2aee/test1 auth v72 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={15714=0-4194304@6f} caps={15714=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@1},l=15714 | request=1 caps=1 dirtyparent=1 dirty=1 0x5555686ae100]
The MDS is updating and sharing the inode's maximum size with other system parts. The dirtyparent=1 and dirty=1 flags indicate that both the parent directory and the inode are marked as modified and need further processing, such as being written to disk.
Dentry Link Transmission
mds.0.cache send_dentry_link
The MDS is sending the directory entry (dentry) for the file test1, linking it to the inode 0x100000007de. This updates the filesystem's namespace and ensures the file is properly cached in the directory structure.
Client Request Acknowledgment
mds.0.server reply_client_request
The MDS confirms the successful creation of the file test1 by client 15714 in the specified directory. The file was created with root permissions (owner_uid=0, owner_gid=0).
Completion of Exclusive Lock
mds.0.locker xlock_finish
The MDS completes the exclusive lock on the directory entry (dentry) for test1, finishing the update and adding the file to the directory structure.
Metadata Gather Evaluation
mds.0.locker eval_gather
The MDS completes the exclusive lock and evaluates the gather operation to update the dentry and inode. This ensures that the directory structure and file metadata are synchronized and consistent across the system.
Metadata Gathering Completion
mds.0.locker eval_gather finished gather
The MDS finishes gathering and finalizing the metadata for the file test1. This ensures that the filesystem state is consistent, and the exclusive lock is released once the process is complete.
Authentication Metadata Unpinning
mds.0.cache.den(0x10000000006 test1) auth_unpin by 0x5555683d6860
The MDS unpins the authentication metadata for the file test1, removing the pin that previously prevented it from being evicted from the cache. This operation is performed by the client with ID 0x5555683d6860.
Directory Entry Synchronization Evaluation
mds.0.locker simple_eval (dn sync l)
The MDS checks the synchronization status of the dentry to ensure consistency. It verifies the state of locks and metadata, ensuring there are no conflicts before moving forward. The sync flag indicates the directory entry is being kept up to date.
Write Lock Completion on Directory Entry
mds.0.locker local_wrlock_finish
The MDS completes the write lock operation on the dentry, ensuring metadata consistency before making changes. This finishes the locking and writing process for the file test1, allowing further operations to continue.
Conclusion
The file creation workflow in CephFS reveals the MDS’s critical role in handling metadata and coordinating client interactions. By following this series of operations, CephFS maintains a scalable, reliable, and consistent environment for distributed storage. Understanding these steps gives insights into the underlying mechanisms of CephFS and its suitability for handling large-scale storage needs.