IBM TechXchange Storage Scale (GPFS) Global User Group

 View Only

File directory metadata fsync issues on failover

  • 1.  File directory metadata fsync issues on failover

    Posted Fri October 06, 2023 10:40 AM
    Edited by Andres Parada Mon October 09, 2023 12:53 PM

    Hello,

    We are developing an application, which does some transaction processing and the internal transaction states are tracked in the following way:

    • For each transaction, new file is created in the directory. We use on Linux fsync() on the directory descriptor to persist the file in the directory.
    • Transaction status is written to this new file, after which also fsync() is issued on file contents, to have guarantees from a file system that data is there for failover.
    • We have to also track additional resource files, for which we require guaranteed/atomic file renames (moves) between folders. On linux again, after renaming, we issue fsync on the directory, to guarantee that data (metadata) is persisted.
    • Also we use unlink() for certain files, for which after the call returns, software must be ensured that file is removed (even in event of power loss). On Linux we use additional fsync() on directory form which file was removed.

    But.. during the testing, we perform intensive above operations on active server node, in GPFS folder. Then we reboot the server (or just cut power from the node). And then failover takes actions on survived GPFS node, and what we see on the other node, is that some actions are not persist and we lose transactional data. One (ore more) of the following occurs:

    • Either, the new file is not persisted in the directory (file not visible in a directory on the survived GPFS node).
    • Either after file rename + directory sync, the file is not moved to a new directory on the failover node (which reads the same GPFS folder) or the file is lost. I.e. rename() response was OK, but during cut-off, the file stayed in the same folder or was lost.
    • Either fsync() on file content is not persisted.
    • Or unlink() + directory fsync after the return, does not remove the file (in the event of power loss).

    Can you please clarify which functions GPFS can guarantee (on which Unix API call results we can trust):

    • Atomic file rename()? After what Unix API calls, we can get guaranteed responses that the file is renamed and is visible in the folder for all nodes?
    • The new file is created. After what Unix API calls, we can get a guaranteed response that the file actually is created and it will be visible in the directory for all server nodes?
    • File contents are persisted? Does fsync() or fdatasync() guarantee that after successful responses from these calls, changes to the files are immediately visible to other nodes, and will be visible if the node restarts?
    • Does successful unlink() ensure that the file is removed?

    Also please specify whether does above guarantees/ensurances on API responses are working in the same way on Linux and AIX?

    We have tried to specify syncSambaMetadataOps=Yes, but seems that does not help. Maybe there are some configuration flags for this? Or maybe cluster configuration should be made in a different way?

    Our current configuration:

    # mmlsconfig
    Configuration data for cluster g7node2.g7node1:
    -----------------------------------------------
    clusterName g7node2.g7node1
    clusterId 14449966042191801738
    autoload yes
    profile gpfsProtocolDefaults
    dmapiFileHandleSize 32
    minReleaseLevel 5.1.1.0
    ccrEnabled yes
    cipherList AUTHONLY
    sdrNotifyAuthEnabled yes
    maxblocksize 16M
    [cesNodes]
    maxMBpS 5000
    numaMemoryInterleave yes
    enforceFilesetQuotaOnRoot yes
    workerThreads 512
    [common]
    tscCmdPortRange 60000-61000
    syncSambaMetadataOps yes
    adminMode central

    File systems in cluster g7node2.g7node1:
    ----------------------------------------
    /dev/silodevgpfs

    we are using developer edition on Linux for tests.

    OS version:

    cat /etc/redhat-release
    CentOS Linux release 7.9.2009 (Core)

    Thanks,

    Madars



    ------------------------------
    Madars Vitolins
    ------------------------------