Originally posted by: Nils Haustein
by Nils Haustein and Frank Kraemer
I recently had some discussions with my peers about the difference of “scale out file systems” and “scale out NAS systems”. One of the key observations is: there is a significant difference between NAS and scale out file systems when it comes to scalability of performance. Let me explain this.
Introduction to scale out NAS and file systems
A scale out NAS system is comprised of multiple NAS storage nodes with internal or external storage (it does not really matter), as shown in Figure 1 below. Each NAS storage node represents the same file systems via file or object protocols such as NFS, SMB or OpenStack® Swift to the applications (global name space). Data is stored across all NAS storage nodes. Scalability can be achieved by adding more NAS storage nodes.
Figure 1: Scale out NAS system architecture with NFS access
A scale out file system – as shown in Figure 2 below - has a similar architecture as a scale out NAS system and is also comprised of multiple storage nodes with internal or external storage. Unlike a scale out NAS system access to the file system is granted through the file system interface of the scale out file system and not through NAS protocols
Figure 2: Scale out file system architecture
A scale out file system does not have the NFS bottleneck as shown below which really differentiates it from a scale out NAS system.
The NFS performance bottleneck
To further explain the performance aspects let’s have a deeper look how an application accesses these systems. Let’s assume an application accesses a scale out NAS system via standard NFS version 3 or 4. File system access via NFS is available on all NAS storage nodes, however, for a single file streaming operation the application is bound to a single NFS node of the scale out NAS system (see Figure 3). In other words, for a single file operation – such as reading a large file for backup - there is a point-to-point connection between the NFS client and a single NAS storage node. The single NAS storage node is the bottleneck.
In contrast, when an application accesses a scale out file system like IBM Spectrum Scale™ (link) it accesses the file system directly through the file system client. With Spectrum Scale the client does not write file in the file system, it rather stripes file blocks directly across all storage nodes (see Figure 3). Thus when the application performs a single file streaming operation – such as reading a large file for backup - this will read from all storage nodes in parallel and is not bound to a single one.
Figure 3: The NFS bottleneck does not exist with a scale out file system
Consequently the performance with a scale out file system like IBM Spectrum Scale can be scaled to the network limits. Actually we have tested this and demonstrated a single IBM Spectrum Protect™ application installed on a Spectrum Scale file system node can write at a speed of 5+ GB/sec to the Spectrum Scale file system (link). This is only possible because Spectrum Scale uses all available storage nodes for single file streaming operations. Traditional scale out NAS systems cannot benefit from this, unless it uses parallel NFS (pNFS).
Parallel NFS (pNFS, link) has been introduced with NFS version 4.1 and provides the ability to stripe single file I/O operations across multiple NAS storage nodes (Figure 4). With pNFS there is a metadata server that controls the striping across multiple NAS storage nodes by maintaining information about location and layout of files. Based on this location information the NFS v4.1 client stripes a file across multiple NAS storage nodes. This reduces the NFS performance bottleneck to a certain level when streaming a single file, but still does not resolve the underlying point-to-point connection problem with NFS.
Figure 4: Parallel NFS architecture
Parallel NFS is not a mandatory feature within NFS version 4.1, this may be one reason it has not found broad acceptance in the market. In fact, most scale out NAS systems do NOT support pNFS today.
Summary
I think it is important to differentiate between “scale out NAS systems” and “scale out file systems”, especially when a scale out NAS system does not use pNFS. Calling such scale out NAS system a scale out file system is confusing because it does not scale performance like a true scale out file system.
In addition with file system protocol like NFS there is more overhead involved, limiting performance. On the NAS storage node site (see Figure 1) the NFS protocol must be translated to the underlying NAS file system protocol before the actual file blocks are stored on disk. It also applies to pNFS. This overhead does not exist with a scale out file system like IBM Spectrum Scale.
Beside pNFS there are techniques to mitigate the NFS performance bottleneck, for example by using Domain Name Service (DNS) techniques. With DNS techniques, application I/O requests to the NAS system are balanced over all NAS storage nodes (see Figure 1), but not for single file streaming operations. Single file stream operations still land on one NAS storage node. Thus a scale out NAS system always carries this bottleneck, unless it uses parallel NFS (pNFS).
References
IBM Spectrum Scale and IBM Spectrum Protect are registered trademarks of IBM Corporation in the US and / or other countries.
Openstack is a registered trademark / word mark of OpenStack Foundation.
Performance test results Spectrum Protect with Spectrum Scale:
https://www.ibm.com/developerworks/community/blogs/storageneers/entry/elastic_storage_with_gpfs_native_raid_performance_test_results_with_tivoli_storage_manager_over_40_gbit_ethernet?lang=en_us
IBM Spectrum Storage Home page
http://www-03.ibm.com/systems/uk/storage/spectrum/scale/index.html
Parallel NFS – SNIA webcast
http://snia.org/sites/default/files/Part4-Using_pNFS%20Feb._2013.pdf