What is NVMe?
Non-Volatile Memory express (NVMe) was developed as an industry specification for accessing non-volatile storage via the PCIe interface. NVMe was a grounds up specification intended to capitalize on the internal characteristics of flash storage: minimize latency, maximize performance, efficiency, and simplify device management.
NVMe, like SATA or USB, allows for multiple vendors to develop products compliant with the specification which are all supported by the same host device driver, therefore removing software compatibility as an adoption inhibitor.
Key areas of improvement in the NVMe specification:
- increased queue depth
- reduced register access per command
- lightweight protocol requiring minimal path length
- multiple MSI-X supported
Overview of NVMe support
NVMe devices do not support the SCSI architecture model hence a new device driver is added in the AIX operating system. VIOS added support for NVMe devices in version 2.2.6. Similar to other disk types, NVMe devices are presented as block storage devices. No changes are required in the upper level components such as LVM and file systems (as shown in the figure 1 below). The initial NVMe solutions for Power are PCIe attached and local to the system.
NVMe device use-cases in the PowerVM environment
The common use cases for the NVMe device are shown in the figure 2 below.
Note: Each NVMe drive is a separate PCIe endpoint and can be assigned individually to a unique AIX, VIOS, or Linux logical partition (LPAR).
The above configuration (Figure 2) is similar to other locally attached storage devices. As with other SSD technology, NVMe has a limited write endurance and may not be suitable for write intensive workload
- User can assign one or more NVMe devices to the VIOS partition:
- Device can be used as a VIOS boot device.
- User can configure devices as a local read cache in the Share Storage Pool (SSP).
- User can carve out logical volumes (using Logical Volume Manager (LVM)) and assign those to client as a LV backed virtual SCSI (vSCSI) devices. Client partition can use it for any purpose e.g. boot image, disk caching, etc.
- Note : VIOS does not allow assigning NVMe disk to client as a physical volume (PV) backed virtual SCSI (vSCSI) device.
- User can assign the NVMe device to the LPAR client partition as shown in the figure 2 (left side). Client partition can use it for any purpose, e.g. boot image, disk caching, etc.
Best practices
- NVMe is a high-speed flash storage which comes in various levels of write endurance. Refer to the write endurance rating of your NVMe device to ensure it is suitable for the intended workload. Consult the IBM feature description of your specific NVMe device to determine its drive write per day (DWPD) rating.
- As with any storage technology, NVMe may be subject to failure. It is advisable to mirror the VIOS boot device. In some Power 9 systems expansion cards are used to hold M.2 form factor NVMe devices. It is advisable to mirror across expansion cards to protect from expansion card failure as shown in figure 3 below.
Key points to note
- VIOS will not support NVMe devices for following usage:
- Similar to any other locally attached devices, client LPARs that are using virtual SCSI (vSCSI) devices backed by NVMe (locally attached) cannot be used in the logical partition mobility (LPM) operations.
- NVMe disks cannot be used in the shared storage pool (SSP).
- VIOS will not allow mapping NVMe devices as a Physical volume (PV) backed in the Virtual SCSI (vSCSI) configuration.
- NVMe devices will not be supported as an Active Memory sharing (AMS) device.
- In the initial releases, default values for the NVMe adapter attributes is same as AIX default attributes hence there is no rules work supported. Attribute support for NVMe adapter will be added based on feedback from the field.
NVMe device examples
This section provides examples of how to identify NVMe devices and their attributes. Although these same commands are used to display information for any other devices, it shows some of the differences. NOTE: For more details refer the AIX/PowerVM documentation on NVMe [https://www.ibm.com/support/knowledgecenter/ssw_aix_72/com.ibm.aix.ktechrf2/nvme.htm].
NVMe adapter information
- Number of channel (nchan): Each channel is an independent kernel thread with dedicated facilities to process IO.
- Maximum size of DMA transfer (max_dma_window): It can be set more optimally if the size and the number of IOs issued at a time are known or predictable.
- User can use “-vpd” option to find the physical location of the device as shown below
NVMe disk information
- User can use “-child” option to find the list of disks for specific NVMe adapter.
- User can use “-attr” option to find more information about NVMe disk similar to SCSI disks.
- Platform specific information can be displayed using “-vpd” option.
NVMe diagnostic information
VIOS does not provide any RBAC for diag command. Hence customer will have to login to root shell to access the diag. User can use diag to configure NVMe devices and check health of the devices. Following figures shows the example of health check for NVMe device.
- Run command “diag” -> Press Enter -> Select “Task Selection” -> Select “NVMe general health information” -> Select the specific NVMe adapter and press Enter.
- Health information shows the life used, read/write statistics and errors.
Additional Information
Summary
PowerVM environment supports NVMe use cases such as VIOS boot device, backing device in the VIOS for exporting VSCSI storage to client LPARS, and server-side flash caching of storage data. The design of NVMe and its related device drivers are optimized for flash storage resulting in a more efficient use of system resources. Additionally, NVMe may provide a more cost-effective solution compared to SAS since NVMe does not require PCIe controllers separate from the storage devices and the NVMe backplane is already included in the base price of some POWER9 servers.
Contacting the PowerVM Team
Have questions for the PowerVM team or want to learn more? Follow our discussion group on LinkedIn IBM PowerVM or IBM Community Discussions