IBM FlashSystem

IBM FlashSystem

Find answers and share expertise on IBM FlashSystem

 View Only

Brocade Fabric Performance Impact Notification in FOS v9.0

By David Green posted Thu October 15, 2020 04:50 PM

  
Brocade Fabric Performance Impact Notification (FPIN)  is a new feature for Brocade FOS v9.0.  It is available on Brocade Gen6 and Gen7 switches.    This  feature enables the switch to  detect issues on a fabric such as congestion or physical link issues and then then notify the affected devices that have registered for these notifications.  FPIN functions in a similar mechanism to RSCN.    RSCN enables the fabric to send  notifications to devices when a device they are zoned to is going offline.  The devices that receive these notifications can then proactively take steps such as path failover rather than have to react to a path being down.  FPIN provides a means to notify devices of link and other issues with a connection or possibly a path through the fabric.    For both RSCN and FPIN, a device must register with fabric services to receive these notifications.  The new Brocade Gen7 hardware  can send hardware  or software signal notifications.  Gen6 can only send software notification.   Both the hardware and software notifications require FOS v9.0 on the switches.
Hardware signals can be sent from the switch to the adapter in the device.  The adapter itself can then decide what to do about the notification.   Software signals are sent higher up in the Fibre-Channel stack, and the adapter driver would then decide how to handle the notification.   One advantage to notifications in hardware is reaction time - the adapter can process the notifications and react more quickly than the driver can.   Another is that the hardware-based notification is a fibre-channel primitive.  This means that even if buffer credits are depleted the signal can still be delivered to the device on the other end of the link.  Primitives are not frames so do not need buffer credits to be sent.  The software layer signal is an ELS frame, so can affected by buffer credit depletion and other link congestion.  Whether the signal sent is hardware or software, how the devices handle the notifications is up to the vendor of the adapter.  Some may log the notification, some may take action.  The action that an adapter takes is also vendor specific.  
FPIN can alert devices about these events:
  • End Device Congestion
  • Device Link Integrity (CRC)
  • Frame Drops
If FPIN is enabled, these events are still monitored via MAPS.    Enabling FPIN won't change your existing MAPS configuration for the above events.   However, with FPIN notifications are sent to the affected devices that register for them.  How the devices handle the notification is vendor specific.  They may just log the event or they may take other steps such as starting link recovery or slowing traffic on  a congested link and re-routing  out an un-congested port.  As a last resort, the device may shut down a troublesome link.  
Some vendors that support FPIN today are:
  • Linux Multi-Path in RHEL 8.2
  • Emulex - supports Congestion and Link Integrity  notifications on Linux
  • Marvell  - will register for FPIN and log the notifications, these could be used as a source of log data for troubleshooting
  • AIX - will register for Link Integrity and Congestion notifications
but we expect that more HBA and Storage Controller vendors will add support for FPIN in the future.  
One use case for FPIN is if a switch detects congestion on an ISL or path between devices, it could potentially notify the device sending data so that device could try  sending data down another path  without waiting for timeouts and path failover to happen.  One common cause of congestion occurs when two devices are zoned together with a speed mismatch  In these cases the faster device could perhaps throttle back to sending data at a slower rate.  Some caveats here are that it would be vender specific for storage systems or host adapters, and in the case of throttling data rates,  this would only work on the host side, unless a storage system could selectively throttle depending on the destination address.  
Another use case is the with link integrity issues.  If a link is accumulating CRC or  Invalid Transmission Words (ITWs) the physical link has a faulty component.  A fibre-channel cable can be bad in only one direction.  So it is possible that the device at one end of a link is not aware of any issues. The Link Integrity FPIN will notify the host adapter if a path is compromised.  The adapter can then determine whether it should try another path by having the multi-path driver fail over.  This would happen at the hardware level, long before the problem bubbled up to the software layer.  
One final note, remember that an FPIN can be sent from any device that supports it.  Potentially the storage, the host or the switch can share this information and if they were all to have the capability to re-route data based on these notifications, the SAN is that much closer to an autonomous, self-healing SAN that routes data around blockages as best it can.  

#StorageAreaNetworks
#PrimaryStorage
#Storage
0 comments
175 views

Permalink