As of the June 2021 release of IBM Storage Insights, switch and fabric support for Brocade and Cisco Fibre Channel switches are now supported. With Fibre Channel connectivity now included in IBM Storage Insights, customers as well as IBM Support will have the complete end to end picture of Server/Fabric/Storage System connectivity in their SAN environments. This completes a big missing piece in the troubleshooting puzzle by providing the connectivity and performance data between servers and storage systems.Monitoring RequirementsBrocadeMonitoring of Brocade Switches and Fabrics is done via direct communication to the switch chassis with chassis running Fabric OS (FOS) version 8.2.1 or later. FOS 8.2.1 was chosen as the minimum required version for several reasons:
- Brocade Network Advisor (BNA) is only supported on older generation switches. BNA End of Life (EOL) support was announced in 2019 with End of Service (EOS) support coming in 2022 (https://docs.broadcom.com/doc/12395099).
- BNA replacement product, SANnav supports newer generation switches but does not support older generation switches.
- SANnav requires a purchased license where FOS is available without additional licenses.
- FOS version 8.2.1 is supported on all Brocade switches currently in support by Brocade.
- FOS version 8.2.1c has achieved Brocade Target Path status (https://docs.broadcom.com/doc/Brocade-SW-Support-RM). Brocade Target Path is a recommended version identified by Brocade based on adoption levels, stability, and free from any know major defects.
Monitoring of Cisco MDS switches requires enabling the NX-API and have NX-OS Release version 8.4.x. In the initial release of Storage Insights, NX-OS version 8.5 or later is not supported due to NX-API output changes made by Cisco not compatible with their 8.4.x implementation. A future release of Storage Insights will add support for NX-OS version 8.5 and later. NX-API was chosen over SNMP based on recommendation from Cisco for the following reasons:
- NX-API is the recommended interface by Cisco for communicating with their switches.
- NX-API has all features available via it where SNMP is missing a variety of feature. This includes Cisco licensed SAN analytics and flow control metrics we can add support for in the future.
- NX-API is more efficient compared to SNMP.
To add switches to Storage Insights for monitoring, the credential to the physical switch chassis must be provided. From the switch chassis, Storage Insights will add and monitor all logical switches, unconfigured switches, NPV switches, and fabrics associated with the chassis.
On the panel to add switches, provide the host name or IP address of the switch chassis and the credentials. To add multiple switches, if the credentials are the same, you can provide a comma separated list of host names or IP address the switch chassis to add.
Another option to add multiple switches is to let Storage Insights do it for you. With a single "seed" switch chassis added to Storage Insights, Storage Insights will attempt to find and add all the switches that are part of the same fabric. Via the originally added switch chassis, Storage Insights will query the associated fabric(s) and find the other logical switches in those fabrics. If the other logical switches have the same credentials as the original switch chassis, they too will be added to Storage Insights as monitored switches. If they do not have the same credentials, they will still be added to Storage Insights as "Not Monitored" switches. The "Not Monitored" switches will show up the the relevant relationship views of other systems (switches, fabrics, storage systems, and servers) and can later be added as monitored switches by providing their credentials.
Switch & Fabric Panels
For those of you familiar with the switch and fabric support in our on-premise sister product Spectrum Control, Storage Insights' has similar views with several improvements. Once the chassis, logical switches, unconfigured switches, NPV switches, and fabrics are being monitored, they will show up in the respective views within Storage Insights. You will note a subtle difference in the switches panel with a new "Chassis" tab that lists the physical switch chassis, unconfigured switches (switches with no logical switches), and NPV switches. The "Switches" tab lists all the logical switches.
From the overview panels, double clicking on either the logical switch, fabric, or chassis, will show the respective detail view for that system. Like the detailed view for storage systems and servers, you will find information specific to the device such as alert definitions, triggered alerts, properties, components belonging to the device, and related resources to the device. Below is an example detailed view of a physical switch chassis.
For your Operations team, there is also a new Fabric Operations Dashboard to list all the key performance indicators and status of the fabrics being monitored. To view the Fabric Operations Dashboard, mouse over the Dashboards menu at the top of Storage Insights then select Operations. By default the Storage Systems Operation Dashboard is shown. To change to the Fabrics Operation Dashboard, in the upper left hand corner, select Fabrics in the pull down selector.
Like Storage Systems, we also collect performance data for the various switch, fabric, and chassis systems and components. Storage Insights collects performance and error rate data against the switch and fabric systems with a 5 minute granularity. The performance data has different levels of "aging" with the max duration we keep performance data up to a year. The data collected at the 5 minute interval is referred to as the "sample" data and is kept for a duration of 2 weeks. The sample data collected within each hour is summarized into "hourly" data and the hourly data is kept for a duration of 30 days. Finally the hourly data collected within a day is summarized into "daily" data and kept in Storage Insights for a year.
To view performance data for any switch, chassis, fabric or sub-component, there is a Performance tab on just about every panel. You can also view performance data for select resources by highlighting them and selecting the "View Performance" menu action.
For troubleshooting or reporting, hundreds of performance metrics and error rate metrics are available to view. On any performance view, clicking on the "+" icon next to the Metrics legend will open a metrics selection dialog to view the various metrics for selection.
With all the data Storage Insights collects, we also alert on any anomalies or threshold violations. For just about any of the properties, health, performance data, or error rates we collect within Storage Insights, alerts can be created for them. Storage Insights supports both single condition alerts as well as multi-condition within 5 minutes of the change occurring.
Server & Storage System Connectivity
Switch and fabric connectivity is also included in any other systems' Related Resources views. This includes other logical switches, fabrics, and chassis; as well as storage systems and servers. From this, you have the ability to easily see what resources are connected to any other device and view the health and performance of those connected resources.
On the various port panels, the resource connected on the other side of the port is also shown in the properties table as a resource hyperlink. Below is an example of a Server view with the connected switch ports shown in the Related Resources section. Note the server's name is also referenced in the switch ports' "Connected Resource" column.
Hopefully the new Switch and Fabric support will be a welcome add-on to your Storage Insights instance. The ability to view end to end connectivity between Servers and Storage Systems should be a big help in troubleshooting problems and identifying exactly where the problem is. IBM Support will also see the same information and can additionally use that information to troubleshoot any storage system performance or connectivity tickets you may have opened for quicker resolution.