Sound monitoring and syslogging practices are the first and sometimes most important step in troubleshooting. They also the most overlooked as they must be configured before a problem happens. If system logging is not configured before a problem happens, valuable information is lost.
Broadcom has two important features that you can use to monitor the health of your Broadcom fabrics and alert you when problems are detected: Flow Vision (the monitoring) and Monitoring And Alerting Policy Suite (MAPS), which can both monitor and alert if it detects error conditions. In this post I'll provide a brief overview of each feature and then we'll see how we can integrate Flow Vision into MAPs to provide a comprehensive monitoring and alerting solution.
Flow Vision provides a detailed view of the traffic between devices on your fabrics. It captures traffic for analysis to find bottlenecks, see excessive bandwidth utilization, and look at other similar flow-based fabric connectivity. Flow Vision can inspect the contents of a frame to gather statistics on each frame. Flow Vision has three main features: Flow Monitor, Flow Generator and Flow Mirror. In this blog post we'll take a look at Flow Monitor as that is what we will integrate into MAPs. Flow Monitor provides the ability to monitor flows that you define and it gathers statistics on frames and I/Os. Some example use cases for flows:
- Flows through the fabric for virtual machines or standalone hosts connected via NPIV that start from from a single N_Port ID (PID) to destination targets.
- Flows monitoring inside logical fabrics and inter-fabric (routed) traffic passing through
- Gaining insights into application performance through the capture of statistics for specified flows.
- Monitoring various frame types at a switch port to provide deeper insights into storage I/O access such as the various SCSI commands
MAPS is a policy-based health monitor that allows a switch to constantly monitor itself for fault detection and performance problems (link timeouts, excessive link resets, physical link errors) and if it detects a problem, alert you via the alert options on the policy, or if they are defined, on the individual rule. However, MAPS does not inspect the contents of the data portion of frames. Options for alerting include email, SNMP or raslog (the system log). You should -always- have the raslog option set as this will give IBM Support critical timestamped data if the switch detects a problem.
Integrating Flow Vision with MAPs
Combining these two capabilities gives you a fully integrated and very powerful monitoring configuration. You can have Flow Vision monitor for certain types of frames, or frames between a specific source/destination pair and then feed that into MAPs to take advantage of the alerting capabilities of MAPs.
In this example we're going to take advantage of the ability of Flow Vision to inspect the contents of a frame, and then we'll add that to MAPS to utilize the alerting capabilities in MAPS. Suppose we want to know when a certain host sends an abort sequence (ABTS) to a storage device. For this example, our host name is Host1. It is connected via NPV so we can't just monitor the ingress port, as it is possible another host will send an ABTS. We are filtering on a specific source N_Port Id. We also want to ensure we collect all ABTS that are sent so we are not filtering on a destination ID.
Step 1: Create the flow:
switch:admin> flow --create Host1_ABTS -ingrport 1/10 -srcdev 0xa1b2c3 -frametype abts
The above rule says to filter ingress port 1/10 for the source N_PORT ID A1B2C3 and filter for frametype of ABTS. Optionally we could specify a -dstdev of "*" and Flow vision would learn which destinations the source dev is sending to.
Step 2: Import the flow into MAPs
switch:admin> mapsconfig --import Host1_ABTS
Step 3: Verify the Flow has been imported
switch:admin> logicalgroup --show
Group Name |Predefined|Type |Member Count|Members
Step 4: Create a Rule and add the rule to a Policy
switch:admin> mapsrule --create myRule_Host1_ABTS -group myflow_22 -monitor TX_FCNT -timebase min -op g -value 5 -action RASLOG -policy myPolicy
Where "-timebase" is the time period to monitor the changes, "-op g" is greater than, "-value" is the value to trigger at, and "-action" is the action to take. So this rule says to log to the raslog if the switch detects greater than 5 ABTS per minute from the source N_Port ID that was specified in the flow.
Next we activate the new policy:
switch:admin> mapspolicy --enable policy myPolicy
Hopefully from this example you can see the utility of being able to monitor and alert on both the contents of frames, as well as errors or changes detected on your switches. This example can also server as a blueprint for enabling additional logging capability when troubleshooting a problem. Perhaps you have an intermittent issue that disappears before you can collect the necessary data. With Flow Vision you can monitor for a condition and then trigger MAPS to alert you via email or raslog. For more information you can review the Brocade MAPS and Flow Vision guides here: