Contributors: Ravi Kumar Poudapelly, Srikanth Kondapaneni
Introduction
Traditionally on AIX, transceiver statistics were collected and displayed using Advanced Diagnostics. To enter Advanced diagnostics mode, all IP addresses configured on these devices need to be tear-ed down. This causes application outage.
With this feature, device driver would:
A. Collects and displays transceiver statistics as part of detailed ethernet statistics (collected using command entstat -d entX, where entX is network devices).
B. Runs check on the collected transceiver statistics to determine the health of transceiver (health check). If any health check fails, corresponding error logs would be recorded. This helps customers and IBM support to isolate transceiver issues early on.
C. Because “AIX snap” command (used to gather system configuration and statistics) has an existing way to collect detailed ethernet statistics, so, transceiver statistics would also be collected. This helps in saving time required to isolate transceiver issues seen by Customers.
D. Transceiver statistics are collected when adapter is running in “Dedicated” (non-SRIOV) Mode.

Transceiver Statistics
Now, adapter driver would gather transceiver statistics and displays them as a new section named “QSFP Transceiver Statistics” was added at the end of detailed ethernet statistics output. Following information would be displayed in this section:
i. Transceiver Vital Product Data (VPD) Information:
a. Vendor Name
b. Vendor Part Number
c. Vendor Serial Number
d. Vendor Organizational Unique Identifier (OUI)
ii. Transceiver Statistics:
a. Wavelength
b. Module Speed
c. Media Type
d. Temperature
e. Voltage
f. Rx Power
g. Tx Power
Detailed explanation for each field:
|
Field Name
|
Field Description
|
|
Vendor Name
|
Used to identify Transceiver Manufacturer
|
|
Vendor Part Number
|
Used to identify a Transceiver module uniquely
|
|
Vendor Serial Number
|
Used to identifies a specific transceiver module
|
|
Vendor OUI
|
Used to identify Transceiver Manufacturer
|
|
Wavelength
|
The specific optical frequency (in nanometers) at which the transceiver transmits light over Fiber.
|
|
Module Speed
|
Maximum data rate that the transceiver supports (essentially how fast it can transmit and receive data).
|
|
Media Type
|
Transceiver Physical Transmission medium. Transceivers are designed to either use Fiber (optic) or copper medium to transfer data.
|
|
Temperature
|
Current operating temperature of the transceiver, measured in degrees Celsius (°C).
|
|
Voltage
|
Real-time voltage level being supplied to the transceiver, measured in volts (V).
|
|
Rx Power
|
Measure of received optical signal strength from the remote device, measured in milli-Watt (mW).
|
|
Tx Power
|
Measure of transmitted optical signal strength to the remote device, measured in milli-Watt (mW).
|
Today this feature is enabled only for specified adapters:
a) PCIe4 2-port 100 GbE RoCE x16 adapter (FC EC66 and EC67; CCIN 2CF3)
b) PCIe4 2-port 100 GbE RoCE x16 adapter (FC EC75 and EC76; CCIN 2CFB)
c) PCIe5 x16 2-port 200 GbE RoCE adapter (FC EC85 and EC86; CCIN EC2C)
Non-separable cables would not capture wavelength, Rx Power and Tx Power. When adapter is connected with non-separable cables, driver would not display these fields.
As this feature is not supported when adapter is running in SRIOV (Shared) mode, so, transceiver statistics would not be displayed for native VFs, vNICs and HNV.
Sample Transceiver Statistics:
To collect transceiver statistics, detailed ethernet statistics need to be run:
|
entstat -d entX entX is device name
|
Following is a snippet of transceiver statistics collected for QSFP-28 transceiver module:
|
QSFP Transceiver Statistics
-------------------------------------
Vendor Name : CISCO-AVAGO
Vendor Part Number : SFBR-89CDDZ-CS5
Vendor Serial Number : AVF2240S24W
Vendor oui : 00:17:6a
Wavelength : 850.000000 Module Speed : 104G
Media Type: 0XC (MPO 1x12 (Multifiber Parallel Optic))
Temperature : 33.375000 C [Range:-5 c - 75 c]
Voltage : 3.292000 V [Range : 2.97 v - 3.63 v]
rx1_power : 0.691300 mW [Range: 0.037200 – 3.467400]
rx2_power : 0.682100 mW [Range: 0.037200 – 3.467400]
rx3_power : 0.598800 mW [Range: 0.037200 – 3.467400]
rx4_power : 0.489400 mW [Range: 0.037200 – 3.467400]
tx1_power : 0.971000 mW [Range: 0.037200 – 3.467400]
tx2_power : 0.913200 mW [Range: 0.037200 – 3.467400]
tx3_power : 0.915800 mW [Range: 0.037200 – 3.467400]
tx4_power : 0.867200 mW [Range: 0.037200 – 3.467400]
|
Health Check:
Health Check would generate an error when a transceiver parameter falls outside its defined limits.
This function is limited only to display errors to alarm customers; no action would be taken – Driver would continue to send and receive traffic, even with transceiver failure error. This error does not necessarily indicate a transceiver hardware failure. Please contact IBM Customer Support for further assistance and a detailed diagnosis.
Transceiver statistics collection by the driver is limited to the execution of the entstat command; consequently, health checks occur only when this command is run. So, health check would not be done periodically.
Transceiver health-check errors are recorded with “MLXCENT_TRANSCEIVER_ERR” label and description as “Transceiver Error” in error log. Following is a sample error recorded by transceiver when temperature exceeded specified range (high temperature):
|
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
453B5292 0512051325 P H ent1 Transceiver error
453B5292 0512051225 P H ent1 Transceiver error
|
|
# errpt -aj 453B5292
LABEL: MLXCENT_TRANSCEIVER
IDENTIFIER: 453B5292
Date/Time: Mon May 12 07:05:15 CDT 2025
Sequence Number: 2642
Machine Id: 06C75A64R0A
Node Id: p10ndd1lp10
Class: H
Type: PERM
WPAR: Global
Resource Name: ent1
Resource Class: adapter
Resource Type: 131519103596
Location: U78D8.ND0.FG0B0AM-P0-C3-C0-T0
VPD:
2-Port PCIe4 100Gb RoCE Adapter x16:
Part Number.................01T740
EC Level....................P14618
FRU Number..................01T742
Serial Number...............YAS6Y874043Z
Feature Code/Marketing ID...EC66
Customer Card ID Number.....2CF3
Network Address.............903B052886
ROM Level.(alterable).......001600352000
Description
Transceiver error
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
File Name
Line: 1006 file: entcore_ioctl.c
MAC ADDRESS
903B052886
DEVICE DRIVER INTERNAL STATE
0030 0000 0002 0000 0000 0000 0001 0000 0000 0000 239B
PCI ETHERNET STATISTICS
0061 0818 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
TRACE RECORD SEQUENCE NUMBER
e:1:726 f:mlxcent_check_teciver r:0x0 s:0:0
e:1:175 f:mlxcent_check_teciver r:0x0 s:0:0
e:2:1106 f:entcore_ioctl r:0x9 s:0:0
NUMBER OF BYTES
160
SENSE DATA
061 0818 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
|
List of different transceiver errors:
Transceiver errors are embedded in error-log as follows:
DEVICE DRIVER INTERNAL STATE
XXXX XXXX XXXX XXXX YYYY YYYY YYYY YYYY ZZZZ ZZZZ ZZZZ ZZZZ
YYYY YYYY YYYY YYYY – This field represents type of transceiver error
Different type of transceiver errors was listed in the table below:
|
Error Type
|
Value of field YYYY YYYY YYYY YYYY
|
Details
|
|
High Temperature Error
|
0000 0000 0000 0001
|
If transceiver’s current temperature exceeds its operating range.
|
|
Low Temperature Error
|
0000 0000 0000 0002
|
If transceiver’s current temperature is below its operating range.
|
|
High Voltage Error
|
0000 0000 0000 0003
|
If transceiver’s current voltage exceeds its operating range.
|
|
Low Voltage Error
|
0000 0000 0000 0004
|
If transceiver’s current voltage is below its operating range.
|
|
High RX1 Power Error
|
0000 0000 0000 0005
|
If transceiver RX-1 (First Receive Channel) power exceeds its operating range.
|
|
Low RX1 Power Error
|
0000 0000 0000 0006
|
If transceiver RX-1 (First Receive Channel) power is below its operating range
|
|
High RX2 Power Error
|
0000 0000 0000 0007
|
If transceiver RX-2 (Second Receive Channel) power exceeds its operating range.
|
|
Low RX2 Power Error
|
0000 0000 0000 0008
|
If transceiver RX-2 (Second Receive Channel) power is below its operating range
|
|
High RX3 Power Error
|
0000 0000 0000 0009
|
If transceiver RX-3 (Third Receive Channel) power exceeds its operating range.
|
|
Low RX3 Power Error
|
0000 0000 0000 0010
|
If transceiver RX-3 (Third Receive Channel) power is below its operating range
|
|
High RX4 Power Error
|
0000 0000 0000 0011
|
If transceiver RX-4 (Fourth Receive Channel) power exceeds its operating range.
|
|
Low RX4 Power Error
|
0000 0000 0000 0012
|
If transceiver RX-4 (Fourth Receive Channel) power is below its operating range
|
|
High TX1 Power Error
|
0000 0000 0000 0013
|
If transceiver TX-1 (First Transmit Channel) power exceeds its operating range.
|
|
Low TX1 Power Error
|
0000 0000 0000 0014
|
If transceiver TX-1 (First Transmit Channel) power is below its operating range
|
|
High TX2 Power Error
|
0000 0000 0000 0015
|
If transceiver TX-2 (Second Transmit Channel) power exceeds its operating range.
|
|
Low TX2 Power Error
|
0000 0000 0000 0016
|
If transceiver TX-2 (Second Transmit Channel) power is below its operating range
|
|
High TX3 Power Error
|
0000 0000 0000 0017
|
If transceiver TX-3 (Third Transmit Channel) power exceeds its operating range.
|
|
Low TX3 Power Error
|
0000 0000 0000 0018
|
If transceiver TX-3 (Third Transmit Channel) power is below its operating range
|
|
High TX4 Power Error
|
0000 0000 0000 0019
|
If transceiver TX-4 (Fourth Transmit Channel) power exceeds its operating range.
|
|
Low TX4 Power Error
|
0000 0000 0000 0020
|
If transceiver TX-4 (Fourth Transmit Channel) power is below its operating range
|
This feature is available to user from AIX 7300-04 and VIOS 4.1.2.0 release onwards.