View Only

PowerVC General Troubleshooting Guide - Capturing diagnostic data

By Becky Dimock posted Fri September 18, 2015 01:42 AM


By:Arun Mani

If you experience a problem with any aspect of PowerVC, you might need to capture diagnostic data for debugging. This is a general troubleshooting guide for common issues faced with the PowerVC management server and doesn't cover troubleshooting any specific issues related to different components. To troubleshoot problems that are component-specific, please refer to the PowerVC Knowledge Center which has more details on the problems and workarounds available.

Capturing diagonistic data by using the powervc-diag command:

PowerVC is built on OpenStack and because of the distributed nature of OpenStack services, capturing diagnostic data for debugging a failure is a challenge. Manual collection of logs and configuration information for all the components is expensive and time consuming. To overcome this overhead, PowerVC has a feature that enables diagnostic data collection for the entire PowerVC environment (both the management server and the managed hosts).

If there is a failure, you just have to invoke the data collection by running the powervc-diag command. This captures the current state of the management server and managed hosts by collecting all of the service logs, service configuration files, database information and everything that's required for the support team to analyze the problem and provide a quick solution. Because powervc-diag provides an option to collect the logs from hosts such as HMC or PowerKVM, this tool provides truly end to end information to the support team.

After capturing the debug data, the powervc-diag command creates a zip file (the name of the file is suffixed with the time stamp). By default, this is stored in the /tmp directory. This command has various options that specify what information to capture. You can run this command with -h for more details. These are the options that are available:

# powervc-diag -h

usage: powervc-diag [-h] [-v][-u USER] [-o OUTPUT_DIR] [-f ARCHIVE_FNAME]

[-n] [-w MAXWAIT] [-c [KVMINFO]] [-p]

If you don't provide any options, it will collect all diagnostic data related to all the services. Listed below are the different categories of diagnostic data that gets captured as part of powervc-diag command:

Conf files:

The configuration files for all PowerVC services can be found under /tmp/powervc-diag_timestamp/etc/<component_name>. These include *.conf and *.ini files that provide information on the configuration used while loading the different PowerVC services.

Log files:

This is a critical section of the zip file where all the logs related to services get captured and stored. You can find this under /tmp/powervc-diag_timestamp/var/log/<component_name>. This file contains service related logs for the component. Any failure will be logged as ERROR with a complete stack trace. Additionally, PowerVC also captures the system logs from /var/log/messages which might have some critical information about issues related to service startup and other kernel related information that should help in debugging.

General information:

The /info folder that is created as part of the zip contains some general information on individual components to help with problem analysis. For instance, the general info data for the nova service will contain details like hypervisor HA policy, hypervisor run-time policy, hypervisor statistics data, etc., to help troubleshoot any problems with the managed resources. Additionally, this section also contains the services information, user information, processes information, etc., from the management server.

DB snapshot:

As mentioned previously, the powervc-diag command takes a snapshot of the current state of the database in the PowerVC management server for all its services and managed resources. The exported database tables can be found under the /db folder in the zip file.

Following these steps will help you capture diagnostic data to help troubleshoot a problem or to submit to support. If you have any questions about this topic, please comment below. Watch this space for more information about troubleshooting your environment. In the meantime, don't forget to follow us on LinkedIn, twitter, and Facebook!