High Performance Computing

 View Only

Visualizing aggregated cluster health information and other data in IBM® Spectrum Conductor using pandas

By Leigh Chen posted Tue May 25, 2021 03:38 PM

  

Introduction

pandas is a fast, powerful, and flexible open source data analysis tool, which is built on top of the Python programming language. We will show how easy it is to use pandas in IBM Spectrum Conductor to run data analysis and visualization on cluster health information, and other data.

Key benefits:

  1. Quick to install and run.
  2. Flexible and customizable.
  3. Wide range of applications.

Part 1. Getting started

Installation can all be done by using the built-in tools in IBM Spectrum Conductor.

  1. Install the required packages using Anaconda management in IBM Spectrum Conductor:
    1. From the cluster management console, create an Anaconda or Miniconda distribution instance with a conda environment.
    2. Add the below packages to your conda environment using the default conda channel. For more information, see Managing packages within conda environments.
      • pandas
      • requests
      • lxml
  1. Deploy an instance group and enable the built-in Jupyter notebook. Specify the Anaconda or Miniconda distribution instance. For more information, see Enabling notebooks for an instance group.
  2. Open Jupyter and create a new notebook with Python 3. For more information, see Open a notebook and create a note.

Part 2. Example: Aggregated cluster health information

The dashboard of the cluster management console displays cluster health information for the entire cluster.

However, often you might be more interested in a specific resource group where your workload is running on, for example ComputeHosts. This example shows how to use pandas to visualize host status and CPU usage for a specific resource group.
  1.  To generate the data, add and run the following code in a cell, where resource_group is the resource group that you want to analyze, and rest_url is REST API URL of your cluster. You can find the REST API URL by running the command: egosh client view REST_BASE_URL_1
    import pandas as pd
    import lxml, requests
    
    # Define cluster information
    resource_group = "ComputeHosts"
    rest_url = "https://www.myhost.com:8543/platform/rest/"
    EGO_TOP = "/opt/ibm/spectrumcomputing"
    auth = ("Admin", "Admin")
    
    headers = {"Accept":"application/json", "Content-Type":"application/json"}
    verify = EGO_TOP + "/security/cacert.pem"
    
    # Send request to get member hosts under the resource group
    request_url = rest_url + "ego/v1/resourcegroups/" + resource_group + "/members"
    resp_hosts = requests.get(request_url, auth=auth, headers=headers, verify=verify)      
    
    # Create dataframe in pandas to store the returned information
    df_hosts = pd.DataFrame(resp_hosts.json())
    df_hosts.set_index('hostname', inplace=True)
    
    # Add new columns to dataframe for host status and CPU utilization
    df_hosts['CPU Utilization'] = ""
    df_hosts['Status'] = ""
    
    # Fill in CPU and status data for each host in dataframe
    for i, row in df_hosts.iterrows():
        
        # Send request to get host details
        request_url = rest_url + "/ego/v1/hosts/" + i
        resp_host_detail = requests.get(request_url, auth=auth, headers=headers, verify=verify)
    
        # Get CPU and host status data for the host
        attr = resp_host_detail.json()['attributes']
        ut = list(filter(lambda x:x["name"]=="ut",attr))[0]['value']
        status = list(filter(lambda x:x["name"]=="hostStatus",attr))[0]['value']
        
        # Fill in data to dataframe
        df_hosts.at[i, 'CPU Utilization'] = ut
        df_hosts.at[i, 'Status'] = status
  2. To view the data, add and run the following code in a cell:
    df_hosts​
  3. To analyze the data, add and run the following code in a cell:
    # Filter only hosts in “OK” status and sort them by CPU utilization
    df_hosts_ok = df_hosts[df_hosts['Status'] == 'ok'].sort_values('CPU Utilization')
     
    # Convert CPU utilization to numerical format for plotting
    df_hosts_ok['CPU Utilization'] = df_hosts_ok['CPU Utilization'].astype(float)
    
    # Calculate slot utilization for each host and the data to a new column
    df_hosts_ok['Slot Utilization'] = (df_hosts['numslots'] - df_hosts['freeslots']) / df_hosts['numslots']​
  4. To visualize the data, add and run the following code in a cell:
    # Plot host status summary for the resource group
    df_hosts['Status'].value_counts().plot.pie()
     
    # Plot bar graph for host CPU in the resource group
    df_hosts_ok[['CPU Utilization']].plot.barh(figsize=(10,10),xlim=(0,1))
    
    # Plot histogram for CPU utilization in the resource group
    df_hosts_ok['CPU Utilization'].plot.hist(range=(0,1), bins=5)​

    A pie chart of host status information for the resource group is displayed:
    A bar chart of CPU usage for each host in the resource group in descending order is displayed.
    A histogram of CPU usage for hosts in the resource group is displayed:

  5. To customize the visualization to display slightly more details, add and run the following code in a cell:
    # Plot bar graph for host CPU and slot utilization in the resource group
    df_hosts_ok[['CPU Utilization','Slot Utilization']].plot.barh(figsize=(10,10),xlim=(0,1))
    
    # Plot histogram for CPU utilization in the resource group, with number of bins expanded to 10
    df_hosts_ok['CPU Utilization'].plot.hist(range=(0,1), bins=10)​

    A bar chart of CPU usage for each host in the resource group in descending order is displayed along with its slot utilization. You can see whether high CPU usage is correlated to high slot usage:

    A histogram of CPU usage for hosts in the resource group is displayed with more detailed breakdown:

Part 3. More applications

Combining pandas with REST APIs in IBM Spectrum Conductor, you have many more applications for data analysis. To analyze daily applications failures, add and run the following code in a cell:

import pandas as pd
import lxml, requests

# Define cluster information
rest_url = "https://www.myhost.com:8643/platform/rest/"
EGO_TOP = "/opt/ibm/spectrumcomputing"
auth = ("Admin", "Admin")

headers = {"Accept":"application/json", "Content-Type":"application/json"}
verify = EGO_TOP + "/security/cacert.pem"

# Send request to get all applications submitted
request_url = rest_url + "conductor/v1/instances/applications"
resp = requests.get(request_url, auth=auth, headers=headers, verify=verify) 

# Create dataframe in pandas to store the returned information
df = pd.DataFrame(resp.json())

# Convert UNIX time to date
df['endtime'] = pd.to_datetime(df['endtime'],unit='ms').dt.date

# Define filters for applications in failed and finished states
filt1 = df['state'] == 'FAILED'
filt2 = df['state'] == 'FINISHED'

# Count the number of applications in each state per day and store in dataframes
failed = df[filt1].endtime.value_counts().sort_index()
finished = df[filt2].endtime.value_counts().sort_index()

# Merage into single dataframe and label the columns
results = pd.concat([failed,finished],axis=1)
results.columns = ['Failed apps', 'Finished apps']

# Plot bar graphs
results.plot.bar(subplots=True)

Bar charts are displayed for number of failed applications for each day relative to the number of finished applications. Notice the high number of failed applications on both 04-17 (April 17) and 04-19 (April 19). However, the failure rate is much higher on 04-19 (April 19).

 

Give it a try and tell us what you think by downloading IBM Spectrum Conductor 2.5.0 on Passport Advantage or the evaluation version! We hope you are as excited as we are about this new release!

Log in to this community page to comment on this blog post. We look forward to hearing from you on the new features, and what you would like to see in future releases.

 

 


#SpectrumComputingGroup
1 comment
32 views

Permalink

Comments

Tue May 25, 2021 08:16 PM

This is also a good one for a vlog.