WebSphere Application Server & Liberty

Lessons from the field #7: Logging WebSphere Application Server traditional performance statistics

By Kevin Grigorenko posted Wed July 28, 2021 08:00 AM

  

If you experience production issues with WebSphere Application Server (WAS) traditional, it's often useful to log performance statistics to a file for later analysis. This includes things such as thread pool utilization, application response times, and so on. Running without such monitoring data is like flying an airplane with some of the gauges broken.

Some of the major ways of monitoring WAS performance statistics are:

All of the above techniques ultimately get their data from the WAS Performance Monitoring Infrastructure (PMI). This is the component inside WAS that gathers configured statistics aggregated over approximately 30-second intervals, and makes this data available to the monitoring tools. By default, PMI is configured in "Basic" mode which gathers core statistics and has an overhead of about 2%.

Logging PMI data to files

If you don't use an IBM or 3rd party monitoring product, or Prometheus, the rest of this article shows you how to log performance data to files as part of the built-in WAS product. Even if you do use 3rd party monitoring products that already log such data, this technique may still be useful because IBM Support generally does not analyze the exported proprietary data formats from 3rd party monitoring products.

Prerequisites

First, ensure PMI is enabled. PMI is enabled by default.

  1. Navigate to each application server's PMI configuration:

    Administrative Console } Servers } Server Types } WebSphere application servers } $SERVER } Performance } Performance Monitoring Infrastructure (PMI)

  2. Ensure "Enable Performance Monitoring Infrastructure" is checked

On the same page, configure PMI configuration for your application needs. The default is "Basic" mode with generally useful statistics. You may consider using "Custom" mode with additional statistics based on your application usage.

Starting and stopping PMI logs through the administrative console

The following instructions start and stop PMI logs that are written to XML files in $WAS/profiles/$PROFILE/logs/tpv/

  1. Administrative Console } Monitoring and Tuning } Performance Viewer } Current Activity
  2. Select all application servers you want to log and click "Start Monitoring"
  3. Click each monitored application server link and:
    1. Click on $SERVER } Settings } Log
      1. Duration = 999999
      2. Maximum File Size = 50
      3. Maximum Number of Historical Files = 5
      4. Log Output Format = XML
      5. Click Apply
    2. Click $SERVER } Summary Reports } Servlets
    3. Click "Start Logging"
      1. Note that this starts logging for all PMI data, not just for servlets
  4. Reproduce the problem
  5. To stop logging, go back into the links above and click "Stop Logging" for each server

      Note that there is no way to automatically start PMI logging when a JVM is started. If desired, logging must be restarted after each JVM restart.

      Viewing PMI logs

      The administrative console includes a function that reads and replays the PMI XML files gathered above:

      1. Administrative Console } Monitoring and Tuning } Performance Viewer } View logs
      2. Choose the XML file from your computer or from the server
      3. Click "View Log"
      4. Expand "Performance Modules" and check the metrics you would like to see
      5. Click "View Module(s)"
      6. In the following example, we can see a spike in the WebContainer average concurrently active threads (ActiveCount) at about 4:22PM:

      WAS TPV PMI XML log viewer screenshot


      This viewer is the same as what you would see when live monitoring with the addition of play, pause, fast-forward, and rewind buttons at the top (the above screenshot only shows the rewind button as the end of the file has been reached). By default, the viewer starts in play mode and iteratively shows more data in simulated real time.

      Note that you do not need to use the same administrative console to view the data as the cell that produced it, although it's best to use the same version and fixpack of WAS if possible. This technique allows you to gather PMI data in production and load it in a pre-production cell to avoid adding overhead to production.

      Discussion

      WAS PMI provides aggregated statistics for configured metrics that help you understand the overall behavior and performance of your applications. WAS TPV PMI logging may be used to log such statistics to files which may replayed through an administrative console or shared with IBM Support (when investigating potential product defects).

      Note that this discussion focuses on low-overhead, aggregated performance statistics. This approach is particularly useful to investigate general health issues and during performance tuning exercises. For more narrow diagnostic situations, alternatives are usually easier to use such as thread dumps, hung thread detection, diagnostic plans, request metrics, sampling profilers, database timing, HTTP access logs, and so on.

      Starting and stopping PMI logs through wsadmin

      The below example wsadmin script is provided as-is and without any support and it may be used to automate starting and stopping PMI logging.



      See our team's previous post in the Lessons from the field series: IBM Java and OpenJ9 Just-In-Time Compiler Tuning

      #app-platform-swat #websphere #WebSphereApplicationServer


      tpvlogging.py


      # Start, stop, query, or configure TPV logging on a set of servers
      # Example: wsadmin -username wsadmin -password wsadmin -lang jython -f tpvlogging.py -userprefs wsadmin -action start -server server1
      
      def usage():
      	print "usage: wsadmin -lang jython -f tpvlogging.py -action [start|stop|list|setlevel] -userprefs USER [-node NODE] [-server SERVER] [-pmilevel none|basic|extended|all]"
      	print "       -userprefs is required and you can just pass in the same user as -username for wsadmin, or any name otherwise"
      	print "       -pmilevel is only used with -action setlevel. Valid values are none, basic, extended, all"
      	print "       If neither -node nor -server are specified, then all application servers on all nodes will be executed"
      	print "       If -node is specified but -server isn't, then all application servers on the node will be executed"
      	print "       This script does not yet support a custom statistics set for -action setlevel"
      	sys.exit()
      
      import sys
      import com.ibm.ws.tpv.engine.UserPreferences as UserPreferences
      import com.ibm.ws.tpv.engine.utils.ServerBean as ServerBean
      import jarray
      import javax.management as mgmt
      
      sType = "APPLICATION_SERVER"
      action = "start"
      targetNode = ""
      targetApplicationServer = ""
      user = ""
      filename = "tpv"
      duration = 999999
      fileSize = 52428800
      numFiles = 5
      outputType = "xml" # or "bin"
      bufferSize = 40
      pmilevel = "extended" # only if -action setlevel
      help = 0
      refreshRate = 30
      affectedCount = 0
      verbose = 0
      
      l = len(sys.argv)
      i = 0
      while i < l:
      	arg = sys.argv[i]
      	if arg == "-help" or arg == "-h" or arg == "-usage" or arg == "-?":
      		help = 1
      	if arg == "-action":
      		action = sys.argv[i + 1]
      	if arg == "-node":
      		targetNode = sys.argv[i + 1]
      	if arg == "-server":
      		targetApplicationServer = sys.argv[i + 1]
      	if arg == "-userprefs":
      		user = sys.argv[i + 1]
      	if arg == "-filename":
      		filename = sys.argv[i + 1]
      	if arg == "-duration":
      		duration = int(sys.argv[i + 1])
      	if arg == "-filesize":
      		fileSize = int(sys.argv[i + 1])
      	if arg == "-numfiles":
      		numFiles = int(sys.argv[i + 1])
      	if arg == "-buffersize":
      		bufferSize = int(sys.argv[i + 1])
      	if arg == "-refreshrate":
      		refreshRate = int(sys.argv[i + 1])
      	if arg == "-outputtype":
      		outputType = sys.argv[i + 1]
      	if arg == "-pmilevel":
      		pmilevel = sys.argv[i + 1]
      	if arg == "-verbose":
      		verbose = 1
      	i = i + 1
      
      if help == 1:
      	usage()
      	
      if len(user) == 0:
      	print ""
      	print "ERROR: -userprefs must be specified (see usage below)"
      	print ""
      	usage()
      
      def getExceptionText(typ, value, tb):
      	value = `value`
      	sd = `tb.dumpStack()`
      	sd = sd.replace("\\\\","/")
      	i = sd.rfind("  File ")
      	j = sd.rfind(", line ")
      	k = sd.rfind(", in ")
      	locn = ""
      	if(i>0 and j>0 and k>0):
      		file = sd[i+7:j]
      		line = sd[j+7:k]
      		func = sd[k+4:-3]
      		locn = "Function="+func+"  Line="+line+"  File="+file
      	return value+" "+locn
      
      def convertToList( inlist ):
      	outlist = []
      	clist = None
      	if (len(inlist) > 0): 
      		if (inlist[0] == '[' and inlist[len(inlist) - 1] == ']'): 
      			if (inlist[1] == "\"" and inlist[len(inlist)-2] == "\""):
      				clist = inlist[1:len(inlist) -1].split(")\" ")
      			else:
      				clist = inlist[1:len(inlist) - 1].split(" ")
      		else:
      			clist = inlist.split(java.lang.System.getProperty("line.separator"))
      
      	if clist != None:
      		for elem in clist:
      			elem = elem.rstrip();
      			if (len(elem) > 0):
      				if (elem[0] == "\"" and elem[len(elem) -1] != "\""):
      					elem = elem+")\""
      				outlist.append(elem)
      	return outlist
      
      def listNodes():
      	nodes = AdminConfig.list("Node")
      	nodeList = convertToList(nodes)
      	return nodeList
      
      def listServers(serverType="", nodeName=""):
      	optionalParamList = []
      	if (len(serverType) > 0):
      		optionalParamList = ['-serverType', serverType]
      	if (len(nodeName) > 0):
      		node = AdminConfig.getid("/Node:" +nodeName+"/")
      		optionalParamList = optionalParamList + ['-nodeName', nodeName]
      	servers = AdminTask.listServers(optionalParamList)
      	servers = convertToList(servers)
      	newservers = []
      	for aServer in servers:
      		sname = aServer[0:aServer.find("(")]
      		nname = aServer[aServer.find("nodes/")+6:aServer.find("servers/")-1]
      		sid = AdminConfig.getid("/Node:"+nname+"/Server:"+sname)
      		if (newservers.count(sid) <= 0):
      			newservers.append(sid)
      	return newservers
      
      print "Action: " + action
      print "User: " + user
      print "Node: " + targetNode
      print "Server: " + targetApplicationServer
      print "File name: " + filename
      print "Duration: " + str(duration)
      print "File Size: " + str(fileSize)
      print "Historical Files: " + str(numFiles)
      print "Output type: " + outputType
      print "Refresh Rate: " + str(refreshRate)
      
      nodeList = listNodes()
      
      for nodeObject in nodeList:
      
      	nodeName = nodeObject.split("(")[0]
      
      	if len(targetNode) > 0 and targetNode.lower() != nodeName.lower():
      		print "Skipping node " + nodeName + " because it did not match targetNode"
      		continue
      
      	print ""
      	print "Processing node: " + nodeName
      
      	try:
      		# build list of Application Servers in the Node
      		serverList = listServers(sType,nodeName)
      	except:
      	    	typ, val, tb = sys.exc_info()
      		value = `val`
      		sd = `tb.dumpStack()`
      		sd = sd.replace("\\\\","/")
      		print "Could not process node. Probably the DMGR (which is ok to skip)? Continuing with the other nodes... " + value + " " + sd
      		continue
      
      	if verbose:
      		print "Number of servers: " + str(len(serverList))
      
      	for serverObject in serverList:
      		serverName = serverObject.split("(")[0]
      
      		if len(targetApplicationServer) > 0 and targetApplicationServer.lower() != serverName.lower():
      			if verbose:
      				print "Skipping server " + serverName + " (node " + nodeName + ")"
      			continue
      
      		prefs = UserPreferences()
      		prefs.setServerName(serverName)
      		prefs.setNodeName(nodeName)
      		prefs.setLoggingDuration(duration)
      		prefs.setLogFileSize(fileSize)
      		prefs.setNumLogFiles(numFiles)
      		prefs.setTpvLogFormat(outputType)
      		prefs.setLogFileName(filename)
      		prefs.setBufferSize(bufferSize)
      		prefs.setUserId(user)
      		prefs.setRefreshRate(refreshRate)
      
      		params = [prefs]
      		sig = ["com.ibm.ws.tpv.engine.UserPreferences"]
      
      		target = "node=" + nodeName
      		name = AdminControl.completeObjectName("type=TivoliPerfEngine," + target + ",*")
      		mbeanObjectName = mgmt.ObjectName(name)
      
      		display = nodeName + "\\" + serverName
      
      		if action == "start":
      			print "Calling TivoliPerfEngine.monitorServer on " + display
      			AdminControl.invoke_jmx(mbeanObjectName, "monitorServer", params, sig)
      
      			print "Calling TivoliPerfEngine.startLogging on " + display
      			AdminControl.invoke_jmx(mbeanObjectName, "startLogging", params, sig)
      
      			affectedCount = affectedCount + 1
      
      		elif action == "stop":
      			print "Calling TivoliPerfEngine.stopLogging on " + display
      			AdminControl.invoke_jmx(mbeanObjectName, "stopLogging", params, sig)
      
      			print "Calling TivoliPerfEngine.disableServer on " + display
      			AdminControl.invoke_jmx(mbeanObjectName, "disableServer", params, sig)
      
      			affectedCount = affectedCount + 1
      
      		elif action == "list":
      			print "Monitored Servers (by " + user + ")"
      			print "======================"
      			servers = AdminControl.invoke(name, "getMonitoredServers", user)
      			if len(servers) > 0:
      				isLoggingSig = ["com.ibm.ws.tpv.engine.utils.ServerBean"]
      				for server in servers.split("\n"):
      					pieces = server.split(".")
      					bean = ServerBean(pieces[0], pieces[1])
      					isLoggingParams = [bean]
      					res = AdminControl.invoke_jmx(mbeanObjectName, "isServerLogging", isLoggingParams, isLoggingSig)
      					perftarget = "node=" + nodeName + ",process=" + pieces[1]
      					perfname = AdminControl.completeObjectName("type=Perf," + perftarget + ",*")
      					print server + " ; Logging=" + str(res) + " ; Level=" + AdminControl.invoke(perfname, "getStatisticSet")
      			break # otherwise we'll do the list for each server in the node -- TODO break outter loop too?
      
      		elif action == "setlevel":
      			target = target + ",process=" + serverName
      			perfname = AdminControl.completeObjectName("type=Perf," + target + ",*")
      			# none, basic, extended, all, custom
      			print "Setting PMI level to " + pmilevel + " on " + serverName
      			AdminControl.invoke(perfname, "setStatisticSet", pmilevel)
      			AdminControl.invoke(perfname, "savePMIConfiguration")
      
      			affectedCount = affectedCount + 1
      
      		elif action == "debug":
      			print "Debug"
      
      		else:
      			print "Unknown action " + action
      
      print ""
      print "Script finished. " + str(affectedCount) + " servers affected."
      ​
      ​​​
      0 comments
      135 views

      Permalink