Optimizing the use of IHS Access logging features for WebSphere Commerce sites
Andres Voldman and Li Liang
Published on February 15, 2013
The IBM® HTTP Server (IHS) records all browser requests in the access.log file under the <IHSInstalldir>/logs directory. The information in the access log is critical for the resolution of session management issues, security problems, and performance problems. This article presents practical tips that take advantage of the powerful IHS logging capabilities and enable you to resolve issues faster.
Setting up custom access logging
IHS (and Apache®) provide powerful and flexible logging capabilities. By using the main IHS configuration file httpd.conf, you have full control of what logs are created, which condition requests are logged under, and which request and response attributes are recorded.
The standard way to define a new logger is to create a format and assign the format a nickname. The example shows the format named "common" which is defined by default in the httpd.conf under <IHSInstalldir>/conf directory:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
Table 1. Comments in the "common" format
For a detailed list of the format strings that can be included with the LogFormat directive, see the official Apache product documentation listed in the Related topics.
The CustomLog directive can be used to specify the location of the access log, and the format used. The CustomLog directive example uses the common format: CustomLog "/httplogs/access.log" common
You can define the custom log directive at multiple levels. For example, if defined inside a virtual host, only requests for that virtual host are logged. The directive can also use an environment variable to control whether to log a particular request. For more information about the CustomLog directive, see the official Apache product documentation listed in the Related topics.
When you use common format by specifying CustomLog "/httplogs/access.log" common, IHS creates a log similar to Listing 1:
Listing 1. Common format by specifying CustomLog "/httplogs/access.log" common.
1
2
3
4
5
|
9.26.190.217 - - [20/Jul/2012:15:08:21 -0400] "GET
/webapp/wcs/stores/servlet/en/madisons/electrical HTTP/1.1" 200 334735
9.26.190.217 - - [20/Jul/2012:15:36:20 -0400] "GET
/wcsstore/ExtendedSitesCatalogAssetStore/images/Brake_Pad1-70.png
HTTP/1.1" 200 3322
|
Troubleshooting Tips
Now that you have custom logging setup, here are several tips that will help you take the most advantage of IHS access logging. These tips are presented in no particular order.
Tip #1 - Logging response times
Logging the response time for each request is useful when debugging performance or timing issues.
By adding %D to the access log format, the server saves the response time in microseconds. For example:
LogFormat "%h %l %u %t \"%r\" %>s %b %D" common
One microsecond is a millionth of a second. This metric is typically too small to be practical; most people prefer to work with milliseconds instead. ( milliseconds = microseconds / 1000 )
When using the values logged with %D, keep in mind that these numbers include a network component. The %D format string counts the time until all the response bytes are passed to the TCP layer in the IHS system. If there is a network or configuration issue, the values logged in the access log will be much higher than the actual time used by the WebSphere Commerce JVMs to execute the request and return it to the plug-in component in IHS.
In general %D logging is a good reference of the JVMs performance, if you suspect the numbers are too high compared to the expected JVM performance, you should enable additional monitoring in the JVM and network layers.
For more information on %D access log format, refer to the IHS official product documentation listed in the Related topics.
Tip #2 - Logging the JVM that handled the request
During the troubleshooting process, it may help to know which JVM handles a particular request. There are multiple ways of logging the JVM processesing a request. Two of the most popular ways are to handle the %{WAS}e environment variable, or to use the JESSIONID cookie.
Using the %{WAS}e environment variable
For requests that are handled with the WebSphere plug-in, the plug-in module creates an environment variable called WAS which contains the server that handled the request. The WAS environment variable is logged in the access logs by adding the following to the LogFormat statement: %{WAS}e.
LogFormat "%h %l %u %t \"%r\" %>s %b %{WAS}e" common
When the %{WAS}e string is added, the hostname and transport port number of the JVM used to serve the request is logged. For example, "app1.torolab.ibm.com:9091".
Using the JSESSIONID cookie
JSESSIONID is the default name of the cookie that WebSphere uses for session management. This cookie has two components, a unique ID followed by a clone ID. The clone ID uniquely identifies a JVM in a cluster. For example: JSESSIONID=0000r2yG7KvBKWaTMUv-5IsCMC7:n1s1br/
. By logging the JSESSIONID cookie in the access log using %{JSESSIONID}C, the clone ID that identifies the JVM will also be available in the log. Then, using the plugin-cfg.xml configuration file, you can find the JVM associated to the clone ID.
Listing 2 shows that the CloneID n1s1 maps to the JVM running in host app1.torolab.ibm.com on port 9109.
Listing 2. Clone ID n1s1
1
2
3
4
5
6
|
< Server CloneID = "n1s1" ConnectTimeout = "0" ExtendedHandshake = "false"
LoadBalanceWeight = "2" MaxConnections = "-1"
Name = "WC_demo_node_node1_server2" ServerIOTimeout = "0"
WaitForContinue = "false" >
< Transport Hostname = "app1.torolab.ibm.com" Port = "9109" Protocol = "http" />
</ Server >
|
Since the default clone IDs can be difficult to memorize, you can use the HttpSessionCloneId custom property to assign the JVMs a new clone ID. Do not name the clone IDs too descriptively, because it can create a security concern. For more information on using the theHTTPSessionClondId, see Related topics.
Figure 1 shows the setting of the HttpSessionCloneId session management custom property to set for a new Clone ID.
Figure 1. Setting the HttpSessionCloneId custom property
The advantage of using the JSESSIONID cookie to determine the JVM is that the cookie tells the JVM that you are using right in the browser (by inspecting the cookies). The disadvantage is that the cookie will be missing on the first request for each session, and the clone ID will be incorrect if the request is failed over to another JVM.
Tip #3 – Correlating requests
For performance analysis, or during general troubleshooting, it is often required to identify all the requests that belong to a particular user or session. The techniques described below include using the client IP, the JSESSIONID, or the WebSphere Commerce session management cookies.
Using the client IP
The default format includes the remote host (%h) as the first element in the access log. Using the client IP is a common method of identifying requests from a single source.
When a load balancer is used, it is possible that the IP reported is that of the load balancer. Additional configuration could be required to log the real client IP instead.
Using the JSESSIONID cookie
As mentioned earlier, the JSESSIONID cookie can be used to uniquely identify a session. "Grepping" the log with a particular JSESSIONID value is a common technique to extract requests that belong to the same session.
Using WebSphere Commerce session management cookies
WebSphere Commerce has its own set of session management cookies, such as WC_AUTHENTICATION_<userID>, WC_USERACTIVITY_<userID> or WC_PERSISTENT, which can also be used for request correlation.
The advantage of using the WebSphere Commerce Session management cookies for correlation is that they contain the actual WebSphere Commerce user Id, which allows you to identify requests from a particular non-generic user.
Listing 3 shows how to extract the user ID from the name of the WC_USERACTIVITY cookie.
Listing 3. Extracting the User ID
1
2
3
4
|
SetEnvIf Cookie "WC_USERACTIVITY_(\d+|-\d+)=" wcUserId=$1
SetEnvIf wcUserId ^$ wcUserId=-1002
LogFormat "%h %l %u %t \"%r\" %>s %b %{wcUserId}e" user
CustomLog "/httplogs/user_log" user
|
Note: Listing 3 assumes the existence of a single WC_USERACTIVITY cookie. There are scenarios (such as multiple stores) where having multiple activity cookies can be valid.
Tip #4 - Limiting the number of requests that are logged
Web servers typically handle not only dynamic requests, which are processed by the Application Server, but also static content such as images, css, js files and others. Since all requests are logged by default, with busy sites, access logs can grow quickly, resulting in large logs that are difficult to manage.
IHS provides configuration options to filter the requests that are logged. Administrators can use these options to log only requests served from the WebSphere Commerce JVMs, or to log these requests on a separate file. This can considerably reduce the log sizes and simplify analysis.
Use the following techniques to limit the requests that are logged:
Using the WAS environment variable
For requests that are forwarded to the WebSphere Application Server JVMs, the plug-in creates an environment variable called "WAS" which contains the name of the JVM that handled the request. This variable can then be used as a condition for logging. If the variable is not defined for a request (meaning it was not handled by WebSphere Application Server), the log entry is not logged.
LogFormat "%h %l %u %t \"%r\" %>s %b" custom
CustomLog logs/access.log common env=WAS
Logging requests that have a particular root
The following example shows how to log requests that start with a common context root:
1
2
|
SetEnvIf Request_URI "^/shop/" storereq
CustomLog logs/access.log common env=storereq
|
Excluding files with specific extensions
Listing 4 shows how to exclude particular extensions, such as images:
Listing 4. Excluding extensions
1
2
3
|
SetEnvIf Request_URI \.gif nolog
SetEnvIf Request_URI \.jpg nolog
CustomLog logs/access.log combined env=!nolog
|
Defining logs at the virtual host level
By default, logs are defined at the root level, which means that they are applicable for all the virtual hosts. If you move the CustomLog entry inside the virtual host definition, the log is only applicable to requests for that virtual host. This technique can be used to create separate logs for SSL and non-SSL, or separate logs for the tools.
Listing 5 shows how to define the cmc.log log inside the 8002 virtual host:
Listing 5. Defining the cmc.log log inside the 8002 virtual host
1
2
3
4
5
6
7
8
9
10
11
12
|
< VirtualHost app01.torolab.ibm.com:8002>
SSLEnable
SSLClientAuth 0
ServerName mystore.torolab.ibm.com
Alias /adminconsole "/app/profiles/demo/
installedApps/WC_demo_cell/WC_demo.ear/SiteAdministration.war/tools/adminconsole/
wcsadmincon.html"
Alias /wcsstore "/app/profiles/demo/installedApps/WC_demo_cell/WC_demo.ear/Stores.war"
Alias /wcadmin "/app/profiles/demo/installedApps/WC_demo_cell/WC_demo.ear/
SiteAdministration.war"
CustomLog logs/cmc.log common
</ VirtualHost >
|
If you maintain the log definition at the root level, it is useful to add the port number to the log format by adding %p. %p indicates the canonical port used in the request, and you can use it to differentiate requests from the WebSphere Commerce tools versus store requests, or http versus https.
Tip #5 - Logging when a page was served by Dynacache
For performance analysis, it is important to know if a page is returned from cache. Starting with WebSphere Application Server V7.0.0.11, the JVM adds a response header called CACHED_RESPONSE to the log if the response was cached with DynaCache at the servlet or full page cache level. For more information about DynaCache, see the documentation page in the Related topics.
This header can be added to the access logs like this example:
%{CACHED_RESPONSE}o
.
If the page is retrieved from cache, the log adds "yes". If the header is missing, the log adds a dash (-) instead.
Tip #6 - Logging request and response cookies
Logging the request and response cookies is useful in troubleshooting session management issues. Logging the request and response cookies make the log considerably larger. But, this option is still more lightweight than enabling tracing in the application server layer. To log the request and response cookies, add the following parameters in the httpd.conf to add more information to the cookie in the access log:
Table 2. httpd.conf parameters
If you are only interested in specific request cookies, use the %{COOKIE}C
format instead. This approach does not work with the WebSphere Commerce cookies that include the user ID in the name, such as WC_USERACTIVITY_<number>
.
Tip #7 - Creating a parseable log format
If the log format is not defined correctly, parsing, or breaking the different elements down for analysis, will be difficult. In order to create a more parsable log format, you need to consider how you will parse it through every step of creating the log format. Although tools such as grep are commonly used to locate keywords in the logs, which may help you find the section you need to analyze, importing the log into a database table or to implementing scripts will allow for deeper analysis.
For example, if you plan to load the access log into a database table, you can configure the log format to surround values in quotes and separate them with commas, so that the log can be loaded directly into the database table using standard import tools that support the CSV format. If you don't use quotes and commas, it makes it difficult to load the access log into the database. Listing 6 demonstrates the use of quotes and commas:
Listing 6. Using commas and quotes in your logs
1
2
3
4
5
|
LogFormat "\"%h\",\"%{%Y-%m-%d-%H.%M.%S.000000}t\",%p,\"%H\",\"%m\",\"%U\",
\"%q\",%>s,\"%b\",\"
%{WAS}e\",%D,\"%{CACHED_RESPONSE}o\"" csv
CustomLog "/httplogs/access_csv" csv
|
Another option for making your access logs more parsable is to implement your own custom tool. There are different libraries available on the web that can help you with the parsing of the access log file. For example, for Perl applications, the Apache::LogRegex library can parse a log entry given a LogFormat. For more information about Apache::LogRegex see the Related topics. Listing 7 demonstrates using the Apache::LogRegex library to retrieve the value of %D from a log entry:
Listing 7. Retrieving %D from a log entry
1
2
3
4
5
|
my $lr = Apache::LogRegex->new($log_format);
....
# For each line
my %parsed_data = $lr->parse($line);
print "Response Time = " . $parsed_data{"%D"};
|
Tip #8 – Implementing log rotation
Given the speed at which access logs grow, set the log rotation so it continuously writes multiple logs with a specified log size. IHS includes the rotatelogs utility that automates rotation to be based on a time interval or size.
This example shows a new log being created whenever the size of the current log reaches 00: MB:CustomLog
"|bin/rotatelogs /httplogs/access.log 500M" common
For more information on how log rotation can be configured and used, see the Apache product documetation listed in the Related topics. The Apache documentation also mentions the cronolog tool as a more flexible option. For more information, see the tool's home page, listed in the Related topics. Besides log rotation, you can implement more (cron) script to compress, archive, and delete old logs.
Tip #9 - Logging compression statistics
Most access log formats include %b, which is the size of response in bytes. This can be useful to identify responses that are extraordinary large.
As most sites enable compression (mod_deflate), %b shows the compressed size of the response as opposed to the original one. For more details on the mod_deflate module, see Related topics.
Listing 8 shows how you can add the compressed size, uncompressed size, and compression ratio to the access logs:
Listing 8. Adding the compressed size, uncompressed size, and compression ratio to the access logs
1
2
3
4
|
DeflateFilterNote Input instream>
DeflateFilterNote Output outstream>
DeflateFilterNote Ratio ratio>
LogFormat "%h %l %u %t \"%r\" %>s %b %{outstream}n/%{instream}n(%{ratio}n%%)"
|
Conclusion
With the tips in this article you can take advantage of the powerful access logging capabilities of the IBM HTTP server. These tips provide performance analysis, security, and troubleshooting possibilities that will assist you in the managing and extracting of information from your logs.