This post is part of a series exploring the unique aspects and capabilities of WebSphere Liberty when running on z/OS.
We'll also explore considerations when moving from WebSphere traditional on z/OS to Liberty on z/OS.
The next post in the series is here.
To start at the beginning, follow this link to the first post.
---------------
We covered much of this in the introduction to diagnostics post a week or so ago, but I wanted to just review all the different places you might need to look if you’re having a problem. Some we talked about before, but others I didn’t mention.
Obviously, the quickest thing to look for are messages. As we’ve discussed these often turn up in messages.log, or in the MSGLOG DD (next week’s topic). You can also look in the hardcopy or system log (SYSLOG). Some messages might also turn up in the message log DD for the started task of the server (the JESMSGLG DD).
These are all (well, except the messages.log file) pretty typical places to look for messages for a server running as a started task on z/OS. While you’ve got the output for the started task open in SDSF (or whatever you use) you should probably also have a look at the SYSOUT DD. You may find more messages in here that can give you a clue about what is going on.
Something we haven’t talked about before in this series is First Failure Data Capture or FFDC. This is a part of the Liberty runtime that tries to capture information about a failure when it happens. It is kind of like a dump you might get from an abend in a traditional z/OS server. Of course from Java you tend not to get dumps or abends. Instead you get exceptions. And scattered around the Liberty code are Java catch blocks for various exceptions that we know might get thrown if errors occur. And from those catch blocks we invoke our FFDC code. This is basically like an ESTAE catching an abend and causing doc to be gathered.
So where does the doc collected by the FFDC code go? They are captured in a directory under wherever you have the logs going called ‘ffdc’. All the doc for any particular incident is captured into a single file and that file has a date/time stamp on it to help you correlate it to whatever problem you’re looking into. There’s also a sort of index file that has a record for each FFDC captured. You can configure the maximum number of FFDCs to keep around so they don’t build up over time. The FFDC code will also try to avoid having a lot of duplicate files for the same error (sort of like DAE on z/OS).
Even if your server isn’t having a problem, you might want to keep an eye out for FFDCs being captured and try to get them resolved. If you have an actual problem you’d probably rather not have a bunch of these for non-problems confusing the issue.
Finally, there are traces. We’ll talk about this more in an upcoming post, but, depending on the problem you’re having, support might ask you to enable some tracing. These will accumulate in files in the logs directory. You can configure how big they are allowed to get and how many to keep so it doesn’t swamp your file system.
There is some minimal level of tracing (info) you should probably have on all the time. The overhead is minimal and it can provide some context that might be useful. On the other hand, I’d strongly discourage turning on “all the tracing” to try to debug something. There is a lot of code in Liberty and enabling everything will slow things down so much the server will likely be completely unusable (not to mention creating a mountain of trace output that is, with rolling deletions, be unlikely to contain the thing you wanted anyway).