IBM Security Verify

 View Only

Troubleshooting IBM Security Information Queue using the logs stack

By Charles Schultz posted Wed October 16, 2019 12:46 PM

  

When trying to diagnose the cause of a problem in IBM Security Information Queue (ISIQ), the logs usually have some clues.  The vast majority of the log data is produced by the “connect” container, since it runs all of the Kafka Connectors, and they’re what do most of the work.  Thus, we will use the connect logs as the example here.  There are three main ways to view the connect log: raw, Docker, and Kibana in the optional log stack.  I will address all 3, but the focus will be on using Kibana.

 

The raw container logs require root access to view them.  They are located in “/var/lib/docker/containers/<containerID>”.  To find the container id, run: docker ps | grep connect_connect.  You should see output similar to:

 

ef3c6af23aa2        private-reg:5000/isiq-connect:latest                      "/bin/bash -c 'CONNE\u2026"   11 minutes ago      Up 11 minutes       8083/tcp, 9092/tcp             connect_connect.1.iome1ba9z66zihcze9aprhg7l

 

The first token is the one we’re interested in.  Using it, run cd /var/lib/docker/containers/ef3c6<tab_to_autocomplete>.  The full id is much longer than the truncated version shown in the docker ps output, but using tab-completion will take you to the correct folder.  Inside, you will find one or more files named “<containerID>-json.log(.1,2,etc.)”. And In those files, you will find log messages like:

 

{"log":"[2019-09-13 22:23:03,408] INFO [conn:scharle.atlantic.directory:LDAP query:116] 7 Available entries found matching baseDn=ou=org,dc=com, filter=(objectclass=erSystemRole), scope=sub, profile=ITIMGroup. (com.ibm.kafka.connect.directory.source.DirectorySourceTask)\n","stream":"stdout","time":"2019-09-13T22:23:03.408928028Z"}

 

If you prefer working with command line tools like sed, awk, grep, and cut, then this approach can provide you with raw material to work with when slicing and dicing the logs.  Aside from the format, requirement for root access, and having to figure out which directory the log is in, there is an additional drawback – you also have to do this on every node that’s running this container.

 

The second method, using docker service, solves the problem of having to track down all the nodes running this container.  Simply running docker service logs connect_connect will cause Docker to pull all the connect logs from all the nodes and collate them for you.  The output looks similar to:

 

connect_connect.1.iome1ba9z66z@csiiesdev    | [2019-09-13 22:21:36,057] DEBUG [conn:scharle.atlantic.directory:LDAP query:131] LDAP search found object: ersapagrname=SAP_HR_CPS_SPECIALIST,erglobalid=7327962232385622714,ou=services,erglobalid=00000000000000000000,ou=org,dc=com (com.ibm.kafka.connect.directory.source.DirectorySourceTask)

 

Note that the message itself is very similar to the raw json log, but it’s prefixed with the container identifier and hostname so you can see which one is experiencing the problem.  This approach is far easier than tracking down individual logs, but if you want to work with the data, you likely will need to redirect it into a file.  Some of the output is on STDERR, so you need to use this command to collect all the data:

 

docker service logs connect_connect > connect.log 2>&1

 

With that said, if you are using the logs stack that’s included with ISIQ, there is an even easier way to review the logs.  The first time you login to Kibana, you will need to setup an Elasticsearch index.  The index is based on the messages currently in Elasticsearch, so it is best to wait 15-30 minutes after originally starting the log stack before you try to work with Kibana.  The reason is that it will have seen nearly all the different types of messages, and thus will be able to report on the different fields.  It is very easy to re-index the messages later, though, if you want to start looking at the data right away.  The link to add an index moves around with each version of Kibana.  Currently, it can be found in the lower right of the “Add Data to Kibana” section of the landing page.


Once you select “Connect to your Elasticsearch index”, it should find your Elasticsearch data and ask you to create an Index Pattern name:

After clicking “Next step”, select the “@timestamp” option.


Then click “Create index pattern”.  If you ever need to re-create the index, select the gears (Management) icon at the bottom on the left side, select “Index Patterns”, the one you created (e.g. logstash*), then click the recycle button in the middle on the upper right.


With the index created, click the clock icon at the top on the left (Discover).  The default query is for all messages from the last 15 minutes, but everything is configurable.  To check for any problems, you could set a search filter of “severity:ERROR or Exception” and set the query window to the last 1 day.


In my case, several were found in the last hour.  An example:


The messages are displayed in reverse chronological order, so the most recent is always at the top.  In the list above, the first error (the bottom one) indicates a database problem.  However, checking the status code indicates it was a failure to close a connection.  This message will occur whenever a connector is restarted, e.g. when a subscription is added.  Even though it’s an error, it’s not one to be concerned about.

 

The next two are real problems.  The first indicates that the transformer encountered a problem.  When moving data from one product to another, the first step is to retrieve the data in the source topics.  When a subscription is added, a transformer component is added that knows how to convert the source data into a format the target system can use.  This transformer piece is driven by txdef.json.  If the transformer reports a problem, then there is a discrepancy between the data coming from the source and the format of that data as defined in txdef.json.  The solution is to compare the data from bad entries in the source topic with the txdef.json definition to see which attribute is coming over as a single value instead of a multi-value.

 

The last error is also a data problem due to the pm_code for an IGI person being null.  But because this message is from a Java connector, it has a thread id attached to it.  So you can see what else the thread has been doing by specifying that as the filter:


Here, we can see that the thread had just tried to process a group, but its service didn’t exist in IGI yet, so it was deferred.  It then reported what objects it has seen so far, which is just the one group.  Then a new message was given to it, our ill fated person record. IGI will not accept a Person without a pm_code, so there’s nothing that can be done with that record.  And the worker thread continues with its additional work.

 

The key here is that you can view all the logs from all containers on all nodes in one place.  You can also easily filter based on severity, thread, connector name, hostname, or text fragment from a message.  The only real drawback in Kibana is that it limits the result set to 500 entries for performance reasons.  Thus, you might have to refine your time window to a small slice to pick up the details you are looking for.  On the other hand, Kibana makes it very simple to set nearly any time window you like with millisecond granularity.

0 comments
2 views

Permalink