Hi Jeff,
We use Splunk for log file analysis and are just starting to send metric into Splunk HEC (JSON file ingestion) for additional monitoring of DB and OS metrics. While this is more of overall operations monitoring and failure analysis, security items could be treated in the same manner, probably using the security log more so than some of the other logs.
Not sure what areas your looking for pointers on, but we ingest the system, security and ffdc (Liberty) logs. I have built quite a few dashboards focused on particular user use cases: (developer – error log keyword searching, counts, etc., infrastructure – mostly counts, logins, failures by type, etc., errors, etc.).
One key idea that might be helpful when using Splunk, if you have multiple environments (e.g. PROD, QAS, DEV, sandboxes, etc.) and/or multiple servers per environment, is to use Splunk "categories" for tagging attributes. Tagging such things as
"environnement" – PROD, QAS, DEV, etc.,
"server_type » - APP, PROCESS,
"server_subtype" – (more for multiple process servers) such as ASYNC (for APP servers), Un-named, REM, CAD, BIRT (if you have multiple Process servers with named users)
Location – (if you have environments in various locations, you could use the company location code, state, etc.)
(other categories as you see fit)
These are all built into a Splunk category field (for each server's assetinfo.json file, used be the Universal Forwarder) so that each log push to Splunk has these attributes along with the log data.
When building your Splunk dashboards, you can use input selectors for each of the categories (above) to filter the logs you want to show/aggregate. For instance, having an environment selector on a Splunk dashboard will let you select all of the "PROD" server logs (all consolidated by time) into one table of entries (searching for say "error"). This will show you all error log records (with "error" in it) from all the PROD servers, all together in chronological order. (this would be very time-consuming, if at all possible, if you were to look at each servers log files separately)...
This allows us to correlate errors, issues BETWEEN the multiple servers, in our PROD environment. (this is just one example of how using the categories help you to group log data in Splunk).
In your case, you could use the security log, in the same way (above) looking for authenticated users, etc. (for security)
I know this will probably (hopefully) spawn more dialog J (always looking for ways to monitor/analyze Tririga)
Regards,
Lester
Lester Drazin
Lockheed Martin Corporation