Liam,
You can add other columns to your "Group By" to get better results. I'm not 100% sure if that will help here but I have used it to avoid the results only showing "First" results.
SELECT UrlHost, devicetime, DATEFORMAT(devicetime, 'YYYY-MM-dd HH:mm:ss'), "ActionTaken", DATEFORMAT(starttime, 'YYYY-MM-dd HH:mm:ss')
FROM events
WHERE devicetype = <proxy_logsource> AND username ILIKE '<user_id>' AND devicetime < 1607527714000 AND devicetime > 1607526814000
GROUP BY "UrlHost", devicetime, "ActionTaken", starttime
ORDER BY devicetime DESC
LIMIT 5
START 1607526814000 STOP 1607527714000
------------------------------
JThur
------------------------------
Original Message:
Sent: Thu December 10, 2020 04:07 PM
From: Liam Mahoney
Subject: AQL - Selecting Distinct Field Values in 5 Events Closest to Point in Time
All,
I'm trying to write an AQL query that will give me the last 5 distinct domains a user visited before a certain point in time (e.g. the start time of an offense that fired with that user's account). I'm having some issues understanding what's going on with aggregation groups and how they're sorted when using a GROUP BY statement. For example if I run the AQL query
SELECT UrlHost, devicetime, DATEFORMAT(devicetime, 'YYYY-MM-dd HH:mm:ss'), "ActionTaken", DATEFORMAT(starttime, 'YYYY-MM-dd HH:mm:ss')
FROM events
WHERE devicetype = <proxy_logsource> AND username ILIKE '<user_id>' AND devicetime < 1607527714000 AND devicetime> 1607526814000
GROUP BY UrlHost
START 1607526814000 STOP 1607527714000
I find that this groups events by the UrlHost as expected. However, I still need to tune the AQL to find the events that happened closest to the STOP time of my AQL.
What I'm failing to understand is how the groupings (aggregates) of events are sorted. When clicking into an aggregate of events from the query above it appears that they are not sorted by their starttime. This makes me hesitant to trust the results from using a query like this:
SELECT UrlHost, FIRST(devicetime), DATEFORMAT(FIRST(devicetime), 'YYYY-MM-dd HH:mm:ss'), FIRST("ActionTaken"), DATEFORMAT(FIRST(starttime), 'YYYY-MM-dd HH:mm:ss')
FROM events
WHERE devicetype = <proxy_logsource> AND username ILIKE '<user_id>' AND devicetime < 1607527714000 AND devicetime > 1607526814000
GROUP BY "UrlHost"
ORDER BY FIRST(devicetime) DESC
LIMIT 5
START 1607526814000 STOP
1607527714000
because I don't know what the expected order of the events is and I'm telling AQL to use "the first entry of the rows in the aggregate" (FIRST description in the AQL guide, p. 28). Ideally I would like to say 'Use the row with the MAX devicetime', but I'm not sure how I would properly choose the rest of the fields like ActionTaken.
Does anyone have any advice or ideas?
Thanks,
Liam
------------------------------
Liam Mahoney
------------------------------