Analysing State from Event Logs in SQL
A task that comes up repeatedly is evaluation of state intervals generated from event logs. If the event log is stored (or can be migrated to) SQL Server, this can be done easily and performantly in SQL.
I generally work with SQL Server but this technique appears to be compatible with most modern SQL implementations.
As an example, a system might have a number of valid states and an event is captured each time the system enters a new state.
| Time | State |
|---|---|
| 2022-03-24 07:03:10.000 | B |
| 2022-03-24 07:05:10.000 | C |
| 2022-03-24 07:35:11.000 | B |
| 2022-03-24 08:05:11.000 | A |
| 2022-03-24 08:41:11.000 | B |
| 2022-03-24 09:11:12.000 | A |
What we’re after is a list of intervals showing the time that the system was in each state. This can be done with a query like:
WITH [CTE_SequencedEvents] AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY [Time]) AS [Sequence]
FROM [#Events]
)
SELECT
StartEvent.[Time] AS [Start Time]
,EndEvent.[Time] AS [End Time]
,DATEDIFF(SECOND, StartEvent.[Time], EndEvent.[Time])/60.0 AS [Duration (Minutes)]
,StartEvent.[State]
FROM [CTE_SequencedEvents] AS StartEvent
JOIN [CTE_SequencedEvents] AS EndEvent ON EndEvent.[Sequence] = StartEvent.[Sequence] + 1
ORDER BY StartEvent.[Time]
Here’s what’s happening:
- The
ROW_NUMBER()windowing function is used to generate a sequential list of row numbers on the fly. - A CTE (common table expression) is wrapped around the sequenced rows. This can be referenced multiple times in the query.
- Each row of the CTE is joined back on the next row, creating a contiguous timeline of intervals.
This gives us the required intervals:
| Start Time | End Time | Duration (Minutes) | State |
|---|---|---|---|
| 2022-03-24 07:03:10.000 | 2022-03-24 07:05:10.000 | 2.000000 | B |
| 2022-03-24 07:05:10.000 | 2022-03-24 07:35:11.000 | 30.016666 | C |
| 2022-03-24 07:35:11.000 | 2022-03-24 08:05:11.000 | 30.000000 | B |
| 2022-03-24 08:05:11.000 | 2022-03-24 08:41:11.000 | 36.000000 | A |
| 2022-03-24 08:41:11.000 | 2022-03-24 09:11:12.000 | 30.016666 | B |
Wrap the intervals up in another CTE for readability and the intervals can be used for all sort of analysis, for example a histogram of time in each state:
WITH [CTE_SequencedEvents] AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY [Time]) AS [Sequence]
FROM [#Events]
)
,[CTE_Intervals] AS
(
SELECT
StartEvent.[Time] AS [StartTime]
,EndEvent.[Time] AS [EndTime]
,DATEDIFF(SECOND, StartEvent.[Time], EndEvent.[Time])/60.0 AS [DurationMinutes]
,StartEvent.[State]
FROM [CTE_SequencedEvents] AS StartEvent
JOIN [CTE_SequencedEvents] AS EndEvent ON EndEvent.[Sequence] = StartEvent.[Sequence] + 1
)
SELECT
[State]
,SUM([DurationMinutes]) AS [Total Duration (Minutes)]
FROM [CTE_Intervals]
GROUP BY [State]
ORDER BY [State]
| State | Total Duration (Minutes) |
|---|---|
| A | 36.000000 |
| B | 62.016666 |
| C | 30.016666 |
Cleaning Up Duplicate and Repeated Events
Depending on the source of the data, you may have duplicate or repeated events:
| Time | State | ||
|---|---|---|---|
| 2022-03-24 07:03:10.000 | A | ||
| 2022-03-24 07:05:10.000 | C | ||
| 2022-03-24 07:05:10.000 | C | ||
| 2022-03-24 07:35:11.000 | B | ||
| 2022-03-24 08:05:11.000 | A | ||
| 2022-03-24 08:41:11.000 | B | ||
| 2022-03-24 08:53:18.000 | B | ||
| 2022-03-24 09:11:12.000 | A |
| Start Time | End Time | Duration (Minutes) | State |
|---|---|---|---|
| 2022-03-24 07:03:10.000 | 2022-03-24 07:05:10.000 | 2.000000 | A |
| 2022-03-24 07:05:10.000 | 2022-03-24 07:05:10.000 | 0.000000 | C |
| 2022-03-24 07:05:10.000 | 2022-03-24 07:35:11.000 | 30.016666 | C |
| 2022-03-24 07:35:11.000 | 2022-03-24 08:05:11.000 | 30.000000 | B |
| 2022-03-24 08:05:11.000 | 2022-03-24 08:41:11.000 | 36.000000 | A |
| 2022-03-24 08:41:11.000 | 2022-03-24 08:53:18.000 | 12.116666 | B |
| 2022-03-24 08:53:18.000 | 2022-03-24 09:11:12.000 | 17.900000 | B |
| State | Minimum Duration (Minutes) | ||
|---|---|---|---|
| A | 2.000000 | ||
| B | 12.116666 | ||
| C | 0.000000 |
| Start Time | End Time | Duration (Minutes) | State |
|---|---|---|---|
| 2022-03-24 07:03:10.000 | 2022-03-24 07:05:10.000 | 2.000000 | A |
| 2022-03-24 07:05:10.000 | 2022-03-24 07:35:11.000 | 30.016666 | C |
| 2022-03-24 07:35:11.000 | 2022-03-24 08:05:11.000 | 30.000000 | B |
| 2022-03-24 08:05:11.000 | 2022-03-24 08:53:18.000 | 48.116666 | A |
| 2022-03-24 08:53:18.000 | 2022-03-24 09:11:12.000 | 17.900000 | B |
Truncating the Reporting Window
Often state analysis needs to be performed between a fixed start and end time. It’s unlikely that your events will align with the bounds of the reporting window and we’ll need to employ a few tricks to truncate the intervals so that the total duration of reported intervals is the same as the duration of the reporting window.
Let’s take our original data set and report on intervals between 7:30AM. and 8:30AM.
| Time | State |
|---|---|
| 2022-03-24 07:03:10.000 | A |
| 2022-03-24 07:05:10.000 | C |
| 2022-03-24 07:35:11.000 | B |
| 2022-03-24 08:05:11.000 | A |
| 2022-03-24 08:41:11.000 | B |
| 2022-03-24 09:11:12.000 | A |
Note that the highlighted window falls outside the reporting window but remains of relevance, as it is required to determine the state for the interval between 7:05AM and 7:35AM to be truncated by the start of the reporting interval.
This can be achieved by adding a truncating step in our CTE pipeline:
DECLARE @reportStart datetime = '2022-03-24 07:30:00.000';
DECLARE @reportEnd datetime = '2022-03-24 08:30:00.000';
WITH [CTE_SequencedEvents] AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY [Time]) AS [Sequence]
FROM [#Events]
)
,[CTE_Intervals] AS
(
SELECT
StartEvent.[Time] AS [StartTime]
,EndEvent.[Time] AS [EndTime]
,StartEvent.[State]
FROM [CTE_SequencedEvents] AS StartEvent
JOIN [CTE_SequencedEvents] AS EndEvent ON EndEvent.[Sequence] = StartEvent.[Sequence] + 1
)
,[CTE_TruncatedIntervals] AS
(
SELECT
CASE WHEN [StartTime] < @reportStart THEN @reportStart ELSE [StartTime] END AS [StartTime]
,CASE WHEN [EndTime] > @reportEnd THEN @reportEnd ELSE [EndTime] END AS [EndTime]
,[State]
FROM [CTE_Intervals]
WHERE @reportStart < [EndTime] AND @reportEnd > [StartTime]
)
SELECT
[StartTime] AS [Start Time]
,[EndTime] AS [End Time]
,DATEDIFF(SECOND, [StartTime], [EndTime])/60.0 AS [Duration (Minutes)]
,[State]
FROM [CTE_TruncatedIntervals]
The condition @reportStart < [EndTime] AND @reportEnd > [StartTime] filters out any intervals that do not overlap with the reporting window. The diagram below may help to see why.

Note also the CASE statements that serve to discard the portion of any interval laying outside the reporting window boundary.
On large datasets this query will not perform well as the filtering overlap condition cannot be calculated without fully loading all events. If the maximum interval size is known, the following modification may be made to the CTE_Events clause to improve performance.
DECLARE @reportStart datetime = '2022-03-24 07:30:00.000';
DECLARE @reportEnd datetime = '2022-03-24 08:30:00.000';
DECLARE @maximumIntervalSizeMinutes int = 60;
WITH [CTE_SequencedEvents] AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY [Time]) AS [Sequence]
FROM [#Events]
WHERE [Time] > DATEADD(MINUTE, -@maximumINtervalSizeMinutes, @reportStart)
AND [Time] < DATEADD(MINUTE, @maximumIntervalSizeMinutes, @reportEnd)
)
,[CTE_Intervals] AS
(
SELECT
StartEvent.[Time] AS [StartTime]
,EndEvent.[Time] AS [EndTime]
,StartEvent.[State]
FROM [CTE_SequencedEvents] AS StartEvent
JOIN [CTE_SequencedEvents] AS EndEvent ON EndEvent.[Sequence] = StartEvent.[Sequence] + 1
)
,[CTE_TruncatedIntervals] AS
(
SELECT
CASE WHEN [StartTime] < @reportStart THEN @reportStart ELSE [StartTime] END AS [StartTime]
,CASE WHEN [EndTime] > @reportEnd THEN @reportEnd ELSE [EndTime] END AS [EndTime]
,[State]
FROM [CTE_Intervals]
WHERE @reportStart < [EndTime] AND @reportEnd > [StartTime]
)
SELECT
[StartTime] AS [Start Time]
,[EndTime] AS [End Time]
,DATEDIFF(SECOND, [StartTime], [EndTime])/60.0 AS [Duration (Minutes)]
,[State]
FROM [CTE_TruncatedIntervals]