Skip to content

Commit 986f34b

Browse files
Create finding-a-gap-in-timeseries-data-and-or-gaps-and-islands-using-sql.md
1 parent 3ad71ee commit 986f34b

File tree

1 file changed

+114
-0
lines changed

1 file changed

+114
-0
lines changed
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Finding a Gap in Timeseries data and or Gaps and Islands using SQL
2+
3+
A common problem in analyzing a Log data from applications is to create sessions from logged user activity. A user may use the app for couple of hours in day and then come back to it the next day. The log will register the user activity, but the logs will not tell us when the session end for the day, and a new session started the next day. Typically a sessions are based on _m_ minutes of activity and _n_ minutes of no activity. _n_ could be minutes, for example. This is also known as [Gaps-and-Islands](sessionization.md) problem in Computer Science.
4+
5+
## Sample log data
6+
7+
|username|log_timestmap|
8+
|--------|-------------------------|
9+
| Angela | 2020-08-07 20:10:00.000 |
10+
| Scott | 2020-08-07 20:10:00.000 |
11+
| Bob | 2020-08-07 20:10:00.000 |
12+
| Bob | 2020-08-07 20:20:00.000 |
13+
| Angela | 2020-08-07 20:20:00.000 |
14+
| Scott | 2020-08-07 20:20:00.000 |
15+
| Bob | 2020-08-07 20:30:00.000 |
16+
| Angela | 2020-08-07 20:30:00.000 |
17+
| Scott | 2020-08-07 20:30:00.000 |
18+
| Angela | 2020-08-07 20:40:00.000 |
19+
| Scott | 2020-08-07 20:40:00.000 |
20+
| Bob | 2020-08-07 20:50:00.000 |
21+
| Angela | 2020-08-07 20:50:00.000 |
22+
| Scott | 2020-08-07 20:50:00.000 |
23+
| Bob | 2020-08-07 21:00:00.000 |
24+
| Bob | 2020-08-07 21:10:00.000 |
25+
| Scott | 2020-08-07 22:00:00.000 |
26+
| Scott | 2020-08-07 22:20:00.000 |
27+
| Scott | 2020-08-07 22:30:00.000 |
28+
29+
### Desired output
30+
31+
From the above App Log, let's say we need to display
32+
33+
1. user
34+
2. begin_timestamp (begin of the session)
35+
3. end_timestamp (end of the session)
36+
37+
We will define the _session_ to be 10 mins i.e. If there is no activity for 10 minutes, the session should be considered as ended.
38+
39+
### MATCH_RECOGNIZE Query to create sessions
40+
41+
Below we will use SQL's [MATCH_RECOGNIZE](applied-overview-of-MATCH_RECOGNIZE-clause.md) to _sessionize_ this data
42+
43+
```sql
44+
SELECT username,
45+
session_start
46+
, session_end
47+
FROM app_log
48+
MATCH_RECOGNIZE(
49+
PARTITION BY username
50+
ORDER BY log_timestamp
51+
MEASURES
52+
first_value(log_timestamp) AS session_start,
53+
last_value(log_timestamp) AS session_end
54+
PATTERN (session_start continuous_activity * )
55+
DEFINE
56+
continuous_activity AS log_timestamp <= dateadd('minute', 10, lag(log_timestamp))
57+
)
58+
```
59+
60+
61+
### Query output
62+
63+
|Username|session_start|session_end|
64+
|--------|-------------------------|-------------------------|
65+
| Scott | 2020-08-07 20:10:00.000 | 2020-08-07 20:50:00.000 |
66+
| Scott | 2020-08-07 22:00:00.000 | 2020-08-07 22:07:00.000 |
67+
| Scott | 2020-08-07 22:20:00.000 | 2020-08-07 22:30:00.000 |
68+
| Bob | 2020-08-07 20:10:00.000 | 2020-08-07 20:30:00.000 |
69+
| Bob | 2020-08-07 20:50:00.000 | 2020-08-07 21:10:00.000 |
70+
| Angela | 2020-08-07 20:10:00.000 | 2020-08-07 20:50:00.000 |
71+
72+
### CONDITIONAL_TRUE_EVENT Query to create sessions
73+
74+
Another way to _sessionize_ this App Log is to use [CONDITIONAL_TRUE_EVENT](conditional_true_event.md) in SQL
75+
76+
```sql
77+
select
78+
username
79+
, log_timestamp
80+
, datediff(
81+
minute
82+
, lag(log_timestamp) over (partition by username order by log_timestamp asc)
83+
, log_timestamp
84+
) as minutes_since_last_action
85+
, conditional_true_event(minutes_since_last_action > 10)
86+
over (partition by username order by log_timestamp asc)
87+
as session_count
88+
from app_log;
89+
```
90+
91+
### Query output
92+
93+
|usernane|log_timestamp|minutes_since_last_action|session_count|
94+
|--------|-------------------------|----|---|
95+
| Scott | 2020-08-07 20:10:00.000 | | 0 |
96+
| Scott | 2020-08-07 20:20:00.000 | 10 | 0 |
97+
| Scott | 2020-08-07 20:30:00.000 | 10 | 0 |
98+
| Scott | 2020-08-07 20:40:00.000 | 10 | 0 |
99+
| Scott | 2020-08-07 20:50:00.000 | 10 | 0 |
100+
| Scott | 2020-08-07 22:00:00.000 | 70 | 1 |
101+
| Scott | 2020-08-07 22:07:00.000 | 7 | 1 |
102+
| Scott | 2020-08-07 22:20:00.000 | 13 | 2 |
103+
| Scott | 2020-08-07 22:30:00.000 | 10 | 2 |
104+
| Bob | 2020-08-07 20:10:00.000 | | 0 |
105+
| Bob | 2020-08-07 20:20:00.000 | 10 | 0 |
106+
| Bob | 2020-08-07 20:30:00.000 | 10 | 0 |
107+
| Bob | 2020-08-07 20:50:00.000 | 20 | 1 |
108+
| Bob | 2020-08-07 21:00:00.000 | 10 | 1 |
109+
| Bob | 2020-08-07 21:10:00.000 | 10 | 1 |
110+
| Angela | 2020-08-07 20:10:00.000 | | 0 |
111+
| Angela | 2020-08-07 20:20:00.000 | 10 | 0 |
112+
| Angela | 2020-08-07 20:30:00.000 | 10 | 0 |
113+
| Angela | 2020-08-07 20:40:00.000 | 10 | 0 |
114+
| Angela | 2020-08-07 20:50:00.000 | 10 | 0 |

0 commit comments

Comments
 (0)