SQL To Compare Events Against Previous Activity

WITH ordered_events AS (
 
  SELECT
    *,
 
    LAG(event_timestamp) OVER (
      PARTITION BY user_pseudo_id
      ORDER BY event_timestamp
    ) AS previous_timestamp,
 
    LAG(source) OVER (
      PARTITION BY user_pseudo_id
      ORDER BY event_timestamp
    ) AS previous_source
 
  FROM base_events
 
)

This query is preparing your GA4 event data for session reconstruction.

The goal is:

“For every event, look at the event immediately before it for the same user.”

That allows you to detect:

  • inactivity gaps
  • traffic source changes
  • session boundaries
  • journey continuity

You start with raw GA4 events like this:

user_pseudo_idevent_timestampsource
A10:00google
A10:05google
A11:00facebook
B09:00direct

But session stitching requires knowing:

“What happened BEFORE this event?”

That’s what LAG() does.

What This Query Is Doing

WITH ordered_events AS (

This creates a temporary table called:

ordered_events

Think of it like:

“A cleaned and sequenced event stream.”

Step 1: Select All Existing Fields

SELECT  *,

This keeps all original columns from base_events.

So if base_events has:

  • user_pseudo_id
  • event_name
  • event_timestamp
  • source
  • medium

they all remain.

Step 2: Get Previous Event Timestamp


LAG(event_timestamp) OVER (
  PARTITION BY user_pseudo_id  
  ORDER BY event_timestamp
) AS previous_timestamp


This is the important part.

What LAG() Means

LAG() says:

“Look one row backward.”

So for each event:

  • get the previous event’s timestamp
  • for the SAME user
  • ordered chronologically
  • Exmple

Suppose user A has:

event_timestamp
10:00
10:05
11:00

After LAG():

event_timestampprevious_timestamp
10:00NULL
10:0510:00
11:0010:05

Now you can calculate:

11:00 - 10:05 = 55 minutes

That might indicate:

“New session started.”

Why PARTITION BY user_pseudo_id Matters

PARTITION BY user_pseudo_id

This resets the calculation for each user.

Without it:

  • events from different users would mix together
  • session logic would break

So this says:

“Only compare events within the same user.”


Why ORDER BY event_timestamp Matters

ORDER BY event_timestamp

This ensures events are processed chronologically.

Otherwise:

  • previous events would be random
  • inactivity calculations become meaningless

Step 3: Get Previous Traffic Source


LAG(source) OVER (  
     PARTITION BY user_pseudo_id  
     ORDER BY event_timestamp
)
AS previous_source

Now we do the same thing — but for traffic source.Example

Before:

timestampsource
10:00google
10:05google
11:00facebook

After:

timestampsourceprevious_source
10:00googleNULL
10:05googlegoogle
11:00facebookgoogle

Now you can detect:

“Did acquisition source change?”

If:

source != previous_source

you may decide:

“Start a new session.”

Final Output Structure

After this query, your data becomes:

usertimestampsourceprevious_timestampprevious_source
A10:00googleNULLNULL
A10:05google10:00google
A11:00facebook10:05google

This gives you the foundation for:

  • inactivity analysis
  • session stitching
  • attribution continuity
  • journey reconstruction

Why This Matters in GA4 BigQuery

GA4 exports:

  • raw events
  • NOT reconstructed sessions

So to rebuild sessions:

  1. order events
  2. compare against previous behavior
  3. detect boundaries
  4. assign session IDs

This query handles:

Step #2: behavioral comparison.

The Mental Model

Think of LAG() as:

“Attach the previous row’s value onto the current row.”

That’s it.

For analytics engineering, this becomes extremely powerful for:

  • sessionization
  • funnel progression
  • churn analysis
  • attribution modeling
  • pathing analysis

Discover more from GA4 BIGQUERY ANALYTICS

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from GA4 BIGQUERY ANALYTICS

Subscribe now to keep reading and get access to the full archive.

Continue reading