SQL To Compare GA4 Events Against Previous Activity

WITH ordered_events AS (
 
  SELECT
    *,
 
    LAG(event_timestamp) OVER (
      PARTITION BY user_pseudo_id
      ORDER BY event_timestamp
    ) AS previous_timestamp,
 
    LAG(source) OVER (
      PARTITION BY user_pseudo_id
      ORDER BY event_timestamp
    ) AS previous_source
 
  FROM base_events
 
)

This query is preparing your GA4 event data for session reconstruction.

The goal is:

“For every event, look at the event immediately before it for the same user.”

That allows you to detect:

  • inactivity gaps
  • traffic source changes
  • session boundaries
  • journey continuity

You start with raw GA4 events like this:

user_pseudo_idevent_timestampsource
A10:00google
A10:05google
A11:00facebook
B09:00direct

But session stitching requires knowing:

“What happened BEFORE this event?”

That’s what LAG() does.

What This Query Is Doing

WITH ordered_events AS (

This creates a temporary table called:

ordered_events

Think of it like:

“A cleaned and sequenced event stream.”

Step 1: Select All Existing Fields

SELECT  *,

This keeps all original columns from base_events.

So if base_events has:

  • user_pseudo_id
  • event_name
  • event_timestamp
  • source
  • medium

they all remain.

Step 2: Get Previous Event Timestamp

LAG(event_timestamp) OVER (
  PARTITION BY user_pseudo_id  
  ORDER BY event_timestamp
) AS previous_timestamp


This is the important part.

What LAG() Means

LAG() says:

“Look one row backward.”

So for each event:

  • get the previous event’s timestamp
  • for the SAME user
  • ordered chronologically
  • Exmple

Suppose user A has:

event_timestamp
10:00
10:05
11:00

After LAG():

event_timestampprevious_timestamp
10:00NULL
10:0510:00
11:0010:05

Now you can calculate:

11:00 - 10:05 = 55 minutes

That might indicate:

“New session started.”

Why PARTITION BY user_pseudo_id Matters

PARTITION BY user_pseudo_id

This resets the calculation for each user.

Without it:

  • events from different users would mix together
  • session logic would break

So this says:

“Only compare events within the same user.”

Why ORDER BY event_timestamp Matters

ORDER BY event_timestamp

This ensures events are processed chronologically.

Otherwise:

  • previous events would be random
  • inactivity calculations become meaningless

Step 3: Get Previous Traffic Source

LAG(source) OVER (  
     PARTITION BY user_pseudo_id  
     ORDER BY event_timestamp
)
AS previous_source

Now we do the same thing — but for traffic source.Example

Before:

timestampsource
10:00google
10:05google
11:00facebook

After:

timestampsourceprevious_source
10:00googleNULL
10:05googlegoogle
11:00facebookgoogle

Now you can detect:

“Did acquisition source change?”

If:

source != previous_source

you may decide:

“Start a new session.”

Final Output Structure

After this query, your data becomes:

usertimestampsourceprevious_timestampprevious_source
A10:00googleNULLNULL
A10:05google10:00google
A11:00facebook10:05google

This gives you the foundation for:

  • inactivity analysis
  • session stitching
  • attribution continuity
  • journey reconstruction

Why This Matters in GA4 BigQuery

GA4 exports:

  • raw events
  • NOT reconstructed sessions

So to rebuild sessions:

  1. order events
  2. compare against previous behavior
  3. detect boundaries
  4. assign session IDs

This query handles:

Step #2: behavioral comparison.

The Mental Model

Think of LAG() as:

“Attach the previous row’s value onto the current row.”

That’s it.

For analytics engineering, this becomes extremely powerful for:

  • sessionization
  • funnel progression
  • churn analysis
  • attribution modeling
  • pathing analysis

Discover more from GA4 BIGQUERY ANALYTICS

Subscribe to get the latest posts sent to your email.

Discover more from GA4 BIGQUERY ANALYTICS

Subscribe now to keep reading and get access to the full archive.

Continue reading