Session counts in GA4 often don’t match what you see in BigQuery exports. This is because GA4 constructs sessions using rules like inactivity timeouts, traffic source changes, and midnight splits.
For analysts and marketers, this can lead to confusion in attribution, engagement tracking, and conversion reporting. In this post, we’ll break down session stitching, provide simple and advanced SQL examples, and show how to handle common real-world scenarios.
The Problem: Why Sessions Don’t Match
GA4 defines a session as a group of user interactions within a specific time frame. However, sessions can be split or duplicated due to:
- Inactivity Timeout: Sessions end after 30 minutes of user inactivity.
- Traffic Source Changes: Switching campaigns mid-session triggers a new session.
- Midnight Cutoff: Sessions crossing midnight are split into two sessions.
- Device Changes: Optionally, a device category change can start a new session.
Impact:
- Session counts fluctuate unexpectedly
- GA4 UI reports differ from BigQuery exports
- Attribution and engagement analysis can be misleading
Solution: Reconstruct Sessions in BigQuery
BigQuery lets you stitch events into sessions based on business rules, giving you accurate session counts and metrics.
1. Simple SQL for Session Counting
If you only need basic session counts, you can use the session_start event:
SELECT
user_pseudo_id,
COUNT(event_name) AS sessions
FROM
`your_project.analytics_XXXX.events_*`
WHERE
event_name = 'session_start'
GROUP BY user_pseudo_id
ORDER BY sessions DESC;
Explanation:
- This counts every
session_startper user.- Simple but doesn’t account for traffic source changes, midnight splits, or inactivity longer than 30 minutes.
2. Deep Dive SQL: Session Stitching With Multiple Scenarios
For accurate, session-level analysis, use this advanced query:
-- Step 1: Identify previous events per user
WITH events_with_lag AS (
SELECT
user_pseudo_id,
user_id, -- Optional for cross-device tracking
event_timestamp,
event_name,
traffic_source.source AS traffic_source,
traffic_source.medium AS traffic_medium,
device.category AS device_category,
LAG(event_timestamp) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_ts,
LAG(traffic_source.source) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_source,
LAG(device.category) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_device
FROM
`your_project.analytics_XXXX.events_*`
),
-- Step 2: Define session boundaries
session_flags AS (
SELECT
*,
CASE
WHEN TIMESTAMP_DIFF(TIMESTAMP_MICROS(event_timestamp), TIMESTAMP_MICROS(prev_ts), MINUTE) > 30 THEN 1
WHEN traffic_source != prev_source THEN 1
WHEN EXTRACT(DATE FROM TIMESTAMP_MICROS(event_timestamp)) != EXTRACT(DATE FROM TIMESTAMP_MICROS(prev_ts)) THEN 1
WHEN device_category != prev_device THEN 1
ELSE 0
END AS new_session_flag
FROM events_with_lag
),
-- Step 3: Assign session numbers
sessionized_events AS (
SELECT
*,
SUM(new_session_flag) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ROWS UNBOUNDED PRECEDING) + 1 AS session_number
FROM session_flags
)
-- Step 4: Aggregate sessions
SELECT
user_pseudo_id,
user_id,
session_number,
MIN(TIMESTAMP_MICROS(event_timestamp)) AS session_start_time,
MAX(TIMESTAMP_MICROS(event_timestamp)) AS session_end_time,
COUNT(*) AS events_in_session,
ARRAY_AGG(DISTINCT traffic_source) AS traffic_sources_in_session,
ARRAY_AGG(DISTINCT device_category) AS devices_in_session
FROM sessionized_events
GROUP BY user_pseudo_id, user_id, session_number
ORDER BY user_pseudo_id, session_start_time;
What this query does:
- Tracks inactivity >30 minutes
- Detects traffic source changes
- Handles midnight splits
- Optionally accounts for device category changes
- Produces a unique session ID per user-event sequence
Key Takeaways
- GA4 splits sessions for inactivity, source changes, and midnight cutoff.
- Simple
session_startcounts work for rough estimates. - Deep session stitching in BigQuery gives accurate session-level metrics for attribution, engagement, and conversion analysis.
- Adjust session stitching logic based on business requirements, including device changes or cross-device tracking.
TL;DR
- GA4 session counts often differ from BigQuery due to session-breaking rules.
- Simple SQL can count
session_startevents. - Deep-dive SQL in BigQuery lets you stitch sessions, accounting for inactivity, campaign changes, midnight splits, and devices.
- Accurate session metrics improve marketing, engagement, and funnel insights.
