If you’ve worked with Google Analytics 4 and BigQuery long enough, you’ve probably run into one frustrating problem:
Your GA4 session numbers do not match your BigQuery session numbers.
You pull a report from the GA4 interface. Then you query BigQuery. And suddenly:
- Sessions are lower
- Attribution looks wrong
- Funnels break
- Users appear fragmented
- Marketing performance becomes unreliable
This happens because the GA4 interface and BigQuery are fundamentally different systems.
GA4 provides processed, modeled analytics. BigQuery provides raw event data. To get reliable analytics in BigQuery, you must reconstruct sessions yourself. This process is called:
Session Stitching
In this guide, you’ll learn:
- Why GA4 sessions break in BigQuery
- How GA4 actually creates sessions
- Why
ga_session_idalone is not enough - How to rebuild accurate sessions using SQL
- How to handle consent mode and identity fragmentation
- Best practices for production-grade session reconstruction
Why GA4 Sessions Break in BigQuery
The biggest misconception in GA4 analytics is this:
“Sessions already exist in BigQuery.”
They don’t.
BigQuery exports raw events. GA4 builds sessions during processing. That distinction changes everything.
In the GA4 UI:
- Google applies modeling
- Identity resolution occurs
- Attribution corrections happen
- Bot filtering runs
- Session logic is processed internally
In BigQuery:
- You only get raw events
- You must rebuild relationships manually
This is why session counts often differ between:
- GA4 reports
- Looker Studio dashboards
- BigQuery SQL queries
- Attribution models
Understanding How GA4 Defines a Session
A session in GA4 is not just:
“A group of events.”
It is a processed behavioral construct created using multiple rules.
GA4 session logic includes:
- 30-minute inactivity timeout
- Midnight session splitting
- Campaign/source changes
- Identity stitching
- Consent-mode behavior
- Device transitions
- Internal modeling logic
This means sessions are dynamic and reconstructed over time.
The Key Session Fields in BigQuery
When working with GA4 exports, the most important session-related fields are:
| Field | Purpose |
|---|---|
user_pseudo_id | Anonymous device/user identifier |
ga_session_id | Timestamp-based session identifier |
event_timestamp | Event ordering |
event_name | Event type |
user_id | Authenticated user identity |
traffic_source.* | Attribution source data |
At first glance, it seems like this should work:
COUNT(DISTINCT ga_session_id)
Unfortunately, this is usually incorrect.
Why ga_session_id Alone Is Not Reliable
The ga_session_id field has several limitations.
Problem #1: It Is Not Globally Unique
Multiple users can share the same session ID value.
This means:
COUNT(DISTINCT ga_session_id)
can undercount sessions.
Correct logic usually requires:
CONCAT(user_pseudo_id, ga_session_id)
OR
CONCAT(COALESCE(user_id, user_pseudo_id), ga_session_id)
Problem #2: Consent Mode Can Reset Identity
When users reject cookies or switch consent states:
- identifiers may disappear
- sessions fragment
- users appear “new”
- attribution resets
This creates broken journeys in BigQuery.
Problem #3: Session IDs Don’t Explain Behavioral Boundaries
GA4 sessions are influenced by:
- inactivity gaps
- campaign changes
- identity changes
- timestamp sequencing
A raw session ID alone does not fully capture these transitions.
What Is Session Stitching?
Session stitching is the process of rebuilding sessions using:
- event order
- inactivity rules
- identity logic
- behavioral boundaries
Instead of blindly trusting one field, we reconstruct sessions using analytics logic.
This creates:
- improved executive reporting
- cleaner attribution
- more accurate funnels
- better customer journey analysis
The Core Logic Behind Session Reconstruction
The basic workflow looks like this:
- Order events by user and timestamp
- Detect session breaks
- Assign session flags
- Generate reconstructed session IDs
- Aggregate events into sessions
Let’s walk through each step.
Step 1: Order Events by User
First, sequence all events chronologically.
WITH base_events AS (
SELECT
user_pseudo_id,
user_id,
event_timestamp,
event_name,
traffic_source.source AS source,
traffic_source.medium AS medium,
device.category AS device_category
FROM `project.dataset.events_*`
)
Step 2: Compare Events Against Previous Activity
Next, use window functions to compare each event against the previous event.
WITH ordered_events AS (
SELECT
*,
LAG(event_timestamp) OVER (
PARTITION BY user_pseudo_id
ORDER BY event_timestamp
) AS previous_timestamp,
LAG(source) OVER (
PARTITION BY user_pseudo_id
ORDER BY event_timestamp
) AS previous_source
FROM base_events
)
This helps identify behavioral breaks.
Step 3: Detect Session Boundaries
Now we define the rules that create a new session.
Typical conditions include:
- inactivity > 30 minutes
- source/medium changes
- date changes
- missing identifiers
WITH session_flags AS (
SELECT
*,
CASE
WHEN previous_timestamp IS NULL THEN 1
WHEN TIMESTAMP_DIFF(
TIMESTAMP_MICROS(event_timestamp),
TIMESTAMP_MICROS(previous_timestamp),
MINUTE
) > 30 THEN 1
WHEN source != previous_source THEN 1
WHEN DATE(TIMESTAMP_MICROS(event_timestamp))
!= DATE(TIMESTAMP_MICROS(previous_timestamp))
THEN 1
ELSE 0
END AS new_session_flag
FROM ordered_events
)
This identifies where sessions begin.
Step 4: Generate Reconstructed Session IDs
Now we create rolling session numbers.
WITH stitched_sessions AS (
SELECT
*,
SUM(new_session_flag) OVER (
PARTITION BY user_pseudo_id
ORDER BY event_timestamp
) AS reconstructed_session_id
FROM session_flags
)
Step 5: Aggregate Final Sessions
Finally, summarize events into usable session records.
SELECT
user_pseudo_id,
reconstructed_session_id,
MIN(TIMESTAMP_MICROS(event_timestamp)) AS session_start,
MAX(TIMESTAMP_MICROS(event_timestamp)) AS session_end,
COUNT(*) AS total_events,
COUNTIF(event_name = 'page_view') AS pageviews,
COUNTIF(event_name = 'purchase') AS purchases
FROM stitched_sessions
GROUP BY 1,2
You now have reconstructed sessions.
How Consent Mode Impacts Session Reconstruction
Consent mode dramatically affects BigQuery analytics.
When consent is denied:
- cookies may disappear
- session continuity breaks
- identifiers reset
- attribution becomes incomplete
This creates major reporting inconsistencies.
Common symptoms:
- inflated users
- lower returning visitor counts
- broken funnels
- inconsistent attribution paths
Session stitching becomes essential in consent-heavy environments.
Why GA4 UI and BigQuery Will Never Match Perfectly
Even after reconstruction, BigQuery numbers may still differ from the GA4 interface. That is normal.
GA4 applies additional processing that BigQuery does not expose, including:
- modeled conversions
- behavioral estimation
- spam filtering
- attribution corrections
- privacy thresholding
BigQuery is raw infrastructure data. GA4 UI is interpreted reporting data.
The goal is not exact parity. The goal is analytical consistency.
When You Should Use Session Stitching
Session stitching is critical when building:
- Marketing attribution models
- Multi-touch journey analysis
- Customer lifecycle analytics
- Funnel reconstruction
- LTV models
- Executive dashboards
- Product analytics pipelines
It is less important for:
- simple traffic trend reporting
- basic dashboarding
- lightweight exploratory analysis
Final Thoughts
GA4 BigQuery exports are powerful because they provide raw behavioral data.
But raw data requires reconstruction.
GA4 sessions are not fully exported objects.
They are interpreted behavioral constructs built during processing.
To create reliable analytics systems, you must rebuild sessions intentionally using:
- time logic
- identity logic
- attribution boundaries
- behavioral sequencing
That is the foundation of accurate analytics engineering in GA4.