How to Reconstruct Accurate Sessions in BigQuery: Session Stitching in GA4

If you’ve worked with Google Analytics 4 and BigQuery long enough, you’ve probably run into one frustrating problem:

Your GA4 session numbers do not match your BigQuery session numbers.

You pull a report from the GA4 interface. Then you query BigQuery. And suddenly:

  • Sessions are lower
  • Attribution looks wrong
  • Funnels break
  • Users appear fragmented
  • Marketing performance becomes unreliable

This happens because the GA4 interface and BigQuery are fundamentally different systems.

GA4 provides processed, modeled analytics. BigQuery provides raw event data. To get reliable analytics in BigQuery, you must reconstruct sessions yourself. This process is called:

Session Stitching

In this guide, you’ll learn:

  • Why GA4 sessions break in BigQuery
  • How GA4 actually creates sessions
  • Why ga_session_id alone is not enough
  • How to rebuild accurate sessions using SQL
  • How to handle consent mode and identity fragmentation
  • Best practices for production-grade session reconstruction

Why GA4 Sessions Break in BigQuery

The biggest misconception in GA4 analytics is this:

“Sessions already exist in BigQuery.”

They don’t.

BigQuery exports raw events. GA4 builds sessions during processing. That distinction changes everything.

In the GA4 UI:

  • Google applies modeling
  • Identity resolution occurs
  • Attribution corrections happen
  • Bot filtering runs
  • Session logic is processed internally

In BigQuery:

  • You only get raw events
  • You must rebuild relationships manually

This is why session counts often differ between:

  • GA4 reports
  • Looker Studio dashboards
  • BigQuery SQL queries
  • Attribution models

Understanding How GA4 Defines a Session

A session in GA4 is not just:

“A group of events.”

It is a processed behavioral construct created using multiple rules.

GA4 session logic includes:

  • 30-minute inactivity timeout
  • Midnight session splitting
  • Campaign/source changes
  • Identity stitching
  • Consent-mode behavior
  • Device transitions
  • Internal modeling logic

This means sessions are dynamic and reconstructed over time.

The Key Session Fields in BigQuery

When working with GA4 exports, the most important session-related fields are:

FieldPurpose
user_pseudo_idAnonymous device/user identifier
ga_session_idTimestamp-based session identifier
event_timestampEvent ordering
event_nameEvent type
user_idAuthenticated user identity
traffic_source.*Attribution source data

At first glance, it seems like this should work:

COUNT(DISTINCT ga_session_id)

Unfortunately, this is usually incorrect.

Why ga_session_id Alone Is Not Reliable

The ga_session_id field has several limitations.

Problem #1: It Is Not Globally Unique

Multiple users can share the same session ID value.

This means:

COUNT(DISTINCT ga_session_id)

can undercount sessions.

Correct logic usually requires:

CONCAT(user_pseudo_id, ga_session_id)

OR 

CONCAT(COALESCE(user_id, user_pseudo_id), ga_session_id)

Problem #2: Consent Mode Can Reset Identity

When users reject cookies or switch consent states:

  • identifiers may disappear
  • sessions fragment
  • users appear “new”
  • attribution resets

This creates broken journeys in BigQuery.

Problem #3: Session IDs Don’t Explain Behavioral Boundaries

GA4 sessions are influenced by:

  • inactivity gaps
  • campaign changes
  • identity changes
  • timestamp sequencing

A raw session ID alone does not fully capture these transitions.

What Is Session Stitching?

Session stitching is the process of rebuilding sessions using:

  • event order
  • inactivity rules
  • identity logic
  • behavioral boundaries

Instead of blindly trusting one field, we reconstruct sessions using analytics logic.

This creates:

  • improved executive reporting
  • cleaner attribution
  • more accurate funnels
  • better customer journey analysis

The Core Logic Behind Session Reconstruction

The basic workflow looks like this:

  1. Order events by user and timestamp
  2. Detect session breaks
  3. Assign session flags
  4. Generate reconstructed session IDs
  5. Aggregate events into sessions

Let’s walk through each step.

Step 1: Order Events by User

First, sequence all events chronologically.

WITH base_events AS (

  SELECT
    user_pseudo_id,
    user_id,
    event_timestamp,
    event_name,

    traffic_source.source AS source,
    traffic_source.medium AS medium,

    device.category AS device_category

  FROM `project.dataset.events_*`

)

Step 2: Compare Events Against Previous Activity

Next, use window functions to compare each event against the previous event.

WITH ordered_events AS (

  SELECT
    *,

    LAG(event_timestamp) OVER (
      PARTITION BY user_pseudo_id
      ORDER BY event_timestamp
    ) AS previous_timestamp,

    LAG(source) OVER (
      PARTITION BY user_pseudo_id
      ORDER BY event_timestamp
    ) AS previous_source

  FROM base_events

)

This helps identify behavioral breaks.

Step 3: Detect Session Boundaries

Now we define the rules that create a new session.

Typical conditions include:

  • inactivity > 30 minutes
  • source/medium changes
  • date changes
  • missing identifiers
WITH session_flags AS (

  SELECT
    *,

    CASE

      WHEN previous_timestamp IS NULL THEN 1

      WHEN TIMESTAMP_DIFF(
        TIMESTAMP_MICROS(event_timestamp),
        TIMESTAMP_MICROS(previous_timestamp),
        MINUTE
      ) > 30 THEN 1

      WHEN source != previous_source THEN 1

      WHEN DATE(TIMESTAMP_MICROS(event_timestamp))
           != DATE(TIMESTAMP_MICROS(previous_timestamp))
      THEN 1

      ELSE 0

    END AS new_session_flag

  FROM ordered_events

)

This identifies where sessions begin.

Step 4: Generate Reconstructed Session IDs

Now we create rolling session numbers.

WITH stitched_sessions AS (

  SELECT
    *,

    SUM(new_session_flag) OVER (
      PARTITION BY user_pseudo_id
      ORDER BY event_timestamp
    ) AS reconstructed_session_id

  FROM session_flags

)

Step 5: Aggregate Final Sessions

Finally, summarize events into usable session records.

SELECT

  user_pseudo_id,
  reconstructed_session_id,

  MIN(TIMESTAMP_MICROS(event_timestamp)) AS session_start,
  MAX(TIMESTAMP_MICROS(event_timestamp)) AS session_end,

  COUNT(*) AS total_events,

  COUNTIF(event_name = 'page_view') AS pageviews,

  COUNTIF(event_name = 'purchase') AS purchases

FROM stitched_sessions

GROUP BY 1,2

You now have reconstructed sessions.

How Consent Mode Impacts Session Reconstruction

Consent mode dramatically affects BigQuery analytics.

When consent is denied:

  • cookies may disappear
  • session continuity breaks
  • identifiers reset
  • attribution becomes incomplete

This creates major reporting inconsistencies.

Common symptoms:

  • inflated users
  • lower returning visitor counts
  • broken funnels
  • inconsistent attribution paths

Session stitching becomes essential in consent-heavy environments.

Why GA4 UI and BigQuery Will Never Match Perfectly

Even after reconstruction, BigQuery numbers may still differ from the GA4 interface. That is normal.

GA4 applies additional processing that BigQuery does not expose, including:

  • modeled conversions
  • behavioral estimation
  • spam filtering
  • attribution corrections
  • privacy thresholding

BigQuery is raw infrastructure data. GA4 UI is interpreted reporting data.

The goal is not exact parity. The goal is analytical consistency.

When You Should Use Session Stitching

Session stitching is critical when building:

  • Marketing attribution models
  • Multi-touch journey analysis
  • Customer lifecycle analytics
  • Funnel reconstruction
  • LTV models
  • Executive dashboards
  • Product analytics pipelines

It is less important for:

  • simple traffic trend reporting
  • basic dashboarding
  • lightweight exploratory analysis

Final Thoughts

GA4 BigQuery exports are powerful because they provide raw behavioral data.

But raw data requires reconstruction.

GA4 sessions are not fully exported objects.
They are interpreted behavioral constructs built during processing.

To create reliable analytics systems, you must rebuild sessions intentionally using:

  • time logic
  • identity logic
  • attribution boundaries
  • behavioral sequencing

That is the foundation of accurate analytics engineering in GA4.

Discover more from GA4 BIGQUERY ANALYTICS

Subscribe to get the latest posts sent to your email.

Discover more from GA4 BIGQUERY ANALYTICS

Subscribe now to keep reading and get access to the full archive.

Continue reading