Unlock the power of Google Analytics with Big Query

Accurate Marketing Attribution with GA4 Users in BigQuery

Marketing teams constantly ask: “Which channel truly drives conversions?”

GA4 provides user-level metrics and event data, but discrepancies between the GA4 interface and BigQuery exports can make attribution tricky. For example, Active Users or New Users may not match when you try to analyze traffic by campaign, source, or medium in BigQuery.

This guide shows how to solve these discrepancies and use GA4 user data in BigQuery to perform accurate marketing attribution, so you can confidently measure channel performance.


The Problem

MetricGA4 UIBigQuery ExportWhy It Differs
Active UsersCounts engaged users per channelCounted via user_id / user_pseudo_id in eventsGA4 applies session deduplication, blended identity, and filters; BigQuery raw data requires explicit SQL for these
New UsersFirst-time users per channelCalculated from earliest event per userGA4 UI uses first_visit/first_open events with session stitching; raw SQL needs custom logic
Campaign AttributionAuto-assigned in GA4 reportsMust join campaign parameters in BigQueryBigQuery data is raw and requires mapping UTM parameters and deduplication per user

Key Insight: To get accurate marketing attribution, BigQuery queries must mimic GA4’s logic for deduplication, session identification, and engagement.


The Solution

Follow these steps to align BigQuery analysis with GA4 for marketing attribution:

1. Deduplicate Users Across Devices

COALESCE(user_id, user_pseudo_id) AS unique_user_id


2. Filter for Engaged Users

WHERE event_name = 'user_engagement'


3. Assign Users to Marketing Channels

SELECT
  COALESCE(user_id, user_pseudo_id) AS unique_user_id,
  traffic_source.source AS source,
  traffic_source.medium AS medium,
  COUNT(DISTINCT event_date) AS active_days
FROM
  `project.dataset.events_*`
WHERE
  event_name = 'user_engagement'
GROUP BY
  unique_user_id, source, medium;


4. Calculate Active and New Users per Channel

WITH first_events AS (
  SELECT
    COALESCE(user_id, user_pseudo_id) AS unique_user_id,
    MIN(event_date) AS first_seen_date,
    traffic_source.source,
    traffic_source.medium
  FROM
    `project.dataset.events_*`
  GROUP BY unique_user_id, source, medium
)

SELECT
  source,
  medium,
  COUNT(unique_user_id) AS new_users
FROM
  first_events
WHERE first_seen_date BETWEEN '2025-08-01' AND '2025-08-07'
GROUP BY source, medium
ORDER BY new_users DESC;


5. Cross-Device & Multi-Session Attribution

Example:

SELECT
  COALESCE(user_id, user_pseudo_id) AS unique_user_id,
  MIN(event_timestamp) AS first_touch,
  MAX(event_timestamp) AS last_touch,
  traffic_source.source AS source
FROM
  `project.dataset.events_*`
GROUP BY unique_user_id, source;


Best Practices

PracticeReason
Deduplicate using COALESCE(user_id, user_pseudo_id)Avoid inflating user counts per channel
Filter for user_engagementCount only engaged users, aligning with GA4 Active Users
Align date ranges and timezoneEnsures metrics match GA4 interface reports
Use traffic_source for channel attributionMaps UTM parameters and GA4 acquisition data
Implement session stitching when neededEnsures accurate cross-device and multi-session attribution

Key Takeaways

For more on GA4 user metrics and BigQuery SQL, see: Mastering GA4 Users: Metrics, Identity, and BigQuery Analysis.

Discover more from GA4 BigQuery

Subscribe now to keep reading and get access to the full archive.

Continue reading