If you’ve ever compared GA4 user metrics in the interface with BigQuery exports, you might have noticed discrepancies. For example, Active Users, New Users, or Total Users in the GA4 UI often don’t match what you calculate from the raw BigQuery data.
Understanding why this happens is crucial for accurate reporting, marketing attribution, and data-driven decision-making. This guide explains the problem and offers detailed solutions to align your BigQuery metrics with GA4 reporting logic.
The Problem: Discrepancies Between GA4 UI and BigQuery
| Metric | GA4 Interface | BigQuery Export | Why It Differs |
|---|---|---|---|
| Active Users | Counts users with engaged sessions | Counted via user_pseudo_id or user_id in events | GA4 filters by engagement events, session timeout, and deduplicates using blended identity |
| New Users | First-time user in a property | Calculated from earliest event per user | GA4 uses first_visit/first_open with UI-level session stitching; BigQuery needs custom SQL |
| Returning Users | Users with previous sessions | Users with prior events in BigQuery | GA4 UI automatically excludes first sessions; raw data requires joins and filters |
| Total Users | Deduplicated by GA4 identity rules | COUNT(DISTINCT user_pseudo_id/user_id) | BigQuery counts may overcount if identity stitching not applied |
Key Insight: GA4 UI metrics apply deduplication, identity prioritization, filters, and session logic automatically, while BigQuery exports are raw event-level data.
The Solution: Aligning BigQuery Metrics with GA4
To ensure your BigQuery user counts match GA4, follow these detailed steps:
1. Deduplicate Users Correctly
- Use
COALESCE(user_id, user_pseudo_id)to combine logged-in and anonymous users. - Always use
COUNT(DISTINCT …)when counting users to avoid double-counting across devices. - If your property has a login system, prioritize
user_idfor cross-device analysis.
Example:
COUNT(DISTINCT COALESCE(user_id, user_pseudo_id)) AS active_users
This ensures every unique person is counted only once, even if they visit multiple times or from multiple devices.
2. Filter for Engaged Sessions
- GA4 counts only users who triggered engagement events as Active Users.
- Use
event_name = 'user_engagement'in your queries. - You can further filter by engagement time thresholds if your GA4 property has custom definitions.
Example:
WHERE event_name = 'user_engagement'
This mimics GA4 UI logic for Active Users, avoiding inflated counts from short or non-interactive visits.
3. Match Date Ranges and Timezones
- GA4 UI applies the property timezone, while BigQuery exports are in UTC.
- Use SQL functions like
DATE(TIMESTAMP_SECONDS(event_timestamp), "America/New_York")to align dates. - Ensuring consistent date ranges prevents off-by-one discrepancies in daily, weekly, or monthly reporting.
4. Use GA4 Identity Spaces
- GA4 tracks users across three identity spaces: Blended, Observed, Device-only.
- To replicate Blended in BigQuery, combine
user_idanduser_pseudo_idusingCOALESCE. - For device-only analysis, use
user_pseudo_idexclusively. - This step is crucial for aligning Total, Active, and Returning Users with GA4 UI metrics.
5. Implement Session Stitching
- GA4 automatically stitches sessions for users with
user_idacross devices. - In BigQuery, create session-level tables using
session_idorga_session_id. - Deduplicate users per session and combine sessions for cross-device reporting.
Example:
SELECT
COUNT(DISTINCT COALESCE(user_id, user_pseudo_id)) AS active_users
FROM
`project.dataset.events_*`
WHERE
event_name = 'user_engagement'
GROUP BY session_id
This ensures Active Users match GA4’s cross-device session logic.
6. Optional: Adjust for Sampling & Data Thresholds
- GA4 UI applies data thresholds for privacy when reporting on small user counts.
- BigQuery exports raw data without thresholds, so counts may appear higher.
- If needed, apply your own threshold rules in SQL to align with GA4 privacy constraints.
Quick SQL Example: Counting Active Users
SELECT
event_date,
COUNT(DISTINCT COALESCE(user_id, user_pseudo_id)) AS active_users
FROM
`project.dataset.events_*`
WHERE
event_name = 'user_engagement'
AND event_date BETWEEN '2025-08-01' AND '2025-08-07'
GROUP BY
event_date
ORDER BY
event_date;
What’s happening in this SQL:
- Deduplicates logged-in and anonymous users with
COALESCE. - Filters only
user_engagementevents to mimic GA4 Active Users. - Counts unique users per day for daily Active Users.
Key Takeaways
- GA4 UI and BigQuery counts differ due to deduplication, session logic, identity prioritization, and timezone differences.
- To match GA4 metrics in BigQuery:
- Deduplicate users with
COALESCE(user_id, user_pseudo_id) - Filter for engagement events (
user_engagement) - Align date ranges and timezones
- Respect GA4 identity spaces (Blended, Observed, Device)
- Implement session stitching when necessary
- Optionally apply thresholds for privacy alignment
- Deduplicate users with
- Following these steps ensures accurate reporting, cross-device analysis, and reliable marketing insights.
For a deeper dive into GA4 user metrics and BigQuery SQL, see: Mastering GA4 Users: Metrics, Identity, and BigQuery Analysis.
