Unlock the power of Google Analytics with Big Query

GA4 User Analysis in BigQuery: Active, New, Returning, and Cross-Device Metrics Explained

In this guide, we’ll walk you through SQL queries to calculate Active, New, and Returning Users, explore cross-device behavior, and share best practices for deduplication and segmentation. By the end, you’ll have a complete toolkit to analyze GA4 users in BigQuery with precision—perfect for building reports, dashboards, or feeding insights into your CRM or marketing tools.

Understanding GA4 User Identifiers in BigQuery

IdentifierDescriptionNotes / Use Case
user_pseudo_idAnonymous ID tied to device/browserDefault in GA4 exports; general user analysis
user_idUnique ID assigned to logged-in usersEnables cross-device tracking; links to CRM or internal systems
COALESCE(user_id, user_pseudo_id)Combines both for deduplicated analysisBest practice for Active, New, Returning user counts

Always deduplicate users before counting metrics to match GA4 logic more closely.


1. Counting Active Users

Active Users are users who had an engaged session or were new in the timeframe.

WITH daily_events AS (
  SELECT
    event_date,
    COALESCE(user_id, user_pseudo_id) AS unique_user_id
  FROM
    `project.dataset.events_*`
  WHERE
    event_name = 'user_engagement'
    AND event_date BETWEEN '2025-08-01' AND '2025-08-07'
)

SELECT
  event_date,
  COUNT(DISTINCT unique_user_id) AS active_users
FROM
  daily_events
GROUP BY
  event_date
ORDER BY
  event_date;

What’s happening in this SQL:

  1. CTE daily_events: Selects events of type user_engagement and replaces null user_id with user_pseudo_id to capture all users.
  2. COUNT(DISTINCT unique_user_id): Counts each user only once per day.
  3. GROUP BY event_date: Aggregates counts by day for daily reporting.

2. Counting New Users

WITH first_events AS (
  SELECT
    COALESCE(user_id, user_pseudo_id) AS unique_user_id,
    MIN(event_date) AS first_seen_date
  FROM
    `project.dataset.events_*`
  GROUP BY
    unique_user_id
)

SELECT
  first_seen_date,
  COUNT(unique_user_id) AS new_users
FROM
  first_events
WHERE
  first_seen_date BETWEEN '2025-08-01' AND '2025-08-07'
GROUP BY
  first_seen_date
ORDER BY
  first_seen_date;

What’s happening in this SQL:

  1. CTE first_events: Finds the earliest event date for each user, treating that as their first visit.
  2. COUNT(unique_user_id): Counts each user’s first session exactly once.
  3. WHERE first_seen_date BETWEEN …: Filters only those users who were new in the selected date range.

3. Counting Returning Users

WITH first_events AS (
  SELECT
    COALESCE(user_id, user_pseudo_id) AS unique_user_id,
    MIN(event_date) AS first_seen_date
  FROM
    `project.dataset.events_*`
  GROUP BY
    unique_user_id
)

SELECT
  event_date,
  COUNT(DISTINCT COALESCE(user_id, user_pseudo_id)) AS returning_users
FROM
  `project.dataset.events_*` e
JOIN
  first_events f
ON
  COALESCE(e.user_id, e.user_pseudo_id) = f.unique_user_id
WHERE
  e.event_date BETWEEN '2025-08-01' AND '2025-08-07'
  AND e.event_date > f.first_seen_date
GROUP BY
  event_date
ORDER BY
  event_date;

What’s happening in this SQL:

  1. CTE first_events: Calculates the first_seen_date for each user.
  2. JOIN events_* with first_events: Matches all events to their user’s first activity.
  3. WHERE e.event_date > f.first_seen_date: Ensures only returning users (not first-time users) are counted.
  4. COUNT(DISTINCT …): Deduplicates users per day.

4. Cross-Device Analysis

WITH user_devices AS (
  SELECT
    COALESCE(user_id, user_pseudo_id) AS unique_user_id,
    device.category AS device_type
  FROM
    `project.dataset.events_*`
  WHERE
    event_date BETWEEN '2025-08-01' AND '2025-08-07'
  GROUP BY
    unique_user_id, device_type
)

SELECT
  COUNT(DISTINCT unique_user_id) AS multi_device_users
FROM
  user_devices
GROUP BY
  unique_user_id
HAVING
  COUNT(DISTINCT device_type) > 1;

What’s happening in this SQL:

  1. CTE user_devices: Collects unique users and the type of device they used.
  2. COUNT(DISTINCT device_type) > 1: Filters users who used multiple device types.
  3. COUNT(DISTINCT unique_user_id): Totals multi-device users.

Best Practices

PracticeReason
Deduplicate with COALESCE(user_id, user_pseudo_id)Avoid double-counting across devices
Filter for user_engagement eventsMatches GA4 Active Users metric
Match GA4 date ranges and timezoneEnsures consistency with GA4 reports
Use BigQuery for custom segmentationFunnels, cohorts, and combining with CRM data
Document and modularize SQLReusable queries for daily, weekly, and monthly reporting

Key Takeaways

For conceptual understanding of GA4 user metrics and identity, see: Mastering GA4 Users: Metrics, Identity, and BigQuery Analysis.


TL;DR

Discover more from GA4 BigQuery

Subscribe now to keep reading and get access to the full archive.

Continue reading