How to Set Up the GA4 BigQuery Export (And Actually Use It)

Most GA4 users never touch BigQuery. They work entirely inside the GA4 interface — sampling kicking in after 500,000 sessions, data retention capped at 14 months, complex multi-step funnels impossible to build — and accept these as limitations of the platform.

They're not limitations of the platform. They're limitations of the UI. The BigQuery export removes every one of those constraints: every raw event row, no sampling, no retention cap, queryable with SQL. And since Google made it free for standard properties, there's no reason not to turn it on.

The thing that stops most people isn't access — it's the schema. GA4's BigQuery data is deeply nested and structured differently from what most analysts expect. This guide covers the setup, the schema, and the four queries your team will actually use day-to-day.

Why BigQuery changes what you can do with GA4 data

Before getting into setup, it's worth being concrete about what you're gaining. GA4's built-in reporting is useful for surface-level metrics, but it has hard limits built in by design.

Limitation 01

Data Sampling

GA4 samples Exploration reports above 500,000 sessions. BigQuery queries your full dataset — every row, every time, unsampled.

Limitation 02

14-Month Retention Cap

GA4's UI retains event data for 14 months maximum. BigQuery holds data as long as you keep the tables — year-on-year analysis becomes straightforward.

Limitation 03

No Cross-Source Joins

GA4 can't join your analytics data with CRM records, ad spend, or backend order data. BigQuery can join any dataset you load alongside it.

Limitation 04

Limited Custom Analysis

Multi-touch attribution, cohort analysis beyond 90 days, and custom funnel paths aren't possible in the GA4 interface. With SQL, they're routine.

Cost reality check: The export is free to enable. You only pay Google Cloud for query processing beyond 1 TiB per month and storage beyond 10 GiB — both free tiers. For most marketing sites, the first year of running a BigQuery pipeline costs under £10 total. Costs only appear at real scale or when queries lack date filters.

Setting up the export: step by step

You need a Google Cloud project with a billing account attached before you start. Billing must be enabled to activate BigQuery, even though you're unlikely to be charged on a standard property.

1
Create a Google Cloud project Go to console.cloud.google.com and create a new project. Name it something clear like ga4-analytics-prod. Attach a billing account — you won't be charged unless you exceed the free tier, but it must be enabled.
2
Enable the BigQuery API In your Cloud Console, navigate to APIs & Services → Library, search for "BigQuery API", and click Enable. Takes about 30 seconds.
3
Link GA4 to your Cloud project In GA4, go to Admin → Product Links → BigQuery Links → Link. Select your Cloud project, choose your export region (pick one close to your users — eu-west1 for Europe, us-central1 for the US), and leave Daily export enabled. Streaming export is available but incurs additional cost — only enable it if you genuinely need real-time data.
4
Wait for the first export GA4 exports once per day, typically completing between 4–8 AM in your selected region. The first table (events_YYYYMMDD) will appear in BigQuery the following morning. You won't get historical backfill — data accumulates from the day you enable the link.
5
Confirm the dataset structure Once the first export lands, open BigQuery and find your dataset. You'll see a daily table (events_YYYYMMDD) and an intraday table (events_intraday_YYYYMMDD) that updates throughout the day and is replaced by the final daily table overnight. Click any table and select Schema before writing any queries.

Understanding the GA4 BigQuery schema

This is where most people get stuck. GA4's BigQuery export uses a nested, repeated schema — very different from a flat SQL table. Three things to understand before writing any query:

The event_params RECORD

Every GA4 event carries custom parameters. In BigQuery, these are stored as a repeated RECORD (an array of key-value pairs) called event_params. You can't access them like normal columns — you need to UNNEST the array or use a correlated subquery. The subquery approach is cleaner for extracting one or two parameters inline:

SQL — Extract custom event parameters inline

SELECT
  event_name,
  user_pseudo_id,
  -- Pull specific params using a correlated subquery
  (SELECT value.string_value
   FROM UNNEST(event_params)
   WHERE key = 'page_title') AS page_title,

  (SELECT value.string_value
   FROM UNNEST(event_params)
   WHERE key = 'page_location') AS page_url

FROM
  `your-project.analytics_XXXXXXXXX.events_*`

WHERE
  _TABLE_SUFFIX BETWEEN '20260601' AND '20260610'
  AND event_name = 'page_view'
LIMIT 1000;

Always use _TABLE_SUFFIX date filters. The events_* wildcard scans every daily partition unless you restrict by date. Without a _TABLE_SUFFIX filter, a simple query on a year of data can process gigabytes and incur charges. This is the single most important cost-control habit in BigQuery.

Key fields at a glance

Field	Type	What it contains
event_name	STRING	The GA4 event name — `page_view`, `purchase`, `scroll`, or any custom event you've defined
event_timestamp	INTEGER	Event time in microseconds (UTC). Use `TIMESTAMP_MICROS(event_timestamp)` to convert to a readable datetime
user_pseudo_id	STRING	GA4's anonymous client ID — the closest equivalent to UA's Client ID. Use this for session stitching and user-level analysis
event_params	RECORD (REPEATED)	Array of key/value pairs for all event parameters. Values live in typed sub-fields: `string_value`, `int_value`, `float_value`, `double_value`
user_properties	RECORD (REPEATED)	Same structure as `event_params` but for user-scoped properties — membership tier, user type, etc.
collected_traffic_source	RECORD	Session-level source/medium/campaign. Use this (not `traffic_source`) for acquisition reporting — `traffic_source` reflects the user's very first acquisition, not the current session
ecommerce	RECORD	Purchase fields: `transaction_id`, `purchase_revenue`, `tax`, `shipping`. Individual line items are in the separate `items` RECORD

Four queries your team will actually use

These are the four queries we build first for every new BigQuery pipeline. They cover the reporting use cases that GA4's UI handles badly or not at all.

1. Daily sessions and engagement rate

GA4 doesn't have a "sessions" metric in BigQuery — you build it from session_start events. This query gives you a clean daily sessions table with engagement rate, free from the sampling that affects GA4's Explorations at volume.

SQL — Daily sessions with engagement rate

SELECT
  FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS date,

  COUNT(DISTINCT
    CONCAT(user_pseudo_id, '.',
      (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
    )
  ) AS sessions,

  COUNTIF(
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = 1
  ) AS engaged_sessions,

  ROUND(
    SAFE_DIVIDE(
      COUNTIF((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = 1),
      COUNT(*)
    ) * 100, 1
  ) AS engagement_rate_pct

FROM `your-project.analytics_XXXXXXXXX.events_*`

WHERE
  _TABLE_SUFFIX BETWEEN '20260501' AND '20260610'
  AND event_name = 'session_start'

GROUP BY date
ORDER BY date DESC;

2. Purchase revenue by channel

This pulls transaction revenue from the ecommerce record and joins it against session-level traffic source — the kind of channel-level ROI reporting that isn't affected by GA4's 14-month retention limit.

SQL — Revenue by traffic channel

SELECT
  collected_traffic_source.session_source   AS source,
  collected_traffic_source.session_medium   AS medium,
  collected_traffic_source.session_campaign AS campaign,
  COUNT(DISTINCT ecommerce.transaction_id)  AS transactions,
  ROUND(SUM(ecommerce.purchase_revenue), 2)  AS revenue

FROM `your-project.analytics_XXXXXXXXX.events_*`

WHERE
  _TABLE_SUFFIX BETWEEN '20260501' AND '20260610'
  AND event_name = 'purchase'
  AND ecommerce.transaction_id IS NOT NULL

GROUP BY 1, 2, 3
ORDER BY revenue DESC;

3. Top landing pages by conversion rate

This requires two steps: identify the landing page for each session, then join to purchase events to calculate conversion rate per page. It's the kind of analysis that requires multiple Explorations in GA4 and is a single query in BigQuery.

SQL — Landing page conversion rate

WITH sessions AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value   FROM UNNEST(event_params) WHERE key = 'ga_session_id')   AS session_id,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page
  FROM `your-project.analytics_XXXXXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260610'
    AND event_name = 'session_start'
),

conversions AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
  FROM `your-project.analytics_XXXXXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260610'
    AND event_name = 'purchase'
)

SELECT
  s.landing_page,
  COUNT(*) AS sessions,
  COUNT(c.session_id) AS converting_sessions,
  ROUND(SAFE_DIVIDE(COUNT(c.session_id), COUNT(*)) * 100, 2) AS conversion_rate_pct

FROM sessions s
LEFT JOIN conversions c
  ON s.user_pseudo_id = c.user_pseudo_id AND s.session_id = c.session_id

GROUP BY s.landing_page
HAVING sessions > 50
ORDER BY conversion_rate_pct DESC
LIMIT 20;

4. Weekly cohort retention

Identify users by their first-visit week and measure how many return in subsequent weeks. This kind of retention analysis doesn't exist in the GA4 interface — it's one of the most requested reports we build for clients.

SQL — Weekly cohort retention (4-week window)

WITH first_visits AS (
  SELECT
    user_pseudo_id,
    DATE_TRUNC(MIN(PARSE_DATE('%Y%m%d', event_date)), WEEK) AS cohort_week
  FROM `your-project.analytics_XXXXXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260610'
    AND event_name = 'session_start'
  GROUP BY user_pseudo_id
),

all_sessions AS (
  SELECT DISTINCT
    user_pseudo_id,
    DATE_TRUNC(PARSE_DATE('%Y%m%d', event_date), WEEK) AS session_week
  FROM `your-project.analytics_XXXXXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260610'
    AND event_name = 'session_start'
)

SELECT
  f.cohort_week,
  DATE_DIFF(s.session_week, f.cohort_week, WEEK) AS weeks_since_first_visit,
  COUNT(DISTINCT f.user_pseudo_id) AS retained_users

FROM first_visits f
JOIN all_sessions s USING (user_pseudo_id)

WHERE DATE_DIFF(s.session_week, f.cohort_week, WEEK) BETWEEN 0 AND 4

GROUP BY 1, 2
ORDER BY cohort_week, weeks_since_first_visit;

Keeping query costs near zero

BigQuery bills on data scanned, not rows returned. The biggest cost mistakes are easy to avoid once you know them:

Always include _TABLE_SUFFIX filters. Querying events_* without a date range scans your entire dataset history. A two-week range on a mid-sized site scans roughly 2–5 GB — well inside the free 1 TiB monthly limit.
Build summary tables, not live connections. Rather than connecting Looker Studio directly to raw event tables (which runs a complex query on every dashboard load), schedule a daily SQL job that writes aggregated results to a flat summary table. Connect dashboards to the summary table instead.
Partition summary tables by date. When creating scheduled summary tables, partition by date so each daily run only processes the previous day's data rather than re-scanning the full table.
Preview before running. BigQuery shows an estimated bytes processed in the top-right corner before you execute. If the estimate is unexpectedly high, you almost certainly have a missing date filter.

Practical benchmark: A well-structured GA4 BigQuery pipeline for a site with 500K monthly sessions typically costs under £10/month — almost entirely storage. With proper date filtering and summary tables, query costs sit inside the free tier for most setups indefinitely.

What to build next

Once the export is running and your first queries are working, the natural next step is a scheduled query that refreshes a summary table daily and feeds a Looker Studio dashboard. That gives your whole team access to unsampled, always-current data without anyone needing to write SQL.

From there, the two most valuable additions for most businesses are: joining GA4 event data with CRM records (matching user_pseudo_id to customer IDs via your own first-party data layer), and building a simple position-based attribution model that weights touchpoints across the full conversion path — something completely out of reach in the GA4 interface.

The BigQuery export is the foundation. Turn it on early. Every analysis you build from here is more reliable, more flexible, and not dependent on what Google decides to keep in the GA4 UI next year.

Need a BigQuery pipeline built for your GA4 property?

We design and build GA4 → BigQuery → Looker Studio pipelines — clean schema, scheduled summary tables, and dashboards your whole team will actually use. Book a free 30-minute audit and we'll show you exactly what's possible with your data.

Why BigQuery changes what you can do with GA4 data

Data Sampling

14-Month Retention Cap

No Cross-Source Joins

Limited Custom Analysis

Setting up the export: step by step

Understanding the GA4 BigQuery schema

The event_params RECORD

Key fields at a glance

Four queries your team will actually use

1. Daily sessions and engagement rate

2. Purchase revenue by channel

3. Top landing pages by conversion rate

4. Weekly cohort retention

Keeping query costs near zero

What to build next

Need a BigQuery pipeline built for your GA4 property?

How to Fix Duplicate Conversions in GA4

Why Your GA4 Traffic Sources Look Wrong (and How to Fix Them)