Most GA4 users never touch BigQuery. They work entirely inside the GA4 interface — sampling kicking in after 500,000 sessions, data retention capped at 14 months, complex multi-step funnels impossible to build — and accept these as limitations of the platform.
They're not limitations of the platform. They're limitations of the UI. The BigQuery export removes every one of those constraints: every raw event row, no sampling, no retention cap, queryable with SQL. And since Google made it free for standard properties, there's no reason not to turn it on.
The thing that stops most people isn't access — it's the schema. GA4's BigQuery data is deeply nested and structured differently from what most analysts expect. This guide covers the setup, the schema, and the four queries your team will actually use day-to-day.
Why BigQuery changes what you can do with GA4 data
Before getting into setup, it's worth being concrete about what you're gaining. GA4's built-in reporting is useful for surface-level metrics, but it has hard limits built in by design.
Data Sampling
GA4 samples Exploration reports above 500,000 sessions. BigQuery queries your full dataset — every row, every time, unsampled.
14-Month Retention Cap
GA4's UI retains event data for 14 months maximum. BigQuery holds data as long as you keep the tables — year-on-year analysis becomes straightforward.
No Cross-Source Joins
GA4 can't join your analytics data with CRM records, ad spend, or backend order data. BigQuery can join any dataset you load alongside it.
Limited Custom Analysis
Multi-touch attribution, cohort analysis beyond 90 days, and custom funnel paths aren't possible in the GA4 interface. With SQL, they're routine.
Cost reality check: The export is free to enable. You only pay Google Cloud for query processing beyond 1 TiB per month and storage beyond 10 GiB — both free tiers. For most marketing sites, the first year of running a BigQuery pipeline costs under £10 total. Costs only appear at real scale or when queries lack date filters.
Setting up the export: step by step
You need a Google Cloud project with a billing account attached before you start. Billing must be enabled to activate BigQuery, even though you're unlikely to be charged on a standard property.
-
1
Create a Google Cloud project Go to console.cloud.google.com and create a new project. Name it something clear like
ga4-analytics-prod. Attach a billing account — you won't be charged unless you exceed the free tier, but it must be enabled. -
2
Enable the BigQuery API In your Cloud Console, navigate to APIs & Services → Library, search for "BigQuery API", and click Enable. Takes about 30 seconds.
-
3
Link GA4 to your Cloud project In GA4, go to Admin → Product Links → BigQuery Links → Link. Select your Cloud project, choose your export region (pick one close to your users —
eu-west1for Europe,us-central1for the US), and leave Daily export enabled. Streaming export is available but incurs additional cost — only enable it if you genuinely need real-time data. -
4
Wait for the first export GA4 exports once per day, typically completing between 4–8 AM in your selected region. The first table (
events_YYYYMMDD) will appear in BigQuery the following morning. You won't get historical backfill — data accumulates from the day you enable the link. -
5
Confirm the dataset structure Once the first export lands, open BigQuery and find your dataset. You'll see a daily table (
events_YYYYMMDD) and an intraday table (events_intraday_YYYYMMDD) that updates throughout the day and is replaced by the final daily table overnight. Click any table and select Schema before writing any queries.
Understanding the GA4 BigQuery schema
This is where most people get stuck. GA4's BigQuery export uses a nested, repeated schema — very different from a flat SQL table. Three things to understand before writing any query:
The event_params RECORD
Every GA4 event carries custom parameters. In BigQuery, these are stored as a repeated RECORD (an array of key-value pairs) called event_params. You can't access them like normal columns — you need to UNNEST the array or use a correlated subquery. The subquery approach is cleaner for extracting one or two parameters inline:
SQL — Extract custom event parameters inline
SELECT
event_name,
user_pseudo_id,
-- Pull specific params using a correlated subquery
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_title') AS page_title,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location') AS page_url
FROM
`your-project.analytics_XXXXXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260601' AND '20260610'
AND event_name = 'page_view'
LIMIT 1000;
Always use _TABLE_SUFFIX date filters. The events_* wildcard scans every daily partition unless you restrict by date. Without a _TABLE_SUFFIX filter, a simple query on a year of data can process gigabytes and incur charges. This is the single most important cost-control habit in BigQuery.
Key fields at a glance
| Field | Type | What it contains |
|---|---|---|
| event_name | STRING | The GA4 event name — page_view, purchase, scroll, or any custom event you've defined |
| event_timestamp | INTEGER | Event time in microseconds (UTC). Use TIMESTAMP_MICROS(event_timestamp) to convert to a readable datetime |
| user_pseudo_id | STRING | GA4's anonymous client ID — the closest equivalent to UA's Client ID. Use this for session stitching and user-level analysis |
| event_params | RECORD (REPEATED) | Array of key/value pairs for all event parameters. Values live in typed sub-fields: string_value, int_value, float_value, double_value |
| user_properties | RECORD (REPEATED) | Same structure as event_params but for user-scoped properties — membership tier, user type, etc. |
| collected_traffic_source | RECORD | Session-level source/medium/campaign. Use this (not traffic_source) for acquisition reporting — traffic_source reflects the user's very first acquisition, not the current session |
| ecommerce | RECORD | Purchase fields: transaction_id, purchase_revenue, tax, shipping. Individual line items are in the separate items RECORD |
Four queries your team will actually use
These are the four queries we build first for every new BigQuery pipeline. They cover the reporting use cases that GA4's UI handles badly or not at all.
1. Daily sessions and engagement rate
GA4 doesn't have a "sessions" metric in BigQuery — you build it from session_start events. This query gives you a clean daily sessions table with engagement rate, free from the sampling that affects GA4's Explorations at volume.
SQL — Daily sessions with engagement rate
SELECT
FORMAT_DATE('%Y-%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS date,
COUNT(DISTINCT
CONCAT(user_pseudo_id, '.',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)
) AS sessions,
COUNTIF(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = 1
) AS engaged_sessions,
ROUND(
SAFE_DIVIDE(
COUNTIF((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = 1),
COUNT(*)
) * 100, 1
) AS engagement_rate_pct
FROM `your-project.analytics_XXXXXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260501' AND '20260610'
AND event_name = 'session_start'
GROUP BY date
ORDER BY date DESC;
2. Purchase revenue by channel
This pulls transaction revenue from the ecommerce record and joins it against session-level traffic source — the kind of channel-level ROI reporting that isn't affected by GA4's 14-month retention limit.
SQL — Revenue by traffic channel
SELECT
collected_traffic_source.session_source AS source,
collected_traffic_source.session_medium AS medium,
collected_traffic_source.session_campaign AS campaign,
COUNT(DISTINCT ecommerce.transaction_id) AS transactions,
ROUND(SUM(ecommerce.purchase_revenue), 2) AS revenue
FROM `your-project.analytics_XXXXXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260501' AND '20260610'
AND event_name = 'purchase'
AND ecommerce.transaction_id IS NOT NULL
GROUP BY 1, 2, 3
ORDER BY revenue DESC;
3. Top landing pages by conversion rate
This requires two steps: identify the landing page for each session, then join to purchase events to calculate conversion rate per page. It's the kind of analysis that requires multiple Explorations in GA4 and is a single query in BigQuery.
SQL — Landing page conversion rate
WITH sessions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page
FROM `your-project.analytics_XXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260610'
AND event_name = 'session_start'
),
conversions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
FROM `your-project.analytics_XXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260610'
AND event_name = 'purchase'
)
SELECT
s.landing_page,
COUNT(*) AS sessions,
COUNT(c.session_id) AS converting_sessions,
ROUND(SAFE_DIVIDE(COUNT(c.session_id), COUNT(*)) * 100, 2) AS conversion_rate_pct
FROM sessions s
LEFT JOIN conversions c
ON s.user_pseudo_id = c.user_pseudo_id AND s.session_id = c.session_id
GROUP BY s.landing_page
HAVING sessions > 50
ORDER BY conversion_rate_pct DESC
LIMIT 20;
4. Weekly cohort retention
Identify users by their first-visit week and measure how many return in subsequent weeks. This kind of retention analysis doesn't exist in the GA4 interface — it's one of the most requested reports we build for clients.
SQL — Weekly cohort retention (4-week window)
WITH first_visits AS (
SELECT
user_pseudo_id,
DATE_TRUNC(MIN(PARSE_DATE('%Y%m%d', event_date)), WEEK) AS cohort_week
FROM `your-project.analytics_XXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260610'
AND event_name = 'session_start'
GROUP BY user_pseudo_id
),
all_sessions AS (
SELECT DISTINCT
user_pseudo_id,
DATE_TRUNC(PARSE_DATE('%Y%m%d', event_date), WEEK) AS session_week
FROM `your-project.analytics_XXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260610'
AND event_name = 'session_start'
)
SELECT
f.cohort_week,
DATE_DIFF(s.session_week, f.cohort_week, WEEK) AS weeks_since_first_visit,
COUNT(DISTINCT f.user_pseudo_id) AS retained_users
FROM first_visits f
JOIN all_sessions s USING (user_pseudo_id)
WHERE DATE_DIFF(s.session_week, f.cohort_week, WEEK) BETWEEN 0 AND 4
GROUP BY 1, 2
ORDER BY cohort_week, weeks_since_first_visit;
Keeping query costs near zero
BigQuery bills on data scanned, not rows returned. The biggest cost mistakes are easy to avoid once you know them:
- Always include _TABLE_SUFFIX filters. Querying
events_*without a date range scans your entire dataset history. A two-week range on a mid-sized site scans roughly 2–5 GB — well inside the free 1 TiB monthly limit. - Build summary tables, not live connections. Rather than connecting Looker Studio directly to raw event tables (which runs a complex query on every dashboard load), schedule a daily SQL job that writes aggregated results to a flat summary table. Connect dashboards to the summary table instead.
- Partition summary tables by date. When creating scheduled summary tables, partition by date so each daily run only processes the previous day's data rather than re-scanning the full table.
- Preview before running. BigQuery shows an estimated bytes processed in the top-right corner before you execute. If the estimate is unexpectedly high, you almost certainly have a missing date filter.
Practical benchmark: A well-structured GA4 BigQuery pipeline for a site with 500K monthly sessions typically costs under £10/month — almost entirely storage. With proper date filtering and summary tables, query costs sit inside the free tier for most setups indefinitely.
What to build next
Once the export is running and your first queries are working, the natural next step is a scheduled query that refreshes a summary table daily and feeds a Looker Studio dashboard. That gives your whole team access to unsampled, always-current data without anyone needing to write SQL.
From there, the two most valuable additions for most businesses are: joining GA4 event data with CRM records (matching user_pseudo_id to customer IDs via your own first-party data layer), and building a simple position-based attribution model that weights touchpoints across the full conversion path — something completely out of reach in the GA4 interface.
The BigQuery export is the foundation. Turn it on early. Every analysis you build from here is more reliable, more flexible, and not dependent on what Google decides to keep in the GA4 UI next year.
Need a BigQuery pipeline built for your GA4 property?
We design and build GA4 → BigQuery → Looker Studio pipelines — clean schema, scheduled summary tables, and dashboards your whole team will actually use. Book a free 30-minute audit and we'll show you exactly what's possible with your data.