BigQuery interview questions for data engineer roles at GCP- native companies (Google, Spotify, Snap, Etsy, Twitter partial). 50+ questions covering BigQuery internals (slot- based pricing, partitioning, clustering, materialized views, BI Engine), SQL dialect specifics (ARRAY_AGG and UNNEST, STRUCT manipulation, JSON functions), cost optimization, and system design with BigQuery as the warehouse. Pair with the data engineer interview prep guide and the the GCP data engineer interview guide.
From 78 reported GCP DE loops in 2024-2026.
| Topic | Test Frequency | Depth Expected |
|---|---|---|
| Partitioning (date, integer range) | 94% | Strategy choice and pruning behavior |
| Clustering (up to 4 columns) | 87% | Order matters; aligns with WHERE / JOIN patterns |
| Slot-based vs on-demand pricing | 78% | When each is right, cost projection |
| ARRAY_AGG and UNNEST | 73% | One-to-many flattening, nested aggregation |
| STRUCT and nested fields | 65% | Schema design with nested types vs flat tables |
| JSON functions | 58% | JSON_EXTRACT, JSON_VALUE, JSON_QUERY trade-offs |
| Materialized views | 52% | Auto-refresh, supported SQL subset, when to use |
| BI Engine for dashboards | 47% | Sub-second query layer for production Looker |
| APPROX_COUNT_DISTINCT (HLL++) | 44% | When approximate is acceptable |
| Window functions | 76% | QUALIFY clause, frame clauses, dialect-specific behavior |
| Time travel | 32% | AT (TIMESTAMP), AT (DATETIME) for historical queries |
| Authorized views | 38% | Cross-dataset access patterns with row-level security |
| Streaming inserts | 47% | Streaming buffer, exactly-once via insertId |
| Storage Write API | 36% | Newer, exactly-once batch writes |
BigQuery's pricing model is the central interview topic. Slot-based: pay per slot per second of dedicated compute capacity. Predictable cost. On-demand: pay per byte scanned (currently $5 per TB). Variable cost, potentially much higher or lower than slot-based depending on query volume and shape. Most production workloads use slots above some volume threshold; ad-hoc and small workloads use on-demand.
Slot capacity sizing is the senior signal. A 100-slot reservation can run roughly 100 medium queries concurrently. Most teams under-provision and run into queueing; some over-provision and waste money. The analytical answer is to look at peak slot usage in INFORMATION_SCHEMA.JOBS and size to handle 95th percentile load.
Partitioning and clustering interact with pricing directly. A partitioned + clustered table that prunes effectively scans 1-10% of total bytes; the same query without partition pruning scans 100% and costs 10-100x more. The interview prompt "why does this query cost $50 instead of $5" usually has the answer "because the WHERE clause didn't hit the partition column."
-- Bad: scans 10TB SELECT * FROM `project.dataset.events` WHERE event_type = 'purchase' AND user_id = '12345'; -- Good: scans ~100GB SELECT user_id, event_ts, amount_usd FROM `project.dataset.events` WHERE _PARTITIONDATE >= '2026-01-01' AND _PARTITIONDATE < '2026-02-01' AND event_type = 'purchase' AND user_id = '12345'; -- Required: table partitioned by _PARTITIONDATE -- and clustered by (event_type, user_id)
SELECT
user_id,
ARRAY_AGG(
STRUCT(event_type, event_ts, page_url)
ORDER BY event_ts
) AS event_sequence
FROM events
WHERE _PARTITIONDATE >= CURRENT_DATE() - 7
GROUP BY user_id;
-- Downstream UNNEST to flatten back to rows when needed:
SELECT user_id, e.event_type, e.event_ts
FROM user_events_with_sequence,
UNNEST(event_sequence) AS e
WHERE e.event_type = 'purchase';-- Approximate count distinct (~2% error, constant memory)
SELECT
date_trunc(event_ts, DAY) AS day,
APPROX_COUNT_DISTINCT(user_id) AS approx_dau
FROM events
WHERE _PARTITIONDATE >= '2026-01-01'
GROUP BY day;
-- Mergeable HLL sketches for rolling unique counts
WITH daily_sketches AS (
SELECT
date_trunc(event_ts, DAY) AS day,
HLL_COUNT.INIT(user_id) AS sketch
FROM events
WHERE _PARTITIONDATE >= '2026-01-01'
GROUP BY day
)
SELECT
day,
HLL_COUNT.MERGE(sketch) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7d_unique_users
FROM daily_sketches
ORDER BY day;-- Top 1 event per user
SELECT user_id, event_ts, event_type
FROM events
QUALIFY ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_ts DESC
) = 1;
-- Equivalent without QUALIFY (more verbose):
WITH ranked AS (
SELECT
user_id, event_ts, event_type,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_ts DESC
) AS rn
FROM events
)
SELECT user_id, event_ts, event_type
FROM ranked
WHERE rn = 1;BigQuery is one of three major cloud warehouses. The choice depends on cost model preference, operational tolerance, and feature requirements.
| Dimension | BigQuery | Snowflake | Redshift |
|---|---|---|---|
| Pricing model | Slot or on-demand per byte | Credit-based per warehouse second | Cluster-based per node hour |
| Compute-storage separation | Yes (always) | Yes | Yes (RA3) or No (DC2) |
| Concurrency model | Slot-based queueing | Auto-scaled warehouses | Concurrency scaling clusters |
| Native streaming inserts | Yes (Storage Write API) | Yes (Snowpipe) | Yes (streaming ingestion) |
| Time travel | 7 days default | 1 to 90 days configurable | No native, via snapshots |
| ML in warehouse | BigQuery ML (mature) | Snowpark ML | Redshift ML (limited) |
| Geo-replication | Multi-region (slow) | Replicated databases | Cross-region snapshots |
| Best fit | GCP shops, ad-hoc heavy | Multi-cloud, predictable load | AWS-native, sustained load |
BigQuery is the warehouse component in any the GCP data engineer interview guide stack. The broader the SQL interview questions hub hub covers SQL fluency at the dialect-portable level; this guide is the BigQuery-specific layer on top. For the round-level framework, see the SQL round prep guide.
For warehouse comparison, see Snowflake vs Databricks interview comparison (the most-tested decision in 2024-2026 lakehouse interviews) and the company guides for the Snowflake data engineer interview guide and the Databricks data engineer interview guide.
Run real BigQuery-flavored SQL problems against real schemas. Build the BigQuery-specific fluency that wins GCP data engineer loops.
Start BigQuery PracticeThe full SQL interview question bank, indexed by topic, difficulty, and company.
Redshift sort keys, dist keys, compression, and RA3 architecture interview prep.
Postgres MVCC, indexing, partitioning, and replication interview prep.
Apache Flink stateful streaming, watermarks, exactly-once, checkpointing interview prep.
Hadoop ecosystem (HDFS, MapReduce, YARN, Hive) interview prep, including modern relevance.
AWS Glue ETL jobs, crawlers, Data Catalog, and PySpark-on-Glue interview prep.
Continue your prep
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.