BigQuery Interview Questions
BigQuery Topic Frequency in Interviews
From 78 reported GCP DE loops in 2024-2026.
| Topic | Test Frequency | Depth Expected |
|---|---|---|
| Partitioning (date, integer range) | 94% | Strategy choice and pruning behavior |
| Clustering (up to 4 columns) | 87% | Order matters; aligns with WHERE / JOIN patterns |
| Slot-based vs on-demand pricing | 78% | When each is right, cost projection |
| ARRAY_AGG and UNNEST | 73% | One-to-many flattening, nested aggregation |
| STRUCT and nested fields | 65% | Schema design with nested types vs flat tables |
| JSON functions | 58% | JSON_EXTRACT, JSON_VALUE, JSON_QUERY trade-offs |
| Materialized views | 52% | Auto-refresh, supported SQL subset, when to use |
| BI Engine for dashboards | 47% | Sub-second query layer for production Looker |
| APPROX_COUNT_DISTINCT (HLL++) | 44% | When approximate is acceptable |
| Window functions | 76% | QUALIFY clause, frame clauses, dialect-specific behavior |
| Time travel | 32% | AT (TIMESTAMP), AT (DATETIME) for historical queries |
| Authorized views | 38% | Cross-dataset access patterns with row-level security |
| Streaming inserts | 47% | Streaming buffer, exactly-once via insertId |
| Storage Write API | 36% | Newer, exactly-once batch writes |
BigQuery Internals: Pricing and Performance
BigQuery's pricing model is the central interview topic. Slot-based: pay per slot per second of dedicated compute capacity. Predictable cost. On-demand: pay per byte scanned (currently $5 per TB). Variable cost, potentially much higher or lower than slot-based depending on query volume and shape. Most production workloads use slots above some volume threshold; ad-hoc and small workloads use on-demand.
Slot capacity sizing is the senior signal. A 100-slot reservation can run roughly 100 medium queries concurrently. Most teams under-provision and run into queueing; some over-provision and waste money. The analytical answer is to look at peak slot usage in INFORMATION_SCHEMA.JOBS and size to handle 95th percentile load.
Partitioning and clustering interact with pricing directly. A partitioned + clustered table that prunes effectively scans 1-10% of total bytes; the same query without partition pruning scans 100% and costs 10-100x more. The interview prompt "why does this query cost $50 instead of $5" usually has the answer "because the WHERE clause didn't hit the partition column."
Eight Real BigQuery Interview Questions With Worked Answers
Why does this query cost $50? Find and fix the issue.
-- Bad: scans 10TB SELECT * FROM `project.dataset.events` WHERE event_type = 'purchase' AND user_id = '12345'; -- Good: scans ~100GB SELECT user_id, event_ts, amount_usd FROM `project.dataset.events` WHERE _PARTITIONDATE >= '2026-01-01' AND _PARTITIONDATE < '2026-02-01' AND event_type = 'purchase' AND user_id = '12345'; -- Required: table partitioned by _PARTITIONDATE -- and clustered by (event_type, user_id)
Use ARRAY_AGG to compute user event sequences
SELECT
user_id,
ARRAY_AGG(
STRUCT(event_type, event_ts, page_url)
ORDER BY event_ts
) AS event_sequence
FROM events
WHERE _PARTITIONDATE >= CURRENT_DATE() - 7
GROUP BY user_id;
-- Downstream UNNEST to flatten back to rows when needed:
SELECT user_id, e.event_type, e.event_ts
FROM user_events_with_sequence,
UNNEST(event_sequence) AS e
WHERE e.event_type = 'purchase';Compute approximate distinct users at billion-row scale
-- Approximate count distinct (~2% error, constant memory)
SELECT
date_trunc(event_ts, DAY) AS day,
APPROX_COUNT_DISTINCT(user_id) AS approx_dau
FROM events
WHERE _PARTITIONDATE >= '2026-01-01'
GROUP BY day;
-- Mergeable HLL sketches for rolling unique counts
WITH daily_sketches AS (
SELECT
date_trunc(event_ts, DAY) AS day,
HLL_COUNT.INIT(user_id) AS sketch
FROM events
WHERE _PARTITIONDATE >= '2026-01-01'
GROUP BY day
)
SELECT
day,
HLL_COUNT.MERGE(sketch) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7d_unique_users
FROM daily_sketches
ORDER BY day;Use QUALIFY to filter window function results
-- Top 1 event per user
SELECT user_id, event_ts, event_type
FROM events
QUALIFY ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_ts DESC
) = 1;
-- Equivalent without QUALIFY (more verbose):
WITH ranked AS (
SELECT
user_id, event_ts, event_type,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_ts DESC
) AS rn
FROM events
)
SELECT user_id, event_ts, event_type
FROM ranked
WHERE rn = 1;When would you use a materialized view vs a scheduled query?
Design a clustering strategy for a 5TB clickstream table
Design a streaming insert pipeline with exactly-once semantics
Design a multi-tenant BigQuery deployment with cost attribution
BigQuery vs Other Warehouses
BigQuery is one of three major cloud warehouses. The choice depends on cost model preference, operational tolerance, and feature requirements.
| Dimension | BigQuery | Snowflake | Redshift |
|---|---|---|---|
| Pricing model | Slot or on-demand per byte | Credit-based per warehouse second | Cluster-based per node hour |
| Compute-storage separation | Yes (always) | Yes | Yes (RA3) or No (DC2) |
| Concurrency model | Slot-based queueing | Auto-scaled warehouses | Concurrency scaling clusters |
| Native streaming inserts | Yes (Storage Write API) | Yes (Snowpipe) | Yes (streaming ingestion) |
| Time travel | 7 days default | 1 to 90 days configurable | No native, via snapshots |
| ML in warehouse | BigQuery ML (mature) | Snowpark ML | Redshift ML (limited) |
| Geo-replication | Multi-region (slow) | Replicated databases | Cross-region snapshots |
| Best fit | GCP shops, ad-hoc heavy | Multi-cloud, predictable load | AWS-native, sustained load |
How BigQuery Connects to the Rest of the Cluster
BigQuery is the warehouse component in any the GCP data engineer interview guide stack. The broader the SQL interview questions hub hub covers SQL fluency at the dialect-portable level; this guide is the BigQuery-specific layer on top. For the round-level framework, see the SQL round prep guide.
For warehouse comparison, see Snowflake vs Databricks interview comparison (the most-tested decision in 2024-2026 lakehouse interviews) and the company guides for the Snowflake data engineer interview guide and the Databricks data engineer interview guide.
Data engineer interview prep FAQ
How important is BigQuery knowledge for non-GCP roles?+
What's the difference between standard SQL and legacy SQL in BigQuery?+
Should I learn BigQuery ML?+
How is BigQuery pricing different from Snowflake?+
Are BigQuery materialized views production-ready?+
How important is the BigQuery Storage Write API?+
Is BigQuery Slots the right capacity model for my company?+
What's BI Engine and when do I need it?+
Practice BigQuery SQL in the Browser
Run real BigQuery-flavored SQL problems against real schemas. Build the BigQuery-specific fluency that wins GCP data engineer loops.
Adjacent Data Engineer Interview Prep Reading
More data engineer interview prep guides
The full SQL interview question bank, indexed by topic, difficulty, and company.
Redshift sort keys, dist keys, compression, and RA3 architecture interview prep.
Postgres MVCC, indexing, partitioning, and replication interview prep.
Apache Flink stateful streaming, watermarks, exactly-once, checkpointing interview prep.
Hadoop ecosystem (HDFS, MapReduce, YARN, Hive) interview prep, including modern relevance.
AWS Glue ETL jobs, crawlers, Data Catalog, and PySpark-on-Glue interview prep.