Redshift Interview Questions
Amazon Redshift interview questions for data engineer roles at AWS-native companies. 40+ questions covering Redshift internals (sort keys, dist keys, RA3 architecture, Spectrum, materialized views), query optimization, the on-demand vs Reserved Instance vs Serverless decision, and migration patterns from legacy DC2 nodes. Pair with the complete data engineer interview preparation framework and the how to pass the AWS Data Engineer interview.
Redshift Topics in Interviews
Topics ranked roughly by how often they appear in AWS DE loops.
| Topic | Frequency | Depth Expected |
|---|---|---|
| Distribution strategies (KEY, ALL, EVEN) | Very common | Strategy choice and skew detection |
| Sort keys (compound vs interleaved) | Very common | When each is right; alignment with WHERE |
| RA3 architecture vs DC2 / DS2 | Common | Compute-storage separation, when to choose |
| Redshift Spectrum (S3 external tables) | Common | When to use vs loading into Redshift |
| Materialized views with auto-refresh | Common | AWS late-2022 feature, now production-grade |
| VACUUM and ANALYZE | Common | Maintenance, sortkey re-sort, statistics refresh |
| WLM (workload management) queues | Occasional | Concurrency control, query queueing |
| Concurrency Scaling clusters | Occasional | Auto-scale read capacity for burst load |
| Result cache | Occasional | Query-string match cache, free reads |
| UNLOAD / COPY for data movement | Common | S3 integration patterns |
| Redshift Serverless | Common | Newer (2022 GA), spiky vs sustained workload trade-off |
| Federated Query (Postgres, MySQL) | Occasional | Cross-engine queries |
| Cross-database queries | Occasional | Multi-database architecture patterns |
| Redshift ML | Rare | In-warehouse SageMaker integration |
Distribution Strategies: The Foundation
Redshift distributes table rows across cluster compute nodes. The distribution strategy determines whether joins require data movement (slow) or can run locally (fast). Three options.
KEY distribution: rows with the same value of the dist key land on the same node. Best for fact tables with a high-cardinality column that aligns with common joins (e.g., dist on customer_id when fact_orders joins frequently to dim_customer also dist on customer_id). Risk: skew if the dist key has low cardinality or hot values.
ALL distribution: every node gets a copy of the table. Best for small dimensions (<3M rows) joined frequently. Cost: extra storage per node. Joins to ALL-distributed tables don't require any data movement.
EVEN distribution: rows distributed round-robin across nodes. Default for tables without a clear dist key. Joins always require data movement. Acceptable for staging tables and small fact tables; suboptimal for production analytical workloads.
Six Real Redshift Interview Questions
Choose distribution strategy for fact_orders (5B rows) and dim_customer (10M rows)
EXPLAIN shows DS_DIST_BOTH for a join. How would you fix it?
When would you use compound vs interleaved sort keys?
When would you use Redshift Spectrum vs loading data into Redshift?
Should this workload run on Redshift Serverless or Provisioned?
Design a Redshift architecture for a multi-tenant SaaS analytics product
Redshift vs Snowflake vs BigQuery
Redshift remains AWS-native dominant despite Snowflake's encroachment. The choice depends on existing AWS investment, predictability of workload, and operational tolerance.
| Dimension | Redshift | Snowflake | BigQuery |
|---|---|---|---|
| AWS native | Yes (only) | Multi-cloud | GCP native |
| Pricing model | Per node hour or RPU-second | Credit per warehouse second | Slot or per-byte |
| Compute-storage separation | Yes (RA3) or No (DC2) | Yes (always) | Yes (always) |
| Operational overhead | Higher: VACUUM, dist key tuning | Low: managed | Lowest: fully serverless feel |
| Concurrency model | WLM + Concurrency Scaling | Multi-cluster warehouses | Slot queueing |
| Time travel | No native (snapshots only) | 1 to 90 days | 7 days default |
| Best fit | AWS-heavy, sustained load | Multi-cloud, mixed load | GCP-heavy, ad-hoc |
How Redshift Connects to the Rest of the Cluster
Redshift is the warehouse component in any how to pass the AWS Data Engineer interview stack. The broader the complete SQL interview problem set hub covers SQL fluency at the dialect-portable level; this guide is the Redshift-specific layer on top.
For comparison, see Google BigQuery interview prep (the GCP equivalent) and the how to pass the Snowflake Data Engineer interview guide for multi-cloud warehouse depth. The Snowflake or Databricks Data Engineer interview differences decision page covers the broader warehouse-vs- lakehouse choice.
Know Redshift the way the interviewer who asks it knows it.
Data engineer interview prep FAQ
Is Redshift still relevant in 2026 with Snowflake’s growth?+
Should I learn DC2 / DS2 architecture if RA3 is the future?+
How important is VACUUM and ANALYZE?+
What’s the difference between Redshift and Redshift Serverless?+
Should I use Redshift Spectrum or just Athena for S3 queries?+
How does query optimization work in Redshift?+
Are materialized views production-ready in Redshift?+
What’s the migration path from Redshift to a different warehouse?+
Practice Redshift-Flavored SQL
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition
More data engineer interview prep reading
More data engineer interview prep guides
The full SQL interview problem set, indexed by topic, difficulty, and company.
BigQuery internals, slot-based pricing, partitioning, and clustering interview prep.
Postgres MVCC, indexing, partitioning, and replication interview prep.
Apache Flink stateful streaming, watermarks, exactly-once, checkpointing interview prep.
Hadoop ecosystem (HDFS, MapReduce, YARN, Hive) interview prep, including modern relevance.
AWS Glue ETL jobs, crawlers, Data Catalog, and PySpark-on-Glue interview prep.