Redshift Interview Questions
Redshift Topic Frequency in Interviews
From 76 reported AWS DE loops in 2024-2026 that included Redshift questions.
| Topic | Test Frequency | Depth Expected |
|---|---|---|
| Distribution strategies (KEY, ALL, EVEN) | 94% | Strategy choice and skew detection |
| Sort keys (compound vs interleaved) | 87% | When each is right; alignment with WHERE |
| RA3 architecture vs DC2 / DS2 | 82% | Compute-storage separation, when to choose |
| Redshift Spectrum (S3 external tables) | 67% | When to use vs loading into Redshift |
| Materialized views with auto-refresh | 62% | AWS late-2022 feature, now production-grade |
| VACUUM and ANALYZE | 58% | Maintenance, sortkey re-sort, statistics refresh |
| WLM (workload management) queues | 47% | Concurrency control, query queueing |
| Concurrency Scaling clusters | 44% | Auto-scale read capacity for burst load |
| Result cache | 39% | Query-string match cache, free reads |
| UNLOAD / COPY for data movement | 53% | S3 integration patterns |
| Redshift Serverless | 63% | Newer (2022 GA), spiky vs sustained workload trade-off |
| Federated Query (Postgres, MySQL) | 31% | Cross-engine queries |
| Cross-database queries | 36% | Multi-database architecture patterns |
| Redshift ML | 22% | In-warehouse SageMaker integration |
Distribution Strategies: The Foundation
Redshift distributes table rows across cluster compute nodes. The distribution strategy determines whether joins require data movement (slow) or can run locally (fast). Three options.
KEY distribution: rows with the same value of the dist key land on the same node. Best for fact tables with a high-cardinality column that aligns with common joins (e.g., dist on customer_id when fact_orders joins frequently to dim_customer also dist on customer_id). Risk: skew if the dist key has low cardinality or hot values.
ALL distribution: every node gets a copy of the table. Best for small dimensions (<3M rows) joined frequently. Cost: extra storage per node. Joins to ALL-distributed tables don't require any data movement.
EVEN distribution: rows distributed round-robin across nodes. Default for tables without a clear dist key. Joins always require data movement. Acceptable for staging tables and small fact tables; suboptimal for production analytical workloads.
Six Real Redshift Interview Questions
Choose distribution strategy for fact_orders (5B rows) and dim_customer (10M rows)
EXPLAIN shows DS_DIST_BOTH for a join. How would you fix it?
When would you use compound vs interleaved sort keys?
When would you use Redshift Spectrum vs loading data into Redshift?
Should this workload run on Redshift Serverless or Provisioned?
Design a Redshift architecture for a multi-tenant SaaS analytics product
Redshift vs Snowflake vs BigQuery
Redshift remains AWS-native dominant despite Snowflake's encroachment. The choice depends on existing AWS investment, predictability of workload, and operational tolerance.
| Dimension | Redshift | Snowflake | BigQuery |
|---|---|---|---|
| AWS native | Yes (only) | Multi-cloud | GCP native |
| Pricing model | Per node hour or RPU-second | Credit per warehouse second | Slot or per-byte |
| Compute-storage separation | Yes (RA3) or No (DC2) | Yes (always) | Yes (always) |
| Operational overhead | Higher: VACUUM, dist key tuning | Low: managed | Lowest: fully serverless feel |
| Concurrency model | WLM + Concurrency Scaling | Multi-cluster warehouses | Slot queueing |
| Time travel | No native (snapshots only) | 1 to 90 days | 7 days default |
| Best fit | AWS-heavy, sustained load | Multi-cloud, mixed load | GCP-heavy, ad-hoc |
How Redshift Connects to the Rest of the Cluster
Redshift is the warehouse component in any how to pass the AWS Data Engineer interview stack. The broader complete SQL interview question bank hub covers SQL fluency at the dialect-portable level; this guide is the Redshift-specific layer on top.
For comparison, see Google BigQuery interview prep (the GCP equivalent) and the how to pass the Snowflake Data Engineer interview guide for multi-cloud warehouse depth. The Snowflake or Databricks Data Engineer interview differences decision page covers the broader warehouse-vs- lakehouse choice.
Data engineer interview prep FAQ
Is Redshift still relevant in 2026 with Snowflake's growth?+
Should I learn DC2 / DS2 architecture if RA3 is the future?+
How important is VACUUM and ANALYZE?+
What's the difference between Redshift and Redshift Serverless?+
Should I use Redshift Spectrum or just Athena for S3 queries?+
How does query optimization work in Redshift?+
Are materialized views production-ready in Redshift?+
What's the migration path from Redshift to a different warehouse?+
Practice Redshift-Flavored SQL
Drill SQL patterns that matter in Redshift loops in our practice sandbox.
Adjacent Data Engineer Interview Prep Reading
More data engineer interview prep guides
The full SQL interview question bank, indexed by topic, difficulty, and company.
BigQuery internals, slot-based pricing, partitioning, and clustering interview prep.
Postgres MVCC, indexing, partitioning, and replication interview prep.
Apache Flink stateful streaming, watermarks, exactly-once, checkpointing interview prep.
Hadoop ecosystem (HDFS, MapReduce, YARN, Hive) interview prep, including modern relevance.
AWS Glue ETL jobs, crawlers, Data Catalog, and PySpark-on-Glue interview prep.