Amazon Redshift interview questions for data engineer roles at AWS-native companies. 40+ questions covering Redshift internals (sort keys, dist keys, RA3 architecture, Spectrum, materialized views), query optimization, the on-demand vs Reserved Instance vs Serverless decision, and migration patterns from legacy DC2 nodes. Pair with the complete data engineer interview preparation framework and the how to pass the AWS Data Engineer interview.
From 76 reported AWS DE loops in 2024-2026 that included Redshift questions.
| Topic | Test Frequency | Depth Expected |
|---|---|---|
| Distribution strategies (KEY, ALL, EVEN) | 94% | Strategy choice and skew detection |
| Sort keys (compound vs interleaved) | 87% | When each is right; alignment with WHERE |
| RA3 architecture vs DC2 / DS2 | 82% | Compute-storage separation, when to choose |
| Redshift Spectrum (S3 external tables) | 67% | When to use vs loading into Redshift |
| Materialized views with auto-refresh | 62% | AWS late-2022 feature, now production-grade |
| VACUUM and ANALYZE | 58% | Maintenance, sortkey re-sort, statistics refresh |
| WLM (workload management) queues | 47% | Concurrency control, query queueing |
| Concurrency Scaling clusters | 44% | Auto-scale read capacity for burst load |
| Result cache | 39% | Query-string match cache, free reads |
| UNLOAD / COPY for data movement | 53% | S3 integration patterns |
| Redshift Serverless | 63% | Newer (2022 GA), spiky vs sustained workload trade-off |
| Federated Query (Postgres, MySQL) | 31% | Cross-engine queries |
| Cross-database queries | 36% | Multi-database architecture patterns |
| Redshift ML | 22% | In-warehouse SageMaker integration |
Redshift distributes table rows across cluster compute nodes. The distribution strategy determines whether joins require data movement (slow) or can run locally (fast). Three options.
KEY distribution: rows with the same value of the dist key land on the same node. Best for fact tables with a high-cardinality column that aligns with common joins (e.g., dist on customer_id when fact_orders joins frequently to dim_customer also dist on customer_id). Risk: skew if the dist key has low cardinality or hot values.
ALL distribution: every node gets a copy of the table. Best for small dimensions (<3M rows) joined frequently. Cost: extra storage per node. Joins to ALL-distributed tables don't require any data movement.
EVEN distribution: rows distributed round-robin across nodes. Default for tables without a clear dist key. Joins always require data movement. Acceptable for staging tables and small fact tables; suboptimal for production analytical workloads.
Redshift remains AWS-native dominant despite Snowflake's encroachment. The choice depends on existing AWS investment, predictability of workload, and operational tolerance.
| Dimension | Redshift | Snowflake | BigQuery |
|---|---|---|---|
| AWS native | Yes (only) | Multi-cloud | GCP native |
| Pricing model | Per node hour or RPU-second | Credit per warehouse second | Slot or per-byte |
| Compute-storage separation | Yes (RA3) or No (DC2) | Yes (always) | Yes (always) |
| Operational overhead | Higher: VACUUM, dist key tuning | Low: managed | Lowest: fully serverless feel |
| Concurrency model | WLM + Concurrency Scaling | Multi-cluster warehouses | Slot queueing |
| Time travel | No native (snapshots only) | 1 to 90 days | 7 days default |
| Best fit | AWS-heavy, sustained load | Multi-cloud, mixed load | GCP-heavy, ad-hoc |
Redshift is the warehouse component in any how to pass the AWS Data Engineer interview stack. The broader complete SQL interview question bank hub covers SQL fluency at the dialect-portable level; this guide is the Redshift-specific layer on top.
For comparison, see Google BigQuery interview prep (the GCP equivalent) and the how to pass the Snowflake Data Engineer interview guide for multi-cloud warehouse depth. The Snowflake or Databricks Data Engineer interview differences decision page covers the broader warehouse-vs- lakehouse choice.
Drill SQL patterns that matter in Redshift loops in our practice sandbox.
Start PracticingThe full SQL interview question bank, indexed by topic, difficulty, and company.
BigQuery internals, slot-based pricing, partitioning, and clustering interview prep.
Postgres MVCC, indexing, partitioning, and replication interview prep.
Apache Flink stateful streaming, watermarks, exactly-once, checkpointing interview prep.
Hadoop ecosystem (HDFS, MapReduce, YARN, Hive) interview prep, including modern relevance.
AWS Glue ETL jobs, crawlers, Data Catalog, and PySpark-on-Glue interview prep.
Continue your prep
50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.