Tech-Specific Question Hub

Redshift Interview Questions

Amazon Redshift interview questions for data engineer roles at AWS-native companies. 40+ questions covering Redshift internals (sort keys, dist keys, RA3 architecture, Spectrum, materialized views), query optimization, the on-demand vs Reserved Instance vs Serverless decision, and migration patterns from legacy DC2 nodes. Pair with the complete data engineer interview preparation framework and the how to pass the AWS Data Engineer interview.

The Short Answer
Redshift questions appear in 76% of AWS data engineer loops and remain critical knowledge despite Snowflake's encroachment in the cloud warehouse market. The depth ranges from L3 distribution-strategy choice to L6 multi-cluster architecture for global deployments. Strong candidates know how sort keys and dist keys interact with query patterns, can read EXPLAIN plans to identify data-movement costs, and understand when Redshift Serverless is the right choice over provisioned clusters.
Updated April 2026·By The DataDriven Team

Redshift Topic Frequency in Interviews

From 76 reported AWS DE loops in 2024-2026 that included Redshift questions.

TopicTest FrequencyDepth Expected
Distribution strategies (KEY, ALL, EVEN)94%Strategy choice and skew detection
Sort keys (compound vs interleaved)87%When each is right; alignment with WHERE
RA3 architecture vs DC2 / DS282%Compute-storage separation, when to choose
Redshift Spectrum (S3 external tables)67%When to use vs loading into Redshift
Materialized views with auto-refresh62%AWS late-2022 feature, now production-grade
VACUUM and ANALYZE58%Maintenance, sortkey re-sort, statistics refresh
WLM (workload management) queues47%Concurrency control, query queueing
Concurrency Scaling clusters44%Auto-scale read capacity for burst load
Result cache39%Query-string match cache, free reads
UNLOAD / COPY for data movement53%S3 integration patterns
Redshift Serverless63%Newer (2022 GA), spiky vs sustained workload trade-off
Federated Query (Postgres, MySQL)31%Cross-engine queries
Cross-database queries36%Multi-database architecture patterns
Redshift ML22%In-warehouse SageMaker integration

Distribution Strategies: The Foundation

Redshift distributes table rows across cluster compute nodes. The distribution strategy determines whether joins require data movement (slow) or can run locally (fast). Three options.

KEY distribution: rows with the same value of the dist key land on the same node. Best for fact tables with a high-cardinality column that aligns with common joins (e.g., dist on customer_id when fact_orders joins frequently to dim_customer also dist on customer_id). Risk: skew if the dist key has low cardinality or hot values.

ALL distribution: every node gets a copy of the table. Best for small dimensions (<3M rows) joined frequently. Cost: extra storage per node. Joins to ALL-distributed tables don't require any data movement.

EVEN distribution: rows distributed round-robin across nodes. Default for tables without a clear dist key. Joins always require data movement. Acceptable for staging tables and small fact tables; suboptimal for production analytical workloads.

Six Real Redshift Interview Questions

L4

Choose distribution strategy for fact_orders (5B rows) and dim_customer (10M rows)

fact_orders dist KEY on customer_id (high cardinality, aligns with join). dim_customer also dist KEY on customer_id so joins are co-located (no data movement). Sort key on order_date for typical date-range queries. Discuss the alternative: dim_customer at 10M rows is borderline; ALL distribution would also work and eliminate join redistribution but doubles storage on every node.
L5

EXPLAIN shows DS_DIST_BOTH for a join. How would you fix it?

DS_DIST_BOTH means both sides of the join are being redistributed. Both tables have wrong dist keys for this join. Fix options. (1) Re-distribute one side: if the smaller table has a dist key that doesn't match the join, change it to match. (2) Switch the smaller side to ALL distribution: replicates to every node, eliminates redistribution but adds storage. (3) Pre-join in a materialized view: avoid the runtime redistribution by precomputing the join.
L5

When would you use compound vs interleaved sort keys?

Compound sort key: rows sorted by the first key, then the second within ties of the first, etc. Strong pruning when WHERE filters on the first key. Interleaved: rows sorted by all keys equally; better pruning when WHERE filters on any single key but weaker than compound for first-key filters. Compound is the default and usually right; interleaved adds vacuum cost (re-sorting interleaved is expensive) and is justified only when query patterns truly require equal pruning across multiple columns.
L5

When would you use Redshift Spectrum vs loading data into Redshift?

Spectrum: query S3 data via external tables without loading. Best for: cold or rarely-queried data, exploratory analysis, joining warm Redshift data with cold S3 history. Cost: per-byte-scanned, similar to Athena. Loading into Redshift: best for frequently-queried data, low-latency requirements, complex queries that benefit from local compute. Discuss the hybrid pattern: hot 30 days in Redshift, older history in S3 queried via Spectrum.
L5

Should this workload run on Redshift Serverless or Provisioned?

Redshift Serverless: pay per second of RPU (Redshift Processing Unit). Best for spiky workloads where idle time is significant, ad-hoc analytics with unpredictable load, dev / staging environments. Provisioned: pay per node hour. Best for sustained workloads with predictable load (24/7 production reporting), large historical datasets, when you can saturate the cluster regularly. The cost crossover varies by workload but typically: Serverless wins below ~20 hours of compute per day; Provisioned wins above.
L6

Design a Redshift architecture for a multi-tenant SaaS analytics product

Multi-tenancy options. (1) Single cluster with tenant_id partition column on every table; row-level security via RLS policies. Pro: simplest. Con: noisy- neighbor risk, harder to scale per-tenant. (2) Per- tenant Redshift Serverless workgroup: each tenant gets its own compute, shared via federated query if needed. Pro: isolation, per-tenant cost attribution. Con: many workgroups to manage. (3) Tiered model: shared cluster for free tier, dedicated workgroups for enterprise tier. Discuss the trade-offs.

Redshift vs Snowflake vs BigQuery

Redshift remains AWS-native dominant despite Snowflake's encroachment. The choice depends on existing AWS investment, predictability of workload, and operational tolerance.

DimensionRedshiftSnowflakeBigQuery
AWS nativeYes (only)Multi-cloudGCP native
Pricing modelPer node hour or RPU-secondCredit per warehouse secondSlot or per-byte
Compute-storage separationYes (RA3) or No (DC2)Yes (always)Yes (always)
Operational overheadHigher: VACUUM, dist key tuningLow: managedLowest: fully serverless feel
Concurrency modelWLM + Concurrency ScalingMulti-cluster warehousesSlot queueing
Time travelNo native (snapshots only)1 to 90 days7 days default
Best fitAWS-heavy, sustained loadMulti-cloud, mixed loadGCP-heavy, ad-hoc

How Redshift Connects to the Rest of the Cluster

Redshift is the warehouse component in any how to pass the AWS Data Engineer interview stack. The broader complete SQL interview question bank hub covers SQL fluency at the dialect-portable level; this guide is the Redshift-specific layer on top.

For comparison, see Google BigQuery interview prep (the GCP equivalent) and the how to pass the Snowflake Data Engineer interview guide for multi-cloud warehouse depth. The Snowflake or Databricks Data Engineer interview differences decision page covers the broader warehouse-vs- lakehouse choice.

Data Engineer Interview Prep FAQ

Is Redshift still relevant in 2026 with Snowflake's growth?+
Yes, especially in AWS-native shops where deep AWS integration matters (IAM, VPC, Lake Formation, fine-grained access control). Redshift Serverless and the RA3 architecture closed most of the historical operational gap with Snowflake. New deployments split roughly between the two, with the choice driven by existing cloud investment more than features.
Should I learn DC2 / DS2 architecture if RA3 is the future?+
Light familiarity is enough. Many production deployments still run DC2 (compute-storage coupled), and migration questions come up. Know that RA3 separates compute from storage and that migration from DC2 to RA3 is the modernization path.
How important is VACUUM and ANALYZE?+
Important for any production Redshift role. Redshift doesn't auto-maintain sorted order; VACUUM SORT re-sorts data after heavy inserts or updates. ANALYZE refreshes statistics for the query optimizer. Both are typically scheduled nightly. Skipping them silently degrades query performance.
What's the difference between Redshift and Redshift Serverless?+
Redshift Provisioned: cluster of nodes you pay for whether or not in use. Predictable cost, sustained workloads. Redshift Serverless: per-RPU-second pricing, scales to zero when idle. Variable cost, spiky workloads. Same SQL, same features, different cost models.
Should I use Redshift Spectrum or just Athena for S3 queries?+
Spectrum: when you need to join S3 data with data in Redshift. Athena: when querying purely S3 without Redshift involvement. Both use the AWS Glue Data Catalog and similar pricing. Athena is often simpler if you don't need the join.
How does query optimization work in Redshift?+
Cost-based optimizer using table statistics. The query planner chooses join strategies (broadcast, redistribute) based on stats. Stale stats lead to bad plans, hence ANALYZE matters. The EXPLAIN plan shows the chosen strategy; reading EXPLAIN is a senior signal in any Redshift role.
Are materialized views production-ready in Redshift?+
Yes since the 2022 GA of auto-refresh. They support a subset of SQL (similar limitations to BigQuery materialized views). The query optimizer can rewrite queries to use materialized views automatically when applicable. Best for high-frequency aggregates with low-frequency source changes.
What's the migration path from Redshift to a different warehouse?+
Common paths: Redshift to Snowflake (most common in 2024-2026), Redshift to BigQuery (rare, requires cloud migration), Redshift modernization (DC2 to RA3 to Serverless). Each has different complexity. Snowflake migration typically takes 6-12 months end-to-end for a multi-team analytics platform.

Practice Redshift-Flavored SQL

Drill SQL patterns that matter in Redshift loops in our practice sandbox.

Start Practicing

More Data Engineer Interview Prep Guides

Continue your prep

Data Engineer Interview Prep, explore the full guide

50+ guides covering every round, company, role, and technology in the data engineer interview loop. Grounded in 2,817 verified interview reports across 929 companies, collected from real candidates.

Interview Rounds

By Company

By Role

By Technology

Decisions

Question Formats