Amazon's DE interview has 5 rounds: a SQL phone screen, plus 4 onsite rounds covering SQL, system design, Python, and behavioral. Every round evaluates Leadership Principles alongside technical skills. The bar raiser has veto power. This guide covers the exact structure, common questions, LP framing, and how to practice on DataDriven with Amazon-verified question patterns.
Interview rounds
Leadership Principles
Onsite duration
Bar raiser (veto power)
Amazon's DE interview process follows a predictable structure. Understanding the structure gives you a preparation advantage because you can allocate your practice time to match the actual round distribution.
The phone screen is your first technical hurdle. You'll share your screen on Amazon Chime and solve 2 SQL problems in a text editor (no IDE). The interviewer watches you code in real time. Most phone screens start with a Medium SQL problem (15 to 20 minutes) followed by a Hard problem (20 to 25 minutes). The last 5 to 10 minutes are a behavioral question. About 40% of candidates pass the phone screen.
This round goes deeper than the phone screen. You may be asked to design a schema first, then write queries against it. The problems are Hard difficulty and often involve multiple CTEs, window functions, and edge case handling. The interviewer probes your thought process and asks follow-up questions: 'What if the table has 10 billion rows? How would you optimize this query?'
You are given a scenario and asked to design the full data pipeline. Amazon system design questions often involve AWS services, though you don't need to know the exact API calls. The interviewer evaluates your ability to structure the problem, identify components, discuss trade-offs, and handle failure modes. You will draw a diagram (on a whiteboard or virtual whiteboard) and walk through the data flow from source to consumption.
Amazon Python rounds for DE roles test data manipulation, not algorithms. You'll parse files, process event streams, implement pipeline logic, or build a data validation function. Clean code matters: Amazon evaluates variable naming, error handling, and whether your code would survive a production code review. Write type hints and docstrings even in the interview.
Every question maps to 1 to 2 Leadership Principles. The interviewer asks follow-up questions to drill into the specifics of your story. Vague answers fail. Prepare 8 to 10 detailed STAR stories before the interview. Each story should cover: Situation (2 sentences), Task (1 sentence), Action (5 to 8 sentences with technical detail), Result (2 sentences with quantified impact).
Amazon has 16 Leadership Principles. All 16 can appear in your interview, but 6 appear most frequently for DE roles. For each one below, we include the specific DE context and what kind of story to prepare.
When designing a data pipeline, start with the analyst or business stakeholder who consumes the data. What SLA do they need? What query patterns will they run? Amazon interviewers want to hear that you designed your pipeline around the customer's needs, not around what was easiest to build.
You own your pipeline end to end. If it breaks at 3 AM, you fix it. If the data quality is wrong, you catch it before the stakeholder does. Tell stories about times you noticed a problem nobody assigned to you and fixed it anyway.
Describe a time you replaced a complex pipeline with a simpler one. Maybe you eliminated 3 intermediate tables by restructuring the SQL. Or you replaced a custom orchestration script with an Airflow DAG. Amazon values reducing complexity.
Data engineers make judgment calls: which modeling technique to use, when to batch vs stream, whether to build or buy. Prepare examples where you made a technical decision that proved correct, and explain the reasoning behind it.
Amazon expects DEs to debug down to the row level. Tell a story about tracking a data quality issue from a dashboard anomaly to a specific source record. The more detailed the investigation, the better.
Describe a situation where you shipped a pipeline quickly rather than waiting for perfect requirements. Amazon values reversible decisions made fast over perfect decisions made slowly. Mention what guardrails you put in place (monitoring, alerts) to catch issues early.
The key to LP framing in technical rounds: you don't need a separate story. When you explain your system design, connect design decisions to LPs naturally. "I chose a 5-minute micro-batch over real-time streaming because the dashboard consumers don't need sub-second latency, and the simpler architecture means fewer failure modes to monitor. That's Customer Obsession: I talked to the analysts and learned what they actually need, not what sounds impressive."
Amazon-specific tip
This is Amazon's most common phone screen pattern: ranking with ties. Use DENSE_RANK with PARTITION BY category. Explain why DENSE_RANK over RANK or ROW_NUMBER. Mention that you would ask the interviewer about tie handling before writing code.
Amazon-specific tip
Gap-and-island problem. Subtract ROW_NUMBER from the month to create a group key. HAVING COUNT >= 3. Walk through a small example to show the interviewer why the subtraction trick works. Amazon phone screens are 45 minutes, typically 2 SQL questions.
Amazon-specific tip
Filter out weekends first (EXTRACT DOW or a calendar dimension). Then apply AVG with ROWS BETWEEN 6 PRECEDING AND CURRENT ROW. Mention that the first 6 days are partial windows and how you would handle them.
Amazon-specific tip
This is a combined modeling and SQL question. Start by proposing the schema (fact_watch_sessions with user_id, content_id, start_time, duration_seconds, device_type). Then write the MoM comparison with LAG. Amazon wants to see you connect the model to the query.
Amazon-specific tip
Self-join or LAG with a time condition. Join orders o1 to orders o2 where o1.customer_id = o2.customer_id AND o1.product_id = o2.product_id AND o1.order_id < o2.order_id AND ABS(EXTRACT(EPOCH FROM o2.created_at - o1.created_at)) <= 300. Explain why this is a real production concern for payment systems.
Amazon-specific tip
Start with requirements (volume: millions of transactions/day, latency: 5 min, sources: seller API, payment system). Propose Kafka for ingestion, Flink for real-time fraud scoring, S3 + Delta Lake for storage, Redshift for the dashboard. Discuss exactly-once semantics, what happens when the fraud model is down, and how you would backfill historical data. Amazon loves pipeline designs with failure modes explicitly addressed.
Amazon-specific tip
Phased approach. Phase 1: historical data migration with S3 as the landing zone. Phase 2: dual-write (both Hadoop and AWS) for validation. Phase 3: pipeline-by-pipeline cutover with rollback capability. Phase 4: decommission Hadoop. Mention that you would prioritize high-value pipelines first, not migrate everything at once. Tie back to Customer Obsession: the analytics consumers should see zero disruption.
Amazon-specific tip
Use a sliding window approach with collections.deque per product_id. For each event, remove events older than 1 hour from the deque, add the new event, check if len(deque) > 3. Return a list of flagged product IDs. Amazon Python questions emphasize real-time data processing patterns.
Amazon-specific tip
Dictionary keyed by product_id. For each incoming record: if 'deleted' flag is true, remove from catalog. If product_id exists, update fields. If new, insert. Return the merged catalog. Discuss how you would handle partial failures (what if the JSON file is corrupted halfway through). Amazon loves idempotency in coding questions.
Amazon-specific tip
Use the STAR framework (Situation, Task, Action, Result). Tie to 'Bias for Action' and 'Are Right, A Lot.' Example: you chose a modeling approach for a new data source before the schema was finalized, put monitoring in place to catch schema drift, and iterated when the real schema diverged from your assumption. Quantify the result: 'We launched 3 weeks earlier than the team that waited for perfect specs.'
Amazon-specific tip
Map to 'Have Backbone; Disagree and Commit.' Describe the disagreement specifically: you wanted batch processing, they wanted streaming. Explain your reasoning with data. Show that once the decision was made (even if it went against your preference), you committed fully and executed. End with the outcome and what you learned.
Amazon-specific tip
Map to 'Dive Deep.' Walk through the debugging process in detail: how you noticed the issue, what tools you used, how you traced it to the root cause, what fix you implemented, and what monitoring you added to prevent recurrence. Amazon interviewers will ask follow-up questions to test the depth of your story. Vague stories fail. Specific stories with technical detail pass.
The bar raiser is unique to Amazon. One of your onsite interviewers is a specially trained employee from a different team. Their job is to evaluate whether you raise the bar for your target role. They have veto power: a bar raiser "no" overrides three "yes" votes from the other interviewers.
You won't know which round is the bar raiser round. It could be the SQL round, the system design round, the coding round, or the behavioral round. This uncertainty is intentional: Amazon wants you to perform consistently across all rounds, not coast through some and peak in others.
What bar raisers look for:
Depth. Surface-level answers that hit the right keywords but lack specifics will not impress a bar raiser. When you describe a pipeline design, the bar raiser will drill into details: "How do you handle exactly-once semantics in that Kafka consumer? What happens if the consumer crashes between reading a message and committing the offset?" If you can't go three levels deep on your own design, the bar raiser will notice.
Consistency with LPs. Bar raisers are trained to evaluate LP alignment. They listen for natural LP references, not forced ones. "I chose this approach because it was simpler to operate, and simplicity reduces the on-call burden for my team" is natural LP alignment (Invent and Simplify). "I did this because of Invent and Simplify" sounds rehearsed and hollow.
Self-awareness. Bar raisers respect candidates who acknowledge trade-offs and limitations in their approach. "This design handles 10M events per day, but if we scale to 100M, the sort-merge join in step 3 would become a bottleneck. At that scale, I would switch to a broadcast join or pre-partition the data." Knowing the limits of your solution is a signal of seniority.
If you've been following a general DE interview prep plan, here is what to add specifically for Amazon.
Add LP stories (5 to 8 hours). Prepare 8 to 10 STAR stories from your work experience. Each story should map to 2 to 3 Leadership Principles. Write them out in full. Practice telling each story in 2 to 3 minutes. The behavioral round moves fast: 5 to 7 questions in 45 to 60 minutes means you get about 7 minutes per answer, including follow-ups. Rambling stories that take 10 minutes will cost you 2 to 3 questions.
Add LP framing to technical answers (2 to 3 hours). Practice connecting your system design decisions to LPs. This feels unnatural at first. After 5 to 10 practice rounds, it becomes second nature. The trick: after describing a design decision, add one sentence explaining the customer impact or operational benefit. That sentence naturally maps to an LP.
Study AWS data services (3 to 5 hours). You don't need to know API calls, but you should know when to use Redshift vs Athena vs Glue vs EMR vs Kinesis vs Lambda. System design rounds at Amazon often involve AWS services. Know the trade-offs: Redshift for structured analytics (provisioned), Athena for ad-hoc queries on S3 (serverless), Glue for ETL (managed Spark), Kinesis for real-time streaming, EMR for custom Spark clusters.
Practice combined modeling + SQL questions (3 to 5 hours). Amazon's onsite SQL round often combines modeling and querying. You design the schema, then write queries against your own schema. This is harder than querying a given schema because any modeling mistakes compound into query complexity. Practice by picking a domain (ride-sharing, streaming service, marketplace), designing the schema in 10 minutes, then solving 2 to 3 queries against it in 20 minutes.
This assumes you already have general DE skills (SQL, Python, modeling basics). If not, complete the 8-week general prep plan first, then add these 4 Amazon-specific weeks.
Solve 10 Hard SQL problems on DataDriven. Focus on combined patterns: CTE + window function + aggregation in one query. Write 10 STAR stories mapped to LPs. Practice telling each story in under 3 minutes.
Design 4 end-to-end pipelines using AWS services. Practice structuring your walkthrough: requirements, high-level design, deep dive, failure modes, trade-offs. Study Redshift, Glue, Kinesis, S3, and Athena trade-offs.
Solve 8 Python data manipulation problems. Practice explaining your code while writing it and connecting design decisions to LPs. Solve 3 combined modeling + SQL problems (design schema, then query it).
Run 2 to 3 full Amazon mock interview loops on DataDriven. Each loop: SQL phone screen (45 min), SQL deep dive (60 min), system design (60 min), Python coding (60 min), behavioral (45 min). Review AI feedback after each loop and focus on your weakest round.
1. Skipping LP framing in technical rounds. Many candidates treat technical and behavioral rounds as separate categories. At Amazon, they're not. Every round evaluates LPs. When you explain a design decision without connecting it to a customer or operational benefit, you miss half the evaluation criteria.
2. Giving vague behavioral stories. "I worked on a data pipeline and it was challenging and we shipped it." This tells the interviewer nothing. They want: "The pipeline processed 2.3M events per day from 4 Kafka topics. The SLA was 15 minutes. We were missing the SLA by 8 minutes because of data skew on the customer_id partition key. I implemented salted joins, which reduced the longest task from 23 minutes to 4 minutes. The pipeline has met SLA every day for the last 6 months." Numbers. Details. Specifics.
3. Not asking clarifying questions. Amazon interviewers intentionally leave questions ambiguous to see if you ask for clarification. "Design a pipeline for our marketplace data." What data? What latency requirement? Who consumes it? What volume? Asking these questions maps to Customer Obsession (understanding requirements) and Dive Deep (not making assumptions). Candidates who start designing without asking any questions signal that they don't think about requirements.
4. Ignoring failure modes in system design. A pipeline design that works perfectly when everything goes right is not a complete design. Amazon wants to hear: "What happens when Kafka is down? What happens when the data volume doubles? What happens when the source schema changes?" Address failure modes proactively. This maps to Ownership (you own the pipeline, including when it breaks).
5. Over-engineering the SQL solution. Amazon SQL interviewers value clarity over cleverness. A 4-CTE solution that you can explain step by step beats a 1-query solution that the interviewer needs 5 minutes to parse. Write clean SQL with meaningful CTE names (not cte1, cte2, cte3). Format it neatly. This is production code, not a code golf competition.
Amazon-style SQL, system design, Python, and behavioral rounds with AI grading. Practice LP framing in every technical answer.