Most candidates think data science and data engineering interviews are different tracks. They're not. Interviewers at both roles pull SQL questions from the same well. What changes is the follow-up: DS rounds push on sampling and statistical significance, DE rounds push on scale and pipeline correctness. If you practice DS SQL questions and ignore the DE angle, you'll bomb the second half of the conversation.
SQL share in DE rounds
Python share
Pure system design
Rounds analyzed
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
Five areas of overlap, ranked by how much they matter for data engineers.
Data scientists and data engineers both face SQL questions in interviews, and the overlap here is almost complete. Aggregations, window functions, CTEs, joins, and subqueries appear identically in both interview tracks. If you are a data engineer interviewing at a company that titles the role 'data scientist,' the SQL portion will feel familiar. The main difference: DS interviews may include probability-themed SQL (sampling, bootstrapping with SQL) while DE interviews lean toward performance optimization.
DE interviews rarely test statistics directly, but many DE-adjacent roles (analytics engineer, ML engineer) do. Understanding p-values, confidence intervals, distributions, and hypothesis testing helps you build better data pipelines. If your pipeline feeds an A/B testing system, knowing how sample size, statistical power, and significance work makes you a better engineer. Several companies (Meta, Netflix, Spotify) ask DEs light statistics questions to gauge breadth.
At senior levels, data engineers are expected to understand how ML models consume data. Feature engineering, training data pipelines, model serving, and monitoring all live in the DE domain. DS interviews test the ML concepts. DE interviews test the infrastructure that supports them. If you can speak both languages, you are significantly more valuable in cross-functional interviews.
Companies like Meta, Netflix, and Airbnb ask DEs about experimentation infrastructure. How do you partition users into control and treatment groups? How do you ensure the pipeline delivers clean experiment data? How do you handle metric computation across variants? These questions sit at the intersection of DS and DE, and candidates who can answer both sides stand out.
DS interviews test your ability to choose the right chart, interpret trends, and communicate findings to stakeholders. DE interviews do not test visualization directly, but understanding business metrics (DAU, MAU, retention, churn, LTV) helps you build better data models. If a DE interviewer asks you to design a data model for user engagement, knowing what metrics the model needs to support is critical.
Most DS prep guides stop at the statistical answer. Interviewers don't. They ask a DE follow-up: how would this scale, where does the data come from, what breaks at 10x volume. Every answer below includes that second beat so you're not caught flat-footed.
Write a query to calculate the 7-day rolling average of daily active users, handling days with zero activity.
Start with a date spine (generate_series or a calendar table) to ensure every day has a row, even days with zero active users. LEFT JOIN the user activity table to the date spine. COUNT(DISTINCT user_id) per day gives you daily active users. Then apply AVG() with a window frame: AVG(dau) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). The date spine is the key insight. Without it, your rolling average skips zero-activity days, inflating the result. This question tests window functions, date handling, and awareness of data gaps.
Find users whose total spending increased month-over-month for 3 consecutive months.
Aggregate spending by user and month. Use LAG to compare each month's spending to the previous month. Flag months where spending increased. Then identify sequences of 3 consecutive increase flags. The date-minus-ROW_NUMBER trick creates groups of consecutive months. Filter groups with COUNT >= 3. This tests window functions, aggregation, and the consecutive-sequence pattern that appears in both DS and DE interviews.
A product team ran an A/B test for 2 weeks and found a 3% lift in click-through rate with a p-value of 0.04. Should they ship the change?
A p-value of 0.04 is below the standard 0.05 threshold, so the result is statistically significant. But significance alone is not sufficient for a ship decision. Consider: (1) Is the 3% lift practically significant? A 3% lift on a 0.1% baseline is different from a 3% lift on a 50% baseline. (2) Was the sample size adequate? Two weeks might not capture weekly seasonality patterns. (3) Did they look at guardrail metrics (revenue, latency, error rates)? A CTR improvement that degrades user experience is not worth shipping. (4) Was there any p-hacking (checking results daily and stopping early)? The right answer is: statistically significant, but need to verify practical significance, check guardrails, and confirm the test ran for the pre-determined duration.
Explain the difference between Type I and Type II errors in the context of a data quality alert system.
Type I error (false positive): the alert fires when there is no real data quality issue. Your pipeline data is fine, but the monitor triggers a page at 3 AM. Type II error (false negative): there is a real data quality issue, but the alert does not fire. Bad data flows downstream undetected. For a data quality system, Type II errors are usually more costly because corrupted data reaches dashboards and ML models before anyone notices. When tuning alert thresholds, you trade off between the two: tighter thresholds reduce Type II errors but increase Type I (more false alarms). This framing shows interviewers you understand statistical concepts in an engineering context.
Design a feature engineering pipeline for a recommendation system that serves 50M users.
Start with requirements: what features does the model need, how fresh must they be, and what latency does serving require? Batch features (user demographics, historical preferences) can be computed daily in Spark and stored in a feature store. Near-real-time features (session clicks, last-5-minutes behavior) require a streaming pipeline (Kafka to Flink) writing to a low-latency store (Redis or DynamoDB). The feature store should support both batch training (historical features joined to training labels) and online serving (point lookup by user_id at inference time). Discuss versioning: when you add a new feature, the model needs retraining on the same feature definition. This tests your understanding of the ML infrastructure that data engineers build and maintain.
How would you detect and handle data drift in a production ML pipeline?
Data drift means the distribution of input features changes over time, which can degrade model performance. Detection: compute statistical summaries (mean, median, percentiles, distribution histograms) for each feature on each batch. Compare current batch distributions to a reference window using KS tests, PSI (population stability index), or simple threshold checks on mean/variance. Handling: alert the ML team when drift exceeds a threshold, trigger model retraining if drift is persistent, and log drift metrics for retrospective analysis. As a data engineer, your responsibility is building the monitoring pipeline that detects drift and the retraining infrastructure that responds to it. You are not necessarily choosing the model architecture, but you are building the system that keeps the model healthy.
A data scientist tells you their A/B test results are being polluted by bot traffic. How would you build a pipeline to filter bots?
Layer multiple signals: (1) Known bot user-agents (Googlebot, Bingbot, etc.) are easy to filter in the ingestion layer. (2) Request rate limits: flag user_ids with more than N actions per minute as suspicious. (3) Session behavior: bots often have zero scroll depth, no mouse movements, and perfectly uniform time-between-clicks. (4) IP reputation: cross-reference IPs against known bot networks. Build a scoring pipeline that assigns a bot_probability to each session. Let the experimentation team choose their threshold (e.g., exclude sessions with bot_probability > 0.8). Store both raw and filtered datasets so the team can compare results with and without filtering. The key is that bot filtering is a data engineering problem that directly affects data science outcomes.
How would you design the data pipeline for an experimentation platform that runs 200 concurrent A/B tests?
The pipeline needs to: (1) Assign users to variants deterministically (hash user_id + experiment_id to ensure consistent assignment across sessions). (2) Log exposure events (which user saw which variant, when). (3) Join exposure events to outcome metrics (clicks, purchases, engagement). (4) Compute per-experiment statistics (mean, variance, sample size per variant). For 200 concurrent tests, the join between exposures and outcomes is the bottleneck. Partition both tables by date and experiment_id. Pre-aggregate metrics daily to reduce join size. Store results in a materialized table that the experimentation UI reads. Discuss backfill: when a metric definition changes, you need to recompute results for running experiments. This tests pipeline design, data modeling, and understanding of how experimentation infrastructure works at scale.
Define retention rate, and write SQL to calculate 7-day retention for a mobile app.
Retention rate: the percentage of users who return to the app N days after their first visit. For 7-day retention: identify each user's first activity date. Check if they had any activity on exactly day 7 (or within a 7-day window, depending on definition). SQL: WITH first_visit AS (SELECT user_id, MIN(activity_date) AS first_date FROM activity GROUP BY user_id) SELECT first_date, COUNT(DISTINCT CASE WHEN a.activity_date = f.first_date + 7 THEN f.user_id END) * 100.0 / COUNT(DISTINCT f.user_id) AS retention_7d FROM first_visit f LEFT JOIN activity a ON f.user_id = a.user_id AND a.activity_date = f.first_date + 7 GROUP BY first_date. Discuss the definition ambiguity: strict (exactly day 7) vs window (days 7 to 13). Different companies use different definitions, and clarifying this upfront shows attention to detail.
A dashboard shows that average order value (AOV) increased 15% this month, but total revenue is flat. What happened?
If AOV went up but total revenue stayed flat, the number of orders must have decreased. Revenue = AOV * order_count. If AOV increases by 15% and revenue is constant, order count dropped by approximately 13%. Investigate: (1) Did a segment of low-value customers stop ordering, which would mechanically raise AOV while reducing volume? (2) Was there a price increase that drove away price-sensitive customers? (3) Did a promotion end that was generating high-volume, low-value orders? (4) Check for data pipeline issues: are all order sources still reporting? A broken ingestion pipeline for one order channel could reduce order count without the team realizing. This question tests business intuition and the habit of questioning data rather than accepting it at face value.
Three practical reasons, not theoretical hand-waving.
At mid-size companies, the 'data engineer' role often includes experimentation analysis, metric computation, and dashboard building. These tasks overlap heavily with data science. If your interview includes a 'business case' or 'metric analysis' round, you are being tested on DS skills even if the title says data engineer. Being prepared for this overlap gives you an advantage.
Data engineers who understand statistics, ML pipelines, and business metrics earn trust from data scientists and product managers faster. This trust translates to more interesting projects, more influence on technical decisions, and faster promotions. You do not need to be an expert statistician, but understanding the basics makes you a better collaborator.
If you can answer both DE and DS interview questions, your target job pool expands to include analytics engineer, ML engineer, and data platform engineer roles. These roles often pay comparably to or more than traditional DE roles and offer different growth paths. Breadth in your interview preparation translates directly to more opportunities.
The questions don't change between DS and DE. Your answers have to. Train on both framings in one place.
Start Practicing