Data Science Interview Questions for Data Engineers (2026)

Most candidates think data science and data engineering interviews are different tracks. They are not. Interviewers at both roles pull SQL questions from the same well. What changes is the follow-up: DS rounds push on sampling and statistical significance, DE rounds push on scale and pipeline correctness.

41%

SQL share in DE rounds

35%

Python share

Pure system design

1,042

Rounds analyzed

Where DS and DE Interviews Overlap

Five areas of overlap, ranked by how much they matter for data engineers.

High overlap

SQL and Data Manipulation

Data scientists and data engineers both face SQL questions in interviews, and the overlap here is almost complete. Aggregations, window functions, CTEs, joins, and subqueries appear identically in both interview tracks. If you are a data engineer interviewing at a company that titles the role 'data scientist,' the SQL portion will feel familiar. The main difference: DS interviews may include probability-themed SQL (sampling, bootstrapping with SQL) while DE interviews lean toward performance optimization.

Medium overlap

Statistics and Probability

DE interviews rarely test statistics directly, but many DE-adjacent roles (analytics engineer, ML engineer) do. Understanding p-values, confidence intervals, distributions, and hypothesis testing helps you build better data pipelines. If your pipeline feeds an A/B testing system, knowing how sample size, statistical power, and significance work makes you a better engineer. Several companies (Meta, Netflix, Spotify) ask DEs light statistics questions to gauge breadth.

High for senior roles overlap

ML Pipeline Design

At senior levels, data engineers are expected to understand how ML models consume data. Feature engineering, training data pipelines, model serving, and monitoring all live in the DE domain. DS interviews test the ML concepts. DE interviews test the infrastructure that supports them. If you can speak both languages, you are significantly more valuable in cross-functional interviews.

Medium-High overlap

A/B Testing and Experimentation

Companies like Meta, Netflix, and Airbnb ask DEs about experimentation infrastructure. How do you partition users into control and treatment groups? How do you ensure the pipeline delivers clean experiment data? How do you handle metric computation across variants? These questions sit at the intersection of DS and DE, and candidates who can answer both sides stand out.

Low-Medium overlap

Data Visualization and Business Metrics

DS interviews test your ability to choose the right chart, interpret trends, and communicate findings to stakeholders. DE interviews do not test visualization directly, but understanding business metrics (DAU, MAU, retention, churn, LTV) helps you build better data models. If a DE interviewer asks you to design a data model for user engagement, knowing what metrics the model needs to support is critical.

10 Questions with Approaches

Most DS prep guides stop at the statistical answer. Interviewers do not. They ask a DE follow-up: how would this scale, where does the data come from, what breaks at 10x volume. Every answer below includes that second beat.

Q1 (SQL): Write a query to calculate the 7-day rolling average of daily active users, handling days with zero activity.. Start with a date spine (generate_series or a calendar table) to ensure every day has a row, even days with zero active users. LEFT JOIN the user activity table to the date spine. COUNT(DISTINCT user_id) per day gives you daily active users. Then apply AVG() with a window frame: AVG(dau) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). The date spine is the key insight. Without it, your rolling average skips zero-activity days, inflating the result. This question tests window functions, date handling, and awareness of data gaps.
Q2 (SQL): Find users whose total spending increased month-over-month for 3 consecutive months.. Aggregate spending by user and month. Use LAG to compare each month's spending to the previous month. Flag months where spending increased. Then identify sequences of 3 consecutive increase flags. The date-minus-ROW_NUMBER trick creates groups of consecutive months. Filter groups with COUNT >= 3. This tests window functions, aggregation, and the consecutive-sequence pattern that appears in both DS and DE interviews.
Q3 (Statistics): A product team ran an A/B test for 2 weeks and found a 3% lift in click-through rate with a p-value of 0.04. Should they ship the change?. A p-value of 0.04 is below the standard 0.05 threshold, so the result is statistically significant. But significance alone is not sufficient for a ship decision. Consider: (1) Is the 3% lift practically significant? A 3% lift on a 0.1% baseline is different from a 3% lift on a 50% baseline. (2) Was the sample size adequate? Two weeks might not capture weekly seasonality patterns. (3) Did they look at guardrail metrics (revenue, latency, error rates)? A CTR improvement that degrades user experience is not worth shipping. (4) Was there any p-hacking (checking results daily and stopping early)? The right answer is: statistically significant, but need to verify practical significance, check guardrails, and confirm the test ran for the pre-determined duration.
Q4 (Statistics): Explain the difference between Type I and Type II errors in the context of a data quality alert system.. Type I error (false positive): the alert fires when there is no real data quality issue. Your pipeline data is fine, but the monitor triggers a page at 3 AM. Type II error (false negative): there is a real data quality issue, but the alert does not fire. Bad data flows downstream undetected. For a data quality system, Type II errors are usually more costly because corrupted data reaches dashboards and ML models before anyone notices. When tuning alert thresholds, you trade off between the two: tighter thresholds reduce Type II errors but increase Type I (more false alarms). This framing shows interviewers you understand statistical concepts in an engineering context.
Q5 (ML Pipelines): Design a feature engineering pipeline for a recommendation system that serves 50M users.. Start with requirements: what features does the model need, how fresh must they be, and what latency does serving require? Batch features (user demographics, historical preferences) can be computed daily in Spark and stored in a feature store. Near-real-time features (session clicks, last-5-minutes behavior) require a streaming pipeline (Kafka to Flink) writing to a low-latency store (Redis or DynamoDB). The feature store should support both batch training (historical features joined to training labels) and online serving (point lookup by user_id at inference time). Discuss versioning: when you add a new feature, the model needs retraining on the same feature definition. This tests your understanding of the ML infrastructure that data engineers build and maintain.
Q6 (ML Pipelines): How would you detect and handle data drift in a production ML pipeline?. Data drift means the distribution of input features changes over time, which can degrade model performance. Detection: compute statistical summaries (mean, median, percentiles, distribution histograms) for each feature on each batch. Compare current batch distributions to a reference window using KS tests, PSI (population stability index), or simple threshold checks on mean/variance. Handling: alert the ML team when drift exceeds a threshold, trigger model retraining if drift is persistent, and log drift metrics for retrospective analysis. As a data engineer, your responsibility is building the monitoring pipeline that detects drift and the retraining infrastructure that responds to it. You are not necessarily choosing the model architecture, but you are building the system that keeps the model healthy.
Q7 (A/B Testing): A data scientist tells you their A/B test results are being polluted by bot traffic. How would you build a pipeline to filter bots?. Layer multiple signals: (1) Known bot user-agents (Googlebot, Bingbot, etc.) are easy to filter in the ingestion layer. (2) Request rate limits: flag user_ids with more than N actions per minute as suspicious. (3) Session behavior: bots often have zero scroll depth, no mouse movements, and perfectly uniform time-between-clicks. (4) IP reputation: cross-reference IPs against known bot networks. Build a scoring pipeline that assigns a bot_probability to each session. Let the experimentation team choose their threshold (e.g., exclude sessions with bot_probability > 0.8). Store both raw and filtered datasets so the team can compare results with and without filtering. The key is that bot filtering is a data engineering problem that directly affects data science outcomes.
Q8 (A/B Testing): How would you design the data pipeline for an experimentation platform that runs 200 concurrent A/B tests?. The pipeline needs to: (1) Assign users to variants deterministically (hash user_id + experiment_id to ensure consistent assignment across sessions). (2) Log exposure events (which user saw which variant, when). (3) Join exposure events to outcome metrics (clicks, purchases, engagement). (4) Compute per-experiment statistics (mean, variance, sample size per variant). For 200 concurrent tests, the join between exposures and outcomes is the bottleneck. Partition both tables by date and experiment_id. Pre-aggregate metrics daily to reduce join size. Store results in a materialized table that the experimentation UI reads. Discuss backfill: when a metric definition changes, you need to recompute results for running experiments. This tests pipeline design, data modeling, and understanding of how experimentation infrastructure works at scale.
Q9 (Business Metrics): Define retention rate, and write SQL to calculate 7-day retention for a mobile app.. Retention rate: the percentage of users who return to the app N days after their first visit. For 7-day retention: identify each user's first activity date. Check if they had any activity on exactly day 7 (or within a 7-day window, depending on definition). SQL: WITH first_visit AS (SELECT user_id, MIN(activity_date) AS first_date FROM activity GROUP BY user_id) SELECT first_date, COUNT(DISTINCT CASE WHEN a.activity_date = f.first_date + 7 THEN f.user_id END) * 100.0 / COUNT(DISTINCT f.user_id) AS retention_7d FROM first_visit f LEFT JOIN activity a ON f.user_id = a.user_id AND a.activity_date = f.first_date + 7 GROUP BY first_date. Discuss the definition ambiguity: strict (exactly day 7) vs window (days 7 to 13). Different companies use different definitions, and clarifying this upfront shows attention to detail.
Q10 (Business Metrics): A dashboard shows that average order value (AOV) increased 15% this month, but total revenue is flat. What happened?. If AOV went up but total revenue stayed flat, the number of orders must have decreased. Revenue = AOV * order_count. If AOV increases by 15% and revenue is constant, order count dropped by approximately 13%. Investigate: (1) Did a segment of low-value customers stop ordering, which would mechanically raise AOV while reducing volume? (2) Was there a price increase that drove away price-sensitive customers? (3) Did a promotion end that was generating high-volume, low-value orders? (4) Check for data pipeline issues: are all order sources still reporting? A broken ingestion pipeline for one order channel could reduce order count without the team realizing. This question tests business intuition and the habit of questioning data rather than accepting it at face value.

Why Data Engineers Should Care About DS Topics

Three practical reasons, not theoretical hand-waving.

Many companies blur the line between DE and DS roles

At mid-size companies, the 'data engineer' role often includes experimentation analysis, metric computation, and dashboard building. These tasks overlap heavily with data science. If your interview includes a 'business case' or 'metric analysis' round, you are being tested on DS skills even if the title says data engineer. Being prepared for this overlap gives you an advantage.

Cross-functional credibility accelerates your career

Data engineers who understand statistics, ML pipelines, and business metrics earn trust from data scientists and product managers faster. This trust translates to more interesting projects, more influence on technical decisions, and faster promotions. You do not need to be an expert statistician, but understanding the basics makes you a better collaborator.

Adjacent roles expand your job search

If you can answer both DE and DS interview questions, your target job pool expands to include analytics engineer, ML engineer, and data platform engineer roles. These roles often pay comparably to or more than traditional DE roles and offer different growth paths. Breadth in your interview preparation translates directly to more opportunities.

Frequently Asked Questions

How much overlap is there between data science and data engineering interviews?+

About 40 to 50% overlap, concentrated in SQL and data manipulation. Both roles test SQL heavily (window functions, CTEs, aggregation). The divergence is in depth: DS interviews go deep on statistics and ML theory, while DE interviews go deep on system design and pipeline architecture. At companies like Meta and Netflix, DE interviews also include light statistics and experimentation questions.

Should data engineers study machine learning for interviews?+

You do not need to understand gradient descent or backpropagation. But you should understand how ML models consume data: what a feature store is, batch vs online feature serving, training data requirements, and model monitoring. At senior DE levels, these topics appear in system design rounds. Understanding them makes your pipeline designs more practical.

Can data science interview prep help me as a data engineer?+

Yes, selectively. Focus on SQL (directly transferable), A/B testing concepts (useful for system design), and business metrics (useful for data modeling). Skip deep ML theory, Bayesian statistics, and optimization algorithms unless you are targeting an ML engineer role. The DS prep that helps DEs most is the analytical thinking framework: decompose a problem, define metrics, identify edge cases.

Are data science jobs being replaced by data engineering jobs?+

Not replaced, but the boundaries are shifting. Many tasks that data scientists did manually (feature engineering, metric computation, experiment analysis) are being automated by data pipelines that DEs build. The remaining DS work is more specialized: model development, causal inference, and research. Both roles remain in demand, but the DE role has broadened to absorb some traditionally DS responsibilities.

02 / Why practice

Stop Practicing Only Half the Interview

01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
03
Five problem shapes cover 80% of data engineer loops
Parsing and reshaping, sessionization, dedup with tie-breaks, streaming aggregation, top-N-per-group. Writing them by hand turns the unfamiliar into pattern recognition

Start Practicing

Related Guides

DE Interview Questions→

Core data engineering questions across all interview rounds

SQL Interview Questions→

Every SQL topic tested in DE interviews with frequency data

Python Interview Questions→

Python for data engineering interviews