You'll find DA interviews are 70% SQL, 20% business reasoning, and 10% spreadsheet or viz questions. Good news: if you're prepping for DE roles, you've already covered the SQL side. The part that catches most engineers off guard is the business framing. Interviewers want to hear you translate a vague product question into a precise query. This guide walks you through ten real DA prompts with the exact thought process you should use to answer them.
DA rounds that are SQL
Phone-screen SQL rounds
GROUP BY frequency
Practice problems
Source: DataDriven analysis of 1,042 verified data engineering interview rounds.
SQL is the biggest overlap. Business metrics knowledge helps you design better data models.
SQL is the backbone of both data analyst and data engineering interviews. The difference is mostly in depth. DA interviews test aggregation, joins, window functions, and CTEs at an intermediate level. DE interviews add optimization, schema design, and advanced window frames. If you can pass a DE SQL interview, you can pass a DA SQL interview with no additional preparation. The reverse is not always true because DE questions push further into edge cases and performance.
DA interviews heavily test your ability to define, compute, and interpret business metrics: DAU/MAU, retention, churn, conversion funnels, average order value, and customer lifetime value. DE interviews test this less directly, but it appears in data modeling rounds. When a DE interviewer asks you to design a schema for an e-commerce analytics platform, they are implicitly testing whether you know which metrics the schema needs to support. Understanding business metrics makes your data models more practical.
DA interviews test pandas, numpy, and basic data manipulation in Python. DE interviews test Python more broadly: file I/O, error handling, API integration, and scripting. The overlap is in data transformation logic. Both roles need to filter, aggregate, reshape, and join datasets in Python. If you are preparing for DE interviews, your Python prep covers the DA Python requirements plus more.
DA interviews often include a round where you interpret or create visualizations: choose the right chart type, identify misleading axes, and communicate insights to non-technical stakeholders. DE interviews almost never test visualization directly. But understanding what makes a good dashboard helps you build better data pipelines, because you understand what the downstream consumer needs.
DA interviews heavily test your ability to explain technical findings to non-technical people. DE interviews test this in behavioral rounds (describe a time you explained a complex system to a non-technical stakeholder). The skill is identical; the context differs. DAs explain analysis results. DEs explain pipeline design decisions, outage root causes, and data quality trade-offs.
Read each question twice before you look at the approach. The first read should be fast: what are they testing? The second read looks for the trap. Most DA questions have one, usually a NULL edge case or a definition of "active user" that you're supposed to ask about. The approaches below walk you through both reads.
Write a query to find the top 3 products by revenue in each category for the last 30 days.
Join orders to products to get category and revenue per product. Filter to the last 30 days using WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'. Aggregate with SUM(revenue) GROUP BY category, product_id. Apply ROW_NUMBER() OVER (PARTITION BY category ORDER BY total_revenue DESC). Filter to row_number <= 3. The interview tests window functions (ROW_NUMBER vs RANK vs DENSE_RANK) and whether you handle ties explicitly. Ask the interviewer: if two products are tied for third place, should both appear? That determines ROW_NUMBER vs DENSE_RANK.
Calculate month-over-month growth rate for total revenue.
Aggregate revenue by month: SUM(revenue) as monthly_revenue, DATE_TRUNC('month', order_date) as month. Use LAG(monthly_revenue) OVER (ORDER BY month) to get the previous month's revenue. Growth rate = (current - previous) / previous * 100. Handle the first month (where LAG returns NULL) with COALESCE or a CASE statement. This tests window functions, date truncation, and division-by-zero awareness. Mention that negative growth rates are valid and should not be filtered out.
Find users who made a purchase within 7 days of signing up.
Join users to orders on user_id. Filter where order_date BETWEEN signup_date AND signup_date + INTERVAL '7 days'. Use DISTINCT user_id to avoid counting the same user multiple times. The subtle part: does 'within 7 days' include day 0 (signup day)? Clarify with the interviewer. Also discuss whether to use the first purchase only or any purchase within the window. These clarifying questions show analytical rigor.
Define customer lifetime value (LTV) and explain how you would calculate it from a transactions table.
LTV = average revenue per customer per time period * average customer lifespan. Simple version: SUM(revenue) / COUNT(DISTINCT user_id) gives revenue per customer. More sophisticated: segment by acquisition cohort, calculate revenue per cohort over time, and fit a curve to project future revenue. For the SQL portion, calculate it as total revenue divided by unique customers, segmented by signup month. Mention that LTV is most useful when compared to customer acquisition cost (CAC). The LTV/CAC ratio tells you whether your growth is sustainable. A ratio below 3:1 is a warning sign.
Your dashboard shows a 20% drop in daily active users. Walk through how you would investigate.
Structured investigation, not guessing. Step 1: Verify the data. Is the pipeline healthy? Check for missing data, delayed ingestion, or schema changes. A 20% DAU drop could be a data pipeline bug, not a real user behavior change. Step 2: Segment the drop. Is it all users or specific segments (mobile vs desktop, new vs returning, geographic regions)? Step 3: Check for external events: holidays, competitor launches, pricing changes, app store removal. Step 4: Correlate with product changes: was there a deploy, a feature flag change, or an A/B test that affected the login flow? Step 5: Check adjacent metrics: did sessions drop or did engagement per session drop? This framework shows you do not jump to conclusions.
Given a CSV file with columns user_id, event_type, and timestamp, write Python code to find the most common sequence of 3 events per user.
Read the CSV with pandas. Sort by user_id and timestamp. Group by user_id. For each user, create rolling windows of 3 consecutive events using a list comprehension or rolling apply. Convert each triplet to a tuple. Count triplet frequencies across all users using collections.Counter. Return the top N most common sequences. The key is handling the per-user grouping correctly: events from different users should not form sequences together. Mention edge cases: users with fewer than 3 events (skip them), duplicate timestamps (sort by a secondary key), and event types that repeat (valid sequences like [click, click, click]).
Write a function that detects outliers in a numeric column using the IQR method.
Calculate Q1 (25th percentile) and Q3 (75th percentile). IQR = Q3 - Q1. Lower bound = Q1 - 1.5 * IQR. Upper bound = Q3 + 1.5 * IQR. Values outside these bounds are outliers. In pandas: q1 = df[col].quantile(0.25), q3 = df[col].quantile(0.75), iqr = q3 - q1. Mask = (df[col] < q1 - 1.5 * iqr) | (df[col] > q3 + 1.5 * iqr). Return df[mask]. Discuss alternatives: Z-score method (assumes normal distribution), modified Z-score (uses median absolute deviation), or domain-specific thresholds. The IQR method holds up well on non-normal distributions, which is why it is the standard starting point.
A stakeholder asks you to visualize the relationship between marketing spend and new user signups over 12 months. What chart do you choose and why?
A dual-axis line chart with months on the x-axis, marketing spend on the left y-axis, and new signups on the right y-axis. This shows correlation over time. Alternatively, a scatter plot with spend on x and signups on y, labeled by month, shows the direct relationship without the time dimension. The scatter plot is better for identifying a linear relationship. The dual-axis chart is better for showing trends and lag effects (spend in month N affecting signups in month N+1). Mention the risk of dual-axis charts: different scales can mislead. Always label axes clearly and consider normalizing to percentage change if the absolute scales are very different.
A product manager wants to know which features drive user retention. How would you approach this analysis?
Define retention first (7-day, 30-day, or custom). Identify the feature set: which features have usage logs? Common features to analyze: onboarding completion, specific feature usage (search, notifications, social sharing), session frequency, and content consumption. For each feature, compare retention rates between users who used the feature vs those who did not. Control for confounders: power users who use every feature will have higher retention regardless. Use cohort analysis by signup week to control for time effects. If the PM wants causal claims (not just correlation), you need an experiment or a natural experiment. Be explicit about the difference between 'users who use feature X have higher retention' and 'feature X causes higher retention.'
You notice that your company's conversion funnel has a 60% drop-off between 'add to cart' and 'checkout.' What would you investigate?
Segment the drop-off: (1) By device: mobile checkout flows often have higher friction. (2) By user type: new users vs returning users. (3) By product category: some categories may have pricing issues. (4) By geography: payment method availability varies by country. Check technical issues: are there JavaScript errors on the checkout page? Is the page load time acceptable (under 3 seconds)? Is the payment gateway failing for some users? Compare against industry benchmarks: e-commerce cart abandonment rates average 70%, so 60% drop-off may actually be below average. If the rate recently increased, check for recent changes: new checkout flow, price changes, removed payment options, or shipping cost surprises. Recommend instrumenting the checkout flow with step-by-step tracking to identify exactly where in the checkout process users leave.
Pick a problem, write the query, and see the result. You'll feel the difference in a week.
Start Practicing