Interview Questions

Data Analyst Interview Questions

You'll find DA interviews are 70% SQL, 20% business reasoning, and 10% spreadsheet or viz questions. Good news: if you're prepping for DE roles, you've already covered the SQL side. The part that catches most engineers off guard is the business framing. Interviewers want to hear you translate a vague product question into a precise query. This guide walks you through ten real DA prompts with the exact thought process you should use to answer them.

70%

DA rounds that are SQL

341

Phone-screen SQL rounds

32%

GROUP BY frequency

854

Practice problems

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Where DA and DE Interviews Overlap

SQL is the biggest overlap. Business metrics knowledge helps you design better data models.

SQL (Core Overlap)

SQL is the backbone of both data analyst and data engineering interviews. The difference is mostly in depth. DA interviews test aggregation, joins, window functions, and CTEs at an intermediate level. DE interviews add optimization, schema design, and advanced window frames. If you can pass a DE SQL interview, you can pass a DA SQL interview with no additional preparation. The reverse is not always true because DE questions push further into edge cases and performance.

Business Metrics and KPIs

DA interviews heavily test your ability to define, compute, and interpret business metrics: DAU/MAU, retention, churn, conversion funnels, average order value, and customer lifetime value. DE interviews test this less directly, but it appears in data modeling rounds. When a DE interviewer asks you to design a schema for an e-commerce analytics platform, they are implicitly testing whether you know which metrics the schema needs to support. Understanding business metrics makes your data models more practical.

Python for Data Manipulation

DA interviews test pandas, numpy, and basic data manipulation in Python. DE interviews test Python more broadly: file I/O, error handling, API integration, and scripting. The overlap is in data transformation logic. Both roles need to filter, aggregate, reshape, and join datasets in Python. If you are preparing for DE interviews, your Python prep covers the DA Python requirements plus more.

Data Visualization

DA interviews often include a round where you interpret or create visualizations: choose the right chart type, identify misleading axes, and communicate insights to non-technical stakeholders. DE interviews almost never test visualization directly. But understanding what makes a good dashboard helps you build better data pipelines, because you understand what the downstream consumer needs.

Stakeholder Communication

DA interviews heavily test your ability to explain technical findings to non-technical people. DE interviews test this in behavioral rounds (describe a time you explained a complex system to a non-technical stakeholder). The skill is identical; the context differs. DAs explain analysis results. DEs explain pipeline design decisions, outage root causes, and data quality trade-offs.

10 Questions with Approaches

Read each question twice before you look at the approach. The first read should be fast: what are they testing? The second read looks for the trap. Most DA questions have one, usually a NULL edge case or a definition of "active user" that you're supposed to ask about. The approaches below walk you through both reads.

Q1
SQL

Write a query to find the top 3 products by revenue in each category for the last 30 days.

Approach

Join orders to products to get category and revenue per product. Filter to the last 30 days using WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'. Aggregate with SUM(revenue) GROUP BY category, product_id. Apply ROW_NUMBER() OVER (PARTITION BY category ORDER BY total_revenue DESC). Filter to row_number <= 3. The interview tests window functions (ROW_NUMBER vs RANK vs DENSE_RANK) and whether you handle ties explicitly. Ask the interviewer: if two products are tied for third place, should both appear? That determines ROW_NUMBER vs DENSE_RANK.

Q2
SQL

Calculate month-over-month growth rate for total revenue.

Approach

Aggregate revenue by month: SUM(revenue) as monthly_revenue, DATE_TRUNC('month', order_date) as month. Use LAG(monthly_revenue) OVER (ORDER BY month) to get the previous month's revenue. Growth rate = (current - previous) / previous * 100. Handle the first month (where LAG returns NULL) with COALESCE or a CASE statement. This tests window functions, date truncation, and division-by-zero awareness. Mention that negative growth rates are valid and should not be filtered out.

Q3
SQL

Find users who made a purchase within 7 days of signing up.

Approach

Join users to orders on user_id. Filter where order_date BETWEEN signup_date AND signup_date + INTERVAL '7 days'. Use DISTINCT user_id to avoid counting the same user multiple times. The subtle part: does 'within 7 days' include day 0 (signup day)? Clarify with the interviewer. Also discuss whether to use the first purchase only or any purchase within the window. These clarifying questions show analytical rigor.

Q4
Business Metrics

Define customer lifetime value (LTV) and explain how you would calculate it from a transactions table.

Approach

LTV = average revenue per customer per time period * average customer lifespan. Simple version: SUM(revenue) / COUNT(DISTINCT user_id) gives revenue per customer. More sophisticated: segment by acquisition cohort, calculate revenue per cohort over time, and fit a curve to project future revenue. For the SQL portion, calculate it as total revenue divided by unique customers, segmented by signup month. Mention that LTV is most useful when compared to customer acquisition cost (CAC). The LTV/CAC ratio tells you whether your growth is sustainable. A ratio below 3:1 is a warning sign.

Q5
Business Metrics

Your dashboard shows a 20% drop in daily active users. Walk through how you would investigate.

Approach

Structured investigation, not guessing. Step 1: Verify the data. Is the pipeline healthy? Check for missing data, delayed ingestion, or schema changes. A 20% DAU drop could be a data pipeline bug, not a real user behavior change. Step 2: Segment the drop. Is it all users or specific segments (mobile vs desktop, new vs returning, geographic regions)? Step 3: Check for external events: holidays, competitor launches, pricing changes, app store removal. Step 4: Correlate with product changes: was there a deploy, a feature flag change, or an A/B test that affected the login flow? Step 5: Check adjacent metrics: did sessions drop or did engagement per session drop? This framework shows you do not jump to conclusions.

Q6
Python

Given a CSV file with columns user_id, event_type, and timestamp, write Python code to find the most common sequence of 3 events per user.

Approach

Read the CSV with pandas. Sort by user_id and timestamp. Group by user_id. For each user, create rolling windows of 3 consecutive events using a list comprehension or rolling apply. Convert each triplet to a tuple. Count triplet frequencies across all users using collections.Counter. Return the top N most common sequences. The key is handling the per-user grouping correctly: events from different users should not form sequences together. Mention edge cases: users with fewer than 3 events (skip them), duplicate timestamps (sort by a secondary key), and event types that repeat (valid sequences like [click, click, click]).

Q7
Python

Write a function that detects outliers in a numeric column using the IQR method.

Approach

Calculate Q1 (25th percentile) and Q3 (75th percentile). IQR = Q3 - Q1. Lower bound = Q1 - 1.5 * IQR. Upper bound = Q3 + 1.5 * IQR. Values outside these bounds are outliers. In pandas: q1 = df[col].quantile(0.25), q3 = df[col].quantile(0.75), iqr = q3 - q1. Mask = (df[col] < q1 - 1.5 * iqr) | (df[col] > q3 + 1.5 * iqr). Return df[mask]. Discuss alternatives: Z-score method (assumes normal distribution), modified Z-score (uses median absolute deviation), or domain-specific thresholds. The IQR method holds up well on non-normal distributions, which is why it is the standard starting point.

Q8
Visualization

A stakeholder asks you to visualize the relationship between marketing spend and new user signups over 12 months. What chart do you choose and why?

Approach

A dual-axis line chart with months on the x-axis, marketing spend on the left y-axis, and new signups on the right y-axis. This shows correlation over time. Alternatively, a scatter plot with spend on x and signups on y, labeled by month, shows the direct relationship without the time dimension. The scatter plot is better for identifying a linear relationship. The dual-axis chart is better for showing trends and lag effects (spend in month N affecting signups in month N+1). Mention the risk of dual-axis charts: different scales can mislead. Always label axes clearly and consider normalizing to percentage change if the absolute scales are very different.

Q9
Case Study

A product manager wants to know which features drive user retention. How would you approach this analysis?

Approach

Define retention first (7-day, 30-day, or custom). Identify the feature set: which features have usage logs? Common features to analyze: onboarding completion, specific feature usage (search, notifications, social sharing), session frequency, and content consumption. For each feature, compare retention rates between users who used the feature vs those who did not. Control for confounders: power users who use every feature will have higher retention regardless. Use cohort analysis by signup week to control for time effects. If the PM wants causal claims (not just correlation), you need an experiment or a natural experiment. Be explicit about the difference between 'users who use feature X have higher retention' and 'feature X causes higher retention.'

Q10
Case Study

You notice that your company's conversion funnel has a 60% drop-off between 'add to cart' and 'checkout.' What would you investigate?

Approach

Segment the drop-off: (1) By device: mobile checkout flows often have higher friction. (2) By user type: new users vs returning users. (3) By product category: some categories may have pricing issues. (4) By geography: payment method availability varies by country. Check technical issues: are there JavaScript errors on the checkout page? Is the page load time acceptable (under 3 seconds)? Is the payment gateway failing for some users? Compare against industry benchmarks: e-commerce cart abandonment rates average 70%, so 60% drop-off may actually be below average. If the rate recently increased, check for recent changes: new checkout flow, price changes, removed payment options, or shipping cost surprises. Recommend instrumenting the checkout flow with step-by-step tracking to identify exactly where in the checkout process users leave.

Frequently Asked Questions

How different are data analyst and data engineer interviews?+
The SQL portions are 80% identical. Both test aggregation, joins, window functions, and CTEs. The main differences: DA interviews include business case analysis, visualization interpretation, and stakeholder communication rounds. DE interviews include system design, pipeline architecture, and data modeling rounds. If you are a DE preparing for a DA-titled role at a company, focus extra time on business metrics and visualization.
Can a data analyst transition to data engineering?+
Yes, and it is one of the most common career transitions in data. DAs already have SQL skills, which is the foundation. The gaps to fill: Python beyond pandas (scripting, error handling, API integration), cloud platform basics (AWS/Azure/GCP), orchestration tools (Airflow), and data modeling beyond simple queries (star schemas, normalization). Most DAs make this transition in 3 to 6 months of focused study.
Should I apply for data analyst jobs if I want to be a data engineer?+
It can be a strategic stepping stone, especially early in your career. A DA role gives you production SQL experience, stakeholder communication practice, and domain knowledge. After 1 to 2 years, transitioning to DE is natural because you understand the business context that pipelines serve. Just make sure you are building Python and infrastructure skills on the side.
Do data analysts and data engineers earn similar salaries?+
DEs typically earn 15 to 30% more than DAs at equivalent experience levels. Entry-level DA: $65K to $90K. Entry-level DE: $85K to $120K. Senior DA: $100K to $140K. Senior DE: $140K to $200K. The gap widens at senior levels because DE roles require deeper technical skills and are harder to fill. These are US numbers; adjust for location.

Your SQL Gets Sharp Here

Pick a problem, write the query, and see the result. You'll feel the difference in a week.

Start Practicing