Career Guide
Two roles that both work with data but test very different skills. Understanding the difference helps you choose your path and prepare for the right interviews.
Building and maintaining the data infrastructure. Pipelines, warehouses, data quality, and reliability.
Analyzing data to answer business questions. Reports, dashboards, ad-hoc analysis, and stakeholder communication.
Advanced SQL: complex joins, window functions, CTEs, query optimization, data modeling. Writes production SQL.
Intermediate to advanced SQL: aggregations, joins, subqueries. Writes analytical SQL for reports and dashboards.
Pipeline scripting, API integrations, testing, orchestration. Libraries: requests, boto3, pytest, pyspark.
Data manipulation and visualization. Libraries: pandas, matplotlib, seaborn, jupyter notebooks.
Airflow/Dagster, Spark, dbt, Docker, cloud services, version control, CI/CD.
BI tools (Tableau, Looker, Power BI), Excel/Sheets, SQL editors, notebooks.
$120K to $200K+ total comp. Top tech: $200K to $350K+.
$70K to $130K+ total comp. Top tech: $130K to $200K+.
Senior DE, Staff DE, Principal DE, Data Architecture, Engineering Management.
Senior Analyst, Analytics Manager, Analytics Engineering, Data Science, Product Management.
SQL (hard), Python coding, data modeling, system design, behavioral. 3 to 5 rounds.
SQL (medium), case studies, dashboard design, statistical reasoning, communication. 2 to 4 rounds.
Job descriptions list skills. The day-to-day reality tells you which role fits your working style.
A data engineer spends most of the day writing and maintaining code that moves data between systems. Monday might start with investigating a failed Airflow DAG that broke overnight because an upstream API changed its response format. That means reading logs, identifying the schema mismatch, updating the ingestion script, writing a test for the new format, and rerunning the backfill. After lunch, the engineer might review a pull request from a teammate adding a new source table to the warehouse, checking for proper error handling, idempotency, and partitioning strategy. Later in the week, there could be a data modeling session where the team designs a new fact table for a product feature. The engineer writes the DDL, sets up the dbt model, validates row counts against the source, and adds data quality checks. Meetings tend to be standups, design reviews, and incident retrospectives. Most communication is async through code reviews and Slack threads.
A data analyst spends most of the day answering questions from the business. Monday might start with a request from the marketing VP asking why conversion rates dropped last week. The analyst writes SQL queries against the warehouse, segments by channel and device, pulls the results into a notebook or BI tool, and builds a chart showing the trend. The finding might be that mobile conversions dropped after a checkout page redesign. The analyst writes up the findings in a Slack post or a brief document, suggests investigating the mobile UX, and presents the data in a meeting. Later in the week, the analyst might build a new dashboard tracking a product launch, define the key metrics with the product manager, write the underlying queries, and choose the right chart types to make the data readable. Meetings are frequent: syncs with stakeholders, metric review sessions, and planning discussions where the analyst represents the data perspective.
Both roles use SQL, Python, and data modeling. The difference is not the skill itself but the depth, context, and how interviews test it.
Data engineers write production SQL that runs on a schedule, processes millions of rows, and must not fail silently. This means window functions for deduplication, recursive CTEs for hierarchy traversal, MERGE statements for upserts, and careful attention to query performance. An engineer who writes a GROUP BY that triggers a full table scan on a 2TB table will hear about it from the cloud bill. Interview SQL questions for engineers test edge cases: NULLs in joins, ties in ranking, handling late-arriving data, and writing queries that are correct under concurrent writes.
Data analysts write analytical SQL that answers a specific business question. The focus is on getting the right answer, not on performance optimization. Analysts use GROUP BY, CASE WHEN, date functions, and joins regularly. A strong analyst can write a cohort analysis query, calculate retention rates, and build time-series comparisons. Interview SQL for analysts is less about edge cases and more about translating a business question into the correct query. Can you figure out which metric to compute and write the SQL to get it?
Engineers use Python as a systems language. They write pipeline scripts that call APIs, parse JSON, transform data, and load it into warehouses. They write pytest test suites, build CLI tools, and work with libraries like boto3 (AWS), pyspark (distributed processing), and SQLAlchemy (database connections). Code quality matters: type hints, error handling, logging, and documentation are expected. Interview Python for engineers tests data structure manipulation, file processing, and building small ETL programs.
Analysts use Python as an analysis language. Pandas is the primary tool for data manipulation, along with matplotlib or seaborn for visualization. Jupyter notebooks are the standard environment. The code is exploratory and often disposable: load a CSV, filter rows, compute aggregations, plot a chart. Interview Python for analysts, when asked at all, tests pandas operations: groupby, merge, pivot, and basic data cleaning.
Engineers design the physical data models that live in the warehouse. They decide table structures, choose between star schemas and normalized designs, set partition keys, define slowly changing dimensions, and write the DDL. They think about storage cost, query performance, and how the model will evolve as requirements change. Interview questions test whether you can design a schema for a given business domain from scratch.
Analysts consume data models. They need to understand star schemas well enough to join fact and dimension tables correctly. They need to know what a surrogate key is, why some tables are denormalized, and how to work with date dimensions. Interview questions test whether you can read an existing schema and write correct queries against it.
Engineers need enough business context to build the right pipeline. If the finance team needs revenue data by midnight, the engineer must understand the SLA and design the pipeline to meet it. But the engineer rarely presents findings to executives or defines business metrics. The business context is a constraint on the system, not the primary output.
Analysts live in the business context. They define metrics, challenge metric definitions when they are ambiguous, and translate vague questions into precise analyses. A good analyst pushes back when a stakeholder asks 'how many users do we have?' by asking 'do you mean registered, active, or paying?' The business context is the primary output.
Window functions, recursive CTEs, query optimization, complex multi-table joins. Production-quality SQL under time pressure.
Aggregations, basic joins, CASE WHEN, date filtering. Focus on the right business answer, not performance.
Design a data pipeline. Choose batch vs streaming. Handle late-arriving data. Common DE interview round.
Not typically asked. Some senior roles ask about dashboard architecture or metric definitions.
Less emphasis. Some behavioral questions about stakeholder work, but primarily technical.
High emphasis. Case studies test translating business questions into analysis and communicating findings.
The interview loop itself is structured differently. Knowing what to expect helps you allocate prep time.
Recruiter Screen
15 to 30 minutes. Background, motivation, salary expectations. Light technical screening: 'What is a data warehouse?' or 'Describe a pipeline you built.'
Technical Screen (SQL)
45 to 60 minutes. Live coding on a shared editor. Expect 2 to 3 SQL problems at medium to hard difficulty. Window functions, self-joins, and CTEs are common. Some companies use HackerRank or Codesignal for this round.
Python / Coding Round
45 to 60 minutes. Write a small ETL script, parse a nested JSON file, or implement a data processing function. Tested on correctness, error handling, and code structure. Not LeetCode-style algorithms at most companies.
Data Modeling / System Design
45 to 60 minutes. Whiteboard or virtual whiteboard. Design a schema for an e-commerce platform. Design a pipeline to ingest clickstream data. Choose batch vs streaming for a given use case. This round tests trade-off thinking.
Behavioral / Culture Fit
30 to 45 minutes. Past projects, conflict resolution, stakeholder management. STAR format answers. 'Tell me about a time a pipeline broke in production.'
Recruiter Screen
15 to 30 minutes. Similar to DE. Background, motivation, portfolio review. Some companies ask for a sample dashboard or analysis.
SQL Assessment
30 to 45 minutes. 2 to 3 SQL problems at easy to medium difficulty. Aggregations, joins, CASE WHEN, date ranges. The focus is on getting the correct business answer rather than query optimization.
Case Study / Take-Home
60 to 90 minutes (or take-home over a few days). Given a dataset and a business question. Analyze the data, build visualizations, write up findings, and present recommendations. Tests analytical thinking and communication.
Stakeholder Presentation
30 to 45 minutes. Present your case study results to a panel. They test whether you can explain data findings to non-technical people. Clarity, structure, and the ability to handle follow-up questions matter.
Behavioral
30 to 45 minutes. Emphasis on collaboration, stakeholder management, and handling ambiguity. 'How do you handle conflicting metric definitions from two teams?'
Total compensation including base, bonus, and equity. Ranges reflect the middle 50% across industries. Top tech companies pay at or above the high end.
First pipeline role. Often titled 'Data Engineer I' or 'Junior DE.' Building pipelines under supervision.
First analyst role. Often titled 'Business Analyst' or 'Data Analyst I.' Writing SQL and building reports.
Owns pipeline systems end-to-end. Designs data models. Reviews others' code. 'Data Engineer II' or 'Senior DE' at smaller companies.
Runs analysis independently. Defines metrics. Manages stakeholder relationships. 'Data Analyst II' or 'Senior Analyst' at smaller companies.
Leads technical decisions. Defines architecture for data platform. Mentors junior engineers. May manage a small team.
Leads analytics for a business unit. Influences product strategy with data. May manage junior analysts.
Sets technical direction across the organization. Evaluates vendor tools. Defines data strategy. Rare title, mostly at large tech companies.
Head of Analytics or Analytics Director. Sets the analytics strategy and framework for the org. Blurs into management.
Not every analyst should become an engineer. But if several of these signals resonate, the switch is worth exploring.
When you find yourself writing queries to detect duplicates, reconcile source discrepancies, or patch missing values before you can do any analysis, you are already doing data engineering work. If this is the part you enjoy, the transition makes sense.
Scheduled Jupyter notebooks that pull data, transform it, and dump it into a sheet are pipelines with no guardrails. If you are already building these, you will benefit from learning proper orchestration tools like Airflow or Dagster that give you retries, alerts, and dependency management.
Analysts often get asked the same question with slight variations. If your instinct is to build a self-service dashboard or a clean data model so the question answers itself, that is engineering thinking.
Analysts present findings. Engineers build systems. If the stakeholder meeting is the part you dread and the SQL optimization is the part you enjoy, that is a signal.
When SQL cannot do what you need and you start writing Python scripts to process data, you are moving into engineering territory. If learning Python energizes you rather than frustrates you, follow that signal.
If a broken pipeline at 2 AM sounds like an interesting problem to solve rather than someone else's problem, you have the temperament for engineering. Analysts expect the data to be there. Engineers make it so.
The compensation gap between senior analysts and senior engineers is significant, especially at large tech companies. This alone is not enough reason to switch, but combined with genuine interest in engineering work, it validates the move.
These questions come up when interviewers want to test whether you understand the data ecosystem beyond your own role.
Engineers build the plumbing. Analysts use the water. Engineers make data reliable, fast, and available. Analysts take that data and turn it into business decisions. You need both: without engineers, analysts have no clean data. Without analysts, engineers build systems nobody uses.
At startups, yes. The 'analytics engineer' role explicitly combines elements of both: writing dbt models, defining metrics, building data pipelines, and serving stakeholders. At scale, the roles diverge because each requires deep specialization.
Window functions and query optimization. Both roles can write GROUP BY and joins. But an engineer is expected to write ROW_NUMBER() for deduplication, LAG/LEAD for change detection, and explain why a query plan shows a full table scan. Analysts rarely need to think about execution plans.
Python scripting (not pandas, but scripts that call APIs and handle errors), then data modeling (star schemas, SCDs, normalization), then one orchestration tool (Airflow or Dagster). Your SQL is likely strong enough already. The gap is infrastructure, not analysis.
DE behavioral questions focus on technical incidents: pipeline failures, data quality issues, production outages. DA behavioral questions focus on stakeholder management: presenting bad news, handling conflicting priorities, influencing decisions with data. Both test collaboration, but the scenarios are different.
Basic statistics, yes. Enough to understand distributions, percentiles, and sampling. But you will not be asked to run regressions or design A/B tests. That is the data scientist and analyst domain. Engineers need to know enough to validate that data looks reasonable, not to draw statistical conclusions.
Data analyst roles have more total openings because every company with data needs analysts. Data engineer roles have fewer openings but higher demand per opening, meaning less competition for qualified candidates. The supply of people who can build production pipelines is smaller than the supply of people who can write analytical SQL.
Certifications are not required for either role, but they can strengthen a resume when you lack direct experience. Here are the ones that carry weight in 2026.
Covers S3, Glue, Redshift, Kinesis, and Lake Formation. Valued at companies using AWS for their data platform. Shows you can work with managed data services.
Covers BigQuery, Dataflow, Pub/Sub, and Cloud Composer. Respected across the industry, even outside Google Cloud shops. Tests system design thinking.
Tests dbt modeling, testing, and documentation. Relevant for analysts moving toward engineering or engineers working on transformation layers.
Proves proficiency with the most common BI tool. Less weight at companies using Looker or Power BI, but demonstrates visualization skills.
DataDriven covers the SQL, Python, and data modeling skills tested in data engineering interviews.
Start Practicing