Microsoft Data Engineer Certification
DP-203 retired March 31, 2025. DP-700 is now the Microsoft DE cert, and it's a meaningfully different exam from the one your senior teammates took. This guide covers the retirement, what DP-700 actually tests, the F-SKU pricing trap, how it compares to AWS and Google, and an eight-week plan that maps to exam objectives.
What this guide actually says
DP-203 retired March 31, 2025. DP-700 is the current Microsoft DE cert. It tests Fabric — Lakehouse, Warehouse, Real-Time Intelligence — and Synapse-only knowledge no longer covers it. DP-203 holders should renew through the Fabric path before their window closes. DP-900 is the practical prereq for career switchers. F-SKU pricing is graded aggressively; memorize the F2 / F8 / F64 anchor points.
The Real-Time Intelligence section is where most candidates lose points. Eventstreams, Eventhouses, and KQL databases are net new material that DP-203 holders never saw. The exam gives you a scenario about high-cardinality clickstream or IoT telemetry and asks you to design the ingest path and the query layer in the same answer.
Know Microsoft DP-700 Fabric the way the interviewer who asks it knows it.
Two Hundred Million Redirects
Billions of clicks. One tiny code. Two very different clocks.
Pulled from debriefs where system design separated levels.
DP-203 retirement timeline
The dates senior interviewers reference when they read your resume.
| Date | Event | What it meant |
|---|---|---|
| November 2023 | DP-700 announced | Microsoft signals Fabric as the strategic surface. Most candidates miss it. |
| Q1 2024 | DP-700 beta | Beta testers report meaningfully different shape: KQL and Eventstreams as new sections. |
| Q2 2024 | DP-700 GA | Launches at standard $165. DP-203 still active and booked heavily. |
| Late 2024 | DP-203 Learn path frozen | Updates stop. DP-203 enters maintenance mode. |
| March 31, 2025 | DP-203 retires | Final exam date. New registrations end. Existing certs valid through their renewal cycle. |
| 2026 onward | DP-700 only | Microsoft DE associate track is single-cert. Renewals route through Fabric. |
DP-700 in detail: what's actually on the exam
Six workload areas. The boundaries between them are the source of every scenario question.
Fabric Lakehouse
Delta tables backed by OneLake. T-SQL endpoint for ad hoc reads, Spark notebooks for transformation. Shortcuts let you mount data from another workspace, ADLS Gen2, or S3 without copying bytes. The exam tests whether you understand a shortcut is a metadata pointer, not replication.
Fabric Warehouse
T-SQL surface that looks like Synapse but isn't. Identity columns, schema-bound views, and cross-database queries behave differently from Dedicated SQL Pools. Storage lives in OneLake as Delta; the engine is the new Polaris-derived MPP, not the legacy Synapse pool.
Data pipelines and Dataflow Gen2
Fabric Data pipelines are the lift-and-shift of Azure Data Factory. Dataflow Gen2 is the Power Query authoring surface for low-code transformation. Pipelines win for orchestration, parametrization, and copy-at-scale. Dataflows win for analyst-authored cleanup. Picking wrong is one of the most common scenario traps.
Real-Time Intelligence
Eventstreams ingest from Event Hubs, Kafka, IoT Hub, HTTP. Eventhouses store data in KQL databases (the Kusto engine behind Application Insights and Azure Data Explorer). KQL is the differentiator most candidates skip. Plan to learn summarize, mv-expand, make-series, and let bindings.
Deployment, security, capacity
Deployment pipelines move artifacts dev → test → prod. Git integration (Azure DevOps or GitHub) backs everything. OneLake security uses workspace roles plus item-level permissions, with row/column security inherited from the underlying Delta table. Capacity is the F-SKU you paid for. Bursting past it throttles requests.
Semantic models and Direct Lake
Direct Lake reads Delta files in OneLake without import or DirectQuery. Fast for Power BI, but it inherits Vertipaq limits underneath. The exam asks scenario questions about when to fall back to import or DirectQuery, and how to diagnose Direct Lake fallback in the capacity metrics app.
F-SKU pricing
Capacity is sold by F-SKU. The exam grades sizing scenarios more aggressively than prep books cover. F64 is the magic line: at F64+, every viewer gets Power BI Pro included.
| F-SKU | Hourly | Monthly | Typical use |
|---|---|---|---|
| F2 | $0.36 / hour | $262 / month | Toy. Demos and sandboxes only. |
| F4 | $0.72 / hour | $525 / month | Solo developer. Tight. |
| F8 | $1.45 / hour | $1,057 / month | Small team or single product line. |
| F16 | $2.90 / hour | $2,114 / month | Mid-size analytics team. |
| F32 | $5.81 / hour | $4,242 / month | Multi-team workspace. |
| F64 | $11.62 / hour | $8,481 / month | Mid-enterprise. Power BI Pro included for all viewers. |
| F128 | $23.23 / hour | $16,956 / month | Enterprise. Multiple capacities common. |
DP-700 vs the other clouds
How DP-700 maps against the other DE certs you might choose between in 2026.
| Exam | Difficulty | Scope | Reach (2026) | Hiring signal | Transferability |
|---|---|---|---|---|---|
| Microsoft DP-700 | Medium-Hard | Fabric Lakehouse, Warehouse, Real-Time, Pipelines | Strongest in regulated enterprises (finance, healthcare, gov) | Strong inside Microsoft ecosystem, modest outside | Limited. F-SKU and OneLake don't map to other clouds. |
| AWS DEA-C01 | Medium | Glue, Redshift, Kinesis, Lake Formation, S3 | Broadest cloud DE market share in 2026 | Strong almost everywhere. Default cert when undecided. | High. Most patterns transfer to Azure and GCP equivalents. |
| Google Pro Data Engineer | Hard | BigQuery, Dataflow (Beam), Pub/Sub, Bigtable, Vertex AI | Smaller footprint, concentrated at GCP-first shops | Highest per-cert prestige. Hard to fake. | High for streaming, ML pipelines, watermarking concepts. |
| Databricks DEA | Medium | Delta Lake, Spark, medallion, Unity Catalog | Hot. Lakehouse adoption accelerating across clouds. | Strong for any company running Spark, regardless of cloud | High. Spark + Delta knowledge applies on AWS, Azure, GCP. |
What interviewers grade on at Microsoft-stack shops
Real questions from Fabric-shop interview loops in 2026. The patterns recur.
Walk me through your Fabric workspace organization for a multi-domain analytics platform.
Strong answers separate workspaces by domain (sales, finance, supply chain) and lifecycle (dev, test, prod), then explain how OneLake shortcuts let teams share canonical Gold tables without copying. Mention deployment pipelines, capacity assignment, and the trade-off between one large F-SKU and many smaller capacities. Weak answers describe a single 'analytics' workspace and miss the governance question entirely.
When would you pick a Fabric Lakehouse vs Warehouse vs Eventhouse?
Lakehouse for raw ingestion, Spark transformation, ML feature engineering. Warehouse for T-SQL workloads where analysts expect SQL Server semantics and stored procedures. Eventhouse for high-cardinality time-series and log-style data where you need sub-second KQL queries over billions of rows. The interviewer is checking whether you understand all three sit on OneLake but use different engines.
OneLake shortcuts: explain the security implications when shortcutting across workspaces.
A shortcut inherits the source table's row-level security and column masking, but the destination workspace's roles control who can resolve the shortcut. That gap is where leaks happen. Strong answers also mention that shortcuts to external storage (ADLS Gen2, S3) authenticate using the source connection, not the destination, which can route reads through unintended identities.
Your Eventstream is dropping events under load. Diagnose.
Walk through the layers. First: capacity throttling at the Eventhouse (check capacity metrics app for throttled requests). Second: Eventstream throughput unit limits. Third: source side — are Event Hubs partitions saturated, or is the producer batching badly? Strong answers reference the Eventstream monitoring view and the difference between dropped events and rejected events.
Design a CDC pipeline from on-prem SQL Server into Fabric.
Most candidates start with Data Factory's self-hosted integration runtime. Better answers consider SQL Server CDC enable + Debezium-to-Event-Hubs, then Eventstream into a Bronze Lakehouse Delta table, then a notebook merging into Silver. The interviewer wants you to understand initial snapshot vs ongoing delta, idempotency on retry, and schema drift handling on the source side.
OneLake security: the part candidates underprepare
Four layers determine who sees what in Fabric. The exam tests the gaps between them. Walk all four in order even when the question is about one.
Workspace roles control item access
Admin, Member, Contributor, Viewer. Viewers see lakehouses and warehouses but can't edit. Contributors can create items. Member adds the right to manage workspace settings. Admin owns the workspace and assigns roles.
Item-level permissions narrow workspace roles down
You can grant a user read on a single Lakehouse without giving them the rest of the workspace. The exam tests scenarios where a Viewer needs read on Gold tables but no access to Bronze. Item permissions answer this. They cannot escalate above the workspace role, only restrict beneath it.
Row-level security travels with the Delta table
RLS defined on a Lakehouse Delta table is enforced uniformly: T-SQL queries through the SQL endpoint, Spark notebooks reading the Delta, and shortcuts pointing at the table all see the filter. The cleanest part of the OneLake security story and the most-tested.
Shortcuts inherit source security but consumer authorization
When you shortcut a table from Workspace A into Workspace B, the source's RLS and column masking still apply. But resolving the shortcut requires permissions in the destination workspace. Misconfigure either side and you either over-share data or break a published dashboard.
Interview soundbites
Short, defensible answers to recurring questions in Microsoft-stack DE interviews. Memorize the structure, not the words.
Lakehouse vs Warehouse
Lakehouse first if the workload is Spark, ML feature engineering, or open-format storage you need to share with non-Microsoft consumers. Warehouse first if it's T-SQL with stored procedures, the team is SQL developers, and you need full ANSI semantics for joins and window functions.
Direct Lake fallback
Direct Lake reads Delta files in OneLake without import or DirectQuery. It falls back to DirectQuery when the table exceeds Vertipaq limits, when calculated columns block the lake path, or when the user lacks proper SQL endpoint permissions. Diagnose with the Capacity Metrics app's fallback indicator.
Capacity throttling
Fabric smooths capacity over a 24-hour window. Workloads can burst above the SKU briefly, then throttle when the smoothing window fills. The right answer to a throttle question is rarely 'increase the SKU' — it's 'right-size the workload, schedule heavy jobs off-peak, or move the noisy item to its own capacity.'
Eventstream durability
Eventstreams are not durable storage. They route events. Durability lives at the destination: an Eventhouse, a Lakehouse, or a Custom App with retry logic. Treat Eventstream like a Kafka Streams topology, not like Kafka itself. The exam asks this distinction in scenario form.
Schema drift
Spark notebooks handle schema drift natively with mergeSchema=true on Delta writes. Pipelines and Dataflow Gen2 don't, and they fail loudly when the source adds a column. Strong answers walk through both paths and recommend Spark notebooks for sources where drift is common.
Cross-cloud shortcuts
Fabric can shortcut to ADLS Gen2 and S3, but not GCS as of mid-2026. The shortcut authenticates through the source connection, so a workspace can read S3 tables without data ever copying into OneLake. The answer to 'we have a Snowflake bill on AWS, can we keep the data there?'
Myth vs reality
Myth: Fabric replaces Synapse
Reality: Synapse Dedicated SQL Pools are still GA and supported. Most large customers run both for years during migration. The exam expects you to know the difference and choose between them.
Myth: DP-700 is just DP-203 with Fabric chapters bolted on
Reality: DP-700 is meaningfully different. KQL and Eventstreams are full sections. Synapse-specific topics (dedicated pool distributions, PolyBase) are gone. Studying DP-203 material leaves you 30% under-prepared.
Myth: Fabric is just Power BI dressed up
Reality: at the DE layer, Fabric runs Spark, Delta, and the Kusto engine. None of that is Power BI. The semantic model layer touches Power BI; DP-700 grades the engineering tier independently.
Myth: Microsoft's Azure DE market shrank when DP-203 retired
Reality: it grew. Regulated enterprises (finance, healthcare, government) accelerated Fabric adoption in 2025-26 because the unified billing and OneLake security model fit their compliance posture.
Myth: I can use my AWS knowledge to pass DP-700
Reality: the F-SKU capacity model and Fabric workspace concepts have no AWS analogue. Plan to study the pricing layer and OneLake security from scratch even if you're senior on AWS.
Decision matrix
Use this if you have ten seconds. The answer is one row away.
| Situation | Pick | Reason |
|---|---|---|
| Targeting Microsoft enterprise shops | DP-700 | Direct match for the platform they actually run. |
| Already in Synapse, want renewal path | DP-700 | DP-203 retired. DP-700 is the official Synapse-to-Fabric bridge. |
| Power BI developer pivoting to DE | DP-900 then DP-700 | DP-900 builds the data vocabulary DP-700 assumes. |
| Multi-cloud consultant | AWS DEA-C01 first, DP-700 second | AWS for breadth, DP-700 for Microsoft engagements. |
| Pure DE at AWS-only shop | Skip DP-700, take AWS DEA-C01 | DP-700 won't move the needle if your stack never touches Azure. |
| ML engineer needing MS credentials | AI-102 instead | AI-102 (Azure AI Engineer) maps to your work; DP-700 won't. |
| Career switcher, no cloud background | DP-900 first, then DP-700 | Skipping fundamentals usually means failing DP-700 once and re-paying. |
Eight-week DP-700 study plan
Calibrated to the actual exam blueprint, not the marketing copy.
- 01
Weeks 1-2: Microsoft Learn DP-700 path
Complete the official DP-700 path on Microsoft Learn (free). Spin up a Fabric trial tenant and confirm you can create a workspace, Lakehouse, and Warehouse. The trial includes 60 days of full F-SKU capacity — enough for the entire study cycle. Don't skip the labs; scenario questions assume hands-on familiarity with the workspace UI.
- 02
Week 3: Build a medallion pipeline using OneLake shortcuts
Ingest a real public dataset (NYC taxi, GitHub archive) into a Bronze Lakehouse, transform with a Spark notebook into Silver, aggregate into a Gold Warehouse table. Use a OneLake shortcut to expose Gold to a second workspace as if a downstream team consumed it. Highest-ROI hands-on exercise for the exam.
- 03
Week 4: Eventstream and Eventhouse hands-on
Stand up an Eventstream from a sample source (built-in Bicycles or Stocks generator). Land in an Eventhouse. Write KQL using summarize, bin, mv-expand, and make-series. Most candidates underprepare here and lose 15-20% of their score.
- 04
Week 5: Practice exams (MeasureUp, Whizlabs)
Take a full timed practice exam. Score honestly. For every wrong question, write a paragraph explaining why the right answer is right and why the others are wrong. This 'why-not' analysis catches conceptual gaps a passing flash-card score hides.
- 05
Week 6: Cost and capacity scenarios
DP-700 grades F-SKU sizing more aggressively than candidates expect. Memorize F2 / F8 / F64 anchor prices. Understand capacity smoothing, bursting, throttling. Practice questions where the answer is 'pick a smaller F-SKU and turn on autoscale' versus 'pick larger and dedicate it.'
- 06
Week 7: Deployment pipelines and Git
Configure Git integration on a workspace. Make a change, push it, deploy to a staging workspace via a deployment pipeline. Understand selective deployment, deployment rules, parameter overrides. The exam includes at least one scenario about promoting a parameterized pipeline through dev / test / prod.
- 07
Week 8: Final timed practice exam
One sitting, exam-day conditions. No notes. No pausing. Above 80%: schedule the real exam within 7 days. Below 70%: don't book yet. Re-do the weakest section's hands-on labs and re-test before scheduling.
Common pitfalls on first attempts
Patterns that appear in failed first attempts. Avoid these and your second sitting becomes your only sitting.
Studying DP-203 material and assuming it covers DP-700
About 30% of DP-700 is net new. Old material gives false confidence. Throw out the 2023 prep books and start from the current Microsoft Learn DP-700 path.
Skipping KQL because 'I'm not a streaming engineer'
KQL is on the exam regardless of role. Real-Time Intelligence is ~20% of the score. You won't pass without basic KQL fluency: summarize, bin, where, project, mv-expand.
Memorizing F-SKU prices but not the F64 license boundary
F64 is the line where viewer Power BI Pro licenses are included in the capacity. Below F64, you still pay per-viewer. Candidates who memorize prices but miss this fail the licensing scenario.
Treating Direct Lake like DirectQuery
Direct Lake is a different mode with different limits. Calculated columns, calculated tables, and certain DAX patterns force fallback. The exam grades whether you know when Direct Lake works and when fallback is required.
Ignoring deployment pipelines and Git integration
Several scenario questions assume you've promoted artifacts dev → test → prod. If you've only worked in a single workspace, you'll guess wrong on deployment-rule and parameter-override questions. Practice the flow once end to end before the exam.
Frequently asked questions
Is DP-203 still worth taking in 2026?+
What happens to my DP-203 cert if I already have it?+
How hard is DP-700 compared to DP-203?+
Do I need DP-900 before DP-700?+
How much does Microsoft Fabric cost in production?+
Does DP-700 expire?+
Is KQL really on the exam, even for non-streaming roles?+
The cert proves what you know. Practice proves what you can ship.
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition