DataDriven 75
Core 75 problems every data professional should know
This study plan is 75 curated data engineering interview problems grouped for structured practice. Each challenge runs real code against live databases or executes Python in a sandboxed environment. You get instant grading, company tier filtering, seniority calibration, and spaced repetition that keeps weak concepts in rotation until they stick.
How to Use This Plan
- Start with the easier items to build pattern recognition for the domain.
- Move to medium and hard challenges once you can solve easy ones without hints.
- After solving a challenge, read the AI discussion phase prompts to pressure-test your solution.
- Track your readiness score on your profile and retry items where you struggled.
- When you feel ready, launch a full mock interview at /interview for end-to-end simulation.
SQL Challenges (24)
- Bronze Medal - easy - Two ahead of you. The rest below.
- 10 Lowest Uptime Services - medium - Ten services at the bottom of the reliability chart.
- 7-Check Rolling Average - medium - Seven entries hold the trend.
- Bargains and Budget-Busters - hard - Every region has both. Find them.
- Cloud Cost Trend Analysis - medium - Cost trends across billing periods.
- Daily Error Count Change - medium - Errors, trending up or down?
- The Row Count Surprise - easy - Same tables. Different handshakes. Wildly different results.
- Content Recommendation Engine - medium - Pages they haven't discovered yet.
- The Freshest Record - medium - Duplicates everywhere. Only the most recent version of the truth survives.
- Longest Visit Streaks - hard - Day after day after day. Who kept coming back?
- Early Commit Velocity by Author - medium - How productive was each author during the first year of a repo's CI pipeline
- The Day-7 Retention Cohort - medium - Day one was promising. Day seven tells the truth.
- Error Severity Buckets - easy - Errors sorted by how much they hurt.
- Build Success Rate by Trigger - medium - Which triggers produce green builds?
- Search Algorithm Rating - hard - How good are the search results?
- The Dormant Accounts - easy - They are still paying. They stopped showing up.
- Power Users by Session Activity - medium - More sessions. More time. The power users.
- Activity Histogram - easy - How many users did X things? Build the distribution.
- Provider Cost Change H1 - easy - Cost swings in the first half of the year.
- Tenure Spread for Active Tokens - hard - Tenure extremes among active tokens.
- Slow Batch Jobs - easy - Promised by noon. Delivered at midnight.
- Long Messages - medium - Some commit messages tell a novel.
- The Duplicate Detection Sprint - easy - Same email, different rows. Spot the repeats.
- Spending by Account Status - medium - Segment user spending and activity by account status across the platform
Python Challenges (27)
- The Window Cleaner - medium - Keep it fresh, keep it unique.
- The Rolling Peak - medium - The sweetest stretch in the sequence.
- Run Length Encoding - easy - AAABBB becomes 3A3B. Compress it.
- The Forward Fill - easy - Patch the gaps in a noisy sensor stream.
- The Event Bucketer - easy - Logs slotted into buckets.
- The Event Broadcaster - medium - Subscribers show up, listen, and sometimes leave.
- The Dictionary Inverter - easy - Flip the dict. Group what used to be values.
- Distribute Values Into Container Types - medium - Round-robin the values. Keep rotating.
- The Dependency Resolver - medium - Everything depends on everything.
- The Throttle Wall - hard - Stop the abusers. Let the rest through.
- Merge Overlapping Time Ranges - medium - Intervals piling up. Clean the timeline.
- The Meeting Room Allocator - hard - Meetings overlap on the calendar. Rooms are limited.
- The Hierarchy Builder - hard - Parent-child pairs, flat. Build the family tree.
- The Infection Spread - hard - It starts with one, and then it spreads.
- Flatten the Feed - easy - Nested lists, all the way down.
- The Output Peak - hard - One stretch outpaced all the others.
- The Consecutive Sequence Finder - medium - Numbers that flow without interruption.
- The Middle Ground - hard - The middle value keeps moving.
- The Config Blender - easy - Config collision. The surviving values after a merge.
- The Record Reconciler - medium - Two versions of the same truth.
- The Schema Migrator - hard - Old schema in, new schema out.
- The Chunked Reader - medium - Too big for memory. Read in pieces.
- The Lazy Stream - hard - Yield values one at a time from a potentially infinite source.
- The Account Manager - easy - Deposits, withdrawals, and the risk of going negative.
- Execution Timer Wrapper - medium - Function wrapped with a timer. Duration captured on exit.
- The Tail End - easy - Push, pop, peek. The basics that break people.
- The Eviction Policy - medium - Fixed capacity. Oldest unused entry gets evicted.
Data Modeling Challenges (14)
- Loan Management Schema - easy - Money out, payments back. The balance has to be exact.
- A/B Experiment Assignment Schema - medium - One user, one experiment, one variant. No exceptions.
- Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
- The Customer Who Changed - hard - She moved. She upgraded. She became someone new. The record has to keep up.
- Social Platform Data Model - medium - Follows, likes, replies to replies. It never stops.
- Content Search and Discovery Schema - hard - Searchable from every angle. Design it so nothing gets lost.
- Fitness App Data Model - easy - Reps, sets, streaks, and personal bests. Gym rats love their stats.
- Subscription Churn Analysis Model - medium - Subscribers are leaving. The data knows why.
- The Retail Blueprint - medium - One business. A thousand transactions. Only one layout survives the analytics layer.
- E-Commerce Supply Chain Tracking - hard - A package splits, reroutes, and (maybe) arrives.
- Trending Dishes Dashboard - medium - What's everyone eating? The answer changes hourly.
- The Plan That Changed Twice This Month - medium - Subscribers come, go, downgrade, and share. The schema has to keep up.
- Ride-Sharing Platform Schema - medium - Riders, drivers, and fares. Everyone takes a cut.
- The Table That Lies - medium - Every query comes out wrong. The data is all there.
Pipeline Architecture Challenges (10)
- The Analysts Cannot Touch Production - medium - Production is the source. Analytics needs its own copy.
- The Acquisition Still Taking Bookings - hard - Two systems, two schemas. One truth.
- The Provider That Sometimes Sleeps - medium - The models run at dawn. The data has to be there first.
- Sixty Minutes, Every Hour - medium - Every hour, on the hour. No excuses.
- Replicate It Without Breaking It - hard - The source changed. The lake needs to know immediately.
- The Decision Before the Door Closes - hard - The window to stop it is smaller than you think.
- Eight Teams, Eight Latencies - medium - Millions of gamers. The architecture decision changes everything.
- Five Times the Traffic, Five Times the Bill - hard - Scale up when needed. Do not bankrupt the team.
- Store, Site, and Distributor - medium - Sales data is piling up. Someone has to make sense of it.
- Thousands of Practices, One Dataset - hard - Patient records in, operational insights out.