DataDriven 150
DataDriven 150
Expanded 150-problem set for deeper interview prep
This study plan is 150 curated data engineering interview problems grouped for structured practice. Each challenge runs real code against live databases or executes Python in a sandboxed environment. You get instant grading, company tier filtering, seniority calibration, and spaced repetition that keeps weak concepts in rotation until they stick.
How to Use This Plan
- Start with the easier items to build pattern recognition for the domain.
- Move to medium and hard challenges once you can solve easy ones without hints.
- After solving a challenge, read the AI discussion phase prompts to pressure-test your solution.
- Track your readiness score on your profile and retry items where you struggled.
- When you feel ready, launch a full mock interview at /interview for end-to-end simulation.
SQL Challenges (60)
- Buyers Who Never Browsed - easy - They bought without ever loading a page.
- The Duplicate Detection Sprint - easy - Same email, different rows. Spot the repeats.
- Weekend Warriors - easy - Weekdays vs. weekends. When does the action really happen?
- The Dormant Accounts - easy - They are still paying. They stopped showing up.
- Average DQ Fail Rate - easy - Average failure rate, table by table.
- Category Sales Summary - easy - Category by category. How did they do?
- Spending by Account Status - medium - Segment user spending and activity by account status across the platform
- Power Users by Session Activity - medium - More sessions. More time. The power users.
- 30-Day Page View Counts - easy - Thirty days of engagement. Quick snapshot.
- Above Average - easy - Products beating the catalog average.
- Above Average Interactions - easy - The average user is boring. Who is above?
- The Row Count Surprise - easy - Same tables. Different handshakes. Wildly different results.
- User Session Roster - easy - Every user paired with their sessions, even users who never logged in
- Average Spending by Account Status - medium - Average per-user lifetime spending segmented by account status
- Content Recommendation Engine - medium - Pages they haven't discovered yet.
- First-Day Session Retention - hard - Day one retention. The first test.
- Above Category Average - easy - The category average is one thing. These beat it.
- Active API Tokens - easy - Tokens that have actually been used.
- Active Campaigns - easy - Which campaigns are earning their keep?
- The Token Census - easy - How many tokens are out there?
- First Contact - easy - Every pipeline has a first run. This is what it brought back.
- Bronze Medal - easy - Two ahead of you. The rest below.
- Cloud Cost Trend Analysis - medium - Cost trends across billing periods.
- 7-Check Rolling Average - medium - Seven entries hold the trend.
- Longest Visit Streaks - hard - Day after day after day. Who kept coming back?
- Previous Day Top Service - hard - Yesterday's top spender.
- Active Users With April Transactions - easy - Active accounts that also opened their wallets. How many?
- Activity Histogram - easy - How many users did X things? Build the distribution.
- Ad Clickers - easy - Who clicked? What did they spend?
- Average Headcount by Department - easy - Compensation benchmarks, department by department.
- Provider Cost Change H1 - easy - Cost swings in the first half of the year.
- Daily Error Count Change - medium - Errors, trending up or down?
- DQ Score Spread - medium - The spread in data quality scores.
- The Failure Report - medium - Errors by day and region. Some areas are worse than they appear.
- Adopters Before Migration - hard - They used the old feature. Did they ever touch the new one?
- Flatten Org Chart Hierarchy - hard - The tree runs deep. Walk every branch to the root.
- Events by Month Across Years - easy - Month by month, year by year. The pattern emerges.
- Log Volume by Day of Week - easy - Some days are noisier than others.
- Slow Batch Jobs - easy - Promised by noon. Delivered at midnight.
- Monthly Unique Users per Campaign - easy - Monthly reach, campaign by campaign.
- The Day-7 Retention Cohort - medium - Day one was promising. Day seven tells the truth.
- Early Commit Velocity by Author - medium - How productive was each author during the first year of a repo's CI pipeline
- Daily Spam Impression Rate - medium - How much of the ad feed is spam?
- Average Accuracy by Framework - medium - Not all frameworks deliver the same accuracy.
- The Quiet Alarms - medium - Low severity. High volume. Worth a look.
- Quarter-over-Quarter Latency Trend - hard - Latency trending up or down? The quarters have the answer.
- Campaign Conversion Window - hard - A narrow window between impression and action.
- Incident Keyword Messages - hard - Certain words trigger an investigation.
- Category-Specific Product Volume - easy - Sum transactions for a specific payment type.
- High-Rated In-Stock Percentage - easy - Highly rated and in stock. A rare combo.
- The Transaction Breakdown - easy - Multiple time windows. One query. The business wants all of it at once.
- Error Severity Buckets - easy - Errors sorted by how much they hurt.
- 2FA Confirmation Rate - medium - Two-factor sent. How many confirmed?
- Build Success Rate by Trigger - medium - Which triggers produce green builds?
- Alert Response Breakdown - medium - An on-call postmortem asks which services are bleeding alerts nobody acknowledges.
- The A/B Verdict - medium - Variant A or Variant B. The conversion numbers pick the winner.
- Model Training Completion Rate - medium - How many models finished training?
- The Org Chart in Numbers - hard - Headcount by department, sliced by quarter. Every seat accounted for.
- Experiment Variant Ratios - hard - Control versus treatment. The participation split.
- Model Accuracy Drift - hard - Accuracy used to be higher.
Python Challenges (52)
- The Dominant Signal - easy - Hottest items in the transaction log. Ties included.
- The Original Keeper - easy - Clean up duplicate events without losing the timeline.
- The Forward Fill - easy - Patch the gaps in a noisy sensor stream.
- The Word Mismatch - easy - Some text does not match.
- The Dictionary Inverter - easy - Flip the dict. Group what used to be values.
- Distribute Values Into Container Types - medium - Round-robin the values. Keep rotating.
- The Column Shuffle - medium - Rows in, columns out. Number them.
- The Hierarchy Builder - hard - Parent-child pairs, flat. Build the family tree.
- The Social Graph - easy - Everyone knows someone.
- The Sequel Spotter - easy - Spot the sequels hiding in the catalog.
- The Numbered Chair - easy - A standing list. Position n holds one entry.
- The Column Transformer - easy - Each column gets its function.
- The Version Parade - easy - 1.0 before 2.0. Don't let the dots confuse you.
- Execution Timer Wrapper - medium - Function wrapped with a timer. Duration captured on exit.
- The Chunked Reader - medium - Too big for memory. Read in pieces.
- The Lazy Stream - hard - Yield values one at a time from a potentially infinite source.
- The Character Encoder - easy - Squeeze a string down to its tightest form.
- The One-Way Street - easy - Monotonic time-series. Direction only.
- The IP Validator - easy - Real and fake, mixed together.
- The Log Pulse - easy - Some lines repeat themselves.
- The Character Map - easy - Character-level frequency. As a dictionary.
- The Event Bucketer - easy - Logs slotted into buckets.
- The Status Board - medium - Make sense of a pile of raw Nginx access logs.
- The Eviction Policy - medium - Fixed capacity. Oldest unused entry gets evicted.
- The Yahtzee Engine - hard - Five dice. Six faces. Score it.
- The Frequency Eviction - hard - When storage is tight, something has to go.
- The One-of-Each - easy - Strip the repeats, keep the originals.
- The Config Blender - easy - Config collision. The surviving values after a merge.
- Flatten the Feed - easy - Nested lists, all the way down.
- Greeting Formatter Class - easy - First impressions are formatted carefully.
- The OOP Pillars Exam - medium - Four principles, one class hierarchy - show you know all of them.
- The Event Broadcaster - medium - Subscribers show up, listen, and sometimes leave.
- The Dynamic Container - hard - Build your own resizable list with no help from the standard library.
- The Throttle Wall - hard - Stop the abusers. Let the rest through.
- The Squeeze - easy - aaabbb gets old fast. Shrink it.
- Run Length Encoding - easy - AAABBB becomes 3A3B. Compress it.
- The Cipher Wheel - easy - Every letter has an alias - you just need the right codebook.
- The Encoded Signal - medium - The encoding is hiding multipliers. Decode it.
- Palindrome Hunt - medium - It reads the same both ways. Go further.
- The Zigzag Encoder - medium - The message snakes its way across the rails.
- Common Prefix - hard - They all start the same way. How far?
- The Schema Migrator - hard - Old schema in, new schema out.
- The Bracket Validator - easy - Brackets opened and closed. The nesting might be off.
- The Deep Unpacker - easy - Boxes inside boxes. Eventually you reach the bottom.
- The Tree Measurer - easy - How deep does the rabbit hole go?
- The Island Counter - medium - Surrounded by water, connected by land - how many separate landmasses?
- The Shortest Route - medium - Fewer hops is always better.
- The Priority Queue - medium - When two things tie, something has to break the deadlock.
- The Coin Vault - medium - Exact change only - and you want to use as few coins as possible.
- The Output Peak - hard - One stretch outpaced all the others.
- The Median Keeper - hard - The middle value keeps moving as new data arrives.
- The Triple Alliance - hard - Three numbers, one target.
Data Modeling Challenges (20)
- Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
- B2B Invoicing Data Model - easy - Invoices go out, partial payments trickle in, and some customers are three months overdue.
- Fitness Studio Membership Schema - easy - Classes fill up. Members no-show. Billing continues.
- Fitness App Data Model - easy - Reps, sets, streaks, and personal bests. Gym rats love their stats.
- Social Platform Data Model - medium - Follows, likes, replies to replies. It never stops.
- A/B Experiment Assignment Schema - medium - One user, one experiment, one variant. No exceptions.
- Trending Dishes Dashboard - medium - What's everyone eating? The answer changes hourly.
- Cloud File Storage Metadata Schema - hard - A file is also a folder. A folder is also a file.
- Loan Management Schema - easy - Money out, payments back. The balance has to be exact.
- Toll Road Sensor Analytics - easy - Cars enter, cars exit. Except when they don't.
- Ride-Sharing Platform Schema - medium - Riders, drivers, and fares. Everyone takes a cut.
- The Retail Blueprint - medium - One business. A thousand transactions. Only one layout survives the analytics layer.
- The Table That Lies - medium - Every query comes out wrong. The data is all there.
- The Retail Tables That Need a New Home - medium - A working system. Now redesign it so the analysts can actually use it.
- The Plan That Changed Twice This Month - medium - Subscribers come, go, downgrade, and share. The schema has to keep up.
- Movie Streaming Analytics Schema - medium - They pressed play. What happened next is the whole question.
- The Customer Who Changed - hard - She moved. She upgraded. She became someone new. The record has to keep up.
- The Churner Who Came Back - hard - They cancelled. They came back. The report has to tell both stories correctly.
- The Territory That Keeps Moving - hard - Reps get reassigned. The receipts have to survive.
- The Schema That Could Not Answer Back - hard - Forty columns in. Zero useful answers out.
Pipeline Architecture Challenges (18)
- Sixty Minutes, Every Hour - medium - Every hour, on the hour. No excuses.
- Six Million Rows Before the Market Opens - medium - One massive CSV. Millions of timestamps.
- The Meal Kit That Knows You - medium - What they ordered says a lot about what they want next.
- The Analysts Cannot Touch Production - medium - Production is the source. Analytics needs its own copy.
- The Vendor Who Never Warns You - medium - Every month, something is different. The dashboards have no idea.
- Eight Teams, Eight Latencies - medium - Millions of gamers. The architecture decision changes everything.
- Every Device Has Its Own Dialect - medium - Three sources. Three formats. Same workout.
- Replicate It Without Breaking It - hard - The source changed. The lake needs to know immediately.
- Prove the Number Is Right - hard - Bad data in fintech is not just messy. It is expensive.
- A Stream All Day and a File at Midnight - hard - Real-time and batch. Same pipeline. No compromises.
- End of Day Is Too Late - medium - Every swipe tells a story.
- Store, Site, and Distributor - medium - Sales data is piling up. Someone has to make sense of it.
- Nested Docs, Flat Reports - medium - Two databases. One direction. No data left behind.
- The Bad Row That Broke the Dashboard - medium - Bad records cannot reach the warehouse.
- The Provider That Sometimes Sleeps - medium - The models run at dawn. The data has to be there first.
- Event System for Multiple Consumers - hard - One event, many hungry consumers.
- The Migration That Cannot Break Morning - hard - It all works today. Moving it without losing a single report is the hard part.
- The Clock That Runs Two Ways - hard - Nightly batch and live events. One dashboard.