Senior Data Engineer Interview Questions
Senior Data Engineer Interview Questions
Senior and staff data engineer interview questions with rubric-scored verdicts.
Senior and staff data engineer interview questions filtered from the full catalog. The bar is not whether you can solve the problem; it is whether you can name 2 alternatives, defend the choice, adapt cleanly when the interviewer changes a requirement mid-round, and call out the failure modes you would have to handle on-call.
The senior (L5) and staff (L6) data engineer interview rubrics differ from mid-level (L4) on three dimensions. First, trade-off articulation. L4 candidates are scored on producing a correct solution; L5 candidates are scored on naming 2 alternatives and defending the choice. "I would use Spark because it is good for this" does not pass L5. "I would use Spark over a pandas job because the data will not fit in driver memory once we scale to 100M rows, and over Dask because our team already has Spark in production and the operational story is simpler" does. Second, failure-mode naming. L5 candidates are expected to state 3 failure modes per component in a design round: what happens when this Kafka broker dies, what happens when this Snowflake warehouse hits the credit limit, what happens when this MERGE operation runs while a backfill is also writing. Third, mid-round pivot. The interviewer will change a requirement halfway through: SLA tightens from 15 minutes to 1 minute; data volume jumps 100x; the BI tool cannot handle table swaps. The L5 signal is adapting cleanly without throwing out the existing design.
The senior data engineer SQL bar specifically tests the 7 advanced patterns. Recursive CTEs for hierarchy and graph. Gap-and-island for streak detection. Sessionization with LAG plus SUM OVER. SCD2 half-open joins. EXPLAIN plan reading. Skew handling with salt-and-rebalance. Idempotent MERGE for late-arriving data. The mid-level catalog covers JOIN, GROUP BY, basic windows; the senior tier composes these into multi-CTE queries with explicit edge-case handling for NULL and ties. The optimization round at L5+ almost always centers on EXPLAIN reading: the interviewer hands you an EXPLAIN ANALYZE showing a sequential scan where you expected an index seek, and the question is what is wrong (typically: a function in WHERE, an implicit cast, or a stale statistic).
The senior data engineer design round is end-to-end: 10M events per day with 15-minute SLA, multi-region replication, idempotent reconciliation for late events, schema evolution without downtime, and the on-call story (what gets paged, who responds, what is the runbook). The L6 follow-up is usually a meta-question: how would you design the data platform itself, the orchestrator, the lineage system, the catalog. Less about the pipeline you would build for one use case, more about the substrate the whole org would build pipelines on top of.
Senior data engineer Python rounds add complexity reasoning that mid-level rounds skip. Big-O articulation for every data structure choice. Memory bounds analysis (when does a list-of-dicts blow up versus a generator). Library familiarity (pandas, polars, asyncio, tenacity). Trade-off articulation (dict vs sort-and-iterate, generator vs list, when async vs sync). The senior data engineer is expected to default to the right choice and explain why.
Companies whose data engineer L5+ loops appear in interview reports: Meta (E5, E6, E7), Amazon (L5, L6), Google (L5, L6, L7), Netflix (Senior, Staff), Stripe (E4, E5, E6), Databricks (L5, L6, L7), Snowflake, Airbnb, Uber. Each has its own rubric calibration. Use the company-specific list for company-tagged senior-level problems.
- What is the bar difference between L4 and L5 data engineer?
- Three dimensions. Trade-off articulation: L5 names 2 alternatives and defends the choice; L4 produces a correct solution. Failure-mode naming: L5 states 3 failure modes per component in a design round; L4 produces a high-level architecture. Mid-round pivot: L5 adapts cleanly when the interviewer changes a requirement halfway through; L4 often has to throw out the design and restart.
- What advanced SQL is tested at L5 data engineer interviews?
- Seven patterns: recursive CTEs for hierarchy or graph traversal, gap-and-island for streak detection, sessionization with LAG and SUM OVER, SCD2 half-open joins, EXPLAIN plan reading and predicate-pushdown reasoning, skew handling with salt-and-rebalance, and idempotent MERGE for late-arriving data. The mid-level catalog covers JOIN, GROUP BY, basic windows; the senior tier composes these into multi-CTE queries with explicit edge-case handling.
- What does the optimization round look like at L5+?
- Interviewer hands you a query and an EXPLAIN ANALYZE output. Typical scenarios: sequential scan where you expected index seek (likely a function in WHERE preventing pushdown, an implicit cast, or a stale statistic), a JOIN with a long-running task (likely skew on the join key, salt and rebalance), or a window function that is slower than expected (likely a missing PARTITION BY or an inefficient frame clause). The bar is identifying the cause from the plan alone and proposing the fix.
- How is the design round different at L5 versus L4?
- L4 design round: produce a working high-level architecture for the given scenario. L5 design round: same scenario but the rubric weights '3 failure modes per component', explicit cost reasoning (back-of-envelope numbers), and the on-call story (what gets paged, who responds, what is the runbook). The interviewer typically changes a requirement halfway through to test the mid-round pivot.
- What is expected for staff (L6) data engineer interviews?
- L6 weights org-level design influence: how would you design the data platform itself (the orchestrator, the lineage system, the catalog, the testing framework) versus 'how would you build this one pipeline'. Behavioral rounds probe technical leadership (driving a multi-team migration, defining the technical strategy, mentoring senior engineers). The bar is the substrate the org builds on, not the surface workload.
- How should a data engineer prepare for the mid-round pivot?
- Practice with a peer or AI mock interviewer that explicitly changes the requirements halfway through. Common pivots: SLA tightens from 15 minutes to 1 minute (requires moving from micro-batch to streaming), data volume jumps 100x (requires partitioning strategy review, broadcast versus sort-merge join decision flip), the BI tool cannot handle table swaps (requires materialized view or insert-overwrite pattern instead of CTAS). The L5 signal is articulating what changes and what stays in the existing design without throwing it out.
- What about the behavioral round at senior data engineer level?
- Senior behavioral rounds probe ownership ('tell me about a time you caught a bug in production data nobody else noticed'), trade-off judgment ('tell me about a time you chose to ship the imperfect version'), and disagreement ('tell me about a time you disagreed with a senior engineer and how you resolved it'). STAR-D format: Situation, Task, Action, Result, Decision-postmortem. The decision-postmortem (what I would do differently) is the senior signal.
- How many mocks should I do before a senior data engineer onsite?
- 3 timed mocks in the final 2 weeks: one SQL+modeling, one Python+PySpark, one full design round. Plus 1-2 behavioral mocks with someone in the same level band. The part most candidates fail is not the technical content; it is the narration under pressure and the mid-round pivot. Mocks expose both.
356 practice problems matching this filter. Domains: Pipeline Architecture (119), SQL (133), Data Modeling (39), Python (65). Difficulty: medium (153), hard (166), easy (37).
Pipeline Architecture (119)
- 45 Minutes Turned Into 3.5 Hours - medium - Spark jobs are running. Just not fast enough.
- 600 Million Events a Day - hard - 600 million events a day. Two years of retention.
- A Clean Number for Every Merchant - hard - Raw payment logs in. Clean merchant summaries out.
- A Million Cars Phoning Home - hard - Every vehicle is a sensor. Deploy the pipeline to catch it all.
- Analysts Are Slowing the Store Down - medium - Orders placed. Data warehouse hungry.
- A New Column on a Billion Rows - hard - Add and backfill a new column to a billion-row production table with zero downtime.
- A Shared Drive Full of Contracts - medium - Buried in PDFs. The data is in there somewhere.
- A Stream All Day and a File at Midnight - hard - Real-time and batch. Same pipeline. No compromises.
- Badging Items That Already Sold Out - hard - Same-day delivery. The features have to be faster.
- Basel, CCAR, and Monday Morning - medium - The regulator does not accept 'eventually consistent.'
- Bikes Before Rush Hour - hard - Bikes in, bikes out. The city needs to predict demand.
- Credit for Every Touch - medium - They saw the ad, clicked the email, then bought. Who gets credit?
- Doubling Every Six Months - hard - Tuesdays are quiet. Black Friday is not.
- Eight-Hour-Old Positions - hard - Positions shift by the second. The math cannot lag.
- Eight Teams, Eight Latencies - medium - Millions of gamers. The architecture decision changes everything.
- End of Day Is Too Late - medium - Every swipe tells a story.
- Equities, ETFs, and the SEC - hard - Fractional shares, multi-currency, point-in-time. All of it.
- Event System for Multiple Consumers - hard - One event, many hungry consumers.
- Every Dataset Needs a Paper Trail - hard - The FDA has opinions about your data pipeline.
- Every Deal Is a Financial Transaction - hard - Real money on the table. Reconstruct every hand.
- Every Device, Every Impression - hard - Every ad seen. Every second watched. Real-time.
- Every Device Has Its Own Dialect - medium - Three sources. Three formats. Same workout.
- Every Firm Formats It Differently - medium - The regulator changed the format. Again. Handle it.
- Every Format Imaginable - hard - PDFs, HL7, JSON. All of it lands in the same lake.
- Everyone Wants the Same Data, Differently - hard - How you store it decides how fast you can read it.
- Every Region Exports Its Own Way - medium - Sales data, BigQuery, Dataflow. Make it all sing.
- Every Scan, Every Parcel, Every Pin Code - medium - Out for delivery. Delivered. Except the events arrived backwards.
- Fifty Thousand Retailers - medium - Retail data at CPG scale. Every SKU, every store.
- Five Times the Traffic, Five Times the Bill - hard - Scale up when needed. Do not bankrupt the team.
- Five Years of Cron Jobs - hard - Half the jobs run on cron. Half run on events. All of it has to move.
- Flying Blind Until Midnight - hard - Intraday risk, full lineage. The regulator is watching.
- Four Teams, One Topic, No Agreement - hard - Everybody is writing to it. Nobody documented it. Now production is fragile.
- Greenfield Build for Six Sources - hard - Infrastructure as code. Meaning as a service.
- Half a Million Rental Cars - medium - Every vehicle is reporting. Every rental matters.
- The Identity Problem - hard - Old systems. New demands. The same customer appears under three different names.
- Listens From Everywhere, Counted Once - hard - Phones, tablets, laptops. And some of them report late.
- Near-Real-Time Trending Dishes Dashboard - hard - The dish rankings update faster than the kitchen.
- Nightly Exports Are Too Slow - medium - Healthcare claims change constantly. The warehouse cannot fall behind.
- 4,500 Stores Before Sunrise - medium - The shelves open at 7. The data better be there.
- Not Every Team Can See Every Row - hard - Everyone can see the bucket. Not everyone should.
- One Bill Across Three Clouds - medium - AWS, Azure, GCP. Three bills. One truth.
- One Earthquake, Ten Thousand Tweets - hard - The firehose is on. Separate signal from noise.
- Out of the Data Center - medium - The on-prem servers are not getting any younger.
- The Speed Layer - medium - Dashboards can't wait for raw logs. Something has to happen upstream.
- Prove the Number Is Right - hard - Bad data in fintech is not just messy. It is expensive.
- Real Data, Fake Patients - hard - Dev needs production data. HIPAA says absolutely not.
- The Register Never Sleeps - medium - Every swipe lands in the warehouse. The table has to stay current without breaking.
- Recommendations Now, Royalties Later - medium - The catalog updated. Did anyone notice?
- Replicate It Without Breaking It - hard - The source changed. The lake needs to know immediately.
- Risk Models on Week-Old Data - medium - Loan approved. Loan denied. Every decision is an event.
- SaaS API Connector with Incremental Sync - medium - The API has rate limits. You have deadlines.
- Same-Day Sales, Every Store - medium - The cash register data needs to be queryable by morning.
- The Living Table - medium - Data lands continuously. History must survive every update.
- Score It Before It Clears - hard - The fraudsters move fast. Your pipeline has to move faster.
- Ship Before Fraud Finishes Checking - hard - The claim looks clean. The fraud model disagrees.
- Six Hours to Miss a Deadline - medium - The rebuild works. It just doesn't finish in time.
- Six Hours to Refresh Every Number - medium - Ratings change. The incremental model has to keep pace.
- Six Million Rows Before the Market Opens - medium - One massive CSV. Millions of timestamps.
- Six Sources, One Platform - medium - ADF orchestrates. Unity Catalog governs. Nothing leaks.
- Sixty Minutes, Every Hour - medium - Every hour, on the hour. No excuses.
- Stores and the Site, Together - hard - The registers never stop ringing.
- Store, Site, and Distributor - medium - Sales data is piling up. Someone has to make sense of it.
- The Acquisition Still Taking Bookings - hard - Two systems, two schemas. One truth.
- The Agency That Changes the Columns - medium - The schema changed overnight. Again.
- The Analysts Cannot Touch Production - medium - Production is the source. Analytics needs its own copy.
- The Analyst Who Saw the Salary Data - hard - Two incidents. One shared lake. The access model was never designed, just assumed.
- The API Drip Feed - medium - The API gives you 100 records at a time. You need millions.
- The Bad Row That Broke the Dashboard - medium - Bad records cannot reach the warehouse.
- The Binding and the Claim - medium - Policies are instant. Claims take their time.
- The Booking That Came Three Ways - hard - PMS, OTA, and website all think they took the reservation first.
- The Boutique That Sold in Six Currencies - hard - Every sale is real. The rate it was converted at depends on who is asking.
- The Bucket Full of Resumes - medium - A thousand resumes. Structured data inside each one.
- The Carrier Moving to Azure - medium - Claims arrive messy. The medallion cleans them up.
- The Claim That Picks Its Own Lane - medium - Three entry points. Different workflows. All must route correctly.
- The Clock That Runs Two Ways - hard - Nightly batch and live events. One dashboard.
- The Consent Stitcher - medium - Consent was given. Or was it? Stitch the records together.
- The Dashboard and the Attribution Model - hard - Streaming and batch. One pipeline to rule them.
- The Decision Before the Door Closes - hard - The window to stop it is smaller than you think.
- The Distributor Filing Problem - medium - Hundreds of suppliers. One warehouse. One deadline.
- The Event Pile - hard - 600 million clicks a day. The budget is not infinite.
- The Fare Aggregator - medium - Airfares shift every minute. Catch the best ones.
- The Fleet That Never Stops - hard - Every truck is talking. Not everyone can hear them yet.
- The Leaderboard That Costs $25K a Month - hard - Product wants it live. Engineering has a price tag.
- The Meal Kit That Knows You - medium - What they ordered says a lot about what they want next.
- The Migration That Cannot Break Morning - hard - It all works today. Moving it without losing a single report is the hard part.
- The Models Going Stale - hard - The model is only as good as what you feed it.
- The Panel and the Set-Top Boxes - hard - Set-top boxes tell you who watched. Projection tells you how many.
- The Patients We Cannot Move - hard - Patient data stays local. Insights have to be global.
- The Points Arrive Two Days Late - medium - The bank data shows up late. The rewards were already sent.
- The Provider That Sometimes Sleeps - medium - The models run at dawn. The data has to be there first.
- The Query That Used to Be Fast - medium - Queries used to be fast. Something changed.
- The Queue That Wouldn't Stop Growing - medium - 500,000 messages behind and the number keeps climbing.
- The Revenue That Was Wrong for Two Weeks - medium - Nobody caught it until the CFO asked a question. Design the system that catches it first.
- The Sale That Needs to Land Now - medium - Three channels feeding one view. Not all of them speak the same language.
- The Signals That Power Recommendations - medium - Fresh signals, many teams, one pipeline.
- The User Who Asked to Be Forgotten - hard - Users want their data erased. Completely.
- The Vendor Who Never Warns You - medium - Every month, something is different. The dashboards have no idea.
- The What-If Machine - hard - A million slots. A thousand campaigns. Every combination matters.
- The Whiteboard Exercise - medium - Marker in hand. Draw the whole thing.
- Thirty Cities, One Forecast - hard - Five cities. Five data formats. One prediction.
- Thirty Countries, One Solvency Number - hard - Premiums collected globally. Losses happen locally.
- Thirty Million Unique Jobs a Year - hard - One press run, many orders. Group them right.
- Thousands of Practices, One Dataset - hard - Patient records in, operational insights out.
- Three Providers, One Workout - hard - The same ride, reported three times.
- Three Regions, One Finance Team - hard - Payments from everywhere. One consistent report.
- Three Regions, One Report - hard - Three regions, billions of payments, one merchant summary by 6 AM.
- Towers and Phones, Same Story - hard - Tower signals meet app events. Somewhere in between is the truth.
- Traders, Risk, and the Regulators - medium - Markets move in milliseconds. The pipeline has to keep up.
- Two Million Boxes by Monday Morning - hard - Shipped, maybe. Delivered, debatable.
- Two Systems, One Room Count - hard - Two booking systems. Rooms do not duplicate themselves.
- Two Ways to Catch a Change - medium - Two ways to watch the database. Each has a cost.
- Two Years of Every Click - hard - Every click, every aisle, every day for two years.
- Two Years of Clicks, Cheap - hard - Two years of clicks. Every query has to be affordable.
- What Everyone Is Watching - hard - Someone is watching. Capture everything.
- What Should We Recommend Tonight - hard - They ordered pad thai twice. That means something.
- Where Is Every Truck, Right Now - medium - Trucks are moving. Every ping counts.
- Which Promotion Is Actually Working - hard - Was the promotion worth it? The data knows.
- Who Is Churning and Why - medium - Subscribers churn. The pipeline cannot.
- Who Saw the Ad Twice - hard - TV and digital. Same viewer, two measurement worlds.
SQL (133)
- 7-Check Rolling Average - medium - Seven entries hold the trend.
- 7-Day Onboarding Conversion - hard - Signed up Monday. Still here by Sunday?
- 7-Day Token Retention - medium - Premium tokens, day by day.
- Above Average Interactions - easy - The average user is boring. Who is above?
- Above Category Average - easy - The category average is one thing. These beat it.
- Active User Penetration Rate - hard - How much of the user base is actually alive?
- Adopters Before Migration - hard - They used the old feature. Did they ever touch the new one?
- The Vote Tally - hard - Net revenue, day by day, for one product in one region.
- Alert Severity - hard - When the alarms go off, who screams loudest?
- App Stability by Region - medium - Some regions crash more than others.
- Best Day for Ad Revenue - medium - One day of the month outperforms the rest.
- The Budget Line - easy - Some rows are over. Some are under. Label every one.
- Cache Efficiency - hard - Some edges run hot. Others coast on the global average.
- Campaign Bookend Engagement - hard - First impression versus last. The gap.
- The Notification That Paid Off - hard - The message went out to thousands. A smaller number actually bit.
- Campaign Conversion Window - hard - A narrow window between impression and action.
- Campaign Engagement Rank Shift - hard - Two months, many countries. Who moved up? Who fell?
- Category Deep Dive - hard - Revenue, units, rank. The full category report card.
- Cheapest CDN Route - easy - The cheapest path across regions.
- Commit Cadence - hard - Some repos go quiet for too long.
- Consecutive Cost Growth Periods - hard - Five straight months of spending increases.
- Content Recommendation Engine - medium - Pages they haven't discovered yet.
- Cost Efficiency Variance - hard - Cost efficiency varies. By how much?
- Deploy Velocity Swings - medium - Month to month, who sped up and who stalled.
- The Freshest Record - medium - Duplicates everywhere. Only the most recent version of the truth survives.
- The Clean Aisle Numbers - medium - Clear the noise. What did each category actually earn?
- Department Running Totals - medium - Compute cumulative metric values within each department using window operations.
- Deploy Velocity - hard - Days between deploys. Some services ship fast, others crawl.
- Early Commit Velocity by Author - medium - How productive was each author during the first year of a repo's CI pipeline
- Employees Per Department - easy - Headcount, location by location.
- The Failure Report - medium - Errors by day and region. Some areas are worse than they appear.
- Experiment Impact - hard - Which experiments moved the needle? Settle the standings inside every variant.
- Extreme Category Totals - medium - The highest and the lowest. Both are interesting.
- Fastest Completion Per Day - medium - Every day has a speed champion.
- Fastest Page View to Click - hard - How fast from view to click?
- Feature Flag Engagement Impact - hard - Flags on versus flags off. The engagement gap.
- Feature Flag Fan vs Detractor Pairs - hard - Some users love the flag. Others want it gone.
- First Interaction Credit - hard - Attribute transactions to earliest touchpoint
- Flatten Org Chart Hierarchy - hard - The tree runs deep. Walk every branch to the root.
- Friday Sessions for Shared Experiments - medium - Friday vibes only. Same experiment, different users.
- Engagement Depth by Event - hard - Where users actually spend their attention.
- Ghost Products - medium - Listed but never sold. The shelves collect dust.
- Heavy Ad Exposure - medium - Saturated with ads. Is it too much?
- Idle Team Members - easy - Sprint started. Some people never got assigned.
- Intra-Region Latency Diff - hard - Same region. Different latency.
- The Full Picture - easy - Two tables know different things about the same people. Combine them.
- Keep Most Recent Record - medium - Carbon copies clutter the table. Only the latest matters.
- Largest Group - easy - One group towers above the rest.
- Latest Commit Build Cost - medium - The latest commit came with a build cost.
- Longest Gap Between Token Events - medium - The longest gap between token events.
- Longest Uptime Streak - hard - Pass, pass, pass. How long until fail?
- Longest Visit Streaks - hard - Day after day after day. Who kept coming back?
- Long Messages - medium - Some commit messages tell a novel.
- Market Share - hard - Every category wants a bigger slice.
- Max Value Per Location - easy - Every location has a peak.
- Median Cloud Cost by Service - hard - The median cloud bill, service by service.
- Median Failure Rate by Table - hard - Half the tables fail more than this.
- Median Household Earnings - hard - Household earnings. The median reveals the middle.
- Median Model Accuracy - hard - The median accuracy. Not the mean.
- Metric Range Per Group - easy - The spread within each group.
- Mid-Range Team Spenders - hard - Above average but not extreme.
- Minimum Parallel Workers - hard - Too few workers and it stalls.
- Model Accuracy Drift - hard - Accuracy used to be higher.
- Monthly Category Totals - easy - Sum amounts by category and month.
- Monthly Running Total - medium - Cumulative sales per product across months.
- Monthly Service Retention - hard - Users came back. Or they did not.
- The Three-Way Report - medium - Three tables. One summary. Every piece depends on the others.
- Negative Outcome Rate for New Users - medium - New users have a rough first two weeks.
- New Customers Per Day - medium - Count users whose first order falls on each date.
- Normalization Tradeoffs in Practice - hard - Clean data or fast queries? You can't always have both.
- Nth Highest Salary Per Department - medium - Third place in every department.
- Nth Largest Value - easy - Select the row with a specific rank position.
- Peak Activity by Device - easy - Activity windows, device by device.
- Peak Concurrent Tokens - hard - How many tokens were alive at the same time?
- Pipeline Duration vs Throughput - hard - Does throughput correlate with duration?
- Pipeline Throughput Ratio - easy - Compute current-to-initial value ratio per period.
- The Event Breakdown - medium - Events are piling up by type. The report needs them side by side.
- Price Pairs - hard - Same shelf, wildly different stickers. Spot the pricing gaps.
- Quarter-over-Quarter Latency Trend - hard - Latency trending up or down? The quarters have the answer.
- Rapid Retry Detection - medium - Detect retried API calls within 5 minutes of failure.
- Recent Price Drops - medium - The price just dropped. Who noticed?
- The Subscription Ghost - medium - Some charges come back to haunt the same card a month later.
- Repeat Purchases Within a Week - medium - They bought again within seven days.
- Response Buckets - medium - Fast, normal, or slow. Every API call gets a verdict.
- Retried Failed API Calls - medium - Spot users who retry API calls within 5 minutes of a failure.
- Rolling Revenue Average - hard - Smooth out the revenue bumps. The trend matters more.
- Rolling Weekly Total - medium - Seven days at a time, the totals keep rolling forward.
- Honeymoon Phase - medium - How many wallets stay loyal the same year they say "I do"?
- Same First and Last Reply Target - medium - They started and ended the month messaging the same person.
- Second Purchase - hard - The first buy is curiosity. The second is commitment.
- Back From the Brink - hard - Roll it back, then nail the next one.
- Service Uptime Minutes - medium - Status changed. How long was it actually up?
- Service Uptime Turnaround - hard - It was down. Then it came back. Stronger.
- Session Page View Distance - hard - Page view distance per session.
- Shared Channel Contacts - hard - User networks mapped through messages.
- The Conversion Story - medium - Signups are one thing. Paid purchases are another. Find the gap by source.
- Slow Batch Jobs - easy - Promised by noon. Delivered at midnight.
- The Address That Changed - hard - Addresses change. History must not be erased.
- Smooth Latency - medium - Noisy latency readings, smoothed into a trend you can trust.
- Spend and Rank - hard - Five thrones at the top of the spending leaderboard.
- Spending Range - hard - Between the smallest purchase and the biggest lies the story.
- Streak Status Changes - hard - Detect value changes across consecutive rows
- Subscribers Without Premium - medium - Subscribed. But never upgraded.
- Successful Call Volume per Endpoint - medium - Not every ping is honest.
- Team Cost Allocation Comparison - hard - Individual spend versus team average.
- The Cannibalization Report - hard - The new product launched. The old one suffered.
- The Latest Transaction Per Product - medium - Every product has a last sale. When was it?
- The Regional Cost Reconciliation - hard - Two cost tables, one region. Reconcile the running balance.
- The Session Stitcher - hard - Page views without sessions are just noise.
- Top Average By Region - easy - Region by region, who pulls the best average?
- Top Campaign by User Revenue - medium - Which campaign made each user spend the most?
- Top Commit Authors by Repo - hard - Three authors per repo. The top committers.
- Top CPU Pods per Namespace - hard - The two most CPU-hungry pods in each namespace.
- Top Endpoint by Power Users - hard - Power users have a favorite endpoint.
- Top Flagged Campaign Resolutions - hard - Flagged the most. Resolved how?
- Top Lessons Each Month - medium - Rank items within time periods and keep top 3
- Top Models by Framework - hard - Every framework has a star model.
- Top Percentile API Tokens - hard - The most suspicious tokens.
- Top Percentile Spenders - medium - Top 1% of users by total spend via percentile bucketing.
- Top Recent Sellers - easy - Fresh data, top sellers. The recent leaderboard.
- Top Regions by High CPU Nodes - hard - Five regions with the hottest CPUs.
- Top Selling Items - easy - Revenue crowns the winners. Who sold the most?
- Top Services Per Provider - medium - Within each cloud, two services rise above the rest.
- Transaction-Only Features - hard - Exclusive to one source. Missing from the other.
- Unique Hostnames per Region - medium - How many distinct machines live in each region?
- Upvote Percentage by Age Cohort - hard - New users versus existing. The upvote gap.
- User 360 - hard - One row per user. Everything they did, or didn't do.
- User Campaign Overlap Percentage - hard - How much ad overlap between users?
- User Connection Score - hard - Every user has a social score.
- User Spend Segmentation by Category - hard - Users segmented by spending behavior.
- Weekly Build Status Report - hard - Every CI run, bucketed by week.
- Weighted Variant Selection - hard - Select a row using cumulative weight probabilities.
- YoY Signup Growth Rate - hard - This year versus last year. Growing or shrinking?
Data Modeling (39)
- A/B Experiment Assignment Schema - medium - One user, one experiment, one variant. No exceptions.
- Airline Flight Operations Schema - medium - Flights, passengers, and routes. Before you draw a single table, tell me the grain.
- Clickstream and Session Schema - medium - Millions of clicks, mostly anonymous.
- Cloud File Storage Metadata Schema - hard - A file is also a folder. A folder is also a file.
- Content Engagement Data Model - hard - Post published. Now measure everything that happens next.
- E-Commerce Supply Chain Tracking - hard - A package splits, reroutes, and (maybe) arrives.
- Employee Transfer Tracking System - medium - People switch teams. HR loses track.
- Financial Trading Warehouse - hard - Every trade, every tick, every fraction of a share. The regulators want receipts.
- Fitness Studio Membership Schema - easy - Classes fill up. Members no-show. Billing continues.
- Food Truck Operations Data Model - medium - Mobile vendor, fixed menu, unpredictable locations.
- Insurance Claims Lifecycle - hard - A claim gets filed. Then it gets complicated. Then it gets reassigned. Then it loops back.
- Livestream Analytics Schema - medium - Someone goes live, thousands tune in, chat explodes, and virtual gifts start flying.
- Log Parsing Pipeline Schema - medium - Raw text files, terabytes of them, full of buried signals and cryptic error codes.
- Machine Process Event Log Schema - medium - Machines fire events. Pair them up before they bury you.
- Metric Definition Reverse Engineering - hard - Five numbers on a dashboard. Your job: figure out where they come from.
- Movie Streaming Analytics Schema - medium - They pressed play. What happened next is the whole question.
- Multiplayer Game Match History - medium - Millions of matches. The leaderboard refreshes in fifteen minutes.
- Online Marketplace - Seller Payouts - hard - The buyer paid one number. The seller got a different one.
- POS Sales Data Warehouse - medium - Every beep at the register. Coupons, returns, all of it.
- Property Booking Platform - hard - Five-star listing. Three-star reality.
- Retailer Data Warehouse Design - medium - Queries are crawling. The analysts are not happy.
- Ride-Sharing Platform Schema - medium - Riders, drivers, and fares. Everyone takes a cut.
- The Customer Who Changed - hard - She moved. She upgraded. She became someone new. The record has to keep up.
- Social Platform Data Model - medium - Follows, likes, replies to replies. It never stops.
- Subscription and Payment Data Model - medium - Two user types. Multiple payment methods. One messy billing table.
- Subscription Churn Analysis Model - medium - Subscribers are leaving. The data knows why.
- Telecom Network Connectivity Warehouse - hard - One device goes down. The ripple keeps going.
- The Celebrity Problem - medium - One post. A million notifications. Something has to give.
- The Churner Who Came Back - hard - They cancelled. They came back. The report has to tell both stories correctly.
- The JSON Files That Became a Data Mart - medium - Three semi-structured inputs. One queryable warehouse.
- The League With Too Many Loyalties - hard - A player can belong to many teams. The schema must agree.
- The Plan That Changed Twice This Month - medium - Subscribers come, go, downgrade, and share. The schema has to keep up.
- The Retail Tables That Need a New Home - medium - A working system. Now redesign it so the analysts can actually use it.
- The Schema That Could Not Answer Back - hard - Forty columns in. Zero useful answers out.
- The Table That Lies - medium - Every query comes out wrong. The data is all there.
- The Talent Funnel - medium - Thousands applied. One accepted. Where did the rest go?
- The Territory That Keeps Moving - hard - Reps get reassigned. The receipts have to survive.
- Three-Sided Marketplace Delivery Schema - hard - One order. Two deliveries. Revenue counted twice. Where is the bug in your schema?
- Trending Dishes Dashboard - medium - What's everyone eating? The answer changes hourly.
Python (65)
- Activity Time Ledger - easy - Matching activities. One runtime.
- Character Occurrence Map - easy - Character frequency as a map.
- Dice Roll Scoring - medium - The pattern rewards the patient.
- Dictionary Key Intersection - medium - Two dictionaries. What do they share?
- Execution Timer Wrapper - medium - Function wrapped with a timer. Duration captured on exit.
- Flatten the Nest - easy - Mixed nesting. One flat list out.
- Merge Intervals - hard - Overlapping ranges. Merge them.
- Merge Overlapping Time Ranges - medium - Intervals piling up. Clean the timeline.
- No Shortcuts - easy - The peak value. Built-ins off the table.
- Ordered Character Check - easy - Check if all As appear before all Bs.
- Palindrome Hunt - medium - It reads the same both ways. Go further.
- Permissions Manager - medium - Manage user permissions with config updates.
- Portfolio Profit Calculator - medium - Portfolio gain from purchase history and current prices.
- Precision and Recall - medium - Precision and recall. Both matter.
- Prefix Based Word Replacement - medium - Every word trimmed to its root.
- Progress Milestones - easy - Progress at every 10% increment. Keep the receipts.
- Every Line on the Receipt - medium - Nested deep inside the receipt. Pull every item out.
- Shortest Unique Metric Tag - medium - One token per metric. Make it unambiguous.
- Stream-Process a Large CSV - hard - Too big to load. Read what you can.
- The Anomaly Detector - hard - Spot the outliers before they page someone.
- The Bipartite Test - medium - Can this crowd be split into two perfectly separated groups?
- The Bracket Validator - easy - Brackets opened and closed. The nesting might be off.
- The Category Ranker - medium - Categories have standing. Rows get theirs.
- The Chain Transform - medium - One small step at a time can cover a great distance.
- The Change Data Capture - hard - Inserts, updates, deletes : all present.
- The Character Map - easy - Character-level frequency. As a dictionary.
- The Column Shuffle - medium - Rows in, columns out. Number them.
- The Consecutive Sequence Finder - medium - Numbers that flow without interruption.
- The DAG Executor - hard - Wire up a mini pipeline and watch it run.
- The Deep Config - medium - Nested config, dot-notation output.
- The Deep Dictionary - easy - One key goes further than the rest.
- The Dependency Resolver - medium - Everything depends on everything.
- The Event Bucketer - easy - Logs slotted into buckets.
- The Event Overlap Detector - medium - Overlapping events. The calendar knows.
- The File Tree Builder - medium - Flat paths. Build the nested tree.
- The Forgetful Machine - medium - It remembers everything, until it does not.
- The Frequency Eviction - hard - When storage is tight, something has to go.
- The Hierarchy Builder - hard - Parent-child pairs, flat. Build the family tree.
- The Impersonator - medium - You only have stacks. Make a queue anyway.
- The Indivisibles - easy - Numbers that yield only to themselves.
- The Infection Spread - hard - It starts with one, and then it spreads.
- The Lazy Stream - hard - Yield values one at a time from a potentially infinite source.
- The Lone Character - easy - It appeared exactly one time. That made it special.
- The Median Keeper - hard - The middle value keeps moving as new data arrives.
- The Meeting Room Allocator - hard - Meetings overlap on the calendar. Rooms are limited.
- The Middle Ground - hard - The middle value keeps moving.
- The Narrow Lens - medium - A narrow timeframe. Everything inside matters.
- The Nearest Value Mapper - medium - Close enough counts. Ties go low.
- The Never-Ending Sequence - easy - Sequence that keeps going. Follow it.
- The Onion Layer - hard - Peel from the outside in - one ring at a time.
- The Output Peak - hard - One stretch outpaced all the others.
- The Repeat Offenders - easy - Repetition is a clue.
- The Rotated Array - medium - Someone shuffled it. Now locate what you came for.
- The Runner-Up - easy - Not the winner. The one just behind it.
- The Schema Migrator - hard - Old schema in, new schema out.
- The Shifting Standard - medium - A benchmark in motion.
- The Stream Averager - easy - The answer moves with the data.
- The Stream Joiner - hard - Events don't wait for each other. This does.
- The Target Hunt - medium - Pairs that hit a target. Every one of them.
- The Trapped Pool - hard - What collects in the valleys after the rain?
- The Triplet Hunt - medium - Every path that works gets a seat at the table.
- The Version Parade - easy - 1.0 before 2.0. Don't let the dots confuse you.
- The Water Collector - hard - Two walls, one sky, and a very important question.
- The Word Inventory - easy - Every word, twice over.
- Triangle Validator - medium - Not every triangle is a triangle.