DataDriven
LearnPracticeInterviewDiscussDaily
  1. Home
  2. Mock Interviews

Data Engineering Mock Interview Questions

1467+ data engineering mock interview questions with AI-powered feedback. Pick your domain, target company tier, and seniority level to start a timed interview simulation. Write real code, ask clarifying questions, and get graded instantly.

Available domains: Python (387 questions), SQL (903 questions), Data Modeling (56 questions), Pipeline Architecture (121 questions). Difficulty levels: easy (534), medium (677), hard (256). Seniority levels: Junior, Mid, Senior, Staff, Sr. Staff.

Python Interview Questions (387)

  • The Dominant Signal - easy - Hottest items in the transaction log. Ties included.
  • The Original Keeper - easy - Clean up duplicate events without losing the timeline.
  • The Forward Fill - easy - Patch the gaps in a noisy sensor stream.
  • The Word Mismatch - easy - Some text does not match.
  • The Social Graph - easy - Everyone knows someone.
  • The Sequel Spotter - easy - Spot the sequels hiding in the catalog.
  • The Numbered Chair - easy - A standing list. Position n holds one entry.
  • The Character Encoder - easy - Squeeze a string down to its tightest form.
  • The One-Way Street - easy - Monotonic time-series. Direction only.
  • The IP Validator - easy - Real and fake, mixed together.
  • The Log Pulse - easy - Some lines repeat themselves.
  • The One-of-Each - easy - Strip the repeats, keep the originals.
  • The Config Blender - easy - Config collision. The surviving values after a merge.
  • Flatten the Feed - easy - Nested lists, all the way down.
  • Activity Time Ledger - easy - Matching activities. One runtime.
  • Batch With Metadata - easy - The list gets chopped.
  • Caesar Shift Check - easy - The key turns. Does it open?
  • Character Occurrence Map - easy - Character frequency as a map.
  • Coalesce Fields - easy - Nulls are hiding. Fill them in.
  • Column Max - easy - One value rules the column.
  • Column Range - easy - From minimum to maximum. What is the spread?
  • Column Sum - easy - Add up the column. Every value counts.
  • Dominant Element - easy - Majority element. Appears more than half the time.
  • Even Filter - easy - Only the even ones survive.
  • Explode List - easy - One row holds many values. Unpack it.
  • Extract Domain - easy - The domain is buried in the string.
  • Flatten the Nest - easy - Mixed nesting. One flat list out.
  • Greeting Formatter Class - easy - First impressions are formatted carefully.
  • Normalize Name - easy - Names are messy. Standardize them.
  • No Shortcuts - easy - The peak value. Built-ins off the table.
  • Null Counter - easy - How many holes in the data?
  • Ordered Character Check - easy - Check if all As appear before all Bs.
  • Progress Milestones - easy - Progress at every 10% increment. Keep the receipts.
  • Quality Gate - easy - Not everything passes inspection.
  • Quantile Calculator - easy - Mark the boundary value at a given point.
  • Record Filter - easy - Some records belong. Others do not.
  • Reverse Field - easy - Flip it. See what happens.
  • Run Length Encoding - easy - AAABBB becomes 3A3B. Compress it.
  • Sanitize Field - easy - Dirty input. Clean output.
  • Schema Checker - easy - The schema says one thing. The data says another.
  • Sequential Word Pairs - easy - Everything has a neighbor.
  • Single Element Among Pairs - easy - One element has no partner.
  • Sort Descending - easy - Biggest first. No exceptions.
  • The Account Manager - easy - Deposits, withdrawals, and the risk of going negative.
  • The Additive Chain - easy - Each value is the sum of the two before it - no calls to itself allowed.
  • The Address Surgeon - easy - One string hides a street, a city, a state, and a zip.
  • The Alphabet Score - easy - Every letter has a secret numeric value - what's your total?
  • The Alphabet Sorter - easy - Filing cabinet logic: everything goes in its proper drawer.
  • The Balanced Sum - easy - Some numbers have a rare quality that mathematicians revere.
  • The Bit Counter - easy - How many lights are on in the binary representation?
  • The Bit Ladder - easy - Count the ones all the way up.
  • The Bitwise Judge - easy - No division, no modulo - just a single bit tells you everything.
  • The Bouncer - easy - Every door has a guest list.
  • The Bronze Medalist - easy - Not first, not last - somewhere in the middle of the podium.
  • The Bug Spotter - easy - It compiles. The answer is still wrong.
  • The Calendar Sort - easy - Time has its own opinion about order.
  • The Carousel - easy - Keep moving, same ride.
  • The Character Map - easy - Character-level frequency. As a dictionary.
  • The Cipher Wheel - easy - Every letter has an alias - you just need the right codebook.
  • The Clock Angle - easy - Two hands. One gap. One number.
  • The Code Expander - easy - Compressed messages need a decoder to come alive.
  • The Column Transformer - easy - Each column gets its function.
  • The Column Zipper - easy - Headers on top, values below, dict in the middle.
  • The Complement Hunt - easy - Every number is looking for its other half.
  • The Crowd Favorite Eatery - easy - One restaurant clearly won the most hearts.
  • The Crowd Pleaser - easy - One value shows up more than all others combined.
  • The Crowd Splitter - easy - The middle holds even with a dominant outlier.
  • The Decomposer - easy - Every composite thing can be broken down to its simplest parts.
  • The Deep Dictionary - easy - One key goes further than the rest.
  • The Deep Dive - easy - A specific position in the unsorted pile.
  • The Deep Selector - easy - Tell it what you want. It knows where to look.
  • The Deep Unpacker - easy - Boxes inside boxes. Eventually you reach the bottom.
  • The Depth of Field - easy - Some containers hold containers that hold containers.
  • The Diagonal Accountant - easy - Two diagonals cross in the center of every square.
  • The Duplicate Spotter - easy - Some values appear more than once - report only those.
  • The Even Checkpoint - easy - Is this number in the even club? Prove it the fast way.
  • The Expander - easy - What goes in small comes out big.
  • The Field Counter - easy - Some fields speak louder than others.
  • The First Encounter - easy - Every character has a story - but only if you remember where it started.
  • The First Stranger - easy - In a crowd, the unique ones stand out first.
  • The Forbidden Ceiling - easy - Round up. But not the obvious way.
  • The Gap Filler - easy - Fill the Nones with the last real value.
  • The Gate Keeper - easy - Not all openings have a closing.
  • The Grid Pivot - easy - A different angle reveals a completely different picture.
  • The Halftime Score - easy - Middle value of a dataset. No built-in shortcuts.
  • The Hash Stamper - easy - One input, one irreversible output - the foundation of every secret.
  • The Indivisibles - easy - Numbers that yield only to themselves.
  • The Integer Sieve - easy - Not everything in this list belongs here.
  • The Last Instance - easy - When duplicates appear, only the last one counts.
  • The Last Seen Map - easy - For each character, where did it appear last?
  • The Lazy Squares - easy - A sequence that never fully reveals itself.
  • The Letter Census - easy - Every crowd has its share of talkers and quiet ones.
  • The Letter Frequency Map - easy - Count every character in the string and report the results.
  • The Letter Ledger - easy - Every character has a count to answer for.
  • The Letter Tally - easy - Each character in the string has a count to answer for.
  • The Line Cutter - easy - Did everyone with an A-pass get through before the B-crowd arrived?
  • The Line Splitter - easy - Comma-separated truths, one at a time.
  • The Log Decoder - easy - Every line holds a secret.
  • The Lone Character - easy - It appeared exactly one time. That made it special.
  • The Lone Traveler - easy - One character stands apart from the crowd.
  • The Manual Sorter - easy - No shortcuts, no built-ins, just work.
  • The Matching Manifest - easy - Two warehouses, one shipment - only load what's in both.
  • The Merge - easy - Chaos in. Order out.
  • The Messy Pipeline - easy - The upstream API has no idea what a schema is.
  • The Minutes Tracker - easy - Some activities eat more time than others.
  • The Mirror Flip - easy - Sometimes the fastest fix is to swap everything.
  • The Mirror Image - easy - Flip the tape backwards - start from the end.
  • The Mirror Test - easy - Check if a string reads the same forwards and backwards.
  • The Mirror Words - easy - Each word looks back at itself.
  • The Missing Number - easy - Something is missing from the sequence.
  • The Molecule Report - easy - Four letters. A lot of math hidden in the sequence.
  • The Multiplication Trail - easy - Each step multiplies the whole journey.
  • The Never-Ending Sequence - easy - Sequence that keeps going. Follow it.
  • The Number Screen - easy - Some numbers make the cut. Most do not.
  • The Odd Digits - easy - Hidden inside a mess of characters are a few odd numbers.
  • The Odd Extractor - easy - Not all numbers from a string are welcome here.
  • The Odd Filter - easy - Strip out everything that does not belong to the odd club.
  • The One-Timers - easy - Values that never repeated.
  • The Op Dispatcher - easy - Name the operation, apply it everywhere.
  • The Order Enforcer - easy - Some rules say every A must come before every B.
  • The Overlap Finder - easy - Two guest lists - who made it onto both?
  • The Pair Counter - easy - How many pairs can be formed from the crowd?
  • The Paired Doors - easy - Every open bracket has a partner - but not every partner shows up.
  • The Pascal Row - easy - Each number is the sum of two numbers above it.
  • The Password Builder - easy - Random characters, fixed rules.
  • The Password Forge - easy - Eight random characters - how many combinations exist?
  • The Peak Finder - easy - Largest number in the list. Max() is not an option.
  • The Pipeline Filter - easy - In the door as one thing, out the door as another.
  • The Price Bander - easy - Different prices, different treatment.
  • The Progress Parade - easy - Just tell them how far along you are.
  • The Ranked Dict - easy - Values deserve order too.
  • The Repeat Offenders - easy - Repetition is a clue.
  • The Roman Converter - easy - Roman numerals decoded.
  • The Runner-Up - easy - Not the winner. The one just behind it.
  • The Running Total - easy - Each position holds the sum of everything before it.
  • The Safe Caster - easy - Type conversion is easy, until it is not.
  • The Score Sorter - easy - Points on the board, sorted by who earned the most.
  • The Scramble Check - easy - Same letters, different order - are these two strings secret twins?
  • The Second Summit - easy - Not the top of the mountain - just below it.
  • The Secret Twins - easy - Same letters, different disguises.
  • The Self-Portrait Number - easy - Some numbers describe themselves perfectly.
  • The Shadow Cleaner - easy - Remove the repeats. No shortcuts.
  • The Silent Locator - easy - Every lookup should cost you less than the one before it.
  • The Single Bit - easy - One particular pattern hides in plain sight.
  • The Solo Act - easy - One-and-done values only.
  • The Spread - easy - Data spread around a center. The range matters.
  • The Squeeze - easy - aaabbb gets old fast. Shrink it.
  • The Step Counter - easy - You can hop one step or two - how many ways to reach the top?
  • The Streak Breaker - easy - It has a problem with repetition.
  • The Style Guide - easy - Not every word deserves the same treatment.
  • The Syntax Sentinel - easy - Brackets opened and closed. The nesting might be off.
  • The Tail End - easy - Push, pop, peek. The basics that break people.
  • The Tail Trimmer - easy - Remove the k-th item from the back without counting forward first.
  • The Tally Counter - easy - How many times does a single guest show up to the party?
  • The Top Reviewer - easy - One restaurant receives the most feedback - which one?
  • The Traffic Director - easy - Spread the load evenly - nobody should be doing all the work.
  • The Tree Measurer - easy - How deep does the rabbit hole go?
  • The Trip Grouper - easy - Where did everyone go, and for how long?
  • The Type Sorter - easy - A mixed list is hiding its numbers - extract them.
  • The Value Sorter - easy - The order was always negotiable.
  • The Version Parade - easy - 1.0 before 2.0. Don't let the dots confuse you.
  • The Vowel Hunt - easy - Just the vowels. All of them.
  • The Word Census - easy - Who said what - and how many times?
  • The Word Counter - easy - How many times does each word show up in a file?
  • The Word Flipper - easy - The sentence stays, the words surrender.
  • The Word Inventory - easy - Every word, twice over.
  • The Word Map - easy - Input text. Output: word frequency.
  • Tokenize - easy - Split it apart. Keep the pieces.
  • Transform Column - easy - Same data, new shape.
  • Type Caster - easy - Wrong type. Fix it.
  • Unique Values - easy - Duplicates are noise. Remove them.
  • Value Count - easy - How many of each? Count them.
  • Word Counter - easy - Words in, counts out.
  • Zip to Record - easy - Two lists become one record.
  • The High Mark - easy - Scan the list. Report the max.
  • The Event Bucketer - easy - Logs slotted into buckets.
  • The List Merger - easy - No shortcuts.
  • The Dictionary Inverter - easy - Flip the dict. Group what used to be values.
  • The String Shrinker - easy - Compress the string. Shorter wins.
  • The Bracket Validator - easy - Brackets opened and closed. The nesting might be off.
  • The Trade Signal - easy - Buy low, sell high. Identify the ideal moment.
  • The Stream Averager - easy - The answer moves with the data.
  • The Generous Ones - medium - The generous ones are obvious.
  • The Payload Flattener - medium - Turn a deeply nested API response into a flat row.
  • The Resume Sifter - medium - Pull what's useful. Skip what you know.
  • The Title Ladder - medium - Job titles and the salary tier they belong to.
  • The Repeat Review - medium - The echo came back.
  • The File Size Profiler - medium - File types and their disk footprint. One type dominates.
  • The Schedule Cleaner - medium - Overlapping sessions. One clean line.
  • Stock Range Finder - medium - Prices move. One stretch had the widest gap.
  • The Status Board - medium - Make sense of a pile of raw Nginx access logs.
  • The Budget Allocator - medium - Split the money. Some wore two hats.
  • The Trade Log Aggregator - medium - Every trade left a footprint.
  • The Timezone Trap - medium - Trip data and timezones. They're not the same thing.
  • The Host Ranker - medium - Some hosts have more to offer.
  • The Email Ranker - medium - Some inboxes see more action.
  • The Consecutive Streak - medium - Login streaks. No gaps allowed.
  • The Schema Differ - medium - Schema from yesterday vs today. Something changed.
  • The Throttle Ceiling - medium - Too many requests in too short a timeframe. Throttle it.
  • The Event Aggregator - medium - Bucket a firehose of events into tidy time windows.
  • The Record Reconciler - medium - Two versions of the same truth.
  • The Dependency Resolver - medium - Everything depends on everything.
  • Batch Partitioner - medium - One pile becomes many. Split wisely.
  • Batch Records - medium - Too many at once. Break them into groups.
  • Char Profile - medium - Every character in the string tells a story.
  • Cumulative Sum - medium - The total grows with every row.
  • Deep Flatten - medium - Nested deep. Flatten everything.
  • Deep Get - medium - Nested deep. Reach in and grab it.
  • Detect Cycle in Sequence - medium - Follow the chain long enough and it might loop back.
  • Detect Outliers - medium - Most values are normal. Some are suspicious.
  • Diagonal Extract - medium - Not every value sits in a row or column.
  • Dice Roll Scoring - medium - The pattern rewards the patient.
  • Dictionary Key Intersection - medium - Two dictionaries. What do they share?
  • Execution Timer Wrapper - medium - Function wrapped with a timer. Duration captured on exit.
  • Extract Leaf Values - medium - The tree has leaves. Pluck them.
  • Find Indices - medium - It is in there somewhere. Where exactly?
  • Find Mode - medium - One value appears more than the rest.
  • Full Outer Zip - medium - Two sides. No value left behind.
  • Group By - medium - Same key, different rows. Bring them together.
  • Lag Column - medium - What came before this row?
  • Left Join - medium - Keep the left side. Match what you can.
  • Max Length Token - medium - The longest token wins.
  • Merge Counters - medium - Two tallies. Combine them.
  • Merge Overlapping Time Ranges - medium - Intervals piling up. Clean the timeline.
  • Palindrome Hunt - medium - It reads the same both ways. Go further.
  • Parse Log Line - medium - One line. A dozen fields hidden inside.
  • Permissions Manager - medium - Manage user permissions with config updates.
  • Portfolio Profit Calculator - medium - Portfolio gain from purchase history and current prices.
  • Precision and Recall - medium - Precision and recall. Both matter.
  • Prefix Based Word Replacement - medium - Every word trimmed to its root.
  • Rank Metrics - medium - Not all numbers are equal. Rank them.
  • Rename Keys - medium - Old names out. New names in.
  • Rotate Buffer - medium - The buffer is full. Rotate it.
  • Row Aggregates - medium - Each row holds its own summary.
  • Running Distinct Count - medium - New values keep appearing. Track the count.
  • Subarray Signal - medium - One stretch carries the strongest signal.
  • The Balanced Inspector - medium - Every branch should carry the same weight.
  • The Bipartite Test - medium - Can this crowd be split into two perfectly separated groups?
  • The Bit Reverser - medium - Sometimes the answer is literally backwards.
  • The Blind Multiplier - medium - Compute the result of everything around you - without seeing yourself.
  • The Bonus Round - medium - Consecutive matching dice rolls trigger a special scoring rule.
  • The Build Order - medium - Some tasks must wait for others to finish first.
  • The Chain Builder - medium - Links connect in sequence - build the chain from scratch.
  • The Chain Transform - medium - One small step at a time can cover a great distance.
  • The Change Tracker - medium - Before and after snapshots. The delta is in there.
  • The Character Clans - medium - Words sharing the same letters belong to the same clan.
  • The Chunked Reader - medium - Too big for memory. Read in pieces.
  • The Clock Examiner - medium - Two hands on a clock - how wide is the gap?
  • The Coin Vault - medium - Exact change only - and you want to use as few coins as possible.
  • The Column Shuffle - medium - Rows in, columns out. Number them.
  • The Counting Machine - medium - It knows where it stopped last time.
  • The Custom Iterator - medium - Some sequences follow their own rules.
  • The Cycle Detector - medium - Follow the chain long enough and you might end up where you started.
  • The Date Sorter - medium - Jumbled calendar. Sort it first.
  • The Deep Config - medium - Nested config, dot-notation output.
  • The Dict Comparator - medium - Two dictionaries. Subtle differences.
  • The Double-Ended Gateway - medium - Some queues let you skip the line from both ends.
  • The Elevator Trace - medium - Nested floors. One path through.
  • The Encoded Signal - medium - The encoding is hiding multipliers. Decode it.
  • The Event Broadcaster - medium - Subscribers show up, listen, and sometimes leave.
  • The Event Window - medium - A five-minute window is all that matters.
  • The Eviction Policy - medium - Fixed capacity. Oldest unused entry gets evicted.
  • The Exception Handler - medium - Good code handles failure as gracefully as success.
  • The Face That Breaks the Bank - medium - Roll enough dice and one number always runs away with it.
  • The Family Reunion - medium - Two cousins share a common ancestor somewhere above.
  • The Fast Climber - medium - Some routes up the mountain are faster than others.
  • The First Class Function - medium - Functions travel as values - prove you can pass one around.
  • The Flat Mapper - medium - Nested values. One flat stream out.
  • The Forbidden Sorter - medium - Put the letters in order without the obvious tool.
  • The Forgetful Machine - medium - It remembers everything, until it does not.
  • The Gap Reporter - medium - The missing IDs in the log - somebody has to notice.
  • The Genre Filter - medium - Three tables, two conditions, one actor's total.
  • The Half-Life Search - medium - Every guess cuts the problem in half.
  • The High Rollers - medium - Not every gambler bets the same - some wager far more than others.
  • The Horizon Scanner - medium - For each position, what is coming up ahead?
  • The Hostile Takeover - medium - One dict eats another.
  • The Hourly Bucket - medium - Timestamps belong somewhere.
  • The Intervals - medium - Timestamps in buckets.
  • The Inverted Triangle - medium - A pattern of stars narrows toward the bottom.
  • The Island Counter - medium - Surrounded by water, connected by land - how many separate landmasses?
  • The Lazy Unpacker - medium - Instead of loading it all at once, yield it one piece at a time.
  • The Letter Kin - medium - Words that share the same letters belong together.
  • The Letter Mapper - medium - A consistent substitution, or not.
  • The Level Inspector - medium - Each floor of the tower tells a different story.
  • The Level Summer - medium - Add up each level of the tree.
  • The Link Shrinker - medium - Long addresses have aliases - you give them out, you keep the map.
  • The Load Balancer - medium - Distribute incoming requests evenly across available servers.
  • The Map Reducer - medium - Map it. Reduce it. One answer.
  • The Market Streak - medium - Some stocks run longer than you think.
  • The Market Timer - medium - One buy, one sell - when do you make the most?
  • The Merge Champion - medium - Many sorted rivers flowing into one.
  • The Min Tracker - medium - The stack remembers the best it ever saw.
  • The Month-by-Month Snapshot - medium - Every salesperson has a story. The months just tell it sideways.
  • The Mountain Peak - medium - The sequence has a summit.
  • The Multiplier Rush - medium - Negatives cancel negatives - but only if you keep both in view.
  • The Narrow Lens - medium - A narrow timeframe. Everything inside matters.
  • The Number Miner - medium - JSON strings are hiding numeric secrets - dig them out.
  • The Number Narrator - medium - Every number has a story in words.
  • The Online Elite - medium - The top performers are hiding in the data.
  • The OOP Pillars Exam - medium - Four principles, one class hierarchy - show you know all of them.
  • The Order Inspector - medium - A binary tree has rules - is this one actually following them?
  • The Page Turner - medium - Nobody loads everything at once.
  • The Pandas Pivot - medium - Rows become columns. Columns become power.
  • The Parentheses Factory - medium - Building balanced brackets is an art form.
  • The Pay Ladder - medium - Climb the ladder the hard way. No shortcuts allowed.
  • The Perfect Match - medium - Two numbers walk into an interview...
  • The Placement Fixer - medium - Each value belongs in exactly one spot.
  • The Postfix Processor - medium - Math without parentheses - the operators come after the numbers.
  • The Precision Hunt - medium - Some answers need no decimal point.
  • The Priority Queue - medium - When two things tie, something has to break the deadlock.
  • The Progress Meter - medium - Report progress at every tenth of the way through.
  • The Quarter Turn - medium - One rotation changes everything.
  • The Queue Disguise - medium - A queue in sheep's clothing.
  • The Repeat Visitor - medium - Loyal customers come back sooner than expected.
  • The Response Aggregator - medium - Multiple result pages. One clean summary.
  • The Rolling Peak - medium - The sweetest stretch in the sequence.
  • The Rolling Window - medium - Smooth things out, one step at a time.
  • The Rotated Array - medium - Someone shuffled it. Now locate what you came for.
  • The Schema Diff - medium - Two versions of the same config - what changed between them?
  • The Scoreboard Race - medium - Simulate rounds until someone hits the target.
  • The Shifting Standard - medium - A benchmark in motion.
  • The Short Address - medium - Turn a big number into a compact alphanumeric code.
  • The Shortest Route - medium - Fewer hops is always better.
  • The Silver Screen Summit - medium - Box office totals decide who makes the top of the marquee.
  • The Slow Leak - medium - Nested iterators. One flat stream.
  • The Sneaky Twins - medium - They look different but they are the same inside.
  • The Spin Doctor - medium - Ninety degrees, but which way?
  • The Spiral Harvest - medium - The snail reads the grid in its own special order.
  • The Staircase Problem - medium - One step or two, the choices add up.
  • The Subarray Tally - medium - How many hidden windows hit the target?
  • The Table Thief - medium - Somewhere in that query, tables are hiding.
  • The Tag Analyst - medium - Two sets of labels, one analysis.
  • The Tail Finder - medium - Navigate to the end of a linked list using recursion.
  • The Timing Decorator - medium - Wrap any function to capture how long it takes.
  • The Top Words - medium - In every document, some words dominate the conversation.
  • The Trip Aggregator - medium - Travel records hold patterns waiting to be surfaced.
  • The Triplet Hunt - medium - Every path that works gets a seat at the table.
  • The Velvet Rope - medium - Some users get in. Others wait outside until the window resets.
  • The Version Ranker - medium - Software versions follow their own ordering rules.
  • The Vocabulary Test - medium - Can you spell out the whole sentence using only the words you know?
  • The Waiting Game - medium - Patience has a price - and a count.
  • The Water Gauge - medium - Elevation bars trap water between peaks - count the volume.
  • The Window Cleaner - medium - Keep it fresh, keep it unique.
  • The Word Families - medium - Different spellings, same letters - they belong together.
  • The Yahtzee Scorer - medium - Dice scoring. Multiple categories evaluated.
  • The Zero Propagator - medium - One zero can change the whole picture.
  • The Zigzag Encoder - medium - The message snakes its way across the rails.
  • Threshold Filter - medium - Above the line or below it.
  • Top N Keys - medium - Most of them do not matter. The few that do stand out.
  • Transpose Table - medium - Rows become columns. Columns become rows.
  • Triangle Validator - medium - Not every triangle is a triangle.
  • Unflatten Keys - medium - Dots in the key names. Rebuild the structure.
  • Validate Email - medium - Looks like an email. But is it?
  • Distribute Values Into Container Types - medium - Round-robin the values. Keep rotating.
  • The Nearest Value Mapper - medium - Close enough counts. Ties go low.
  • The Target Hunt - medium - Pairs that hit a target. Every one of them.
  • The Event Overlap Detector - medium - Overlapping events. The calendar knows.
  • The Consecutive Sequence Finder - medium - Numbers that flow without interruption.
  • The File Tree Builder - medium - Flat paths. Build the nested tree.
  • The Impersonator - medium - You only have stacks. Make a queue anyway.
  • The Category Ranker - medium - Categories have standing. Rows get theirs.
  • The Throttle Wall - hard - Stop the abusers. Let the rest through.
  • The Change Data Capture - hard - Inserts, updates, deletes : all present.
  • The Stream Joiner - hard - Events don't wait for each other. This does.
  • The Anomaly Detector - hard - Spot the outliers before they page someone.
  • The Schema Migrator - hard - Old schema in, new schema out.
  • The DAG Executor - hard - Wire up a mini pipeline and watch it run.
  • Common Prefix - hard - They all start the same way. How far?
  • Data Quality Report - hard - The data is not as clean as it looks.
  • Group Average - hard - Same group, different values. What is typical?
  • Merge Intervals - hard - Overlapping ranges. Merge them.
  • Pivot Records - hard - Long format is easy. Wide format is useful.
  • The Dynamic Container - hard - Build your own resizable list with no help from the standard library.
  • The Frequency Eviction - hard - When storage is tight, something has to go.
  • The Infection Spread - hard - It starts with one, and then it spreads.
  • The Lazy Stream - hard - Yield values one at a time from a potentially infinite source.
  • The Median Keeper - hard - The middle value keeps moving as new data arrives.
  • The Onion Layer - hard - Peel from the outside in - one ring at a time.
  • The Trapped Pool - hard - What collects in the valleys after the rain?
  • The Triple Alliance - hard - Three numbers, one target.
  • The Water Collector - hard - Two walls, one sky, and a very important question.
  • The Yahtzee Engine - hard - Five dice. Six faces. Score it.
  • Stream-Process a Large CSV - hard - Too big to load. Read what you can.
  • The Meeting Room Allocator - hard - Meetings overlap on the calendar. Rooms are limited.
  • The Middle Ground - hard - The middle value keeps moving.
  • The Hierarchy Builder - hard - Parent-child pairs, flat. Build the family tree.
  • The Output Peak - hard - One stretch outpaced all the others.

SQL Interview Questions (903)

  • Unmatched Credit Complaints - easy - Credits were promised. Not everyone got theirs.
  • The Duplicate Detection Sprint - easy - Same email, different rows. Spot the repeats.
  • Weekend Warriors - easy - Weekdays vs. weekends. When does the action really happen?
  • The Dormant Accounts - easy - They are still paying. They stopped showing up.
  • 30-Day Page View Counts - easy - Thirty days of engagement. Quick snapshot.
  • Above Average Interactions - easy - The average user is boring. Who is above?
  • Above Category Average - easy - The category average is one thing. These beat it.
  • Active API Tokens - easy - Tokens that have actually been used.
  • Active Campaigns - easy - Which campaigns are earning their keep?
  • Active Token Owners in 2026 - easy - Active token owners this year.
  • Active User Revenue for April - easy - Total revenue from active users in a single month
  • Active Users With April Transactions - easy - Active accounts that also opened their wallets. How many?
  • Activity Histogram - easy - How many users did X things? Build the distribution.
  • Ad Revenue 2026 - easy - Annual ad revenue. On the books.
  • Alert Hotspots by Service and Severity - easy - Some services and severities light up more than others.
  • All Infra Regions - easy - The infrastructure spans the globe. Map it.
  • Annual Cloud Spend - easy - One year of cloud bills. The total.
  • Annual Cloud Spend Summary - easy - A year of cloud bills. Add it all up.
  • Annual Pipeline Failures - easy - How many pipelines broke this year?
  • April and May Active Users - easy - Spring cleaning for the user base. Who was actually around?
  • Auth Endpoints - easy - Not all endpoints are visible to everyone.
  • Authors With Successful Deploys - easy - Who deployed successfully?
  • Auth Service Health Checks - easy - One service. Full audit trail.
  • Average Brand Campaign Revenue - easy - A quick benchmark on brand campaigns.
  • Average Build Duration by Repo - easy - Some repos build fast. Others don't.
  • Average DQ Fail Rate - easy - Average failure rate, table by table.
  • Average GPU Node CPU Usage - easy - GPU nodes burning CPU. How much?
  • Average Headcount by Department - easy - Compensation benchmarks, department by department.
  • Average High-Range Accuracy - easy - The top-scoring models. What's their average?
  • Average Latency by Health Status - easy - Healthy versus degraded. The latency gap is real.
  • Average Latency by Status - easy - Each status code has its own latency story.
  • Average Node CPU by Region - easy - Average infrastructure node CPU usage broken down by region
  • Average Node Utilization - easy - CPU and memory, region by region.
  • Average Rating by Category - easy - Category ratings. Some shine, some don't.
  • Average Response Time by Hour - easy - Hour by hour. When does latency spike?
  • Average Search Endpoint Latency - easy - One endpoint. Average speed.
  • Average Search Results Per User - easy - How many results per searcher?
  • Average Session Duration by Device - easy - Session length, device by device.
  • Bargain Bin - easy - Floor prices. Right before the vendor call.
  • Best-Selling Reps Each Month - easy - In every category, a few sellers rise to the top.
  • Big Spenders - easy - The whale list.
  • Budget Flag - easy - Join tables and label rows as over or under budget.
  • Budget-Friendly Products - easy - Affordable does not mean invisible.
  • Campaign Match Rate - easy - Campaign reach. Measured.
  • Campaign Revenue Totals - easy - Every campaign has a price tag. Total them up.
  • Cart Sizes - easy - Power buyers. Big carts.
  • Category Census - easy - Which aisles are worth restocking?
  • Category Sales Summary - easy - Category by category. How did they do?
  • Category-Specific Product Volume - easy - Sum transactions for a specific payment type.
  • CDN Image Request Paths - easy - CDN image traffic. Every path.
  • CDN-Related DNS Lookups - easy - DNS lookups tied to the CDN.
  • Character Position in Endpoint - easy - URL patterns, character by character.
  • Chat Activity - easy - Which channels are ghost towns?
  • Cheapest Cost Per Region - easy - Lowest spend per region.
  • Cheapest Transaction per User - easy - Everyone has a smallest purchase.
  • Clean Cache CDN Edges - easy - Cached, clean, error-free edges.
  • Clean Latency Cast - easy - The latency column is a string. It should not be.
  • Clicked Ad Impressions - easy - They saw the ad. They clicked.
  • Cloud Bill - easy - Which cost buckets are bleeding money?
  • Cloud Cost by Team - easy - Spend by team. Who's burning most?
  • Common Age Buckets - easy - Duplicate records hiding in the users table.
  • Completed Priority-1 Jobs - easy - Priority one. Completed.
  • Compute Nodes in Key Regions - easy - Compute nodes across the key regions.
  • Content by Specific Users - easy - Two creators. What did they publish?
  • Content Duration Snapshot - easy - A popularity snapshot by duration.
  • Content Mix - easy - One content format to bet the quarter on.
  • Content Published in 2026 - easy - Published back then. Still relevant?
  • Content Sorted by Duration - easy - The catalog, sorted by length.
  • Content Type Distribution - easy - How many of each content type?
  • Content Types by Creator - easy - One creator. What did they make?
  • Content Viewer Penetration - easy - What share of the user base has viewed at least one piece of content
  • Cost Efficiency Ratio - easy - Dollars in, value out. What's the ratio?
  • Count Distinct Services - easy - How wide is the service mesh?
  • Count Nodes in Region - easy - One region. How many nodes?
  • CPU Utilization Summary - easy - The CPUs are working. How hard?
  • Customer Full Name Concat - easy - First name, last name. Combine them.
  • Daily and Weekly Active Users - easy - One metric by day, one by week. Same users, different lenses.
  • Daily Cross-Platform Users - easy - Mobile and web. Same day, same users?
  • Daily Deployment Count - easy - Deploys per day.
  • Department Spend Difference - easy - The compensation gap between departments.
  • Department Spend Gap - easy - Gap between Engineering's and Marketing's biggest single purchase
  • Deploy Cadence - easy - Which environments ship the most?
  • Deploy Count by Service - easy - Some services deploy constantly. Others barely at all.
  • Deployed Models by Framework - easy - Which frameworks are actually in production?
  • Deployment Duration by Status - easy - Fast deploys versus slow ones. By outcome.
  • Deployments Without Alerts - easy - Deployed without a single alert. Suspicious or impressive?
  • Deprecated Model Count - easy - How many models are past their expiration date?
  • Device Mix - easy - The device breakdown before the redesign.
  • Device Types With Chrome Users - easy - Power users and their devices.
  • Disabled Feature Flags - easy - Disabled flags. Still worth auditing.
  • Distinct Blog Referrers - easy - Where did the traffic really come from? No repeats.
  • Distinct Product Categories - easy - A quick category inventory.
  • Early 2026 Data Pipelines - easy - Early-year data pipelines.
  • Employees Per Department - easy - Headcount, location by location.
  • Error Severity Buckets - easy - Errors sorted by how much they hurt.
  • Errors With Service Health - easy - Error data, enriched with health context.
  • Even-ID February Signups - easy - A very specific slice of a very specific cohort.
  • Even-ID June Signups - easy - Odd IDs, even IDs. The filter is precise.
  • Event Count on Key Days - easy - Key days. Key event volumes.
  • Events by Month Across Years - easy - Month by month, year by year. The pattern emerges.
  • Event Types Spanning Multiple Months - easy - Some events span seasons.
  • Expensive AWS Services - easy - Some AWS services quietly drain the budget.
  • Extreme Headcount Departments - easy - The pay extremes tell a story.
  • Failed Payment Deployments - easy - Payment deploys that went wrong.
  • Features With Missing Values - easy - Missing data in the features.
  • February 2024 Signups - easy - One signup window. One cohort. Who joined the club?
  • Filter By Domain - easy - Select rows matching a text suffix pattern.
  • Filtered User Roster - easy - A clean roster for the all-hands.
  • Find Deploy Authors - easy - Same person. Many different spellings.
  • First Build per Repository - easy - Every repo had a first build.
  • First Migration Record - easy - The very first migration. Where it all began.
  • First Run Row Count - easy - Every job's first run. How many rows?
  • Flag Check - easy - Which flags are actually live?
  • Full Customer Order List - easy - Every customer. Every order. The full picture.
  • Gateway Connection Timeouts - easy - Timeouts at the gateway.
  • Health Check Distribution - easy - Pass, fail, degraded. The distribution.
  • Health Checks per Service - easy - Some services get checked constantly.
  • Heavy Searchers in August - easy - August's power searchers.
  • High and Critical Alerts in 2026 - easy - High and critical alerts from that year.
  • Higher Performing Variant - easy - Control versus treatment. One wins.
  • Higher Than Supervisor - easy - When the student outscores the teacher.
  • Highest Cost Per Team - easy - Peak cost, team by team.
  • Highest Latency Endpoints - easy - The slowest endpoints. Everyone notices.
  • High-Output Creators - easy - High engagement creators.
  • High Price Products - easy - Everything above 100.
  • High-Rated In-Stock Percentage - easy - Highly rated and in stock. A rare combo.
  • High-Spend 2025 Campaigns - easy - Big-budget campaigns from last year.
  • High-Traffic Endpoints in February - easy - When traffic spikes, some endpoints get buried. How many crossed the line?
  • High Volume Batch Jobs - easy - Batch jobs that processed millions.
  • Holiday Promo Campaign Click Year - easy - One year, the holiday campaign exploded.
  • Holiday Sale Campaign Revenue - easy - The holiday sale campaign. How did it do?
  • Idle Team Members - easy - Sprint started. Some people never got assigned.
  • Inactive Unverified Users - easy - Signed up. Never verified. Never came back.
  • Initial Count - easy - Support is looking for naming patterns that predict ticket volume.
  • In-Stock Product Count - easy - How many products are actually available?
  • Japan Revenue for April - easy - Last month's numbers for one region.
  • Joined Employee Details - easy - Combine two related tables with a join.
  • Largest Group - easy - One group towers above the rest.
  • Last Five Batch Jobs - easy - The last five. A quick tail check.
  • Last Migration Record - easy - The most recent migration. Is it the last?
  • Last Server Activity - easy - Each server's last heartbeat.
  • Latency vs Regional Average - easy - Each service versus its region's average.
  • Latest Metric Values - easy - Stale records hiding in the metrics.
  • Latest Session Per User - easy - Everyone has a most recent session.
  • Latest Version Per Service - easy - The latest version deployed. Each service.
  • Log Entries by Level - easy - Info, warn, error, fatal. The breakdown matters.
  • Log Volume by Day of Week - easy - Some days are noisier than others.
  • Longest Active Membership Streak - easy - The longest unbroken streak.
  • Longest Deploy With Full Identifier - easy - The longest deployment. Full ID.
  • Long Searches Containing 'er' - easy - Long queries with 'er'. A pattern?
  • Low-Byte CDN Responses - easy - Tiny responses from the edge.
  • Low-Engagement User Count - easy - How many users are barely engaged?
  • Lowest Average Price Category - easy - The cheapest category. Not necessarily the worst.
  • Low Latency API Calls - easy - Fast endpoints. Confirmed fast.
  • Low Severity DQ Checks - easy - Low severity checks. All of them.
  • Low Throughput Pipelines - easy - Pipelines barely moving data.
  • Low Uptime Services - easy - Underperforming services.
  • Max Value Per Location - easy - Every location has a peak.
  • Memory-Heavy Pods - easy - Memory-hungry workloads.
  • Merge-Triggered Builds 2026 - easy - How many builds came from merges this year?
  • Message Length - easy - Verbose commits. Risky changes?
  • Messages Containing Keyword - easy - Flagged terms in the messages.
  • Messages From Specific Users - easy - Specific users. What did they say?
  • Metric Range Per Group - easy - The spread within each group.
  • Metric Value Quarter Complement - easy - Two metrics that accidentally match.
  • Metric Volatility Gap - easy - Stable metrics are boring. Volatile ones need attention.
  • Mid-CPU Nodes - easy - Not the heaviest. Not the lightest. The middle.
  • Mid-Range Cost Allocations - easy - Not the cheapest. Not the priciest. The middle.
  • Mid-Tier Batch Jobs - easy - Not the biggest, not the smallest. The overlooked middle.
  • Missing Email for Non-Active Users - easy - No email on file. No recent activity. Something smells off.
  • Mobile Event Counts - easy - Mobile engagement, device by device.
  • Monthly Active Users per Endpoint - easy - One endpoint, many users. Which ones showed up?
  • Monthly Category Totals - easy - Sum amounts by category and month.
  • Monthly Deployment Count - easy - Deploys by month.
  • Monthly Signup Counts - easy - Signups, month by month.
  • Monthly Transaction Counts - easy - Every month tells a spending story, user by user.
  • Monthly Unique Users per Campaign - easy - Monthly reach, campaign by campaign.
  • Morning Warning Logs - easy - Warnings before noon.
  • Most Common Export Job Status - easy - The most common job status.
  • Most Recent Token Usage - easy - Each user's latest token activity.
  • Multi-Column User Sort - easy - Sorted by name. Then by something else.
  • Multi-OS Users - easy - iOS today, Android tomorrow.
  • Multi-Provider Cost Lookup - easy - AWS, GCP, Azure. Side by side.
  • Multi-Variant Experiments - easy - One user, multiple experiments.
  • Never-Ordered Products - easy - In the catalog. Never purchased.
  • Nodes in Target Regions - easy - The target regions need attention.
  • Node Summary Per Region - easy - Every region has a node story.
  • No Gaps - easy - Zero blanks. A clean contact list.
  • Non-Bot Acknowledged Alerts - easy - Human-acknowledged alerts only.
  • Non-Draft Content - easy - Everything except drafts.
  • Notifications Opened on Date - easy - One day, many pings. How many actually got opened?
  • Nth Highest Salary - easy - Not the highest. Not the second. The third.
  • Nth Largest Value - easy - Select the row with a specific rank position.
  • NULL Keys in Joins - easy - Rows that vanish during the join.
  • Oldest and Newest User Sessions - easy - The extremes of the user base.
  • One-Star Product Review Count - easy - One-star reviews. How many?
  • Overall Average API Latency - easy - The overall average. Across everything.
  • Peak Activity by Device - easy - Activity windows, device by device.
  • Peak Ad Revenue Moment - easy - The single peak earning moment.
  • Peak Metric Per Department - easy - Peak metrics for the quarterly deck.
  • Peak Non-Converting Month - easy - Everyone showed up. Nobody bought anything.
  • Peak Satisfaction - easy - Which departments are winning on satisfaction?
  • Peak Spending Month - easy - One month, the bill was unforgettable.
  • Pending Batch Jobs - easy - Stuck jobs. Still pending.
  • Pipeline Run History - easy - The lineage trail.
  • Pipeline Throughput Ratio - easy - Compute current-to-initial value ratio per period.
  • Platform Check - easy - OS and device combos. Which sessions last longest?
  • Platform Team Feature Flags - easy - The platform team owns a lot of flags.
  • Platform Team Mobile Flags - easy - Mobile flags under platform ownership.
  • Pod Distribution by Restart Count - easy - Low-restart pods. Reliable or idle?
  • Popular Categories - easy - Merchandising only cares about categories big enough to negotiate shelf space.
  • Price Check - easy - Priced to sell or priced to sit?
  • Production Deployment Count - easy - How many production deploys?
  • Production Deploys From April Onward - easy - After the cutoff, how many times did prod get a push?
  • Product Name Letter Replace - easy - A quick text transform on product names.
  • Product Name Prefix - easy - Just the first three characters. That is all.
  • Product Page Sale Searches - easy - They searched from the product page.
  • Product Revenue Ranking - easy - Rank them by revenue. See who leads.
  • Products Without Sales - easy - Listed but never sold.
  • Profitable Categories by Price - easy - The most profitable categories.
  • Promo Campaign Cost per Acquisition - easy - The campaign ran. What did each customer cost?
  • Provider Cost Change H1 - easy - Cost swings in the first half of the year.
  • Purchase Log - easy - Names on receipts, not just IDs.
  • Q2 Search Volume - easy - Q2 search volume. The numbers.
  • Quarterly Deployment Count - easy - Deploys per quarter.
  • Recurring Error Types - easy - The same errors, recurring.
  • Regional Profits - easy - P&L by region. Before the board meeting.
  • Regions With 5+ Nodes - easy - Regions with five or more nodes.
  • Retargeting Campaign Impressions - easy - Retargeting impressions. All of them.
  • Revenue by Product - easy - Which products carry the revenue line?
  • Revenue for Specific Users - easy - Alice and bob. Total spend.
  • Reviews Per Reviewer - easy - The workload split across reviewers.
  • Running Node Pairs - easy - Two servers, same region, both alive.
  • Satisfaction Score by Region - easy - Satisfaction scores. Missing region data.
  • Search Endpoint Status Distribution - easy - Status codes on the health endpoint.
  • Searches by Users With Email - easy - One user's search behavior.
  • Search Terms Starting With G - easy - Queries starting with 'g'.
  • Second Highest Salary - easy - Silver medal. Almost the top, but not quite.
  • Second Highest Value - easy - Almost the top. Not quite.
  • Service Alert Frequency - easy - How often does each service trigger alerts?
  • Services With Most Error Occurrences - easy - The noisiest services.
  • Service User Growth Rate - easy - User growth, service by service.
  • Session-Fit Content - easy - Content that fits the session length.
  • Session Logins Dec 13 to 19 - easy - Logins during one specific window.
  • Session Pulse - easy - Engagement is slipping. Who is phoning it in?
  • Sessions Per Device Type - easy - Sessions, device by device.
  • Signups by Age Bucket Since April - easy - Recent signups by age.
  • Signups Jan to Jul 2026 - easy - Signups from January through July.
  • Sirens and Smoke - easy - Stale alerts. Still ringing.
  • Slow Batch Jobs - easy - Promised by noon. Delivered at midnight.
  • Slow Failures - easy - SRE is hunting for the endpoints that fail slowly enough to burn timeouts.
  • Slow Production Deploys - easy - Production deploys that took way too long.
  • Sort Tokens by Scope Character - easy - Token scopes, sorted for compliance.
  • Status Report - easy - Where are orders getting stuck?
  • Stock Status - easy - Human-readable availability labels.
  • Storage Node Lookup - easy - The storage nodes hold the critical data.
  • Successful Deploy Endpoint Calls - easy - Successful deploys only. No failures allowed.
  • Successful Pipeline Runs - easy - Which pipelines completed successfully?
  • Successful Production Deploys - easy - Successful production deploys with duration.
  • Suspected Bot Sessions - easy - Five seconds or less. Probably a bot.
  • Targeted Ad Campaigns - easy - High-value impressions. Targeted precisely.
  • The Ad Ledger - easy - Annual ad revenue. On the record.
  • The Campaign Trail - easy - Impressions are vanity. Conversions are sanity.
  • The February Cohort - easy - One signup window. One cohort. Who joined the club?
  • The First Half - easy - New arrivals during one specific window.
  • The Legacy Hunt - easy - Old data. Still matters.
  • The Merge Counter - easy - How many builds came from merges?
  • The Publishing Audit - easy - Published years ago. Still generating views?
  • The Token Census - easy - How many tokens are out there?
  • Third Largest Batch Job - easy - Bronze medal in the batch job rankings.
  • Threads Excluding User - easy - Every thread they're not part of.
  • Three Lowest Distinct Cloud Cost Amounts - easy - The three cheapest bills on record.
  • Tiered Transaction Summary - easy - Compute multiple date windowed aggregates in a single query.
  • Timeout Status Records - easy - Unknown status in the health records.
  • Timeout Warning Logs - easy - Timeout warnings. The postmortem trail.
  • Titles Ending With S - easy - Naming conventions. Specifically the plurals.
  • Top 100 Batch Jobs Total Output - easy - The hundred biggest jobs. Combined output.
  • Top 10 Batch Jobs - easy - The ten biggest batch jobs.
  • Top 10 Model Accuracies - easy - Top ten model performance.
  • Top 10 Slowest Endpoints - easy - The ten endpoints nobody wants to call.
  • Top 5 Slowest DNS Lookups - easy - Five DNS lookups that took too long.
  • Top Ad Campaigns by Revenue - easy - Every campaign has a bottom line. Stack them up.
  • Top API Token Scopes - easy - The highest-value token scopes.
  • Top Average By Region - easy - Region by region, who pulls the best average?
  • Top Deployed Model - easy - The best-performing model in production.
  • Top Device by Sessions - easy - One device type generates the most sessions.
  • Top Duration Content Items - easy - The content that held the number-one spot.
  • Top Five - easy - The five priciest items for the luxury section.
  • Top Metric Values - easy - The five highest numbers. No duplicates.
  • Top Mobile OS by Session Duration - easy - Which mobile OS keeps users longest?
  • Top Performing Models - easy - The models that actually perform.
  • Top Product Categories by Sales - easy - The highest-grossing categories.
  • Top-Ranked Wines by Variety - easy - The best bottles. Ranked by variety.
  • Top Recent Sellers - easy - Fresh data, top sellers. The recent leaderboard.
  • Top Selling Items - easy - Revenue crowns the winners. Who sold the most?
  • Top Shelf - easy - Buyers need to know ceiling prices before negotiating with vendors.
  • Top Spenders Dense Rank - easy - Spending speaks. Let the leaderboard do the talking.
  • Total Compute Cloud Cost - easy - Total compute spend. The number.
  • Total Cost by Category - easy - Total spend per category.
  • Total Engineering Cost Allocation - easy - Engineering's total allocated budget.
  • Total Rows by Pipeline Status - easy - Row counts alongside pipeline aggregates.
  • Total User Spend - easy - Each customer's total. Summarized.
  • Transaction Overview - easy - The executive snapshot. Users, products, revenue.
  • Transaction Source Features - easy - One pipeline reviewed them. What did it see?
  • Transactions With Product Names - easy - Simple select progressing to a join
  • Trim Endpoints Right - easy - Trailing whitespace. Clean it up.
  • Trim Search Terms Left - easy - Leading whitespace. Clean it up.
  • Tutorial Content Count - easy - How much of the catalog is tutorials?
  • Unique Hosts by Node Type - easy - How many unique hosts per node type?
  • Unique Searchers - easy - How many users actually searched?
  • Unique Searchers Count - easy - Unique searchers. The count.
  • Unique Stream Topics - easy - A clean inventory of streaming topics.
  • Unmatched Categories - easy - Categories with nothing on the shelf. Empty aisles.
  • Unreviewed Models - easy - Models that have never been evaluated.
  • Unused Read Tokens - easy - Active tokens that nobody uses.
  • US-East KV Store Entries - easy - KV store inventory. us-east-1.
  • User Age Ranking - easy - Age brackets, stacked from top to bottom.
  • User Engagement Totals - easy - Per-user engagement. The totals.
  • User Event Type Count - easy - How many flavors of activity does each user have?
  • User Roster - easy - Which account states are bleeding users?
  • User Session Roster - easy - Every user paired with their sessions, even users who never logged in
  • User Sessions on Specific Days - easy - One user. Specific days. What happened?
  • Users Per Device Type - easy - Users per device. The split.
  • Users Who Clicked Ads - easy - Ad clickers and their account details.
  • Users With Purchase Events - easy - At least one purchase. That changes everything.
  • Verify Commit ID Uniqueness - easy - Duplicate commit IDs. Are there any?
  • View Count Per Page - easy - Every page has visitors. Some just have more.
  • Views by Specific Users - easy - Retrieve all content views for a set of flagged user accounts
  • Weekly Transaction Volume - easy - Weekly volume. The pulse.
  • Welcome Wagon - easy - How many signed up this year?
  • Whale Watch - easy - The accounts driving the top line.
  • Yearly Output - easy - Publishing velocity for the board deck.
  • 2026 Signup Count - easy - This year's signup count.
  • Join Type Row Counts - easy - Same tables, different handshakes, wildly different results.
  • Ad Clickers - easy - Who clicked? What did they spend?
  • Clean Averages - easy - Merchandising only cares about the categories customers actually rate.
  • Log Priority - easy - Which servers are on fire before coffee?
  • Unique Visitors - easy - Which months actually had an audience?
  • High-Value Electronics - easy - The five priciest electronics.
  • Regional Status - easy - The full regional breakdown.
  • Click Revenue - easy - Which campaigns are earning their keep?
  • Email Census - easy - The reachability split.
  • Log Levels - easy - Severity breakdown with response times.
  • Above Average - easy - Products beating the catalog average.
  • The Revenue Cliff - medium - Revenue was climbing. Then it wasn't. Spot the drop.
  • The Phantom Readers - medium - They read everything. They bought nothing.
  • The Day-7 Retention Cohort - medium - Day one was promising. Day seven tells the truth.
  • The Latest Transaction Per Product - medium - Every product has a last sale. When was it?
  • 10 Lowest Uptime Services - medium - Ten services at the bottom of the reliability chart.
  • 2FA Confirmation Rate - medium - Two-factor sent. How many confirmed?
  • 7-Check Rolling Average - medium - Seven entries hold the trend.
  • 7-Day Token Retention - medium - Premium tokens, day by day.
  • 80th Percentile API Latency - medium - The 80th percentile tells the real story.
  • 90th Pctl Model Accuracy Gap - medium - Most models are fine. The bottom 10% are not.
  • Above-Average Cloud Spend - medium - Some services quietly burn more than the rest.
  • Above Average Product Prices - medium - Some products cost more than they should.
  • Active Duo - medium - Shoppers who also browse. The overlap is the insight.
  • Active Searchers - medium - They typed a query. That means something.
  • Active Tokens on Target Date - medium - One specific day. Which tokens were still alive?
  • Active User Open Rate - medium - What share of push notifications were opened by active users
  • Active Users by Session Count - medium - Signed up is one thing. Showing up is another.
  • Active vs Regional User Count - medium - Active users versus total users. The gap is telling.
  • Ad Revenue by Age Bucket - medium - Ad dollars, sliced by country.
  • After Hours API Calls - medium - The office is dark. The API is not.
  • Alert Count by Severity Tier - medium - Alerts by severity. The breakdown matters.
  • Alert Response Breakdown - medium - An on-call postmortem asks which services are bleeding alerts nobody acknowledges.
  • Alert Severity Pivot by Service - medium - When services cry wolf, the severity matrix tells who's serious.
  • All Known Endpoints - medium - Two tables. One truth. Every endpoint accounted for.
  • API Calls With and Without Errors - medium - Some calls succeed. Some do not. Break it down.
  • API Calls With Matching Status - medium - Same status, same pattern. Coincidence?
  • API Token Churn Rate - medium - Tokens come and go. What's the turnover?
  • API Traffic by CDN Edge - medium - CDN paths carrying API traffic. Which edges?
  • App Stability by Region - medium - Some regions crash more than others.
  • Attributable Impression Rate - medium - What share of ad impressions can be traced to a real user account
  • Auction Lot Summary - medium - The hammer falls. Who bid the most?
  • Auth Endpoint Callers - medium - Identify users who have called authentication API endpoints
  • Authors Deploying to Dev and Production - medium - Dev, staging, production. Who has touched all three?
  • Average Accuracy by Framework - medium - Not all frameworks deliver the same accuracy.
  • Average API Latency by Year - medium - Latency year over year. Is it getting better?
  • Average Compensation by Department and Status - medium - Average compensation. Department by department.
  • Average Fulfillment Lag - medium - Ordered, then... waiting.
  • Average Initial Call Latency - medium - First contact latency. The benchmark.
  • Average Results for Python Searches - medium - Python searches. What's the click-through?
  • Average Review Comments by Author - medium - Some authors get more feedback than others.
  • Average Session Duration - medium - How long do users actually stay?
  • Average Spending by Account Status - medium - Average per-user lifetime spending segmented by account status
  • Average Update Call Latency - medium - Follow-up calls. How fast?
  • Average Watch Time by Format - medium - Which content format keeps viewers watching the longest
  • Avg Alerts by Severity - medium - Alert patterns by severity.
  • Avg Daily Active Users per Endpoint - medium - Daily engagement, endpoint by endpoint. The averages reveal all.
  • Avg Session Duration by Creator - medium - Some creators keep users longer.
  • Batch Job Performance Tiers - medium - Every batch job gets a grade.
  • Best Accuracy to Training Time Ratio - medium - Fast to train. Accurate too. Which model?
  • Best Day for Ad Revenue - medium - One day of the month outperforms the rest.
  • Biggest Deployment Decline - medium - One team's deploy count cratered. Which one?
  • Binary Flag Indicators - medium - On or off. Every flag at a glance.
  • Bottom Endpoints by POST Volume - medium - The quietest POST endpoints.
  • Builds per Author per Branch - medium - Who triggered what, and where?
  • Build Success Rate by Trigger - medium - Which triggers produce green builds?
  • Build Success vs Failure by Repo - medium - Green versus red, repo by repo.
  • Busiest Pipeline Month - medium - One month, more pipeline runs than any other.
  • Busiest Route by Passenger Volume - medium - The busiest route by volume.
  • Busy Authors - medium - Some developers spread their commits everywhere.
  • Campaign Click-Through Rates - medium - Clicks per impression. Campaign by campaign.
  • Campaign Cost Effectiveness - medium - Money in, conversions out. What is the ratio?
  • Campaign Revenue by Click Channel - medium - Which ad format drives the most revenue?
  • Campaigns With Most Clicks - medium - The campaigns getting all the clicks.
  • Categories With Mixed Price Tiers - medium - Users who cross content types.
  • CDN Traffic by Day and Hour - medium - CDN traffic, hour by hour.
  • Cheapest High-Rated Product - medium - Cheap and highly rated. A rare combination.
  • Classify Services by Name - medium - The name tells you what it is. Mostly.
  • Clicked Holiday Impressions - medium - Holiday ads. Who actually clicked?
  • Click vs Non-Click Rates - medium - Some searches lead to clicks. Most do not.
  • Cloud Cost Stats by Provider - medium - Three providers. Three very different bills.
  • Cloud Cost Trend Analysis - medium - Cost trends across billing periods.
  • Combined Cloud Spend by Region and Service - medium - Region by region. Service by service. Where does the money go?
  • Commit Royalty - medium - In a sea of commits, only a few wear the crown.
  • Completion Rate - medium - Not every region closes orders cleanly. The percentages tell the story.
  • Consistent High-Quantity Revenue - medium - Big orders, consistent revenue. A rare combination.
  • Content Recommendation Engine - medium - Pages they haven't discovered yet.
  • Content Session Counts - medium - Session metrics, content item by item.
  • Cost Density Extremes - medium - Some regions pack more cost per node than others.
  • Cost Share Within Category - medium - Each entry's slice of the category total.
  • Creators With Top-Rated Content - medium - Top-rated content. Who made it?
  • Cross-Region Customers - medium - Orders crossing borders.
  • Cross-Variant User Pairs - medium - Same experiment. Different variants. Who overlaps?
  • Cumulative Monthly Revenue Avg - medium - Revenue, cumulating month by month.
  • Currently Active Feature Flags - medium - Which flags are live right now?
  • Customers Without Orders - medium - Customers who have never ordered.
  • Custom Message Type Counts - medium - Not all messages are created equal.
  • Daily Error Count Change - medium - Errors, trending up or down?
  • Daily Error Resolution Ratio - medium - Reported versus removed. The daily ratio.
  • Daily Metric Percentage Change - medium - Yesterday versus today. What moved?
  • Daily Session and User Counts - medium - Sessions and users, day by day.
  • Daily Spam Impression Rate - medium - How much of the ad feed is spam?
  • Daily Top Endpoints - medium - Three winners each day.
  • Data Repo Fix Commits - medium - How many commits start with 'fix'?
  • Days with More Edited Than Unedited Messages - medium - Some days, more messages get edited than sent.
  • Deduplicate and Keep Latest - medium - Duplicates everywhere. Only the freshest survives.
  • Deduplicated Sales Volume by Category - medium - Clean the noise, then see what each aisle really earned.
  • Department Cost by Status - medium - Headcount and compensation. The dashboard view.
  • Department Running Totals - medium - Compute cumulative metric values within each department using window operations.
  • Deploy Author Performance Score - medium - Not all deployers are equally reliable.
  • Deployment Failure Impact - medium - When deploys fail, how bad is the blast radius?
  • Deployments per Environment - medium - Dev, staging, prod. Where do most deploys land?
  • Deploy Reliability Scores - medium - A reliability scoreboard for deploy teams.
  • Devices Per Age Bucket - medium - Device diversity among the younger users.
  • Device Type Serving Most Users - medium - One device type serves more users than the rest.
  • Disabled Flag Ratio - medium - Feature flags that went dark. What percentage fell silent?
  • Distinct Chat Conversations - medium - How many unique conversations?
  • DQ Fail Rate by Table - medium - Pass rates, table by table.
  • DQ Score Spread - medium - The spread in data quality scores.
  • Duplicate DQ Check Records - medium - Passed QA twice. That's the problem.
  • Duplicated User Event Messages - medium - Duplicated messages from the alerts topic.
  • Duplicate Training Runs - medium - Same model, trained twice.
  • Early Commit Velocity by Author - medium - How productive was each author during the first year of a repo's CI pipeline
  • Early User Activation - medium - Activated early. A good sign.
  • Efficient Pipeline Throughput - medium - Throughput per pipeline. The benchmark.
  • Endpoint Latency Spread - medium - Latency spread across endpoints.
  • Endpoint Performance Report - medium - Every endpoint has a speed and a reliability story.
  • Endpoint With Most GET-Only Users - medium - Read-only users have a favorite endpoint.
  • Engagement by Content Type - medium - Some content types get all the attention.
  • Engagement Gap - medium - Zero transactions is still a data point. Count everyone.
  • Error Hall of Fame - medium - The year's worst error categories.
  • Error Rate by Region - medium - Error rate per day and region via conditional aggregation.
  • Exclusive Users per Device Type - medium - Loyal to one platform only.
  • Experiment Conversion Pivot - medium - Variant A or Variant B? The conversion numbers tell the story.
  • Extract Deploy Versions - medium - The version number is buried in the log.
  • Extreme API Token Usage - medium - Outlier tokens. Suspiciously busy.
  • Extreme Category Totals - medium - The highest and the lowest. Both are interesting.
  • Extremely Late Resolutions - medium - Twenty minutes past the SLA. Still unresolved.
  • Failed Constraint Checks Count - medium - Constraints failed. How many?
  • Failure Rate - medium - Build failures happen. Which repos break the most?
  • Fastest CI Build Date - medium - The fastest build ever. When did it happen?
  • Fastest Completion Per Day - medium - Every day has a speed champion.
  • Fastest Regions by Latency - medium - The fastest regions. Benchmarked.
  • Feature Flag Adoption - medium - How widely adopted are the flags?
  • Feature Quality by Source - medium - Quality varies by source.
  • Feature Vote Winner - medium - Users voted with their clicks. Who won?
  • Find the Fifth Largest Cost - medium - Not the biggest. Not the smallest. The fifth.
  • First and Last Peak Accuracy Dates - medium - Peak accuracy. When it first hit and when it last did.
  • First and Last Timeout Per Service - medium - First timeout. Last timeout. Each service.
  • First Deploy Attribution - medium - The first deploy per service.
  • First Half of Page Views - medium - Half the data. The first half.
  • First Time Learners Per Day - medium - Brand new users, day by day.
  • First Touch Attribution - medium - The first interaction matters most. Or does it?
  • Frequent Message Senders - medium - Someone is sending too many messages.
  • Friday Sessions for Shared Experiments - medium - Friday vibes only. Same experiment, different users.
  • Fulfillable Order Percentage - medium - What percentage of orders can be fulfilled?
  • Ghost Products - medium - Listed but never sold. The shelves collect dust.
  • Heavy Ad Exposure - medium - Saturated with ads. Is it too much?
  • Heavy Hitters - medium - Some repos never sleep.
  • Heavy Namespaces - medium - Kubernetes has favorites. Some namespaces carry more weight.
  • Highest and Lowest Cloud Costs - medium - The extremes in cloud spending.
  • Highest Daily Spend - medium - Somewhere in that window, someone broke the spending record.
  • Highest Node Density Regions - medium - Some regions are packed with nodes.
  • Highest Throughput Pipelines - medium - The pipes that carry the most water.
  • Inactive Android Control Users - medium - Android control cohort. Gone quiet.
  • Inactive Users in Date Range - medium - Ghost accounts. Active signup, zero sessions.
  • Inactive vs Suspended Engagement - medium - Premium versus free. The engagement gap.
  • iOS Adoption by Age Bucket - medium - The install numbers don't match the hype.
  • iOS Sessions by Device Type - medium - iOS engagement, device by device.
  • Job Status Duration - medium - How long in each job state?
  • Keep Most Recent Record - medium - Carbon copies clutter the table. Only the latest matters.
  • Keyword-Based User Search - medium - The search terms reveal intent.
  • Largest A/B Test by Participants - medium - The biggest experiment ever run.
  • Largest Single Cloud Cost - medium - One line item. The biggest bill of all.
  • Latency Gap to 10th Fastest - medium - One server. Compared to the 10th fastest.
  • Latest Commit Build Cost - medium - The latest commit came with a build cost.
  • Latest Migration Output per Author - medium - Each author's most recent migration output.
  • Leading ML Frameworks by Accuracy - medium - Which frameworks lead on accuracy?
  • Least Viewed Content - medium - Nobody is watching. Should it still exist?
  • Longest Gap Between Token Events - medium - The longest gap between token events.
  • Longest Running Pipeline - medium - One pipeline outlasted them all.
  • Long Messages - medium - Some commit messages tell a novel.
  • Long-Running Feature Flags - medium - Flags that have been on for too long.
  • Low-Engagement Sessions - medium - Users whose average session duration is below the engagement threshold
  • Lowest Cost Network-Heavy Team - medium - Networking costs versus compute. Which teams?
  • Lowest Latency per Service - medium - The fastest response each service ever gave.
  • Low Severity Checks in 2026 - medium - Low severity. High volume.
  • Low-Volume Stream Topics - medium - Quiet topics in the stream.
  • March Revenue by Customer - medium - One month, every customer, every dollar accounted for.
  • Median Null Percentage of Float Features - medium - Nulls in float columns. How widespread?
  • Mentorship User Pairs - medium - Pair them up. Mentor and mentee.
  • Metric Count - medium - How deep does each department's tracking go?
  • Metric Value Pairs Over Threshold - medium - Two metrics, both above the line.
  • Minimum Cost Per Provider - medium - The cheapest month from each provider.
  • Mobile vs Desktop Session Duration - medium - Mobile versus desktop. Who stays longer?
  • Models With Variable Accuracy - medium - Accuracy should be stable. These models are not.
  • Model Training Completion Rate - medium - How many models finished training?
  • Monthly Cohort Retention - medium - Compute month over month retention rates for user signup cohorts.
  • Monthly Revenue Comparison - medium - Last month versus this month. Per product.
  • Monthly Running Total - medium - Cumulative sales per product across months.
  • Monthly Spend Pivot by Provider - medium - Cloud bills by month, split by who sent the invoice.
  • Monthly Transaction Summary - medium - A monthly engagement summary.
  • Month With Fewest Deploys - medium - One month, nobody deployed.
  • Most Active Chat Users - medium - The loudest voices on the platform.
  • Most Active Recent Committers - medium - Who has been writing the most code lately?
  • Most Active Servers by Log Volume - medium - The busiest servers by log volume.
  • Most Commented Code Review - medium - The code review that started a debate.
  • Most Common Monday Outcome - medium - Mondays have a pattern.
  • Most Efficient API Endpoint - medium - Best throughput per call.
  • Most Frequent Error Types - medium - The errors that keep coming back.
  • Most Ordered Product by Country - medium - Popular products in specific markets.
  • Most Popular Content Type - medium - The content type everyone prefers.
  • Most Popular Signup Day - medium - One day of the week wins on signups.
  • Most Profitable Region Month - medium - One region, one month. Peak profit.
  • Multi-Host Regions by Node Type - medium - Some regions are quietly building empires.
  • Multi-Table Report - medium - Join three tables into a summary report.
  • Mutual Channel Connections - medium - Two users. What channels do they share?
  • Negative Outcome Rate for New Users - medium - New users have a rough first two weeks.
  • Net Lines - medium - Some authors build. Others trim. The net tells the truth.
  • New Customers Per Day - medium - Count users whose first order falls on each date.
  • New User Purchases - medium - Revenue from the signup cohort that joined this year.
  • Nodes by Region and Type - medium - Broken down by region. Broken down by type.
  • Nodes in Key Regions - medium - Six regions. How many nodes in each?
  • Noisiest Tables by DQ Failures - medium - The tables that fail the most checks.
  • Non-Trivial Fatal Errors - medium - Short errors are noise. Long ones matter.
  • Notification Delivery Ratio - medium - Sent versus delivered. The gap is the problem.
  • Notification Open Rate - medium - Sent versus opened. The rate.
  • Notifications Pivot by Weekday - medium - Notifications by platform and day of week.
  • Nth Highest Salary Per Department - medium - Third place in every department.
  • Opened Notifications in Jan-Feb - medium - Two months of push notifications. How many were actually read?
  • Over-Budget Services - medium - Over budget. Flagged.
  • Overlapping User Sessions - medium - Two sessions, one user, same clock. Something overlaps.
  • Overloaded Infrastructure Nodes - medium - CPU above 90. Memory above 80. Red alert.
  • Pages Viewed by Session Duration - medium - Longer sessions, more pages? Check.
  • Pairwise Latency Maximum - medium - Every pair compared.
  • Peak API Hour - medium - The hour when traffic peaks.
  • Peak Hour Power Callers - medium - One hour. The phone lines exploded.
  • Peak Latency for 2026-Era Endpoints - medium - Peak latency for that era's endpoints.
  • Peak Retargeting Revenue Month - medium - Retargeting revenue. The peak month.
  • Pipeline Completion Rate - medium - How far do users get through the flow?
  • Pipeline Overhead by Environment - medium - Production overhead versus staging.
  • Pipeline Recovery by Priority - medium - Recovery time, priority by priority.
  • Pivot Event Counts - medium - Reshape rows into columns by event type.
  • Pod CPU to Memory Ratio - medium - CPU versus memory. Resource efficiency.
  • Power Users - medium - Engagement separates tourists from regulars.
  • Power Users by Session Activity - medium - More sessions. More time. The power users.
  • Power Users by Session Count - medium - Three sessions is casual. More than that is serious.
  • Price Rank - medium - In every category, someone charges the most. Who's on top?
  • Priciest Item in Each Category - medium - The most expensive item per category.
  • Product Ratings vs Sales - medium - Do higher ratings actually mean more revenue?
  • Products With Strong Unit Price - medium - Budget-friendly and high-performing.
  • Product Transaction Counts - medium - Show how many transactions each product has, sorted by product ID.
  • Profit Tiers - medium - High, moderate, or in the red. Every order gets a label.
  • Prolific Authors in Largest Service Teams - medium - Senior leads in the biggest teams.
  • Provider Spend Variance Between Halves - medium - Two time windows. Did the cloud bill go up or down?
  • Push Notification Open Rate - medium - Push sent. How many opened?
  • Push Notification Status Pivot - medium - Sent, opened, ignored. The notification lifecycle in numbers.
  • Push Opens by Platform and Campaign - medium - Opens by platform and campaign.
  • Quarterly Consolidated Cloud Costs - medium - Quarterly cloud spend, weighted.
  • Rank Users by Search Query Count - medium - Who searches the most? The answer might surprise you.
  • Rapid Retry Detection - medium - Detect retried API calls within 5 minutes of failure.
  • Rate Limit Rules Per Endpoint - medium - Threshold rules, endpoint by endpoint.
  • Rating Tiers - medium - No gaps, no skips. Ratings stacked tight within each category.
  • Recent Price Drops - medium - The price just dropped. Who noticed?
  • Regional Order Summary - medium - Region by region. The order numbers tell the story.
  • Regions by Alert Volume - medium - Some regions are quiet. Others never stop screaming.
  • Region With Best Uptime - medium - The single most reliable region.
  • Region With Most Nodes - medium - Which region hosts the most?
  • Repeat Buyers Across Halves - medium - First half buyer. Second half buyer. Same person.
  • Repeated Transactions - medium - Detect same amount transactions within 10 minutes.
  • Repeat Purchases Within a Week - medium - They bought again within seven days.
  • Repeat Purchase Window - medium - The retention squad is looking for repeat purchasers.
  • Repository Commit Ranking - medium - Lines added tell the story of a repo's ambition.
  • Repos with More Builds Than Commits - medium - More builds than commits. Something is off.
  • Response Buckets - medium - Fast, normal, or slow. Every API call gets a verdict.
  • Retried Failed API Calls - medium - Spot users who retry API calls within 5 minutes of a failure.
  • Returning Buyers - medium - They came back and bought again.
  • Revenue Per Product With Zeros - medium - Total revenue per product. Even the zeros.
  • Reviewer Performance Metrics - medium - Some reviewers are thorough. Others are fast.
  • Reviewers Per Repo Per Year - medium - Reviewers per repo, year by year.
  • Revoked Tokens by Scope - medium - Banned tokens, sorted by what they had access to.
  • Rolling Weekly Total - medium - Seven days at a time, the totals keep rolling forward.
  • Rows With Multiple Flag Conditions - medium - Rows caught by multiple flags.
  • Runner-Up Cost Without ORDER BY - medium - The second highest. Without sorting.
  • Running Tab - medium - Every purchase adds to the total. Watch the tab grow.
  • Rush Hour API Latency - medium - Rush hour hits the API differently.
  • Same-Day Signup Rate - medium - Percentage of transactions on the signup date.
  • Same First and Last Reply Target - medium - They started and ended the month messaging the same person.
  • Satisfaction by Platform - medium - Satisfaction scores, platform by platform.
  • Second Highest Cloud Cost - medium - The second biggest bill on record.
  • Second Highest Latency by Method - medium - Almost the slowest. By method.
  • Senior to Junior Ratio - medium - The ratio tells you a lot about the department.
  • Servers Returning to Origin - medium - Servers that migrated back home.
  • Server With Most Errors - medium - One server stands out. Not in a good way.
  • Service Budget per Head - medium - Budget per head. Pipeline by pipeline.
  • Service Component Classification - medium - Classified by naming pattern.
  • Service Reliability Tiers - medium - Reliability tiers. Based on uptime.
  • Services at Median Uptime - medium - Exactly at the median. Not above, not below.
  • Service Uptime Minutes - medium - Status changed. How long was it actually up?
  • Session Duration by Account Status - medium - Average session duration broken down by user account status
  • Session Overview - medium - Full engagement picture, even for the ones who never showed up.
  • Session Rank - medium - Longest sessions rise to the top. Within each user, a pecking order.
  • Sessions by Content Type - medium - Engagement, broken down by content format.
  • Shared Category Purchasers - medium - They bought different things from the same aisle.
  • Shared Endpoints - medium - Shared credentials across endpoints.
  • Signup to Subscription Rate - medium - Conditional aggregation for conversion rates
  • Single Service Owners - medium - One owner, one service. Nobody else.
  • Smooth Latency - medium - Noisy latency readings, smoothed into a trend you can trust.
  • Spending by Account Status - medium - Segment user spending and activity by account status across the platform
  • Spending Tiers - medium - High rollers, mid-spenders, and the frugal. Everyone gets a tier.
  • Split Metric Sums - medium - One column, two totals.
  • Subscribers Without Premium - medium - Subscribed. But never upgraded.
  • Successful Build Duration by Repository - medium - CI throughput, repo by repo.
  • Successful Call Volume per Endpoint - medium - Not every ping is honest.
  • Sum Excluding Extremes - medium - Remove the outliers. Then sum.
  • Super Reviewers - medium - The most prolific code reviewers.
  • Symmetric Reply Network - medium - Who replies to whom? Both directions.
  • Tables With Many DQ Failures - medium - Some tables have never once passed QA.
  • Tables With Most DQ Failures - medium - The tables with the most failures.
  • Teams Below Double Average Spend - medium - Teams spending under twice the average.
  • Tenure Mentorship Match - medium - Pair by tenure. Longest with newest.
  • The Podium Finish - medium - Top two products per category.
  • The Quiet Alarms - medium - Low severity. High volume. Worth a look.
  • The Slow Lane - medium - Peak API load. The slow endpoints.
  • Third Highest Spender - medium - Bronze medal in spending.
  • Three-Item Combinations - medium - Generate all unique 3-item sets with total cost.
  • Three-Value Sum Combinations - medium - Pick three. See what they add up to.
  • Token Churn Rate - medium - Tokens come and go. How fast is the revolving door?
  • Tokens With Non-Read Scope Prefix - medium - Tokens that don't start with 'read'.
  • Top 10 AB Test Variants - medium - The ten best-performing variants.
  • Top 10 CPU-Heavy Nodes - medium - The ten hungriest nodes.
  • Top 10 Rated Products - medium - The ten highest-rated items.
  • Top 2 Active Push Days - medium - Two days stood out from the rest. Which ones?
  • Top 2 Ad Campaigns by Spend - medium - Two campaigns. Most of the budget.
  • Top 2 Busiest API Slots - medium - Two time slots per week. The busiest.
  • Top 2 Callers per Endpoint - medium - Two top callers per endpoint.
  • Top 2 Cloud Services by Cost - medium - Two services eating most of the budget.
  • Top 2 Rate-Limited Clients - medium - Two clients are hitting the rate limit harder than anyone.
  • Top 3 First-View Pages - medium - The first three pages new users see.
  • Top 3 Revenue Months - medium - The three best months on record.
  • Top Accuracy Model - medium - The single best-performing model.
  • Top Active API Tokens - medium - The five busiest tokens.
  • Top Active Senders per Channel - medium - Top three messages per channel by replies.
  • Top Alert Resolvers - medium - The engineers who resolve the most.
  • Top API Caller - medium - One user triggered more API calls than anyone.
  • Top AWS Non-APAC Service Costs - medium - Outside APAC, AWS costs tell a different story.
  • Top Batch Job Under Priority 1 - medium - Priority one. Top performer.
  • Top Buyers by Transaction Count - medium - Frequency is loyalty. Who keeps coming back?
  • Top Buyers of Premium Products - medium - Which users bought the most top-rated products
  • Top Campaign by Opens - medium - One campaign got all the opens.
  • Top Campaign by User Revenue - medium - Which campaign made each user spend the most?
  • Top Category by User Segment - medium - Each segment has a favorite category.
  • Top Chat Contributors - medium - The ten most active chat users.
  • Top Committers in 2025 - medium - In a sea of commits, only a few wear the crown.
  • Top Content by Lifetime Value - medium - Lifetime value. Measured in total watch time.
  • Top Content by Views - medium - Top five content items by views.
  • Top Content by Watch Time - medium - Some content holds attention. Others get skipped.
  • Top Content Flagger - medium - Flagged content. Who flagged the most?
  • Top Cost Categories - medium - Three categories eating the budget.
  • Top Cost Entry per Team - medium - The single biggest bill per team.
  • Top Earner Per Campaign - medium - The top-earning user per campaign.
  • Top Error Categories in 2025 - medium - Last year's worst error categories.
  • Top Error-Service Pair - medium - Which error-service pair triggered the most resolved incidents
  • Top Frameworks by Accuracy - medium - Top three frameworks by accuracy.
  • Top Identified Event Types - medium - The top users by events, but only the identifiable ones.
  • Top Lessons Each Month - medium - Rank items within time periods and keep top 3
  • Top Metric per Department - medium - Peak performer in every department.
  • Top Pattern Matches - medium - A needle in a haystack, but how many haystacks?
  • Top Percentile Spenders - medium - Top 1% of users by total spend via percentile bucketing.
  • Top Product Categories - medium - Top three categories by page views.
  • Top Product Category by Transactions - medium - Organic purchases, no marketing nudge. Which category wins?
  • Top Products by Quantity Sold - medium - The bestsellers. By volume.
  • Top Products per Category - medium - Five winners per category.
  • Top Region by Order Volume - medium - The single busiest region.
  • Top Regions by Critical Alerts - medium - Which regions have the highest volume of critical alerts
  • Top Regions by Effective Uptime - medium - The most reliable regions.
  • Top Repos by Commit Volume - medium - The most active repos in the org. No ties left behind.
  • Top Repos by Successful Builds - medium - Green builds. Which repos lead?
  • Top Revenue Products H1 - medium - First half of the year. Which products led the revenue race?
  • Top Services by Regional Cost - medium - Top spenders in one region.
  • Top Services by Uptime - medium - Uptime is a competition. Which services never blink?
  • Top Services Per Provider - medium - Within each cloud, two services rise above the rest.
  • Top Spender - medium - When your spending exceeds the priciest item on the shelf.
  • Top Users by Pages Viewed - medium - Five users who browsed the most.
  • Top Users by Recent Spend - medium - Big spenders in the last 30 days.
  • Top Users by Session Time - medium - They spent the most time here.
  • Transaction Revenue by Customer - medium - One month, every customer, every dollar accounted for.
  • Transaction Share of User Spend - medium - Each transaction's share of the whole.
  • Transaction Timeline - medium - First purchase to last. The full spending arc.
  • Trend Spotter - medium - What did they spend last time? Context changes everything.
  • Unclicked Searches by Campaign - medium - Searched but never clicked.
  • Unique Hostnames per Region - medium - How many distinct machines live in each region?
  • Unique Reporters per Content - medium - How many people flagged each item?
  • Unmatched Deploy Services - medium - Two registries. They do not agree.
  • Unsold Product Categories - medium - Dead inventory inflating storage costs.
  • US Active User Share - medium - What percentage of active users are US-based?
  • User Devices - medium - Desktop, mobile, tablet. What does each user actually use?
  • User Engagement Summary - medium - Sessions plus searches. The full engagement picture.
  • Users Outperforming Control - medium - Treatment beat control. For these users.
  • User Spend Audit - medium - One user. One category. Total spend.
  • Users With Admin Tokens - medium - Admin tokens. Who holds them?
  • Users With API Errors - medium - Count unique users who have triggered an API error response
  • Users Without Purchases - medium - How many registered users have never made a single purchase
  • Users Without Sessions - medium - Account created. Never logged in.
  • User With Most Transactions - medium - The most active buyer.
  • Views by Content Type - medium - Count content views broken down by content type
  • Word Count Per Message - medium - How wordy are the messages?
  • Workers Earning Above Department Average - medium - Earning above the department average.
  • Yearly Build Duration by Repo - medium - Build times by repo, year by year.
  • Year-over-Year Content Launches - medium - Launch velocity, year over year.
  • Zero Accuracy on First Training - medium - First run. Zero accuracy. How common?
  • Cumulative Sales Per Customer - medium - Each purchase adds to the running total. Watch it climb.
  • Category Revenue - medium - Which categories pull their weight?
  • Platform Speed - medium - Which devices keep users longest?
  • Click Rate - medium - Campaigns nobody clicks.
  • Above the Curve - medium - Spenders who break from the pack.
  • Department Snapshot - medium - Who is underperforming and who is excelling?
  • Noisy Endpoints - medium - The routes generating the most noise.
  • Build Health - medium - Repos that break more than they ship.
  • Category Buyers - medium - Which categories have the broadest reach?
  • Diverse Shoppers - medium - They shop the whole catalog.
  • Silent Users - medium - Users who have never typed a query.
  • Funnel Leakage Report - hard - Users enter the funnel. Most never reach the bottom.
  • The Session Stitcher - hard - Page views without sessions are just noise.
  • The Regional Cost Reconciliation - hard - Two cost tables, one region. Reconcile the running balance.
  • The Cannibalization Report - hard - The new product launched. The old one suffered.
  • 2nd Most Common Content Type - hard - Everyone talks about number one. What about number three?
  • 7-Day Onboarding Conversion - hard - Signed up Monday. Still here by Sunday?
  • Above Category Avg - hard - Above average is relative. Relative to what?
  • Active User Penetration Rate - hard - How much of the user base is actually alive?
  • Adopters Before Migration - hard - They used the old feature. Did they ever touch the new one?
  • Aggregate Votes by Paper Subject - hard - Net revenue, day by day, for one product in one region.
  • Alert Severity - hard - When the alarms go off, who screams loudest?
  • Allocations in Top Spending Region - hard - The biggest spenders live in one region.
  • Alphabetical Tag Sort - hard - Tags in the wrong order.
  • API Call Distribution Fraction - hard - Not all endpoints are created equal.
  • Average Event Progression Time - hard - How fast do users move through the funnel?
  • Average Sessions Per User - hard - How often do users come back?
  • Best Selling Product by Month - hard - Every month has a winner.
  • Bottom 2% Services by Spend - hard - The bottom 2% of spenders. Who are they?
  • Cache Efficiency - hard - Some edges run hot. Others coast on the global average.
  • Campaign Bookend Engagement - hard - First impression versus last. The gap.
  • Campaign Conversion Count - hard - The push notification went out. Did anyone convert?
  • Campaign Conversion Window - hard - A narrow window between impression and action.
  • Campaign Engagement Rank Shift - hard - Two months, many countries. Who moved up? Who fell?
  • Category Deep Dive - hard - Revenue, units, rank. The full category report card.
  • Cheapest and Most Expensive Service per Region - hard - Every region has a bargain and a budget-buster.
  • Cheapest CDN Route - hard - The cheapest path across regions.
  • Classify Accounts by Activity Tier - hard - The accounts fall into tiers. Where is the cutoff?
  • Cloud Cost Breakdown by Provider - hard - Cloud costs, provider by provider.
  • Commit Cadence - hard - Some repos go quiet for too long.
  • Consecutive Cost Growth Periods - hard - Five straight months of spending increases.
  • Content Page Spreads - hard - Content, laid out in two columns.
  • Cost Efficiency Variance - hard - Cost efficiency varies. By how much?
  • Creator Favorite Content Type - hard - Every creator has a go-to format.
  • Daily Net Revenue - hard - Net revenue, day by day. Refunds included.
  • Data Quality - hard - Failed checks pile up. Which tables need the most attention?
  • Department Quarterly Pivot - hard - Headcount by department, sliced by quarter. The org chart in numbers.
  • Deploy Velocity - hard - Days between deploys. Some services ship fast, others crawl.
  • Endpoint Name Word Count - hard - Some endpoint names are novels.
  • Endpoint Ranking - hard - The slowest endpoints. Called to the principal's office.
  • Error Category Breakdown - hard - Postmortem time. Categorize the errors.
  • Exact Keyword Counts in Logs - hard - Errors and warnings. Count every single one.
  • Experiment Impact - hard - Which experiments moved the needle? Rank them within each group.
  • Experiment Variant Ratios - hard - Control versus treatment. The participation split.
  • Fastest and Slowest Services by Region - hard - The fastest and slowest in every region.
  • Fastest Page View to Click - hard - How fast from view to click?
  • Feature Flag Engagement Impact - hard - Flags on versus flags off. The engagement gap.
  • Feature Flag Fan vs Detractor Pairs - hard - Some users love the flag. Others want it gone.
  • Feature Name Intersection - hard - Training names versus serving names. The overlap.
  • First-Day Session Retention - hard - Day one retention. The first test.
  • First Interaction Credit - hard - Attribute transactions to earliest touchpoint
  • Flatten Org Chart Hierarchy - hard - The tree runs deep. Walk every branch to the root.
  • Friday Spending Analysis - hard - Friday spending during Q1.
  • Full Funnel - hard - Search. Browse. Buy. Only a few do all three.
  • Healthiest Service Check History - hard - The healthiest service. Full history.
  • High Engagement Pages - hard - Some pages hold attention longer than others.
  • Impressions by Search Keyword - hard - Campaign performance, keyword by keyword.
  • Incident Keyword Messages - hard - Certain words trigger an investigation.
  • Intra-Region Latency Diff - hard - Same region. Different latency.
  • Largest CDN Response - hard - One edge location served something massive.
  • Latency Quartiles Per Endpoint - hard - Quartile breakdowns. Endpoint by endpoint.
  • Latency Variance and Std Dev - hard - How much does latency actually vary?
  • Longest Uptime Streak - hard - Pass, pass, pass. How long until fail?
  • Longest Visit Streaks - hard - Day after day after day. Who kept coming back?
  • Lowest CPU Pods per Namespace - hard - The five lightest pods per namespace.
  • Market Share - hard - Every category wants a bigger slice.
  • Median Cloud Cost by Service - hard - The median cloud bill, service by service.
  • Median Failure Rate by Table - hard - Half the tables fail more than this.
  • Median Household Earnings - hard - Household earnings. The median reveals the middle.
  • Median Model Accuracy - hard - The median accuracy. Not the mean.
  • Median Transaction by Category - hard - The middle transaction in each category.
  • Mid-Range Team Spenders - hard - Above average but not extreme.
  • Minimum Parallel Workers - hard - Too few workers and it stalls.
  • Model Accuracy Drift - hard - Accuracy used to be higher.
  • Mode of Small Team Costs - hard - One charge keeps showing up everywhere.
  • Monthly Cloud Cost Forecast Error - hard - The forecast was off. By how much?
  • Monthly Deploy Counts Pivoted - hard - Deploys by month. Side by side.
  • Monthly Revenue Change - hard - Revenue, month over month.
  • Monthly Service Retention - hard - Users came back. Or they did not.
  • Most Efficient High-Volume Campaign - hard - High volume. Low cost. The dream campaign.
  • Most Efficient Region by Token Usage - hard - Some regions squeeze more out of every token.
  • Multi-Category Buyers - hard - One-category shoppers are boring.
  • Multi-Month Active Users - hard - Active this month and last month. Who stuck around?
  • New Services With Poor Health - hard - New services, already struggling.
  • New vs Returning User Share - hard - Fresh faces versus familiar ones.
  • Node Utilization - hard - Overloaded nodes hiding in busy regions. Spot the hot spots.
  • Oldest Alert per Service - hard - The oldest unresolved alert per service.
  • Peak Concurrent Pods - hard - The most pods alive at once.
  • Peak Concurrent Tokens - hard - How many tokens were alive at the same time?
  • Pipeline Duration vs Throughput - hard - Does throughput correlate with duration?
  • Previous Day Top Service - hard - Yesterday's top spender.
  • Price Pairs - hard - Same shelf, wildly different stickers. Spot the pricing gaps.
  • Quarterly Peak Cloud Costs - hard - Every quarter has a peak bill.
  • Quarter-over-Quarter Latency Trend - hard - Latency trending up or down? The quarters have the answer.
  • Rarest Latency Value - hard - A latency value that appeared exactly once.
  • Regional Sales Growth QoQ - hard - Quarter-over-quarter growth. Region by region.
  • Resolved vs Unresolved Alerts - hard - Resolved versus open. By severity.
  • Rolling Revenue Average - hard - Smooth out the revenue bumps. The trend matters more.
  • Running Total With CTE - hard - A running total that builds step by step.
  • Same-Day Session and Transaction Correlation - hard - Same day session and purchase. Connected?
  • Search Algorithm Rating - hard - How good are the search results?
  • Search Success by User Tenure - hard - Compare search click-through rates between new and existing users.
  • Search Term Length vs Click Rates - hard - Longer queries, more clicks?
  • Second Purchase - hard - The first buy is curiosity. The second is commitment.
  • Sequential Service Transitions - hard - Job to job. The transitions.
  • Service Scorecard - hard - Deploys vs. alerts. One row per service tells the whole story.
  • Services Hitting Cost Threshold - hard - The budget line is here. How many crossed it?
  • Services With Most Checks in 2025 - hard - Last year's most-checked services.
  • Services With Multi-Quarter Uptime - hard - Multi-quarter uptime streaks.
  • Service Uptime Turnaround - hard - It was down. Then it came back. Stronger.
  • Service With Most Critical Alerts - hard - One service keeps setting off the alarms.
  • Session Count Distribution - hard - How are sessions distributed among the newest users?
  • Session Page View Distance - hard - Page view distance per session.
  • Shared Channel Contacts - hard - User networks mapped through messages.
  • Spend and Rank - hard - Five thrones at the top of the spending leaderboard.
  • Spending Range - hard - Between the smallest purchase and the biggest lies the story.
  • Streak Status Changes - hard - Detect value changes across consecutive rows
  • Team Cost Allocation Comparison - hard - Individual spend versus team average.
  • Tenure Spread for Active Tokens - hard - Tenure extremes among active tokens.
  • The Usual Suspects - hard - Same services, same checks, same problems.
  • Top 3 Monthly Costs per Team - hard - Three priciest months per team.
  • Top and Bottom Cloud Spenders - hard - The extremes. Top and bottom.
  • Top Commit Authors by Repo - hard - Three authors per repo. The top committers.
  • Top CPU Pods per Namespace - hard - The two most CPU-hungry pods in each namespace.
  • Top Endpoint by Power Users - hard - Power users have a favorite endpoint.
  • Top Flagged Campaign Resolutions - hard - Flagged the most. Resolved how?
  • Top Framework by Deployments - hard - The framework most often deployed.
  • Top Models by Framework - hard - Every framework has a star model.
  • Top Per Category - hard - Every category has a champion. Crown them all.
  • Top Percentile API Tokens - hard - The most suspicious tokens.
  • Top Regions by High CPU Nodes - hard - Five regions with the hottest CPUs.
  • Total Hours Between Consecutive Events - hard - Hours between state changes.
  • Transaction-Only Features - hard - Exclusive to one source. Missing from the other.
  • Upvote Percentage by Age Cohort - hard - New users versus existing. The upvote gap.
  • User 360 - hard - One row per user. Everything they did, or didn't do.
  • User Campaign Overlap Percentage - hard - How much ad overlap between users?
  • User Connection Score - hard - Every user has a social score.
  • User Spend Segmentation by Category - hard - Users segmented by spending behavior.
  • Users Who Churned in February - hard - Gone in February.
  • Users With and Without Ad Clicks - hard - Clicked an ad versus never clicked. The split.
  • Viewer-to-Purchaser Activity - hard - Started as viewers. Became creators.
  • Weekly Order Status Report - hard - Weekly order status. The report.
  • Weekly Transaction Day Split - hard - Transactions by day of week.
  • Weighted Variant Selection - hard - Select a row using cumulative weight probabilities.
  • Worst Table Per Year by DQ Failures - hard - Every year has a worst table.
  • YoY Signup Growth Rate - hard - This year versus last year. Growing or shrinking?
  • Zero-Retry Job Ratio by Priority - hard - No retries needed. First try success rate.
  • Slowly Changing Dimension Type 2 - hard - Addresses change. History must not be erased.
  • Normalization Tradeoffs in Practice - hard - Clean data or fast queries? You can't always have both.

Data Modeling Interview Questions (56)

  • Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
  • B2B Invoicing Data Model - easy - Invoices go out, partial payments trickle in, and some customers are three months overdue.
  • Fitness Studio Membership Schema - easy - Classes fill up. Members no-show. Billing continues.
  • A Number for the Seller - easy - They want a total. Give them the right schema first.
  • Event Ticketing System Data Model - easy - JSON in. Reporting warehouse out. Design both ends.
  • Loan Management Schema - easy - Money out, payments back. The balance has to be exact.
  • Toll Road Sensor Analytics - easy - Cars enter, cars exit. Except when they don't.
  • Fitness App Data Model - easy - Reps, sets, streaks, and personal bests. Gym rats love their stats.
  • Ride-Sharing Platform Schema - medium - Riders, drivers, and fares. Everyone takes a cut.
  • Employee Transfer Tracking System - medium - People switch teams. HR loses track.
  • Movie Streaming Analytics Schema - medium - They pressed play. What happened next is the whole question.
  • Log Parsing Pipeline Schema - medium - Raw text files, terabytes of them, full of buried signals and cryptic error codes.
  • Livestream Analytics Schema - medium - Someone goes live, thousands tune in, chat explodes, and virtual gifts start flying.
  • POS Sales Data Warehouse - medium - Every beep at the register. Coupons, returns, all of it.
  • Online Retail Star Schema - medium - Prices change. Categories shift. Revenue slices everywhere.
  • Social Platform Data Model - medium - Follows, likes, replies to replies. It never stops.
  • Subscription Churn Analysis Model - medium - Subscribers are leaving. The data knows why.
  • Employee Application Time Tracking - medium - Every minute tracked. Every app accounted for.
  • Food Truck Operations Data Model - medium - Mobile vendor, fixed menu, unpredictable locations.
  • Loan Application Reporting Schema - medium - Approved, declined, or pending. Design the tables that say so.
  • Machine Process Event Log Schema - medium - Machines fire events. Pair them up before they bury you.
  • Order and Shipment Data Model - medium - Order placed. Now track it to the door.
  • Sales Analytics Star Schema - medium - Five rounds with a data engineer. Round five: design the star.
  • Subscription and Payment Data Model - medium - Two user types. Multiple payment methods. One messy billing table.
  • The JSON Files That Became a Data Mart - medium - Three semi-structured inputs. One queryable warehouse.
  • The Plan That Changed Twice This Month - medium - Subscribers come, go, downgrade, and share. The schema has to keep up.
  • The Retail Tables That Need a New Home - medium - A working system. Now redesign it so the analysts can actually use it.
  • The Talent Funnel - medium - Thousands applied. One accepted. Where did the rest go?
  • The Transfer Request - medium - Apply, wait, get approved or denied. Track all of it.
  • Retailer Data Warehouse Design - medium - Queries are crawling. The analysts are not happy.
  • The Table That Lies - medium - Every query comes out wrong. The data is all there.
  • Clickstream and Session Schema - medium - Millions of clicks, mostly anonymous.
  • The Celebrity Problem - medium - One post. A million notifications. Something has to give.
  • Housing Marketplace Analytics - medium - Sellers want buyers. Buyers want deals.
  • Trending Dishes Dashboard - medium - What's everyone eating? The answer changes hourly.
  • Airline Flight Operations Schema - medium - Flights, passengers, and routes. Before you draw a single table, tell me the grain.
  • A/B Experiment Assignment Schema - medium - One user, one experiment, one variant. No exceptions.
  • Multiplayer Game Match History - medium - Millions of matches. The leaderboard refreshes in fifteen minutes.
  • EdTech Classroom Engagement Schema - medium - They opened the assignment. Did they actually read it?
  • Telecom Network Connectivity Warehouse - hard - One device goes down. The ripple keeps going.
  • Metric Definition Reverse Engineering - hard - Five numbers on a dashboard. Your job: figure out where they come from.
  • Property Booking Platform - hard - Five-star listing. Three-star reality.
  • E-Commerce Supply Chain Tracking - hard - A package splits, reroutes, and (maybe) arrives.
  • SCD Type 2 Customer Dimension - hard - Things were different six months ago. Can you prove it?
  • Financial Trading Warehouse - hard - Every trade, every tick, every fraction of a share. The regulators want receipts.
  • Content Engagement Data Model - hard - Post published. Now measure everything that happens next.
  • Content Search and Discovery Schema - hard - Searchable from every angle. Design it so nothing gets lost.
  • Marketplace Sales Warehouse - hard - No schema given. The interviewer is watching.
  • The League With Too Many Loyalties - hard - A player can belong to many teams. The schema must agree.
  • The Schema That Could Not Answer Back - hard - Forty columns in. Zero useful answers out.
  • The Churner Who Came Back - hard - They cancelled. They came back. The report has to tell both stories correctly.
  • The Territory That Keeps Moving - hard - Reps get reassigned. The receipts have to survive.
  • Insurance Claims Lifecycle - hard - A claim gets filed. Then it gets complicated. Then it gets reassigned. Then it loops back.
  • Online Marketplace - Seller Payouts - hard - The buyer paid one number. The seller got a different one.
  • Cloud File Storage Metadata Schema - hard - A file is also a folder. A folder is also a file.
  • Three-Sided Marketplace Delivery Schema - hard - One order. Two deliveries. Revenue counted twice. Where is the bug in your schema?

Pipeline Architecture Interview Questions (121)

  • Hourly ETL Pipeline with Consistency - medium - Every hour, on the hour. No excuses.
  • Time Series CSV Ingestion Pipeline - medium - One massive CSV. Millions of timestamps.
  • Order and Menu Recommendation Pipeline - medium - What they ordered says a lot about what they want next.
  • Card Transaction Streaming Pipeline - medium - Every swipe tells a story.
  • Data Pipeline for Sales Analytics - medium - Sales data is piling up. Someone has to make sense of it.
  • Batch ETL: MongoDB to Redshift - medium - Two databases. One direction. No data left behind.
  • Whiteboard ETL Pipeline Design - medium - Marker in hand. Draw the whole thing.
  • GPS Tracking Pipeline for Logistics - medium - Trucks are moving. Every ping counts.
  • SCD Pipeline into a Delta Lakehouse - medium - Dimensions change. History must survive.
  • SaaS API Connector with Incremental Sync - medium - The API has rate limits. You have deadlines.
  • Real-Time POS Ingestion into Snowflake - medium - The cash register data needs to be queryable by morning.
  • Streaming Pipeline with Schema Validation and Snowflake Sink - medium - Bad records cannot reach the warehouse.
  • Dynamic Schema File Ingestion Pipeline - medium - The schema changed overnight. Again.
  • Pre-Aggregated User Activity Metrics Pipeline - medium - DAU, WAU, MAU. Refreshed every hour.
  • Database Replication and Schema Normalization Pipeline - medium - Production is the source. Analytics needs its own copy.
  • Document Ingestion and Text Extraction Pipeline - medium - Buried in PDFs. The data is in there somewhere.
  • On-Prem to Cloud Pipeline Modernization - medium - The on-prem servers are not getting any younger.
  • The API Drip Feed - medium - The API gives you 100 records at a time. You need millions.
  • CDC Connector: Log-Based vs Trigger-Based - medium - Two ways to watch the database. Each has a cost.
  • Snowflake Query Performance Degradation Diagnosis - medium - Queries used to be fast. Something changed.
  • Real-Time POS Pipeline with Snowpipe and MERGE - medium - Sales hit the register. Snowflake needs to know now.
  • GCP Sales Analytics Pipeline - medium - Sales data, BigQuery, Dataflow. Make it all sing.
  • Resume Document Ingestion and Extraction Pipeline - medium - A thousand resumes. Structured data inside each one.
  • Subscription Analytics Pipeline - medium - Subscribers churn. The pipeline cannot.
  • Large-Scale Sales Data Pipeline for CPG Analytics - medium - Retail data at CPG scale. Every SKU, every store.
  • Financial Services Pipeline with Regulatory Reporting - medium - The regulator does not accept 'eventually consistent.'
  • Event-Driven Insurance Pipeline with Async Claim Processing - medium - Policies are instant. Claims take their time.
  • Databricks Pipeline with Spark Performance Optimization - medium - Spark jobs are running. Just not fast enough.
  • Gaming Event Pipeline: Streaming vs Batch Architecture Decision - medium - Millions of gamers. The architecture decision changes everything.
  • Vehicle Fleet Telematics and Rental Operations Pipeline - medium - Every vehicle is reporting. Every rental matters.
  • Insurance Claims and Policy Data Platform on Azure Databricks - medium - Claims arrive messy. The medallion cleans them up.
  • Healthcare Claims CDC Pipeline with PySpark - medium - Healthcare claims change constantly. The warehouse cannot fall behind.
  • Fintech Lending Platform Event Pipeline - medium - Loan approved. Loan denied. Every decision is an event.
  • Azure Data Factory Orchestration with Databricks Unity Catalog - medium - ADF orchestrates. Unity Catalog governs. Nothing leaks.
  • Energy Trading Market Data Pipeline - medium - Markets move in milliseconds. The pipeline has to keep up.
  • Streaming Content Metadata and Viewer Engagement Pipeline - medium - The catalog updated. Did anyone notice?
  • E-Commerce Platform Analytics Pipeline: Orders to Warehouse - medium - Orders placed. Data warehouse hungry.
  • Regulatory Data ETL Pipeline with Dynamic Schema Handling - medium - The regulator changed the format. Again. Handle it.
  • Last-Mile Delivery Shipment Tracking State Machine Pipeline - medium - Out for delivery. Delivered. Except the events arrived backwards.
  • Financial Ratings Data Pipeline with dbt Incremental Strategy - medium - Ratings change. The incremental model has to keep pace.
  • The Fare Aggregator - medium - Airfares shift every minute. Catch the best ones.
  • The Consent Stitcher - medium - Consent was given. Or was it? Stitch the records together.
  • Loyalty Rewards Pipeline with Late Bank Data - medium - The bank data shows up late. The rewards were already sent.
  • Multi-Cloud Billing Unification Pipeline with Medallion Architecture - medium - AWS, Azure, GCP. Three bills. One truth.
  • Multi-Touch Marketing Attribution Pipeline on Snowflake - medium - They saw the ad, clicked the email, then bought. Who gets credit?
  • The Queue That Wouldn't Stop Growing - medium - 500,000 messages behind and the number keeps climbing.
  • The Vendor Who Never Warns You - medium - Every month, something is different. The dashboards have no idea.
  • The Sale That Needs to Land Now - medium - Three channels feeding one view. Not all of them speak the same language.
  • The Provider That Sometimes Sleeps - medium - The models run at dawn. The data has to be there first.
  • The Revenue That Was Wrong for Two Weeks - medium - Nobody caught it until the CFO asked a question. Design the system that catches it first.
  • Six Hours to Miss a Deadline - medium - The rebuild works. It just doesn't finish in time.
  • Every Device Has Its Own Dialect - medium - Three sources. Three formats. Same workout.
  • Personalization Platform Ingestion - medium - Fresh signals, many teams, one pipeline.
  • The Claim That Picks Its Own Lane - medium - Three entry points. Different workflows. All must route correctly.
  • The Distributor Filing Problem - medium - Hundreds of suppliers. One warehouse. One deadline.
  • URL Shortener Click Analytics Pipeline - medium - Billions of clicks. One tiny code. Two very different clocks.
  • Real-Time Fraud Detection Pipeline - hard - The fraudsters move fast. Your pipeline has to move faster.
  • Event System for Multiple Consumers - hard - One event, many hungry consumers.
  • Real-Time Sales Lakehouse Ingestion - hard - The registers never stop ringing.
  • Viewing Event Pipeline - hard - Someone is watching. Capture everything.
  • Ad Simulation Platform Pipeline - hard - A million slots. A thousand campaigns. Every combination matters.
  • Data Ingest Pipeline with Access Tradeoffs - hard - How you store it decides how fast you can read it.
  • Fintech ETL with Data Validation Checks - hard - Bad data in fintech is not just messy. It is expensive.
  • ML Feature Pipeline for Model Deployment - hard - The model is only as good as what you feed it.
  • Streaming CDC into Delta Lake with UPSERT - hard - The source changed. The lake needs to know immediately.
  • Multi-Region Payment Event Pipeline - hard - Payments from everywhere. One consistent report.
  • Dual-Source Inventory Sync Pipeline - hard - Two systems, two schemas. One truth.
  • Multi-Device Event Pipeline with Late Data - hard - Phones, tablets, laptops. And some of them report late.
  • Cost-Optimized Clickstream Data Lake - hard - 600 million clicks a day. The budget is not infinite.
  • Livestream Event Ingestion Pipeline - hard - The stream is live. The data cannot wait.
  • S3-Based Data Warehouse with File-Level Access Control - hard - Everyone can see the bucket. Not everyone should.
  • Multi-City Demand Forecasting Data Pipeline - hard - Five cities. Five data formats. One prediction.
  • Healthcare Data Lake with Multi-Format Ingestion - hard - PDFs, HL7, JSON. All of it lands in the same lake.
  • Near-Real-Time Trending Dishes Dashboard - hard - The dish rankings update faster than the kitchen.
  • Lambda Architecture for Batch and Streaming Workloads - hard - Real-time and batch. Same pipeline. No compromises.
  • AWS Pipeline Auto-Scaling for Variable Volume - hard - Tuesdays are quiet. Black Friday is not.
  • Clickstream Pipeline for Apple Product Analytics - hard - Every tap, swipe, and scroll. At scale.
  • Dual-Source Hotel Inventory Sync Pipeline - hard - Two booking systems. Rooms do not duplicate themselves.
  • Merchant Payment Summary Pipeline - hard - Raw payment logs in. Clean merchant summaries out.
  • Multi-Device Streaming Pipeline with GDPR Deletion - hard - Users want their data erased. Completely.
  • Financial Trading Data Warehouse - hard - Fractional shares, multi-currency, point-in-time. All of it.
  • Data Platform IaC with Semantic Layer - hard - Infrastructure as code. Meaning as a service.
  • Online Schema Migration on a Billion-Row Table - hard - Add and backfill a new column to a billion-row production table with zero downtime.
  • Order and Menu Feature Pipeline for Recommendations - hard - They ordered pad thai twice. That means something.
  • AWS Pipeline with Auto-Scaling and Cost Governance - hard - Scale up when needed. Do not bankrupt the team.
  • Pharma Data Ingestion Pipeline with Governance - hard - The FDA has opinions about your data pipeline.
  • City-Wide Bicycle Demand Forecasting Pipeline - hard - Bikes in, bikes out. The city needs to predict demand.
  • Cost-Efficient Clickstream Analytics with Two-Year Retention - hard - Two years of clicks. Every query has to be affordable.
  • Retail Clickstream Event Store at Kafka Scale - hard - 600 million events a day. Two years of retention.
  • Cellular Connectivity and App Log Data Warehouse - hard - Tower signals meet app events. Somewhere in between is the truth.
  • On-Prem and Event-Driven Pipeline Migration to Cloud - hard - Half the jobs run on cron. Half run on events. All of it has to move.
  • HIPAA-Compliant PHI De-identification Pipeline for Development - hard - Dev needs production data. HIPAA says absolutely not.
  • Streaming Device Telemetry and Ad Impression Pipeline - hard - Every ad seen. Every second watched. Real-time.
  • Streaming and Batch Unified Pipeline on Azure Databricks - hard - Streaming and batch. One pipeline to rule them.
  • Consumer Goods Trade Promotion Pipeline on GCP - hard - Was the promotion worth it? The data knows.
  • EHR Platform Operational Data Pipeline - hard - Patient records in, operational insights out.
  • Global Insurance Premium and Loss Ingestion Platform - hard - Premiums collected globally. Losses happen locally.
  • Rocket Delivery Feature Store Pipeline - hard - Same-day delivery. The features have to be faster.
  • Real-Money Card Game Session Reconstruction Pipeline - hard - Real money on the table. Reconstruct every hand.
  • Legacy ETL Modernization with SCD Type 2 Entity Resolution - hard - The legacy pipeline works. Nobody knows how.
  • Connected Vehicle Telemetry Pipeline with IaC Deployment - hard - Every vehicle is a sensor. Deploy the pipeline to catch it all.
  • Real-Time Investment Portfolio Position Pipeline - hard - Positions shift by the second. The math cannot lag.
  • Device Insurance Claims Pipeline with Real-Time Fraud Scoring - hard - The claim looks clean. The fraud model disagrees.
  • TV Audience Measurement Pipeline with Panel Projection - hard - Set-top boxes tell you who watched. Projection tells you how many.
  • Cross-Platform TV and Digital Ad Measurement Pipeline - hard - TV and digital. Same viewer, two measurement worlds.
  • Real-Time News Event Detection Pipeline from Social Media Firehose - hard - The firehose is on. Separate signal from noise.
  • Capital Markets Intraday Risk Pipeline with BCBS 239 Lineage - hard - Intraday risk, full lineage. The regulator is watching.
  • Federated Clinical Trial Data Pipeline - hard - Patient data stays local. Insights have to be global.
  • Print Order Ganging and Manufacturing Analytics - hard - One press run, many orders. Group them right.
  • Daily Payment Log Pipeline - hard - Three regions, billions of payments, one merchant summary by 6 AM.
  • The Booking That Came Three Ways - hard - PMS, OTA, and website all think they took the reservation first.
  • The Boutique That Sold in Six Currencies - hard - Every sale is real. The rate it was converted at depends on who is asking.
  • The Clock That Runs Two Ways - hard - Nightly batch and live events. One dashboard.
  • The Fleet That Never Stops - hard - Every truck is talking. Not everyone can hear them yet.
  • Three Providers, One Workout - hard - The same ride, reported three times.
  • The Decision Before the Door Closes - hard - The window to stop it is smaller than you think.
  • The Migration That Cannot Break Morning - hard - It all works today. Moving it without losing a single report is the hard part.
  • Two Million Boxes by Monday Morning - hard - Shipped, maybe. Delivered, debatable.
  • The Leaderboard That Costs $25K a Month - hard - Product wants it live. Engineering has a price tag.
  • Four Teams, One Topic, No Agreement - hard - Everybody is writing to it. Nobody documented it. Now production is fragile.
  • The Analyst Who Saw the Salary Data - hard - Two incidents. One shared lake. The access model was never designed, just assumed.

How It Works

  1. Choose a domain: SQL, Python, Data Modeling, or Pipeline Architecture
  2. Select your seniority level and target company tier
  3. Start a timed mock interview with a vague prompt
  4. Ask clarifying questions to the AI interviewer
  5. Write and execute your solution against a real database
  6. Get instant feedback and a hire/no-hire decision

Related Resources

  • Practice Problems (untimed)
  • SQL Interview Questions Guide
  • Python Interview Questions Guide
  • Data Modeling Interview Questions
  • Data Engineering Interview Prep
  • System Design Interview Questions
  • Behavioral Interview Questions
  • Daily Challenge
  • Data Engineering Lessons