Data Engineering Mock Interview Questions
1467+ data engineering mock interview questions with AI-powered feedback. Pick your domain, target company tier, and seniority level to start a timed interview simulation. Write real code, ask clarifying questions, and get graded instantly.
Available domains: Python (387 questions), SQL (903 questions), Data Modeling (56 questions), Pipeline Architecture (121 questions). Difficulty levels: easy (534), medium (677), hard (256). Seniority levels: Junior, Mid, Senior, Staff, Sr. Staff.
Python Interview Questions (387)
- The Dominant Signal - easy - Hottest items in the transaction log. Ties included.
- The Original Keeper - easy - Clean up duplicate events without losing the timeline.
- The Forward Fill - easy - Patch the gaps in a noisy sensor stream.
- The Word Mismatch - easy - Some text does not match.
- The Social Graph - easy - Everyone knows someone.
- The Sequel Spotter - easy - Spot the sequels hiding in the catalog.
- The Numbered Chair - easy - A standing list. Position n holds one entry.
- The Character Encoder - easy - Squeeze a string down to its tightest form.
- The One-Way Street - easy - Monotonic time-series. Direction only.
- The IP Validator - easy - Real and fake, mixed together.
- The Log Pulse - easy - Some lines repeat themselves.
- The One-of-Each - easy - Strip the repeats, keep the originals.
- The Config Blender - easy - Config collision. The surviving values after a merge.
- Flatten the Feed - easy - Nested lists, all the way down.
- Activity Time Ledger - easy - Matching activities. One runtime.
- Batch With Metadata - easy - The list gets chopped.
- Caesar Shift Check - easy - The key turns. Does it open?
- Character Occurrence Map - easy - Character frequency as a map.
- Coalesce Fields - easy - Nulls are hiding. Fill them in.
- Column Max - easy - One value rules the column.
- Column Range - easy - From minimum to maximum. What is the spread?
- Column Sum - easy - Add up the column. Every value counts.
- Dominant Element - easy - Majority element. Appears more than half the time.
- Even Filter - easy - Only the even ones survive.
- Explode List - easy - One row holds many values. Unpack it.
- Extract Domain - easy - The domain is buried in the string.
- Flatten the Nest - easy - Mixed nesting. One flat list out.
- Greeting Formatter Class - easy - First impressions are formatted carefully.
- Normalize Name - easy - Names are messy. Standardize them.
- No Shortcuts - easy - The peak value. Built-ins off the table.
- Null Counter - easy - How many holes in the data?
- Ordered Character Check - easy - Check if all As appear before all Bs.
- Progress Milestones - easy - Progress at every 10% increment. Keep the receipts.
- Quality Gate - easy - Not everything passes inspection.
- Quantile Calculator - easy - Mark the boundary value at a given point.
- Record Filter - easy - Some records belong. Others do not.
- Reverse Field - easy - Flip it. See what happens.
- Run Length Encoding - easy - AAABBB becomes 3A3B. Compress it.
- Sanitize Field - easy - Dirty input. Clean output.
- Schema Checker - easy - The schema says one thing. The data says another.
- Sequential Word Pairs - easy - Everything has a neighbor.
- Single Element Among Pairs - easy - One element has no partner.
- Sort Descending - easy - Biggest first. No exceptions.
- The Account Manager - easy - Deposits, withdrawals, and the risk of going negative.
- The Additive Chain - easy - Each value is the sum of the two before it - no calls to itself allowed.
- The Address Surgeon - easy - One string hides a street, a city, a state, and a zip.
- The Alphabet Score - easy - Every letter has a secret numeric value - what's your total?
- The Alphabet Sorter - easy - Filing cabinet logic: everything goes in its proper drawer.
- The Balanced Sum - easy - Some numbers have a rare quality that mathematicians revere.
- The Bit Counter - easy - How many lights are on in the binary representation?
- The Bit Ladder - easy - Count the ones all the way up.
- The Bitwise Judge - easy - No division, no modulo - just a single bit tells you everything.
- The Bouncer - easy - Every door has a guest list.
- The Bronze Medalist - easy - Not first, not last - somewhere in the middle of the podium.
- The Bug Spotter - easy - It compiles. The answer is still wrong.
- The Calendar Sort - easy - Time has its own opinion about order.
- The Carousel - easy - Keep moving, same ride.
- The Character Map - easy - Character-level frequency. As a dictionary.
- The Cipher Wheel - easy - Every letter has an alias - you just need the right codebook.
- The Clock Angle - easy - Two hands. One gap. One number.
- The Code Expander - easy - Compressed messages need a decoder to come alive.
- The Column Transformer - easy - Each column gets its function.
- The Column Zipper - easy - Headers on top, values below, dict in the middle.
- The Complement Hunt - easy - Every number is looking for its other half.
- The Crowd Favorite Eatery - easy - One restaurant clearly won the most hearts.
- The Crowd Pleaser - easy - One value shows up more than all others combined.
- The Crowd Splitter - easy - The middle holds even with a dominant outlier.
- The Decomposer - easy - Every composite thing can be broken down to its simplest parts.
- The Deep Dictionary - easy - One key goes further than the rest.
- The Deep Dive - easy - A specific position in the unsorted pile.
- The Deep Selector - easy - Tell it what you want. It knows where to look.
- The Deep Unpacker - easy - Boxes inside boxes. Eventually you reach the bottom.
- The Depth of Field - easy - Some containers hold containers that hold containers.
- The Diagonal Accountant - easy - Two diagonals cross in the center of every square.
- The Duplicate Spotter - easy - Some values appear more than once - report only those.
- The Even Checkpoint - easy - Is this number in the even club? Prove it the fast way.
- The Expander - easy - What goes in small comes out big.
- The Field Counter - easy - Some fields speak louder than others.
- The First Encounter - easy - Every character has a story - but only if you remember where it started.
- The First Stranger - easy - In a crowd, the unique ones stand out first.
- The Forbidden Ceiling - easy - Round up. But not the obvious way.
- The Gap Filler - easy - Fill the Nones with the last real value.
- The Gate Keeper - easy - Not all openings have a closing.
- The Grid Pivot - easy - A different angle reveals a completely different picture.
- The Halftime Score - easy - Middle value of a dataset. No built-in shortcuts.
- The Hash Stamper - easy - One input, one irreversible output - the foundation of every secret.
- The Indivisibles - easy - Numbers that yield only to themselves.
- The Integer Sieve - easy - Not everything in this list belongs here.
- The Last Instance - easy - When duplicates appear, only the last one counts.
- The Last Seen Map - easy - For each character, where did it appear last?
- The Lazy Squares - easy - A sequence that never fully reveals itself.
- The Letter Census - easy - Every crowd has its share of talkers and quiet ones.
- The Letter Frequency Map - easy - Count every character in the string and report the results.
- The Letter Ledger - easy - Every character has a count to answer for.
- The Letter Tally - easy - Each character in the string has a count to answer for.
- The Line Cutter - easy - Did everyone with an A-pass get through before the B-crowd arrived?
- The Line Splitter - easy - Comma-separated truths, one at a time.
- The Log Decoder - easy - Every line holds a secret.
- The Lone Character - easy - It appeared exactly one time. That made it special.
- The Lone Traveler - easy - One character stands apart from the crowd.
- The Manual Sorter - easy - No shortcuts, no built-ins, just work.
- The Matching Manifest - easy - Two warehouses, one shipment - only load what's in both.
- The Merge - easy - Chaos in. Order out.
- The Messy Pipeline - easy - The upstream API has no idea what a schema is.
- The Minutes Tracker - easy - Some activities eat more time than others.
- The Mirror Flip - easy - Sometimes the fastest fix is to swap everything.
- The Mirror Image - easy - Flip the tape backwards - start from the end.
- The Mirror Test - easy - Check if a string reads the same forwards and backwards.
- The Mirror Words - easy - Each word looks back at itself.
- The Missing Number - easy - Something is missing from the sequence.
- The Molecule Report - easy - Four letters. A lot of math hidden in the sequence.
- The Multiplication Trail - easy - Each step multiplies the whole journey.
- The Never-Ending Sequence - easy - Sequence that keeps going. Follow it.
- The Number Screen - easy - Some numbers make the cut. Most do not.
- The Odd Digits - easy - Hidden inside a mess of characters are a few odd numbers.
- The Odd Extractor - easy - Not all numbers from a string are welcome here.
- The Odd Filter - easy - Strip out everything that does not belong to the odd club.
- The One-Timers - easy - Values that never repeated.
- The Op Dispatcher - easy - Name the operation, apply it everywhere.
- The Order Enforcer - easy - Some rules say every A must come before every B.
- The Overlap Finder - easy - Two guest lists - who made it onto both?
- The Pair Counter - easy - How many pairs can be formed from the crowd?
- The Paired Doors - easy - Every open bracket has a partner - but not every partner shows up.
- The Pascal Row - easy - Each number is the sum of two numbers above it.
- The Password Builder - easy - Random characters, fixed rules.
- The Password Forge - easy - Eight random characters - how many combinations exist?
- The Peak Finder - easy - Largest number in the list. Max() is not an option.
- The Pipeline Filter - easy - In the door as one thing, out the door as another.
- The Price Bander - easy - Different prices, different treatment.
- The Progress Parade - easy - Just tell them how far along you are.
- The Ranked Dict - easy - Values deserve order too.
- The Repeat Offenders - easy - Repetition is a clue.
- The Roman Converter - easy - Roman numerals decoded.
- The Runner-Up - easy - Not the winner. The one just behind it.
- The Running Total - easy - Each position holds the sum of everything before it.
- The Safe Caster - easy - Type conversion is easy, until it is not.
- The Score Sorter - easy - Points on the board, sorted by who earned the most.
- The Scramble Check - easy - Same letters, different order - are these two strings secret twins?
- The Second Summit - easy - Not the top of the mountain - just below it.
- The Secret Twins - easy - Same letters, different disguises.
- The Self-Portrait Number - easy - Some numbers describe themselves perfectly.
- The Shadow Cleaner - easy - Remove the repeats. No shortcuts.
- The Silent Locator - easy - Every lookup should cost you less than the one before it.
- The Single Bit - easy - One particular pattern hides in plain sight.
- The Solo Act - easy - One-and-done values only.
- The Spread - easy - Data spread around a center. The range matters.
- The Squeeze - easy - aaabbb gets old fast. Shrink it.
- The Step Counter - easy - You can hop one step or two - how many ways to reach the top?
- The Streak Breaker - easy - It has a problem with repetition.
- The Style Guide - easy - Not every word deserves the same treatment.
- The Syntax Sentinel - easy - Brackets opened and closed. The nesting might be off.
- The Tail End - easy - Push, pop, peek. The basics that break people.
- The Tail Trimmer - easy - Remove the k-th item from the back without counting forward first.
- The Tally Counter - easy - How many times does a single guest show up to the party?
- The Top Reviewer - easy - One restaurant receives the most feedback - which one?
- The Traffic Director - easy - Spread the load evenly - nobody should be doing all the work.
- The Tree Measurer - easy - How deep does the rabbit hole go?
- The Trip Grouper - easy - Where did everyone go, and for how long?
- The Type Sorter - easy - A mixed list is hiding its numbers - extract them.
- The Value Sorter - easy - The order was always negotiable.
- The Version Parade - easy - 1.0 before 2.0. Don't let the dots confuse you.
- The Vowel Hunt - easy - Just the vowels. All of them.
- The Word Census - easy - Who said what - and how many times?
- The Word Counter - easy - How many times does each word show up in a file?
- The Word Flipper - easy - The sentence stays, the words surrender.
- The Word Inventory - easy - Every word, twice over.
- The Word Map - easy - Input text. Output: word frequency.
- Tokenize - easy - Split it apart. Keep the pieces.
- Transform Column - easy - Same data, new shape.
- Type Caster - easy - Wrong type. Fix it.
- Unique Values - easy - Duplicates are noise. Remove them.
- Value Count - easy - How many of each? Count them.
- Word Counter - easy - Words in, counts out.
- Zip to Record - easy - Two lists become one record.
- The High Mark - easy - Scan the list. Report the max.
- The Event Bucketer - easy - Logs slotted into buckets.
- The List Merger - easy - No shortcuts.
- The Dictionary Inverter - easy - Flip the dict. Group what used to be values.
- The String Shrinker - easy - Compress the string. Shorter wins.
- The Bracket Validator - easy - Brackets opened and closed. The nesting might be off.
- The Trade Signal - easy - Buy low, sell high. Identify the ideal moment.
- The Stream Averager - easy - The answer moves with the data.
- The Generous Ones - medium - The generous ones are obvious.
- The Payload Flattener - medium - Turn a deeply nested API response into a flat row.
- The Resume Sifter - medium - Pull what's useful. Skip what you know.
- The Title Ladder - medium - Job titles and the salary tier they belong to.
- The Repeat Review - medium - The echo came back.
- The File Size Profiler - medium - File types and their disk footprint. One type dominates.
- The Schedule Cleaner - medium - Overlapping sessions. One clean line.
- Stock Range Finder - medium - Prices move. One stretch had the widest gap.
- The Status Board - medium - Make sense of a pile of raw Nginx access logs.
- The Budget Allocator - medium - Split the money. Some wore two hats.
- The Trade Log Aggregator - medium - Every trade left a footprint.
- The Timezone Trap - medium - Trip data and timezones. They're not the same thing.
- The Host Ranker - medium - Some hosts have more to offer.
- The Email Ranker - medium - Some inboxes see more action.
- The Consecutive Streak - medium - Login streaks. No gaps allowed.
- The Schema Differ - medium - Schema from yesterday vs today. Something changed.
- The Throttle Ceiling - medium - Too many requests in too short a timeframe. Throttle it.
- The Event Aggregator - medium - Bucket a firehose of events into tidy time windows.
- The Record Reconciler - medium - Two versions of the same truth.
- The Dependency Resolver - medium - Everything depends on everything.
- Batch Partitioner - medium - One pile becomes many. Split wisely.
- Batch Records - medium - Too many at once. Break them into groups.
- Char Profile - medium - Every character in the string tells a story.
- Cumulative Sum - medium - The total grows with every row.
- Deep Flatten - medium - Nested deep. Flatten everything.
- Deep Get - medium - Nested deep. Reach in and grab it.
- Detect Cycle in Sequence - medium - Follow the chain long enough and it might loop back.
- Detect Outliers - medium - Most values are normal. Some are suspicious.
- Diagonal Extract - medium - Not every value sits in a row or column.
- Dice Roll Scoring - medium - The pattern rewards the patient.
- Dictionary Key Intersection - medium - Two dictionaries. What do they share?
- Execution Timer Wrapper - medium - Function wrapped with a timer. Duration captured on exit.
- Extract Leaf Values - medium - The tree has leaves. Pluck them.
- Find Indices - medium - It is in there somewhere. Where exactly?
- Find Mode - medium - One value appears more than the rest.
- Full Outer Zip - medium - Two sides. No value left behind.
- Group By - medium - Same key, different rows. Bring them together.
- Lag Column - medium - What came before this row?
- Left Join - medium - Keep the left side. Match what you can.
- Max Length Token - medium - The longest token wins.
- Merge Counters - medium - Two tallies. Combine them.
- Merge Overlapping Time Ranges - medium - Intervals piling up. Clean the timeline.
- Palindrome Hunt - medium - It reads the same both ways. Go further.
- Parse Log Line - medium - One line. A dozen fields hidden inside.
- Permissions Manager - medium - Manage user permissions with config updates.
- Portfolio Profit Calculator - medium - Portfolio gain from purchase history and current prices.
- Precision and Recall - medium - Precision and recall. Both matter.
- Prefix Based Word Replacement - medium - Every word trimmed to its root.
- Rank Metrics - medium - Not all numbers are equal. Rank them.
- Rename Keys - medium - Old names out. New names in.
- Rotate Buffer - medium - The buffer is full. Rotate it.
- Row Aggregates - medium - Each row holds its own summary.
- Running Distinct Count - medium - New values keep appearing. Track the count.
- Subarray Signal - medium - One stretch carries the strongest signal.
- The Balanced Inspector - medium - Every branch should carry the same weight.
- The Bipartite Test - medium - Can this crowd be split into two perfectly separated groups?
- The Bit Reverser - medium - Sometimes the answer is literally backwards.
- The Blind Multiplier - medium - Compute the result of everything around you - without seeing yourself.
- The Bonus Round - medium - Consecutive matching dice rolls trigger a special scoring rule.
- The Build Order - medium - Some tasks must wait for others to finish first.
- The Chain Builder - medium - Links connect in sequence - build the chain from scratch.
- The Chain Transform - medium - One small step at a time can cover a great distance.
- The Change Tracker - medium - Before and after snapshots. The delta is in there.
- The Character Clans - medium - Words sharing the same letters belong to the same clan.
- The Chunked Reader - medium - Too big for memory. Read in pieces.
- The Clock Examiner - medium - Two hands on a clock - how wide is the gap?
- The Coin Vault - medium - Exact change only - and you want to use as few coins as possible.
- The Column Shuffle - medium - Rows in, columns out. Number them.
- The Counting Machine - medium - It knows where it stopped last time.
- The Custom Iterator - medium - Some sequences follow their own rules.
- The Cycle Detector - medium - Follow the chain long enough and you might end up where you started.
- The Date Sorter - medium - Jumbled calendar. Sort it first.
- The Deep Config - medium - Nested config, dot-notation output.
- The Dict Comparator - medium - Two dictionaries. Subtle differences.
- The Double-Ended Gateway - medium - Some queues let you skip the line from both ends.
- The Elevator Trace - medium - Nested floors. One path through.
- The Encoded Signal - medium - The encoding is hiding multipliers. Decode it.
- The Event Broadcaster - medium - Subscribers show up, listen, and sometimes leave.
- The Event Window - medium - A five-minute window is all that matters.
- The Eviction Policy - medium - Fixed capacity. Oldest unused entry gets evicted.
- The Exception Handler - medium - Good code handles failure as gracefully as success.
- The Face That Breaks the Bank - medium - Roll enough dice and one number always runs away with it.
- The Family Reunion - medium - Two cousins share a common ancestor somewhere above.
- The Fast Climber - medium - Some routes up the mountain are faster than others.
- The First Class Function - medium - Functions travel as values - prove you can pass one around.
- The Flat Mapper - medium - Nested values. One flat stream out.
- The Forbidden Sorter - medium - Put the letters in order without the obvious tool.
- The Forgetful Machine - medium - It remembers everything, until it does not.
- The Gap Reporter - medium - The missing IDs in the log - somebody has to notice.
- The Genre Filter - medium - Three tables, two conditions, one actor's total.
- The Half-Life Search - medium - Every guess cuts the problem in half.
- The High Rollers - medium - Not every gambler bets the same - some wager far more than others.
- The Horizon Scanner - medium - For each position, what is coming up ahead?
- The Hostile Takeover - medium - One dict eats another.
- The Hourly Bucket - medium - Timestamps belong somewhere.
- The Intervals - medium - Timestamps in buckets.
- The Inverted Triangle - medium - A pattern of stars narrows toward the bottom.
- The Island Counter - medium - Surrounded by water, connected by land - how many separate landmasses?
- The Lazy Unpacker - medium - Instead of loading it all at once, yield it one piece at a time.
- The Letter Kin - medium - Words that share the same letters belong together.
- The Letter Mapper - medium - A consistent substitution, or not.
- The Level Inspector - medium - Each floor of the tower tells a different story.
- The Level Summer - medium - Add up each level of the tree.
- The Link Shrinker - medium - Long addresses have aliases - you give them out, you keep the map.
- The Load Balancer - medium - Distribute incoming requests evenly across available servers.
- The Map Reducer - medium - Map it. Reduce it. One answer.
- The Market Streak - medium - Some stocks run longer than you think.
- The Market Timer - medium - One buy, one sell - when do you make the most?
- The Merge Champion - medium - Many sorted rivers flowing into one.
- The Min Tracker - medium - The stack remembers the best it ever saw.
- The Month-by-Month Snapshot - medium - Every salesperson has a story. The months just tell it sideways.
- The Mountain Peak - medium - The sequence has a summit.
- The Multiplier Rush - medium - Negatives cancel negatives - but only if you keep both in view.
- The Narrow Lens - medium - A narrow timeframe. Everything inside matters.
- The Number Miner - medium - JSON strings are hiding numeric secrets - dig them out.
- The Number Narrator - medium - Every number has a story in words.
- The Online Elite - medium - The top performers are hiding in the data.
- The OOP Pillars Exam - medium - Four principles, one class hierarchy - show you know all of them.
- The Order Inspector - medium - A binary tree has rules - is this one actually following them?
- The Page Turner - medium - Nobody loads everything at once.
- The Pandas Pivot - medium - Rows become columns. Columns become power.
- The Parentheses Factory - medium - Building balanced brackets is an art form.
- The Pay Ladder - medium - Climb the ladder the hard way. No shortcuts allowed.
- The Perfect Match - medium - Two numbers walk into an interview...
- The Placement Fixer - medium - Each value belongs in exactly one spot.
- The Postfix Processor - medium - Math without parentheses - the operators come after the numbers.
- The Precision Hunt - medium - Some answers need no decimal point.
- The Priority Queue - medium - When two things tie, something has to break the deadlock.
- The Progress Meter - medium - Report progress at every tenth of the way through.
- The Quarter Turn - medium - One rotation changes everything.
- The Queue Disguise - medium - A queue in sheep's clothing.
- The Repeat Visitor - medium - Loyal customers come back sooner than expected.
- The Response Aggregator - medium - Multiple result pages. One clean summary.
- The Rolling Peak - medium - The sweetest stretch in the sequence.
- The Rolling Window - medium - Smooth things out, one step at a time.
- The Rotated Array - medium - Someone shuffled it. Now locate what you came for.
- The Schema Diff - medium - Two versions of the same config - what changed between them?
- The Scoreboard Race - medium - Simulate rounds until someone hits the target.
- The Shifting Standard - medium - A benchmark in motion.
- The Short Address - medium - Turn a big number into a compact alphanumeric code.
- The Shortest Route - medium - Fewer hops is always better.
- The Silver Screen Summit - medium - Box office totals decide who makes the top of the marquee.
- The Slow Leak - medium - Nested iterators. One flat stream.
- The Sneaky Twins - medium - They look different but they are the same inside.
- The Spin Doctor - medium - Ninety degrees, but which way?
- The Spiral Harvest - medium - The snail reads the grid in its own special order.
- The Staircase Problem - medium - One step or two, the choices add up.
- The Subarray Tally - medium - How many hidden windows hit the target?
- The Table Thief - medium - Somewhere in that query, tables are hiding.
- The Tag Analyst - medium - Two sets of labels, one analysis.
- The Tail Finder - medium - Navigate to the end of a linked list using recursion.
- The Timing Decorator - medium - Wrap any function to capture how long it takes.
- The Top Words - medium - In every document, some words dominate the conversation.
- The Trip Aggregator - medium - Travel records hold patterns waiting to be surfaced.
- The Triplet Hunt - medium - Every path that works gets a seat at the table.
- The Velvet Rope - medium - Some users get in. Others wait outside until the window resets.
- The Version Ranker - medium - Software versions follow their own ordering rules.
- The Vocabulary Test - medium - Can you spell out the whole sentence using only the words you know?
- The Waiting Game - medium - Patience has a price - and a count.
- The Water Gauge - medium - Elevation bars trap water between peaks - count the volume.
- The Window Cleaner - medium - Keep it fresh, keep it unique.
- The Word Families - medium - Different spellings, same letters - they belong together.
- The Yahtzee Scorer - medium - Dice scoring. Multiple categories evaluated.
- The Zero Propagator - medium - One zero can change the whole picture.
- The Zigzag Encoder - medium - The message snakes its way across the rails.
- Threshold Filter - medium - Above the line or below it.
- Top N Keys - medium - Most of them do not matter. The few that do stand out.
- Transpose Table - medium - Rows become columns. Columns become rows.
- Triangle Validator - medium - Not every triangle is a triangle.
- Unflatten Keys - medium - Dots in the key names. Rebuild the structure.
- Validate Email - medium - Looks like an email. But is it?
- Distribute Values Into Container Types - medium - Round-robin the values. Keep rotating.
- The Nearest Value Mapper - medium - Close enough counts. Ties go low.
- The Target Hunt - medium - Pairs that hit a target. Every one of them.
- The Event Overlap Detector - medium - Overlapping events. The calendar knows.
- The Consecutive Sequence Finder - medium - Numbers that flow without interruption.
- The File Tree Builder - medium - Flat paths. Build the nested tree.
- The Impersonator - medium - You only have stacks. Make a queue anyway.
- The Category Ranker - medium - Categories have standing. Rows get theirs.
- The Throttle Wall - hard - Stop the abusers. Let the rest through.
- The Change Data Capture - hard - Inserts, updates, deletes : all present.
- The Stream Joiner - hard - Events don't wait for each other. This does.
- The Anomaly Detector - hard - Spot the outliers before they page someone.
- The Schema Migrator - hard - Old schema in, new schema out.
- The DAG Executor - hard - Wire up a mini pipeline and watch it run.
- Common Prefix - hard - They all start the same way. How far?
- Data Quality Report - hard - The data is not as clean as it looks.
- Group Average - hard - Same group, different values. What is typical?
- Merge Intervals - hard - Overlapping ranges. Merge them.
- Pivot Records - hard - Long format is easy. Wide format is useful.
- The Dynamic Container - hard - Build your own resizable list with no help from the standard library.
- The Frequency Eviction - hard - When storage is tight, something has to go.
- The Infection Spread - hard - It starts with one, and then it spreads.
- The Lazy Stream - hard - Yield values one at a time from a potentially infinite source.
- The Median Keeper - hard - The middle value keeps moving as new data arrives.
- The Onion Layer - hard - Peel from the outside in - one ring at a time.
- The Trapped Pool - hard - What collects in the valleys after the rain?
- The Triple Alliance - hard - Three numbers, one target.
- The Water Collector - hard - Two walls, one sky, and a very important question.
- The Yahtzee Engine - hard - Five dice. Six faces. Score it.
- Stream-Process a Large CSV - hard - Too big to load. Read what you can.
- The Meeting Room Allocator - hard - Meetings overlap on the calendar. Rooms are limited.
- The Middle Ground - hard - The middle value keeps moving.
- The Hierarchy Builder - hard - Parent-child pairs, flat. Build the family tree.
- The Output Peak - hard - One stretch outpaced all the others.
SQL Interview Questions (903)
- Unmatched Credit Complaints - easy - Credits were promised. Not everyone got theirs.
- The Duplicate Detection Sprint - easy - Same email, different rows. Spot the repeats.
- Weekend Warriors - easy - Weekdays vs. weekends. When does the action really happen?
- The Dormant Accounts - easy - They are still paying. They stopped showing up.
- 30-Day Page View Counts - easy - Thirty days of engagement. Quick snapshot.
- Above Average Interactions - easy - The average user is boring. Who is above?
- Above Category Average - easy - The category average is one thing. These beat it.
- Active API Tokens - easy - Tokens that have actually been used.
- Active Campaigns - easy - Which campaigns are earning their keep?
- Active Token Owners in 2026 - easy - Active token owners this year.
- Active User Revenue for April - easy - Total revenue from active users in a single month
- Active Users With April Transactions - easy - Active accounts that also opened their wallets. How many?
- Activity Histogram - easy - How many users did X things? Build the distribution.
- Ad Revenue 2026 - easy - Annual ad revenue. On the books.
- Alert Hotspots by Service and Severity - easy - Some services and severities light up more than others.
- All Infra Regions - easy - The infrastructure spans the globe. Map it.
- Annual Cloud Spend - easy - One year of cloud bills. The total.
- Annual Cloud Spend Summary - easy - A year of cloud bills. Add it all up.
- Annual Pipeline Failures - easy - How many pipelines broke this year?
- April and May Active Users - easy - Spring cleaning for the user base. Who was actually around?
- Auth Endpoints - easy - Not all endpoints are visible to everyone.
- Authors With Successful Deploys - easy - Who deployed successfully?
- Auth Service Health Checks - easy - One service. Full audit trail.
- Average Brand Campaign Revenue - easy - A quick benchmark on brand campaigns.
- Average Build Duration by Repo - easy - Some repos build fast. Others don't.
- Average DQ Fail Rate - easy - Average failure rate, table by table.
- Average GPU Node CPU Usage - easy - GPU nodes burning CPU. How much?
- Average Headcount by Department - easy - Compensation benchmarks, department by department.
- Average High-Range Accuracy - easy - The top-scoring models. What's their average?
- Average Latency by Health Status - easy - Healthy versus degraded. The latency gap is real.
- Average Latency by Status - easy - Each status code has its own latency story.
- Average Node CPU by Region - easy - Average infrastructure node CPU usage broken down by region
- Average Node Utilization - easy - CPU and memory, region by region.
- Average Rating by Category - easy - Category ratings. Some shine, some don't.
- Average Response Time by Hour - easy - Hour by hour. When does latency spike?
- Average Search Endpoint Latency - easy - One endpoint. Average speed.
- Average Search Results Per User - easy - How many results per searcher?
- Average Session Duration by Device - easy - Session length, device by device.
- Bargain Bin - easy - Floor prices. Right before the vendor call.
- Best-Selling Reps Each Month - easy - In every category, a few sellers rise to the top.
- Big Spenders - easy - The whale list.
- Budget Flag - easy - Join tables and label rows as over or under budget.
- Budget-Friendly Products - easy - Affordable does not mean invisible.
- Campaign Match Rate - easy - Campaign reach. Measured.
- Campaign Revenue Totals - easy - Every campaign has a price tag. Total them up.
- Cart Sizes - easy - Power buyers. Big carts.
- Category Census - easy - Which aisles are worth restocking?
- Category Sales Summary - easy - Category by category. How did they do?
- Category-Specific Product Volume - easy - Sum transactions for a specific payment type.
- CDN Image Request Paths - easy - CDN image traffic. Every path.
- CDN-Related DNS Lookups - easy - DNS lookups tied to the CDN.
- Character Position in Endpoint - easy - URL patterns, character by character.
- Chat Activity - easy - Which channels are ghost towns?
- Cheapest Cost Per Region - easy - Lowest spend per region.
- Cheapest Transaction per User - easy - Everyone has a smallest purchase.
- Clean Cache CDN Edges - easy - Cached, clean, error-free edges.
- Clean Latency Cast - easy - The latency column is a string. It should not be.
- Clicked Ad Impressions - easy - They saw the ad. They clicked.
- Cloud Bill - easy - Which cost buckets are bleeding money?
- Cloud Cost by Team - easy - Spend by team. Who's burning most?
- Common Age Buckets - easy - Duplicate records hiding in the users table.
- Completed Priority-1 Jobs - easy - Priority one. Completed.
- Compute Nodes in Key Regions - easy - Compute nodes across the key regions.
- Content by Specific Users - easy - Two creators. What did they publish?
- Content Duration Snapshot - easy - A popularity snapshot by duration.
- Content Mix - easy - One content format to bet the quarter on.
- Content Published in 2026 - easy - Published back then. Still relevant?
- Content Sorted by Duration - easy - The catalog, sorted by length.
- Content Type Distribution - easy - How many of each content type?
- Content Types by Creator - easy - One creator. What did they make?
- Content Viewer Penetration - easy - What share of the user base has viewed at least one piece of content
- Cost Efficiency Ratio - easy - Dollars in, value out. What's the ratio?
- Count Distinct Services - easy - How wide is the service mesh?
- Count Nodes in Region - easy - One region. How many nodes?
- CPU Utilization Summary - easy - The CPUs are working. How hard?
- Customer Full Name Concat - easy - First name, last name. Combine them.
- Daily and Weekly Active Users - easy - One metric by day, one by week. Same users, different lenses.
- Daily Cross-Platform Users - easy - Mobile and web. Same day, same users?
- Daily Deployment Count - easy - Deploys per day.
- Department Spend Difference - easy - The compensation gap between departments.
- Department Spend Gap - easy - Gap between Engineering's and Marketing's biggest single purchase
- Deploy Cadence - easy - Which environments ship the most?
- Deploy Count by Service - easy - Some services deploy constantly. Others barely at all.
- Deployed Models by Framework - easy - Which frameworks are actually in production?
- Deployment Duration by Status - easy - Fast deploys versus slow ones. By outcome.
- Deployments Without Alerts - easy - Deployed without a single alert. Suspicious or impressive?
- Deprecated Model Count - easy - How many models are past their expiration date?
- Device Mix - easy - The device breakdown before the redesign.
- Device Types With Chrome Users - easy - Power users and their devices.
- Disabled Feature Flags - easy - Disabled flags. Still worth auditing.
- Distinct Blog Referrers - easy - Where did the traffic really come from? No repeats.
- Distinct Product Categories - easy - A quick category inventory.
- Early 2026 Data Pipelines - easy - Early-year data pipelines.
- Employees Per Department - easy - Headcount, location by location.
- Error Severity Buckets - easy - Errors sorted by how much they hurt.
- Errors With Service Health - easy - Error data, enriched with health context.
- Even-ID February Signups - easy - A very specific slice of a very specific cohort.
- Even-ID June Signups - easy - Odd IDs, even IDs. The filter is precise.
- Event Count on Key Days - easy - Key days. Key event volumes.
- Events by Month Across Years - easy - Month by month, year by year. The pattern emerges.
- Event Types Spanning Multiple Months - easy - Some events span seasons.
- Expensive AWS Services - easy - Some AWS services quietly drain the budget.
- Extreme Headcount Departments - easy - The pay extremes tell a story.
- Failed Payment Deployments - easy - Payment deploys that went wrong.
- Features With Missing Values - easy - Missing data in the features.
- February 2024 Signups - easy - One signup window. One cohort. Who joined the club?
- Filter By Domain - easy - Select rows matching a text suffix pattern.
- Filtered User Roster - easy - A clean roster for the all-hands.
- Find Deploy Authors - easy - Same person. Many different spellings.
- First Build per Repository - easy - Every repo had a first build.
- First Migration Record - easy - The very first migration. Where it all began.
- First Run Row Count - easy - Every job's first run. How many rows?
- Flag Check - easy - Which flags are actually live?
- Full Customer Order List - easy - Every customer. Every order. The full picture.
- Gateway Connection Timeouts - easy - Timeouts at the gateway.
- Health Check Distribution - easy - Pass, fail, degraded. The distribution.
- Health Checks per Service - easy - Some services get checked constantly.
- Heavy Searchers in August - easy - August's power searchers.
- High and Critical Alerts in 2026 - easy - High and critical alerts from that year.
- Higher Performing Variant - easy - Control versus treatment. One wins.
- Higher Than Supervisor - easy - When the student outscores the teacher.
- Highest Cost Per Team - easy - Peak cost, team by team.
- Highest Latency Endpoints - easy - The slowest endpoints. Everyone notices.
- High-Output Creators - easy - High engagement creators.
- High Price Products - easy - Everything above 100.
- High-Rated In-Stock Percentage - easy - Highly rated and in stock. A rare combo.
- High-Spend 2025 Campaigns - easy - Big-budget campaigns from last year.
- High-Traffic Endpoints in February - easy - When traffic spikes, some endpoints get buried. How many crossed the line?
- High Volume Batch Jobs - easy - Batch jobs that processed millions.
- Holiday Promo Campaign Click Year - easy - One year, the holiday campaign exploded.
- Holiday Sale Campaign Revenue - easy - The holiday sale campaign. How did it do?
- Idle Team Members - easy - Sprint started. Some people never got assigned.
- Inactive Unverified Users - easy - Signed up. Never verified. Never came back.
- Initial Count - easy - Support is looking for naming patterns that predict ticket volume.
- In-Stock Product Count - easy - How many products are actually available?
- Japan Revenue for April - easy - Last month's numbers for one region.
- Joined Employee Details - easy - Combine two related tables with a join.
- Largest Group - easy - One group towers above the rest.
- Last Five Batch Jobs - easy - The last five. A quick tail check.
- Last Migration Record - easy - The most recent migration. Is it the last?
- Last Server Activity - easy - Each server's last heartbeat.
- Latency vs Regional Average - easy - Each service versus its region's average.
- Latest Metric Values - easy - Stale records hiding in the metrics.
- Latest Session Per User - easy - Everyone has a most recent session.
- Latest Version Per Service - easy - The latest version deployed. Each service.
- Log Entries by Level - easy - Info, warn, error, fatal. The breakdown matters.
- Log Volume by Day of Week - easy - Some days are noisier than others.
- Longest Active Membership Streak - easy - The longest unbroken streak.
- Longest Deploy With Full Identifier - easy - The longest deployment. Full ID.
- Long Searches Containing 'er' - easy - Long queries with 'er'. A pattern?
- Low-Byte CDN Responses - easy - Tiny responses from the edge.
- Low-Engagement User Count - easy - How many users are barely engaged?
- Lowest Average Price Category - easy - The cheapest category. Not necessarily the worst.
- Low Latency API Calls - easy - Fast endpoints. Confirmed fast.
- Low Severity DQ Checks - easy - Low severity checks. All of them.
- Low Throughput Pipelines - easy - Pipelines barely moving data.
- Low Uptime Services - easy - Underperforming services.
- Max Value Per Location - easy - Every location has a peak.
- Memory-Heavy Pods - easy - Memory-hungry workloads.
- Merge-Triggered Builds 2026 - easy - How many builds came from merges this year?
- Message Length - easy - Verbose commits. Risky changes?
- Messages Containing Keyword - easy - Flagged terms in the messages.
- Messages From Specific Users - easy - Specific users. What did they say?
- Metric Range Per Group - easy - The spread within each group.
- Metric Value Quarter Complement - easy - Two metrics that accidentally match.
- Metric Volatility Gap - easy - Stable metrics are boring. Volatile ones need attention.
- Mid-CPU Nodes - easy - Not the heaviest. Not the lightest. The middle.
- Mid-Range Cost Allocations - easy - Not the cheapest. Not the priciest. The middle.
- Mid-Tier Batch Jobs - easy - Not the biggest, not the smallest. The overlooked middle.
- Missing Email for Non-Active Users - easy - No email on file. No recent activity. Something smells off.
- Mobile Event Counts - easy - Mobile engagement, device by device.
- Monthly Active Users per Endpoint - easy - One endpoint, many users. Which ones showed up?
- Monthly Category Totals - easy - Sum amounts by category and month.
- Monthly Deployment Count - easy - Deploys by month.
- Monthly Signup Counts - easy - Signups, month by month.
- Monthly Transaction Counts - easy - Every month tells a spending story, user by user.
- Monthly Unique Users per Campaign - easy - Monthly reach, campaign by campaign.
- Morning Warning Logs - easy - Warnings before noon.
- Most Common Export Job Status - easy - The most common job status.
- Most Recent Token Usage - easy - Each user's latest token activity.
- Multi-Column User Sort - easy - Sorted by name. Then by something else.
- Multi-OS Users - easy - iOS today, Android tomorrow.
- Multi-Provider Cost Lookup - easy - AWS, GCP, Azure. Side by side.
- Multi-Variant Experiments - easy - One user, multiple experiments.
- Never-Ordered Products - easy - In the catalog. Never purchased.
- Nodes in Target Regions - easy - The target regions need attention.
- Node Summary Per Region - easy - Every region has a node story.
- No Gaps - easy - Zero blanks. A clean contact list.
- Non-Bot Acknowledged Alerts - easy - Human-acknowledged alerts only.
- Non-Draft Content - easy - Everything except drafts.
- Notifications Opened on Date - easy - One day, many pings. How many actually got opened?
- Nth Highest Salary - easy - Not the highest. Not the second. The third.
- Nth Largest Value - easy - Select the row with a specific rank position.
- NULL Keys in Joins - easy - Rows that vanish during the join.
- Oldest and Newest User Sessions - easy - The extremes of the user base.
- One-Star Product Review Count - easy - One-star reviews. How many?
- Overall Average API Latency - easy - The overall average. Across everything.
- Peak Activity by Device - easy - Activity windows, device by device.
- Peak Ad Revenue Moment - easy - The single peak earning moment.
- Peak Metric Per Department - easy - Peak metrics for the quarterly deck.
- Peak Non-Converting Month - easy - Everyone showed up. Nobody bought anything.
- Peak Satisfaction - easy - Which departments are winning on satisfaction?
- Peak Spending Month - easy - One month, the bill was unforgettable.
- Pending Batch Jobs - easy - Stuck jobs. Still pending.
- Pipeline Run History - easy - The lineage trail.
- Pipeline Throughput Ratio - easy - Compute current-to-initial value ratio per period.
- Platform Check - easy - OS and device combos. Which sessions last longest?
- Platform Team Feature Flags - easy - The platform team owns a lot of flags.
- Platform Team Mobile Flags - easy - Mobile flags under platform ownership.
- Pod Distribution by Restart Count - easy - Low-restart pods. Reliable or idle?
- Popular Categories - easy - Merchandising only cares about categories big enough to negotiate shelf space.
- Price Check - easy - Priced to sell or priced to sit?
- Production Deployment Count - easy - How many production deploys?
- Production Deploys From April Onward - easy - After the cutoff, how many times did prod get a push?
- Product Name Letter Replace - easy - A quick text transform on product names.
- Product Name Prefix - easy - Just the first three characters. That is all.
- Product Page Sale Searches - easy - They searched from the product page.
- Product Revenue Ranking - easy - Rank them by revenue. See who leads.
- Products Without Sales - easy - Listed but never sold.
- Profitable Categories by Price - easy - The most profitable categories.
- Promo Campaign Cost per Acquisition - easy - The campaign ran. What did each customer cost?
- Provider Cost Change H1 - easy - Cost swings in the first half of the year.
- Purchase Log - easy - Names on receipts, not just IDs.
- Q2 Search Volume - easy - Q2 search volume. The numbers.
- Quarterly Deployment Count - easy - Deploys per quarter.
- Recurring Error Types - easy - The same errors, recurring.
- Regional Profits - easy - P&L by region. Before the board meeting.
- Regions With 5+ Nodes - easy - Regions with five or more nodes.
- Retargeting Campaign Impressions - easy - Retargeting impressions. All of them.
- Revenue by Product - easy - Which products carry the revenue line?
- Revenue for Specific Users - easy - Alice and bob. Total spend.
- Reviews Per Reviewer - easy - The workload split across reviewers.
- Running Node Pairs - easy - Two servers, same region, both alive.
- Satisfaction Score by Region - easy - Satisfaction scores. Missing region data.
- Search Endpoint Status Distribution - easy - Status codes on the health endpoint.
- Searches by Users With Email - easy - One user's search behavior.
- Search Terms Starting With G - easy - Queries starting with 'g'.
- Second Highest Salary - easy - Silver medal. Almost the top, but not quite.
- Second Highest Value - easy - Almost the top. Not quite.
- Service Alert Frequency - easy - How often does each service trigger alerts?
- Services With Most Error Occurrences - easy - The noisiest services.
- Service User Growth Rate - easy - User growth, service by service.
- Session-Fit Content - easy - Content that fits the session length.
- Session Logins Dec 13 to 19 - easy - Logins during one specific window.
- Session Pulse - easy - Engagement is slipping. Who is phoning it in?
- Sessions Per Device Type - easy - Sessions, device by device.
- Signups by Age Bucket Since April - easy - Recent signups by age.
- Signups Jan to Jul 2026 - easy - Signups from January through July.
- Sirens and Smoke - easy - Stale alerts. Still ringing.
- Slow Batch Jobs - easy - Promised by noon. Delivered at midnight.
- Slow Failures - easy - SRE is hunting for the endpoints that fail slowly enough to burn timeouts.
- Slow Production Deploys - easy - Production deploys that took way too long.
- Sort Tokens by Scope Character - easy - Token scopes, sorted for compliance.
- Status Report - easy - Where are orders getting stuck?
- Stock Status - easy - Human-readable availability labels.
- Storage Node Lookup - easy - The storage nodes hold the critical data.
- Successful Deploy Endpoint Calls - easy - Successful deploys only. No failures allowed.
- Successful Pipeline Runs - easy - Which pipelines completed successfully?
- Successful Production Deploys - easy - Successful production deploys with duration.
- Suspected Bot Sessions - easy - Five seconds or less. Probably a bot.
- Targeted Ad Campaigns - easy - High-value impressions. Targeted precisely.
- The Ad Ledger - easy - Annual ad revenue. On the record.
- The Campaign Trail - easy - Impressions are vanity. Conversions are sanity.
- The February Cohort - easy - One signup window. One cohort. Who joined the club?
- The First Half - easy - New arrivals during one specific window.
- The Legacy Hunt - easy - Old data. Still matters.
- The Merge Counter - easy - How many builds came from merges?
- The Publishing Audit - easy - Published years ago. Still generating views?
- The Token Census - easy - How many tokens are out there?
- Third Largest Batch Job - easy - Bronze medal in the batch job rankings.
- Threads Excluding User - easy - Every thread they're not part of.
- Three Lowest Distinct Cloud Cost Amounts - easy - The three cheapest bills on record.
- Tiered Transaction Summary - easy - Compute multiple date windowed aggregates in a single query.
- Timeout Status Records - easy - Unknown status in the health records.
- Timeout Warning Logs - easy - Timeout warnings. The postmortem trail.
- Titles Ending With S - easy - Naming conventions. Specifically the plurals.
- Top 100 Batch Jobs Total Output - easy - The hundred biggest jobs. Combined output.
- Top 10 Batch Jobs - easy - The ten biggest batch jobs.
- Top 10 Model Accuracies - easy - Top ten model performance.
- Top 10 Slowest Endpoints - easy - The ten endpoints nobody wants to call.
- Top 5 Slowest DNS Lookups - easy - Five DNS lookups that took too long.
- Top Ad Campaigns by Revenue - easy - Every campaign has a bottom line. Stack them up.
- Top API Token Scopes - easy - The highest-value token scopes.
- Top Average By Region - easy - Region by region, who pulls the best average?
- Top Deployed Model - easy - The best-performing model in production.
- Top Device by Sessions - easy - One device type generates the most sessions.
- Top Duration Content Items - easy - The content that held the number-one spot.
- Top Five - easy - The five priciest items for the luxury section.
- Top Metric Values - easy - The five highest numbers. No duplicates.
- Top Mobile OS by Session Duration - easy - Which mobile OS keeps users longest?
- Top Performing Models - easy - The models that actually perform.
- Top Product Categories by Sales - easy - The highest-grossing categories.
- Top-Ranked Wines by Variety - easy - The best bottles. Ranked by variety.
- Top Recent Sellers - easy - Fresh data, top sellers. The recent leaderboard.
- Top Selling Items - easy - Revenue crowns the winners. Who sold the most?
- Top Shelf - easy - Buyers need to know ceiling prices before negotiating with vendors.
- Top Spenders Dense Rank - easy - Spending speaks. Let the leaderboard do the talking.
- Total Compute Cloud Cost - easy - Total compute spend. The number.
- Total Cost by Category - easy - Total spend per category.
- Total Engineering Cost Allocation - easy - Engineering's total allocated budget.
- Total Rows by Pipeline Status - easy - Row counts alongside pipeline aggregates.
- Total User Spend - easy - Each customer's total. Summarized.
- Transaction Overview - easy - The executive snapshot. Users, products, revenue.
- Transaction Source Features - easy - One pipeline reviewed them. What did it see?
- Transactions With Product Names - easy - Simple select progressing to a join
- Trim Endpoints Right - easy - Trailing whitespace. Clean it up.
- Trim Search Terms Left - easy - Leading whitespace. Clean it up.
- Tutorial Content Count - easy - How much of the catalog is tutorials?
- Unique Hosts by Node Type - easy - How many unique hosts per node type?
- Unique Searchers - easy - How many users actually searched?
- Unique Searchers Count - easy - Unique searchers. The count.
- Unique Stream Topics - easy - A clean inventory of streaming topics.
- Unmatched Categories - easy - Categories with nothing on the shelf. Empty aisles.
- Unreviewed Models - easy - Models that have never been evaluated.
- Unused Read Tokens - easy - Active tokens that nobody uses.
- US-East KV Store Entries - easy - KV store inventory. us-east-1.
- User Age Ranking - easy - Age brackets, stacked from top to bottom.
- User Engagement Totals - easy - Per-user engagement. The totals.
- User Event Type Count - easy - How many flavors of activity does each user have?
- User Roster - easy - Which account states are bleeding users?
- User Session Roster - easy - Every user paired with their sessions, even users who never logged in
- User Sessions on Specific Days - easy - One user. Specific days. What happened?
- Users Per Device Type - easy - Users per device. The split.
- Users Who Clicked Ads - easy - Ad clickers and their account details.
- Users With Purchase Events - easy - At least one purchase. That changes everything.
- Verify Commit ID Uniqueness - easy - Duplicate commit IDs. Are there any?
- View Count Per Page - easy - Every page has visitors. Some just have more.
- Views by Specific Users - easy - Retrieve all content views for a set of flagged user accounts
- Weekly Transaction Volume - easy - Weekly volume. The pulse.
- Welcome Wagon - easy - How many signed up this year?
- Whale Watch - easy - The accounts driving the top line.
- Yearly Output - easy - Publishing velocity for the board deck.
- 2026 Signup Count - easy - This year's signup count.
- Join Type Row Counts - easy - Same tables, different handshakes, wildly different results.
- Ad Clickers - easy - Who clicked? What did they spend?
- Clean Averages - easy - Merchandising only cares about the categories customers actually rate.
- Log Priority - easy - Which servers are on fire before coffee?
- Unique Visitors - easy - Which months actually had an audience?
- High-Value Electronics - easy - The five priciest electronics.
- Regional Status - easy - The full regional breakdown.
- Click Revenue - easy - Which campaigns are earning their keep?
- Email Census - easy - The reachability split.
- Log Levels - easy - Severity breakdown with response times.
- Above Average - easy - Products beating the catalog average.
- The Revenue Cliff - medium - Revenue was climbing. Then it wasn't. Spot the drop.
- The Phantom Readers - medium - They read everything. They bought nothing.
- The Day-7 Retention Cohort - medium - Day one was promising. Day seven tells the truth.
- The Latest Transaction Per Product - medium - Every product has a last sale. When was it?
- 10 Lowest Uptime Services - medium - Ten services at the bottom of the reliability chart.
- 2FA Confirmation Rate - medium - Two-factor sent. How many confirmed?
- 7-Check Rolling Average - medium - Seven entries hold the trend.
- 7-Day Token Retention - medium - Premium tokens, day by day.
- 80th Percentile API Latency - medium - The 80th percentile tells the real story.
- 90th Pctl Model Accuracy Gap - medium - Most models are fine. The bottom 10% are not.
- Above-Average Cloud Spend - medium - Some services quietly burn more than the rest.
- Above Average Product Prices - medium - Some products cost more than they should.
- Active Duo - medium - Shoppers who also browse. The overlap is the insight.
- Active Searchers - medium - They typed a query. That means something.
- Active Tokens on Target Date - medium - One specific day. Which tokens were still alive?
- Active User Open Rate - medium - What share of push notifications were opened by active users
- Active Users by Session Count - medium - Signed up is one thing. Showing up is another.
- Active vs Regional User Count - medium - Active users versus total users. The gap is telling.
- Ad Revenue by Age Bucket - medium - Ad dollars, sliced by country.
- After Hours API Calls - medium - The office is dark. The API is not.
- Alert Count by Severity Tier - medium - Alerts by severity. The breakdown matters.
- Alert Response Breakdown - medium - An on-call postmortem asks which services are bleeding alerts nobody acknowledges.
- Alert Severity Pivot by Service - medium - When services cry wolf, the severity matrix tells who's serious.
- All Known Endpoints - medium - Two tables. One truth. Every endpoint accounted for.
- API Calls With and Without Errors - medium - Some calls succeed. Some do not. Break it down.
- API Calls With Matching Status - medium - Same status, same pattern. Coincidence?
- API Token Churn Rate - medium - Tokens come and go. What's the turnover?
- API Traffic by CDN Edge - medium - CDN paths carrying API traffic. Which edges?
- App Stability by Region - medium - Some regions crash more than others.
- Attributable Impression Rate - medium - What share of ad impressions can be traced to a real user account
- Auction Lot Summary - medium - The hammer falls. Who bid the most?
- Auth Endpoint Callers - medium - Identify users who have called authentication API endpoints
- Authors Deploying to Dev and Production - medium - Dev, staging, production. Who has touched all three?
- Average Accuracy by Framework - medium - Not all frameworks deliver the same accuracy.
- Average API Latency by Year - medium - Latency year over year. Is it getting better?
- Average Compensation by Department and Status - medium - Average compensation. Department by department.
- Average Fulfillment Lag - medium - Ordered, then... waiting.
- Average Initial Call Latency - medium - First contact latency. The benchmark.
- Average Results for Python Searches - medium - Python searches. What's the click-through?
- Average Review Comments by Author - medium - Some authors get more feedback than others.
- Average Session Duration - medium - How long do users actually stay?
- Average Spending by Account Status - medium - Average per-user lifetime spending segmented by account status
- Average Update Call Latency - medium - Follow-up calls. How fast?
- Average Watch Time by Format - medium - Which content format keeps viewers watching the longest
- Avg Alerts by Severity - medium - Alert patterns by severity.
- Avg Daily Active Users per Endpoint - medium - Daily engagement, endpoint by endpoint. The averages reveal all.
- Avg Session Duration by Creator - medium - Some creators keep users longer.
- Batch Job Performance Tiers - medium - Every batch job gets a grade.
- Best Accuracy to Training Time Ratio - medium - Fast to train. Accurate too. Which model?
- Best Day for Ad Revenue - medium - One day of the month outperforms the rest.
- Biggest Deployment Decline - medium - One team's deploy count cratered. Which one?
- Binary Flag Indicators - medium - On or off. Every flag at a glance.
- Bottom Endpoints by POST Volume - medium - The quietest POST endpoints.
- Builds per Author per Branch - medium - Who triggered what, and where?
- Build Success Rate by Trigger - medium - Which triggers produce green builds?
- Build Success vs Failure by Repo - medium - Green versus red, repo by repo.
- Busiest Pipeline Month - medium - One month, more pipeline runs than any other.
- Busiest Route by Passenger Volume - medium - The busiest route by volume.
- Busy Authors - medium - Some developers spread their commits everywhere.
- Campaign Click-Through Rates - medium - Clicks per impression. Campaign by campaign.
- Campaign Cost Effectiveness - medium - Money in, conversions out. What is the ratio?
- Campaign Revenue by Click Channel - medium - Which ad format drives the most revenue?
- Campaigns With Most Clicks - medium - The campaigns getting all the clicks.
- Categories With Mixed Price Tiers - medium - Users who cross content types.
- CDN Traffic by Day and Hour - medium - CDN traffic, hour by hour.
- Cheapest High-Rated Product - medium - Cheap and highly rated. A rare combination.
- Classify Services by Name - medium - The name tells you what it is. Mostly.
- Clicked Holiday Impressions - medium - Holiday ads. Who actually clicked?
- Click vs Non-Click Rates - medium - Some searches lead to clicks. Most do not.
- Cloud Cost Stats by Provider - medium - Three providers. Three very different bills.
- Cloud Cost Trend Analysis - medium - Cost trends across billing periods.
- Combined Cloud Spend by Region and Service - medium - Region by region. Service by service. Where does the money go?
- Commit Royalty - medium - In a sea of commits, only a few wear the crown.
- Completion Rate - medium - Not every region closes orders cleanly. The percentages tell the story.
- Consistent High-Quantity Revenue - medium - Big orders, consistent revenue. A rare combination.
- Content Recommendation Engine - medium - Pages they haven't discovered yet.
- Content Session Counts - medium - Session metrics, content item by item.
- Cost Density Extremes - medium - Some regions pack more cost per node than others.
- Cost Share Within Category - medium - Each entry's slice of the category total.
- Creators With Top-Rated Content - medium - Top-rated content. Who made it?
- Cross-Region Customers - medium - Orders crossing borders.
- Cross-Variant User Pairs - medium - Same experiment. Different variants. Who overlaps?
- Cumulative Monthly Revenue Avg - medium - Revenue, cumulating month by month.
- Currently Active Feature Flags - medium - Which flags are live right now?
- Customers Without Orders - medium - Customers who have never ordered.
- Custom Message Type Counts - medium - Not all messages are created equal.
- Daily Error Count Change - medium - Errors, trending up or down?
- Daily Error Resolution Ratio - medium - Reported versus removed. The daily ratio.
- Daily Metric Percentage Change - medium - Yesterday versus today. What moved?
- Daily Session and User Counts - medium - Sessions and users, day by day.
- Daily Spam Impression Rate - medium - How much of the ad feed is spam?
- Daily Top Endpoints - medium - Three winners each day.
- Data Repo Fix Commits - medium - How many commits start with 'fix'?
- Days with More Edited Than Unedited Messages - medium - Some days, more messages get edited than sent.
- Deduplicate and Keep Latest - medium - Duplicates everywhere. Only the freshest survives.
- Deduplicated Sales Volume by Category - medium - Clean the noise, then see what each aisle really earned.
- Department Cost by Status - medium - Headcount and compensation. The dashboard view.
- Department Running Totals - medium - Compute cumulative metric values within each department using window operations.
- Deploy Author Performance Score - medium - Not all deployers are equally reliable.
- Deployment Failure Impact - medium - When deploys fail, how bad is the blast radius?
- Deployments per Environment - medium - Dev, staging, prod. Where do most deploys land?
- Deploy Reliability Scores - medium - A reliability scoreboard for deploy teams.
- Devices Per Age Bucket - medium - Device diversity among the younger users.
- Device Type Serving Most Users - medium - One device type serves more users than the rest.
- Disabled Flag Ratio - medium - Feature flags that went dark. What percentage fell silent?
- Distinct Chat Conversations - medium - How many unique conversations?
- DQ Fail Rate by Table - medium - Pass rates, table by table.
- DQ Score Spread - medium - The spread in data quality scores.
- Duplicate DQ Check Records - medium - Passed QA twice. That's the problem.
- Duplicated User Event Messages - medium - Duplicated messages from the alerts topic.
- Duplicate Training Runs - medium - Same model, trained twice.
- Early Commit Velocity by Author - medium - How productive was each author during the first year of a repo's CI pipeline
- Early User Activation - medium - Activated early. A good sign.
- Efficient Pipeline Throughput - medium - Throughput per pipeline. The benchmark.
- Endpoint Latency Spread - medium - Latency spread across endpoints.
- Endpoint Performance Report - medium - Every endpoint has a speed and a reliability story.
- Endpoint With Most GET-Only Users - medium - Read-only users have a favorite endpoint.
- Engagement by Content Type - medium - Some content types get all the attention.
- Engagement Gap - medium - Zero transactions is still a data point. Count everyone.
- Error Hall of Fame - medium - The year's worst error categories.
- Error Rate by Region - medium - Error rate per day and region via conditional aggregation.
- Exclusive Users per Device Type - medium - Loyal to one platform only.
- Experiment Conversion Pivot - medium - Variant A or Variant B? The conversion numbers tell the story.
- Extract Deploy Versions - medium - The version number is buried in the log.
- Extreme API Token Usage - medium - Outlier tokens. Suspiciously busy.
- Extreme Category Totals - medium - The highest and the lowest. Both are interesting.
- Extremely Late Resolutions - medium - Twenty minutes past the SLA. Still unresolved.
- Failed Constraint Checks Count - medium - Constraints failed. How many?
- Failure Rate - medium - Build failures happen. Which repos break the most?
- Fastest CI Build Date - medium - The fastest build ever. When did it happen?
- Fastest Completion Per Day - medium - Every day has a speed champion.
- Fastest Regions by Latency - medium - The fastest regions. Benchmarked.
- Feature Flag Adoption - medium - How widely adopted are the flags?
- Feature Quality by Source - medium - Quality varies by source.
- Feature Vote Winner - medium - Users voted with their clicks. Who won?
- Find the Fifth Largest Cost - medium - Not the biggest. Not the smallest. The fifth.
- First and Last Peak Accuracy Dates - medium - Peak accuracy. When it first hit and when it last did.
- First and Last Timeout Per Service - medium - First timeout. Last timeout. Each service.
- First Deploy Attribution - medium - The first deploy per service.
- First Half of Page Views - medium - Half the data. The first half.
- First Time Learners Per Day - medium - Brand new users, day by day.
- First Touch Attribution - medium - The first interaction matters most. Or does it?
- Frequent Message Senders - medium - Someone is sending too many messages.
- Friday Sessions for Shared Experiments - medium - Friday vibes only. Same experiment, different users.
- Fulfillable Order Percentage - medium - What percentage of orders can be fulfilled?
- Ghost Products - medium - Listed but never sold. The shelves collect dust.
- Heavy Ad Exposure - medium - Saturated with ads. Is it too much?
- Heavy Hitters - medium - Some repos never sleep.
- Heavy Namespaces - medium - Kubernetes has favorites. Some namespaces carry more weight.
- Highest and Lowest Cloud Costs - medium - The extremes in cloud spending.
- Highest Daily Spend - medium - Somewhere in that window, someone broke the spending record.
- Highest Node Density Regions - medium - Some regions are packed with nodes.
- Highest Throughput Pipelines - medium - The pipes that carry the most water.
- Inactive Android Control Users - medium - Android control cohort. Gone quiet.
- Inactive Users in Date Range - medium - Ghost accounts. Active signup, zero sessions.
- Inactive vs Suspended Engagement - medium - Premium versus free. The engagement gap.
- iOS Adoption by Age Bucket - medium - The install numbers don't match the hype.
- iOS Sessions by Device Type - medium - iOS engagement, device by device.
- Job Status Duration - medium - How long in each job state?
- Keep Most Recent Record - medium - Carbon copies clutter the table. Only the latest matters.
- Keyword-Based User Search - medium - The search terms reveal intent.
- Largest A/B Test by Participants - medium - The biggest experiment ever run.
- Largest Single Cloud Cost - medium - One line item. The biggest bill of all.
- Latency Gap to 10th Fastest - medium - One server. Compared to the 10th fastest.
- Latest Commit Build Cost - medium - The latest commit came with a build cost.
- Latest Migration Output per Author - medium - Each author's most recent migration output.
- Leading ML Frameworks by Accuracy - medium - Which frameworks lead on accuracy?
- Least Viewed Content - medium - Nobody is watching. Should it still exist?
- Longest Gap Between Token Events - medium - The longest gap between token events.
- Longest Running Pipeline - medium - One pipeline outlasted them all.
- Long Messages - medium - Some commit messages tell a novel.
- Long-Running Feature Flags - medium - Flags that have been on for too long.
- Low-Engagement Sessions - medium - Users whose average session duration is below the engagement threshold
- Lowest Cost Network-Heavy Team - medium - Networking costs versus compute. Which teams?
- Lowest Latency per Service - medium - The fastest response each service ever gave.
- Low Severity Checks in 2026 - medium - Low severity. High volume.
- Low-Volume Stream Topics - medium - Quiet topics in the stream.
- March Revenue by Customer - medium - One month, every customer, every dollar accounted for.
- Median Null Percentage of Float Features - medium - Nulls in float columns. How widespread?
- Mentorship User Pairs - medium - Pair them up. Mentor and mentee.
- Metric Count - medium - How deep does each department's tracking go?
- Metric Value Pairs Over Threshold - medium - Two metrics, both above the line.
- Minimum Cost Per Provider - medium - The cheapest month from each provider.
- Mobile vs Desktop Session Duration - medium - Mobile versus desktop. Who stays longer?
- Models With Variable Accuracy - medium - Accuracy should be stable. These models are not.
- Model Training Completion Rate - medium - How many models finished training?
- Monthly Cohort Retention - medium - Compute month over month retention rates for user signup cohorts.
- Monthly Revenue Comparison - medium - Last month versus this month. Per product.
- Monthly Running Total - medium - Cumulative sales per product across months.
- Monthly Spend Pivot by Provider - medium - Cloud bills by month, split by who sent the invoice.
- Monthly Transaction Summary - medium - A monthly engagement summary.
- Month With Fewest Deploys - medium - One month, nobody deployed.
- Most Active Chat Users - medium - The loudest voices on the platform.
- Most Active Recent Committers - medium - Who has been writing the most code lately?
- Most Active Servers by Log Volume - medium - The busiest servers by log volume.
- Most Commented Code Review - medium - The code review that started a debate.
- Most Common Monday Outcome - medium - Mondays have a pattern.
- Most Efficient API Endpoint - medium - Best throughput per call.
- Most Frequent Error Types - medium - The errors that keep coming back.
- Most Ordered Product by Country - medium - Popular products in specific markets.
- Most Popular Content Type - medium - The content type everyone prefers.
- Most Popular Signup Day - medium - One day of the week wins on signups.
- Most Profitable Region Month - medium - One region, one month. Peak profit.
- Multi-Host Regions by Node Type - medium - Some regions are quietly building empires.
- Multi-Table Report - medium - Join three tables into a summary report.
- Mutual Channel Connections - medium - Two users. What channels do they share?
- Negative Outcome Rate for New Users - medium - New users have a rough first two weeks.
- Net Lines - medium - Some authors build. Others trim. The net tells the truth.
- New Customers Per Day - medium - Count users whose first order falls on each date.
- New User Purchases - medium - Revenue from the signup cohort that joined this year.
- Nodes by Region and Type - medium - Broken down by region. Broken down by type.
- Nodes in Key Regions - medium - Six regions. How many nodes in each?
- Noisiest Tables by DQ Failures - medium - The tables that fail the most checks.
- Non-Trivial Fatal Errors - medium - Short errors are noise. Long ones matter.
- Notification Delivery Ratio - medium - Sent versus delivered. The gap is the problem.
- Notification Open Rate - medium - Sent versus opened. The rate.
- Notifications Pivot by Weekday - medium - Notifications by platform and day of week.
- Nth Highest Salary Per Department - medium - Third place in every department.
- Opened Notifications in Jan-Feb - medium - Two months of push notifications. How many were actually read?
- Over-Budget Services - medium - Over budget. Flagged.
- Overlapping User Sessions - medium - Two sessions, one user, same clock. Something overlaps.
- Overloaded Infrastructure Nodes - medium - CPU above 90. Memory above 80. Red alert.
- Pages Viewed by Session Duration - medium - Longer sessions, more pages? Check.
- Pairwise Latency Maximum - medium - Every pair compared.
- Peak API Hour - medium - The hour when traffic peaks.
- Peak Hour Power Callers - medium - One hour. The phone lines exploded.
- Peak Latency for 2026-Era Endpoints - medium - Peak latency for that era's endpoints.
- Peak Retargeting Revenue Month - medium - Retargeting revenue. The peak month.
- Pipeline Completion Rate - medium - How far do users get through the flow?
- Pipeline Overhead by Environment - medium - Production overhead versus staging.
- Pipeline Recovery by Priority - medium - Recovery time, priority by priority.
- Pivot Event Counts - medium - Reshape rows into columns by event type.
- Pod CPU to Memory Ratio - medium - CPU versus memory. Resource efficiency.
- Power Users - medium - Engagement separates tourists from regulars.
- Power Users by Session Activity - medium - More sessions. More time. The power users.
- Power Users by Session Count - medium - Three sessions is casual. More than that is serious.
- Price Rank - medium - In every category, someone charges the most. Who's on top?
- Priciest Item in Each Category - medium - The most expensive item per category.
- Product Ratings vs Sales - medium - Do higher ratings actually mean more revenue?
- Products With Strong Unit Price - medium - Budget-friendly and high-performing.
- Product Transaction Counts - medium - Show how many transactions each product has, sorted by product ID.
- Profit Tiers - medium - High, moderate, or in the red. Every order gets a label.
- Prolific Authors in Largest Service Teams - medium - Senior leads in the biggest teams.
- Provider Spend Variance Between Halves - medium - Two time windows. Did the cloud bill go up or down?
- Push Notification Open Rate - medium - Push sent. How many opened?
- Push Notification Status Pivot - medium - Sent, opened, ignored. The notification lifecycle in numbers.
- Push Opens by Platform and Campaign - medium - Opens by platform and campaign.
- Quarterly Consolidated Cloud Costs - medium - Quarterly cloud spend, weighted.
- Rank Users by Search Query Count - medium - Who searches the most? The answer might surprise you.
- Rapid Retry Detection - medium - Detect retried API calls within 5 minutes of failure.
- Rate Limit Rules Per Endpoint - medium - Threshold rules, endpoint by endpoint.
- Rating Tiers - medium - No gaps, no skips. Ratings stacked tight within each category.
- Recent Price Drops - medium - The price just dropped. Who noticed?
- Regional Order Summary - medium - Region by region. The order numbers tell the story.
- Regions by Alert Volume - medium - Some regions are quiet. Others never stop screaming.
- Region With Best Uptime - medium - The single most reliable region.
- Region With Most Nodes - medium - Which region hosts the most?
- Repeat Buyers Across Halves - medium - First half buyer. Second half buyer. Same person.
- Repeated Transactions - medium - Detect same amount transactions within 10 minutes.
- Repeat Purchases Within a Week - medium - They bought again within seven days.
- Repeat Purchase Window - medium - The retention squad is looking for repeat purchasers.
- Repository Commit Ranking - medium - Lines added tell the story of a repo's ambition.
- Repos with More Builds Than Commits - medium - More builds than commits. Something is off.
- Response Buckets - medium - Fast, normal, or slow. Every API call gets a verdict.
- Retried Failed API Calls - medium - Spot users who retry API calls within 5 minutes of a failure.
- Returning Buyers - medium - They came back and bought again.
- Revenue Per Product With Zeros - medium - Total revenue per product. Even the zeros.
- Reviewer Performance Metrics - medium - Some reviewers are thorough. Others are fast.
- Reviewers Per Repo Per Year - medium - Reviewers per repo, year by year.
- Revoked Tokens by Scope - medium - Banned tokens, sorted by what they had access to.
- Rolling Weekly Total - medium - Seven days at a time, the totals keep rolling forward.
- Rows With Multiple Flag Conditions - medium - Rows caught by multiple flags.
- Runner-Up Cost Without ORDER BY - medium - The second highest. Without sorting.
- Running Tab - medium - Every purchase adds to the total. Watch the tab grow.
- Rush Hour API Latency - medium - Rush hour hits the API differently.
- Same-Day Signup Rate - medium - Percentage of transactions on the signup date.
- Same First and Last Reply Target - medium - They started and ended the month messaging the same person.
- Satisfaction by Platform - medium - Satisfaction scores, platform by platform.
- Second Highest Cloud Cost - medium - The second biggest bill on record.
- Second Highest Latency by Method - medium - Almost the slowest. By method.
- Senior to Junior Ratio - medium - The ratio tells you a lot about the department.
- Servers Returning to Origin - medium - Servers that migrated back home.
- Server With Most Errors - medium - One server stands out. Not in a good way.
- Service Budget per Head - medium - Budget per head. Pipeline by pipeline.
- Service Component Classification - medium - Classified by naming pattern.
- Service Reliability Tiers - medium - Reliability tiers. Based on uptime.
- Services at Median Uptime - medium - Exactly at the median. Not above, not below.
- Service Uptime Minutes - medium - Status changed. How long was it actually up?
- Session Duration by Account Status - medium - Average session duration broken down by user account status
- Session Overview - medium - Full engagement picture, even for the ones who never showed up.
- Session Rank - medium - Longest sessions rise to the top. Within each user, a pecking order.
- Sessions by Content Type - medium - Engagement, broken down by content format.
- Shared Category Purchasers - medium - They bought different things from the same aisle.
- Shared Endpoints - medium - Shared credentials across endpoints.
- Signup to Subscription Rate - medium - Conditional aggregation for conversion rates
- Single Service Owners - medium - One owner, one service. Nobody else.
- Smooth Latency - medium - Noisy latency readings, smoothed into a trend you can trust.
- Spending by Account Status - medium - Segment user spending and activity by account status across the platform
- Spending Tiers - medium - High rollers, mid-spenders, and the frugal. Everyone gets a tier.
- Split Metric Sums - medium - One column, two totals.
- Subscribers Without Premium - medium - Subscribed. But never upgraded.
- Successful Build Duration by Repository - medium - CI throughput, repo by repo.
- Successful Call Volume per Endpoint - medium - Not every ping is honest.
- Sum Excluding Extremes - medium - Remove the outliers. Then sum.
- Super Reviewers - medium - The most prolific code reviewers.
- Symmetric Reply Network - medium - Who replies to whom? Both directions.
- Tables With Many DQ Failures - medium - Some tables have never once passed QA.
- Tables With Most DQ Failures - medium - The tables with the most failures.
- Teams Below Double Average Spend - medium - Teams spending under twice the average.
- Tenure Mentorship Match - medium - Pair by tenure. Longest with newest.
- The Podium Finish - medium - Top two products per category.
- The Quiet Alarms - medium - Low severity. High volume. Worth a look.
- The Slow Lane - medium - Peak API load. The slow endpoints.
- Third Highest Spender - medium - Bronze medal in spending.
- Three-Item Combinations - medium - Generate all unique 3-item sets with total cost.
- Three-Value Sum Combinations - medium - Pick three. See what they add up to.
- Token Churn Rate - medium - Tokens come and go. How fast is the revolving door?
- Tokens With Non-Read Scope Prefix - medium - Tokens that don't start with 'read'.
- Top 10 AB Test Variants - medium - The ten best-performing variants.
- Top 10 CPU-Heavy Nodes - medium - The ten hungriest nodes.
- Top 10 Rated Products - medium - The ten highest-rated items.
- Top 2 Active Push Days - medium - Two days stood out from the rest. Which ones?
- Top 2 Ad Campaigns by Spend - medium - Two campaigns. Most of the budget.
- Top 2 Busiest API Slots - medium - Two time slots per week. The busiest.
- Top 2 Callers per Endpoint - medium - Two top callers per endpoint.
- Top 2 Cloud Services by Cost - medium - Two services eating most of the budget.
- Top 2 Rate-Limited Clients - medium - Two clients are hitting the rate limit harder than anyone.
- Top 3 First-View Pages - medium - The first three pages new users see.
- Top 3 Revenue Months - medium - The three best months on record.
- Top Accuracy Model - medium - The single best-performing model.
- Top Active API Tokens - medium - The five busiest tokens.
- Top Active Senders per Channel - medium - Top three messages per channel by replies.
- Top Alert Resolvers - medium - The engineers who resolve the most.
- Top API Caller - medium - One user triggered more API calls than anyone.
- Top AWS Non-APAC Service Costs - medium - Outside APAC, AWS costs tell a different story.
- Top Batch Job Under Priority 1 - medium - Priority one. Top performer.
- Top Buyers by Transaction Count - medium - Frequency is loyalty. Who keeps coming back?
- Top Buyers of Premium Products - medium - Which users bought the most top-rated products
- Top Campaign by Opens - medium - One campaign got all the opens.
- Top Campaign by User Revenue - medium - Which campaign made each user spend the most?
- Top Category by User Segment - medium - Each segment has a favorite category.
- Top Chat Contributors - medium - The ten most active chat users.
- Top Committers in 2025 - medium - In a sea of commits, only a few wear the crown.
- Top Content by Lifetime Value - medium - Lifetime value. Measured in total watch time.
- Top Content by Views - medium - Top five content items by views.
- Top Content by Watch Time - medium - Some content holds attention. Others get skipped.
- Top Content Flagger - medium - Flagged content. Who flagged the most?
- Top Cost Categories - medium - Three categories eating the budget.
- Top Cost Entry per Team - medium - The single biggest bill per team.
- Top Earner Per Campaign - medium - The top-earning user per campaign.
- Top Error Categories in 2025 - medium - Last year's worst error categories.
- Top Error-Service Pair - medium - Which error-service pair triggered the most resolved incidents
- Top Frameworks by Accuracy - medium - Top three frameworks by accuracy.
- Top Identified Event Types - medium - The top users by events, but only the identifiable ones.
- Top Lessons Each Month - medium - Rank items within time periods and keep top 3
- Top Metric per Department - medium - Peak performer in every department.
- Top Pattern Matches - medium - A needle in a haystack, but how many haystacks?
- Top Percentile Spenders - medium - Top 1% of users by total spend via percentile bucketing.
- Top Product Categories - medium - Top three categories by page views.
- Top Product Category by Transactions - medium - Organic purchases, no marketing nudge. Which category wins?
- Top Products by Quantity Sold - medium - The bestsellers. By volume.
- Top Products per Category - medium - Five winners per category.
- Top Region by Order Volume - medium - The single busiest region.
- Top Regions by Critical Alerts - medium - Which regions have the highest volume of critical alerts
- Top Regions by Effective Uptime - medium - The most reliable regions.
- Top Repos by Commit Volume - medium - The most active repos in the org. No ties left behind.
- Top Repos by Successful Builds - medium - Green builds. Which repos lead?
- Top Revenue Products H1 - medium - First half of the year. Which products led the revenue race?
- Top Services by Regional Cost - medium - Top spenders in one region.
- Top Services by Uptime - medium - Uptime is a competition. Which services never blink?
- Top Services Per Provider - medium - Within each cloud, two services rise above the rest.
- Top Spender - medium - When your spending exceeds the priciest item on the shelf.
- Top Users by Pages Viewed - medium - Five users who browsed the most.
- Top Users by Recent Spend - medium - Big spenders in the last 30 days.
- Top Users by Session Time - medium - They spent the most time here.
- Transaction Revenue by Customer - medium - One month, every customer, every dollar accounted for.
- Transaction Share of User Spend - medium - Each transaction's share of the whole.
- Transaction Timeline - medium - First purchase to last. The full spending arc.
- Trend Spotter - medium - What did they spend last time? Context changes everything.
- Unclicked Searches by Campaign - medium - Searched but never clicked.
- Unique Hostnames per Region - medium - How many distinct machines live in each region?
- Unique Reporters per Content - medium - How many people flagged each item?
- Unmatched Deploy Services - medium - Two registries. They do not agree.
- Unsold Product Categories - medium - Dead inventory inflating storage costs.
- US Active User Share - medium - What percentage of active users are US-based?
- User Devices - medium - Desktop, mobile, tablet. What does each user actually use?
- User Engagement Summary - medium - Sessions plus searches. The full engagement picture.
- Users Outperforming Control - medium - Treatment beat control. For these users.
- User Spend Audit - medium - One user. One category. Total spend.
- Users With Admin Tokens - medium - Admin tokens. Who holds them?
- Users With API Errors - medium - Count unique users who have triggered an API error response
- Users Without Purchases - medium - How many registered users have never made a single purchase
- Users Without Sessions - medium - Account created. Never logged in.
- User With Most Transactions - medium - The most active buyer.
- Views by Content Type - medium - Count content views broken down by content type
- Word Count Per Message - medium - How wordy are the messages?
- Workers Earning Above Department Average - medium - Earning above the department average.
- Yearly Build Duration by Repo - medium - Build times by repo, year by year.
- Year-over-Year Content Launches - medium - Launch velocity, year over year.
- Zero Accuracy on First Training - medium - First run. Zero accuracy. How common?
- Cumulative Sales Per Customer - medium - Each purchase adds to the running total. Watch it climb.
- Category Revenue - medium - Which categories pull their weight?
- Platform Speed - medium - Which devices keep users longest?
- Click Rate - medium - Campaigns nobody clicks.
- Above the Curve - medium - Spenders who break from the pack.
- Department Snapshot - medium - Who is underperforming and who is excelling?
- Noisy Endpoints - medium - The routes generating the most noise.
- Build Health - medium - Repos that break more than they ship.
- Category Buyers - medium - Which categories have the broadest reach?
- Diverse Shoppers - medium - They shop the whole catalog.
- Silent Users - medium - Users who have never typed a query.
- Funnel Leakage Report - hard - Users enter the funnel. Most never reach the bottom.
- The Session Stitcher - hard - Page views without sessions are just noise.
- The Regional Cost Reconciliation - hard - Two cost tables, one region. Reconcile the running balance.
- The Cannibalization Report - hard - The new product launched. The old one suffered.
- 2nd Most Common Content Type - hard - Everyone talks about number one. What about number three?
- 7-Day Onboarding Conversion - hard - Signed up Monday. Still here by Sunday?
- Above Category Avg - hard - Above average is relative. Relative to what?
- Active User Penetration Rate - hard - How much of the user base is actually alive?
- Adopters Before Migration - hard - They used the old feature. Did they ever touch the new one?
- Aggregate Votes by Paper Subject - hard - Net revenue, day by day, for one product in one region.
- Alert Severity - hard - When the alarms go off, who screams loudest?
- Allocations in Top Spending Region - hard - The biggest spenders live in one region.
- Alphabetical Tag Sort - hard - Tags in the wrong order.
- API Call Distribution Fraction - hard - Not all endpoints are created equal.
- Average Event Progression Time - hard - How fast do users move through the funnel?
- Average Sessions Per User - hard - How often do users come back?
- Best Selling Product by Month - hard - Every month has a winner.
- Bottom 2% Services by Spend - hard - The bottom 2% of spenders. Who are they?
- Cache Efficiency - hard - Some edges run hot. Others coast on the global average.
- Campaign Bookend Engagement - hard - First impression versus last. The gap.
- Campaign Conversion Count - hard - The push notification went out. Did anyone convert?
- Campaign Conversion Window - hard - A narrow window between impression and action.
- Campaign Engagement Rank Shift - hard - Two months, many countries. Who moved up? Who fell?
- Category Deep Dive - hard - Revenue, units, rank. The full category report card.
- Cheapest and Most Expensive Service per Region - hard - Every region has a bargain and a budget-buster.
- Cheapest CDN Route - hard - The cheapest path across regions.
- Classify Accounts by Activity Tier - hard - The accounts fall into tiers. Where is the cutoff?
- Cloud Cost Breakdown by Provider - hard - Cloud costs, provider by provider.
- Commit Cadence - hard - Some repos go quiet for too long.
- Consecutive Cost Growth Periods - hard - Five straight months of spending increases.
- Content Page Spreads - hard - Content, laid out in two columns.
- Cost Efficiency Variance - hard - Cost efficiency varies. By how much?
- Creator Favorite Content Type - hard - Every creator has a go-to format.
- Daily Net Revenue - hard - Net revenue, day by day. Refunds included.
- Data Quality - hard - Failed checks pile up. Which tables need the most attention?
- Department Quarterly Pivot - hard - Headcount by department, sliced by quarter. The org chart in numbers.
- Deploy Velocity - hard - Days between deploys. Some services ship fast, others crawl.
- Endpoint Name Word Count - hard - Some endpoint names are novels.
- Endpoint Ranking - hard - The slowest endpoints. Called to the principal's office.
- Error Category Breakdown - hard - Postmortem time. Categorize the errors.
- Exact Keyword Counts in Logs - hard - Errors and warnings. Count every single one.
- Experiment Impact - hard - Which experiments moved the needle? Rank them within each group.
- Experiment Variant Ratios - hard - Control versus treatment. The participation split.
- Fastest and Slowest Services by Region - hard - The fastest and slowest in every region.
- Fastest Page View to Click - hard - How fast from view to click?
- Feature Flag Engagement Impact - hard - Flags on versus flags off. The engagement gap.
- Feature Flag Fan vs Detractor Pairs - hard - Some users love the flag. Others want it gone.
- Feature Name Intersection - hard - Training names versus serving names. The overlap.
- First-Day Session Retention - hard - Day one retention. The first test.
- First Interaction Credit - hard - Attribute transactions to earliest touchpoint
- Flatten Org Chart Hierarchy - hard - The tree runs deep. Walk every branch to the root.
- Friday Spending Analysis - hard - Friday spending during Q1.
- Full Funnel - hard - Search. Browse. Buy. Only a few do all three.
- Healthiest Service Check History - hard - The healthiest service. Full history.
- High Engagement Pages - hard - Some pages hold attention longer than others.
- Impressions by Search Keyword - hard - Campaign performance, keyword by keyword.
- Incident Keyword Messages - hard - Certain words trigger an investigation.
- Intra-Region Latency Diff - hard - Same region. Different latency.
- Largest CDN Response - hard - One edge location served something massive.
- Latency Quartiles Per Endpoint - hard - Quartile breakdowns. Endpoint by endpoint.
- Latency Variance and Std Dev - hard - How much does latency actually vary?
- Longest Uptime Streak - hard - Pass, pass, pass. How long until fail?
- Longest Visit Streaks - hard - Day after day after day. Who kept coming back?
- Lowest CPU Pods per Namespace - hard - The five lightest pods per namespace.
- Market Share - hard - Every category wants a bigger slice.
- Median Cloud Cost by Service - hard - The median cloud bill, service by service.
- Median Failure Rate by Table - hard - Half the tables fail more than this.
- Median Household Earnings - hard - Household earnings. The median reveals the middle.
- Median Model Accuracy - hard - The median accuracy. Not the mean.
- Median Transaction by Category - hard - The middle transaction in each category.
- Mid-Range Team Spenders - hard - Above average but not extreme.
- Minimum Parallel Workers - hard - Too few workers and it stalls.
- Model Accuracy Drift - hard - Accuracy used to be higher.
- Mode of Small Team Costs - hard - One charge keeps showing up everywhere.
- Monthly Cloud Cost Forecast Error - hard - The forecast was off. By how much?
- Monthly Deploy Counts Pivoted - hard - Deploys by month. Side by side.
- Monthly Revenue Change - hard - Revenue, month over month.
- Monthly Service Retention - hard - Users came back. Or they did not.
- Most Efficient High-Volume Campaign - hard - High volume. Low cost. The dream campaign.
- Most Efficient Region by Token Usage - hard - Some regions squeeze more out of every token.
- Multi-Category Buyers - hard - One-category shoppers are boring.
- Multi-Month Active Users - hard - Active this month and last month. Who stuck around?
- New Services With Poor Health - hard - New services, already struggling.
- New vs Returning User Share - hard - Fresh faces versus familiar ones.
- Node Utilization - hard - Overloaded nodes hiding in busy regions. Spot the hot spots.
- Oldest Alert per Service - hard - The oldest unresolved alert per service.
- Peak Concurrent Pods - hard - The most pods alive at once.
- Peak Concurrent Tokens - hard - How many tokens were alive at the same time?
- Pipeline Duration vs Throughput - hard - Does throughput correlate with duration?
- Previous Day Top Service - hard - Yesterday's top spender.
- Price Pairs - hard - Same shelf, wildly different stickers. Spot the pricing gaps.
- Quarterly Peak Cloud Costs - hard - Every quarter has a peak bill.
- Quarter-over-Quarter Latency Trend - hard - Latency trending up or down? The quarters have the answer.
- Rarest Latency Value - hard - A latency value that appeared exactly once.
- Regional Sales Growth QoQ - hard - Quarter-over-quarter growth. Region by region.
- Resolved vs Unresolved Alerts - hard - Resolved versus open. By severity.
- Rolling Revenue Average - hard - Smooth out the revenue bumps. The trend matters more.
- Running Total With CTE - hard - A running total that builds step by step.
- Same-Day Session and Transaction Correlation - hard - Same day session and purchase. Connected?
- Search Algorithm Rating - hard - How good are the search results?
- Search Success by User Tenure - hard - Compare search click-through rates between new and existing users.
- Search Term Length vs Click Rates - hard - Longer queries, more clicks?
- Second Purchase - hard - The first buy is curiosity. The second is commitment.
- Sequential Service Transitions - hard - Job to job. The transitions.
- Service Scorecard - hard - Deploys vs. alerts. One row per service tells the whole story.
- Services Hitting Cost Threshold - hard - The budget line is here. How many crossed it?
- Services With Most Checks in 2025 - hard - Last year's most-checked services.
- Services With Multi-Quarter Uptime - hard - Multi-quarter uptime streaks.
- Service Uptime Turnaround - hard - It was down. Then it came back. Stronger.
- Service With Most Critical Alerts - hard - One service keeps setting off the alarms.
- Session Count Distribution - hard - How are sessions distributed among the newest users?
- Session Page View Distance - hard - Page view distance per session.
- Shared Channel Contacts - hard - User networks mapped through messages.
- Spend and Rank - hard - Five thrones at the top of the spending leaderboard.
- Spending Range - hard - Between the smallest purchase and the biggest lies the story.
- Streak Status Changes - hard - Detect value changes across consecutive rows
- Team Cost Allocation Comparison - hard - Individual spend versus team average.
- Tenure Spread for Active Tokens - hard - Tenure extremes among active tokens.
- The Usual Suspects - hard - Same services, same checks, same problems.
- Top 3 Monthly Costs per Team - hard - Three priciest months per team.
- Top and Bottom Cloud Spenders - hard - The extremes. Top and bottom.
- Top Commit Authors by Repo - hard - Three authors per repo. The top committers.
- Top CPU Pods per Namespace - hard - The two most CPU-hungry pods in each namespace.
- Top Endpoint by Power Users - hard - Power users have a favorite endpoint.
- Top Flagged Campaign Resolutions - hard - Flagged the most. Resolved how?
- Top Framework by Deployments - hard - The framework most often deployed.
- Top Models by Framework - hard - Every framework has a star model.
- Top Per Category - hard - Every category has a champion. Crown them all.
- Top Percentile API Tokens - hard - The most suspicious tokens.
- Top Regions by High CPU Nodes - hard - Five regions with the hottest CPUs.
- Total Hours Between Consecutive Events - hard - Hours between state changes.
- Transaction-Only Features - hard - Exclusive to one source. Missing from the other.
- Upvote Percentage by Age Cohort - hard - New users versus existing. The upvote gap.
- User 360 - hard - One row per user. Everything they did, or didn't do.
- User Campaign Overlap Percentage - hard - How much ad overlap between users?
- User Connection Score - hard - Every user has a social score.
- User Spend Segmentation by Category - hard - Users segmented by spending behavior.
- Users Who Churned in February - hard - Gone in February.
- Users With and Without Ad Clicks - hard - Clicked an ad versus never clicked. The split.
- Viewer-to-Purchaser Activity - hard - Started as viewers. Became creators.
- Weekly Order Status Report - hard - Weekly order status. The report.
- Weekly Transaction Day Split - hard - Transactions by day of week.
- Weighted Variant Selection - hard - Select a row using cumulative weight probabilities.
- Worst Table Per Year by DQ Failures - hard - Every year has a worst table.
- YoY Signup Growth Rate - hard - This year versus last year. Growing or shrinking?
- Zero-Retry Job Ratio by Priority - hard - No retries needed. First try success rate.
- Slowly Changing Dimension Type 2 - hard - Addresses change. History must not be erased.
- Normalization Tradeoffs in Practice - hard - Clean data or fast queries? You can't always have both.
Data Modeling Interview Questions (56)
- Customer Address History - easy - People move. Sometimes twice in a month. How do you remember where everyone was, and when?
- B2B Invoicing Data Model - easy - Invoices go out, partial payments trickle in, and some customers are three months overdue.
- Fitness Studio Membership Schema - easy - Classes fill up. Members no-show. Billing continues.
- A Number for the Seller - easy - They want a total. Give them the right schema first.
- Event Ticketing System Data Model - easy - JSON in. Reporting warehouse out. Design both ends.
- Loan Management Schema - easy - Money out, payments back. The balance has to be exact.
- Toll Road Sensor Analytics - easy - Cars enter, cars exit. Except when they don't.
- Fitness App Data Model - easy - Reps, sets, streaks, and personal bests. Gym rats love their stats.
- Ride-Sharing Platform Schema - medium - Riders, drivers, and fares. Everyone takes a cut.
- Employee Transfer Tracking System - medium - People switch teams. HR loses track.
- Movie Streaming Analytics Schema - medium - They pressed play. What happened next is the whole question.
- Log Parsing Pipeline Schema - medium - Raw text files, terabytes of them, full of buried signals and cryptic error codes.
- Livestream Analytics Schema - medium - Someone goes live, thousands tune in, chat explodes, and virtual gifts start flying.
- POS Sales Data Warehouse - medium - Every beep at the register. Coupons, returns, all of it.
- Online Retail Star Schema - medium - Prices change. Categories shift. Revenue slices everywhere.
- Social Platform Data Model - medium - Follows, likes, replies to replies. It never stops.
- Subscription Churn Analysis Model - medium - Subscribers are leaving. The data knows why.
- Employee Application Time Tracking - medium - Every minute tracked. Every app accounted for.
- Food Truck Operations Data Model - medium - Mobile vendor, fixed menu, unpredictable locations.
- Loan Application Reporting Schema - medium - Approved, declined, or pending. Design the tables that say so.
- Machine Process Event Log Schema - medium - Machines fire events. Pair them up before they bury you.
- Order and Shipment Data Model - medium - Order placed. Now track it to the door.
- Sales Analytics Star Schema - medium - Five rounds with a data engineer. Round five: design the star.
- Subscription and Payment Data Model - medium - Two user types. Multiple payment methods. One messy billing table.
- The JSON Files That Became a Data Mart - medium - Three semi-structured inputs. One queryable warehouse.
- The Plan That Changed Twice This Month - medium - Subscribers come, go, downgrade, and share. The schema has to keep up.
- The Retail Tables That Need a New Home - medium - A working system. Now redesign it so the analysts can actually use it.
- The Talent Funnel - medium - Thousands applied. One accepted. Where did the rest go?
- The Transfer Request - medium - Apply, wait, get approved or denied. Track all of it.
- Retailer Data Warehouse Design - medium - Queries are crawling. The analysts are not happy.
- The Table That Lies - medium - Every query comes out wrong. The data is all there.
- Clickstream and Session Schema - medium - Millions of clicks, mostly anonymous.
- The Celebrity Problem - medium - One post. A million notifications. Something has to give.
- Housing Marketplace Analytics - medium - Sellers want buyers. Buyers want deals.
- Trending Dishes Dashboard - medium - What's everyone eating? The answer changes hourly.
- Airline Flight Operations Schema - medium - Flights, passengers, and routes. Before you draw a single table, tell me the grain.
- A/B Experiment Assignment Schema - medium - One user, one experiment, one variant. No exceptions.
- Multiplayer Game Match History - medium - Millions of matches. The leaderboard refreshes in fifteen minutes.
- EdTech Classroom Engagement Schema - medium - They opened the assignment. Did they actually read it?
- Telecom Network Connectivity Warehouse - hard - One device goes down. The ripple keeps going.
- Metric Definition Reverse Engineering - hard - Five numbers on a dashboard. Your job: figure out where they come from.
- Property Booking Platform - hard - Five-star listing. Three-star reality.
- E-Commerce Supply Chain Tracking - hard - A package splits, reroutes, and (maybe) arrives.
- SCD Type 2 Customer Dimension - hard - Things were different six months ago. Can you prove it?
- Financial Trading Warehouse - hard - Every trade, every tick, every fraction of a share. The regulators want receipts.
- Content Engagement Data Model - hard - Post published. Now measure everything that happens next.
- Content Search and Discovery Schema - hard - Searchable from every angle. Design it so nothing gets lost.
- Marketplace Sales Warehouse - hard - No schema given. The interviewer is watching.
- The League With Too Many Loyalties - hard - A player can belong to many teams. The schema must agree.
- The Schema That Could Not Answer Back - hard - Forty columns in. Zero useful answers out.
- The Churner Who Came Back - hard - They cancelled. They came back. The report has to tell both stories correctly.
- The Territory That Keeps Moving - hard - Reps get reassigned. The receipts have to survive.
- Insurance Claims Lifecycle - hard - A claim gets filed. Then it gets complicated. Then it gets reassigned. Then it loops back.
- Online Marketplace - Seller Payouts - hard - The buyer paid one number. The seller got a different one.
- Cloud File Storage Metadata Schema - hard - A file is also a folder. A folder is also a file.
- Three-Sided Marketplace Delivery Schema - hard - One order. Two deliveries. Revenue counted twice. Where is the bug in your schema?
Pipeline Architecture Interview Questions (121)
- Hourly ETL Pipeline with Consistency - medium - Every hour, on the hour. No excuses.
- Time Series CSV Ingestion Pipeline - medium - One massive CSV. Millions of timestamps.
- Order and Menu Recommendation Pipeline - medium - What they ordered says a lot about what they want next.
- Card Transaction Streaming Pipeline - medium - Every swipe tells a story.
- Data Pipeline for Sales Analytics - medium - Sales data is piling up. Someone has to make sense of it.
- Batch ETL: MongoDB to Redshift - medium - Two databases. One direction. No data left behind.
- Whiteboard ETL Pipeline Design - medium - Marker in hand. Draw the whole thing.
- GPS Tracking Pipeline for Logistics - medium - Trucks are moving. Every ping counts.
- SCD Pipeline into a Delta Lakehouse - medium - Dimensions change. History must survive.
- SaaS API Connector with Incremental Sync - medium - The API has rate limits. You have deadlines.
- Real-Time POS Ingestion into Snowflake - medium - The cash register data needs to be queryable by morning.
- Streaming Pipeline with Schema Validation and Snowflake Sink - medium - Bad records cannot reach the warehouse.
- Dynamic Schema File Ingestion Pipeline - medium - The schema changed overnight. Again.
- Pre-Aggregated User Activity Metrics Pipeline - medium - DAU, WAU, MAU. Refreshed every hour.
- Database Replication and Schema Normalization Pipeline - medium - Production is the source. Analytics needs its own copy.
- Document Ingestion and Text Extraction Pipeline - medium - Buried in PDFs. The data is in there somewhere.
- On-Prem to Cloud Pipeline Modernization - medium - The on-prem servers are not getting any younger.
- The API Drip Feed - medium - The API gives you 100 records at a time. You need millions.
- CDC Connector: Log-Based vs Trigger-Based - medium - Two ways to watch the database. Each has a cost.
- Snowflake Query Performance Degradation Diagnosis - medium - Queries used to be fast. Something changed.
- Real-Time POS Pipeline with Snowpipe and MERGE - medium - Sales hit the register. Snowflake needs to know now.
- GCP Sales Analytics Pipeline - medium - Sales data, BigQuery, Dataflow. Make it all sing.
- Resume Document Ingestion and Extraction Pipeline - medium - A thousand resumes. Structured data inside each one.
- Subscription Analytics Pipeline - medium - Subscribers churn. The pipeline cannot.
- Large-Scale Sales Data Pipeline for CPG Analytics - medium - Retail data at CPG scale. Every SKU, every store.
- Financial Services Pipeline with Regulatory Reporting - medium - The regulator does not accept 'eventually consistent.'
- Event-Driven Insurance Pipeline with Async Claim Processing - medium - Policies are instant. Claims take their time.
- Databricks Pipeline with Spark Performance Optimization - medium - Spark jobs are running. Just not fast enough.
- Gaming Event Pipeline: Streaming vs Batch Architecture Decision - medium - Millions of gamers. The architecture decision changes everything.
- Vehicle Fleet Telematics and Rental Operations Pipeline - medium - Every vehicle is reporting. Every rental matters.
- Insurance Claims and Policy Data Platform on Azure Databricks - medium - Claims arrive messy. The medallion cleans them up.
- Healthcare Claims CDC Pipeline with PySpark - medium - Healthcare claims change constantly. The warehouse cannot fall behind.
- Fintech Lending Platform Event Pipeline - medium - Loan approved. Loan denied. Every decision is an event.
- Azure Data Factory Orchestration with Databricks Unity Catalog - medium - ADF orchestrates. Unity Catalog governs. Nothing leaks.
- Energy Trading Market Data Pipeline - medium - Markets move in milliseconds. The pipeline has to keep up.
- Streaming Content Metadata and Viewer Engagement Pipeline - medium - The catalog updated. Did anyone notice?
- E-Commerce Platform Analytics Pipeline: Orders to Warehouse - medium - Orders placed. Data warehouse hungry.
- Regulatory Data ETL Pipeline with Dynamic Schema Handling - medium - The regulator changed the format. Again. Handle it.
- Last-Mile Delivery Shipment Tracking State Machine Pipeline - medium - Out for delivery. Delivered. Except the events arrived backwards.
- Financial Ratings Data Pipeline with dbt Incremental Strategy - medium - Ratings change. The incremental model has to keep pace.
- The Fare Aggregator - medium - Airfares shift every minute. Catch the best ones.
- The Consent Stitcher - medium - Consent was given. Or was it? Stitch the records together.
- Loyalty Rewards Pipeline with Late Bank Data - medium - The bank data shows up late. The rewards were already sent.
- Multi-Cloud Billing Unification Pipeline with Medallion Architecture - medium - AWS, Azure, GCP. Three bills. One truth.
- Multi-Touch Marketing Attribution Pipeline on Snowflake - medium - They saw the ad, clicked the email, then bought. Who gets credit?
- The Queue That Wouldn't Stop Growing - medium - 500,000 messages behind and the number keeps climbing.
- The Vendor Who Never Warns You - medium - Every month, something is different. The dashboards have no idea.
- The Sale That Needs to Land Now - medium - Three channels feeding one view. Not all of them speak the same language.
- The Provider That Sometimes Sleeps - medium - The models run at dawn. The data has to be there first.
- The Revenue That Was Wrong for Two Weeks - medium - Nobody caught it until the CFO asked a question. Design the system that catches it first.
- Six Hours to Miss a Deadline - medium - The rebuild works. It just doesn't finish in time.
- Every Device Has Its Own Dialect - medium - Three sources. Three formats. Same workout.
- Personalization Platform Ingestion - medium - Fresh signals, many teams, one pipeline.
- The Claim That Picks Its Own Lane - medium - Three entry points. Different workflows. All must route correctly.
- The Distributor Filing Problem - medium - Hundreds of suppliers. One warehouse. One deadline.
- URL Shortener Click Analytics Pipeline - medium - Billions of clicks. One tiny code. Two very different clocks.
- Real-Time Fraud Detection Pipeline - hard - The fraudsters move fast. Your pipeline has to move faster.
- Event System for Multiple Consumers - hard - One event, many hungry consumers.
- Real-Time Sales Lakehouse Ingestion - hard - The registers never stop ringing.
- Viewing Event Pipeline - hard - Someone is watching. Capture everything.
- Ad Simulation Platform Pipeline - hard - A million slots. A thousand campaigns. Every combination matters.
- Data Ingest Pipeline with Access Tradeoffs - hard - How you store it decides how fast you can read it.
- Fintech ETL with Data Validation Checks - hard - Bad data in fintech is not just messy. It is expensive.
- ML Feature Pipeline for Model Deployment - hard - The model is only as good as what you feed it.
- Streaming CDC into Delta Lake with UPSERT - hard - The source changed. The lake needs to know immediately.
- Multi-Region Payment Event Pipeline - hard - Payments from everywhere. One consistent report.
- Dual-Source Inventory Sync Pipeline - hard - Two systems, two schemas. One truth.
- Multi-Device Event Pipeline with Late Data - hard - Phones, tablets, laptops. And some of them report late.
- Cost-Optimized Clickstream Data Lake - hard - 600 million clicks a day. The budget is not infinite.
- Livestream Event Ingestion Pipeline - hard - The stream is live. The data cannot wait.
- S3-Based Data Warehouse with File-Level Access Control - hard - Everyone can see the bucket. Not everyone should.
- Multi-City Demand Forecasting Data Pipeline - hard - Five cities. Five data formats. One prediction.
- Healthcare Data Lake with Multi-Format Ingestion - hard - PDFs, HL7, JSON. All of it lands in the same lake.
- Near-Real-Time Trending Dishes Dashboard - hard - The dish rankings update faster than the kitchen.
- Lambda Architecture for Batch and Streaming Workloads - hard - Real-time and batch. Same pipeline. No compromises.
- AWS Pipeline Auto-Scaling for Variable Volume - hard - Tuesdays are quiet. Black Friday is not.
- Clickstream Pipeline for Apple Product Analytics - hard - Every tap, swipe, and scroll. At scale.
- Dual-Source Hotel Inventory Sync Pipeline - hard - Two booking systems. Rooms do not duplicate themselves.
- Merchant Payment Summary Pipeline - hard - Raw payment logs in. Clean merchant summaries out.
- Multi-Device Streaming Pipeline with GDPR Deletion - hard - Users want their data erased. Completely.
- Financial Trading Data Warehouse - hard - Fractional shares, multi-currency, point-in-time. All of it.
- Data Platform IaC with Semantic Layer - hard - Infrastructure as code. Meaning as a service.
- Online Schema Migration on a Billion-Row Table - hard - Add and backfill a new column to a billion-row production table with zero downtime.
- Order and Menu Feature Pipeline for Recommendations - hard - They ordered pad thai twice. That means something.
- AWS Pipeline with Auto-Scaling and Cost Governance - hard - Scale up when needed. Do not bankrupt the team.
- Pharma Data Ingestion Pipeline with Governance - hard - The FDA has opinions about your data pipeline.
- City-Wide Bicycle Demand Forecasting Pipeline - hard - Bikes in, bikes out. The city needs to predict demand.
- Cost-Efficient Clickstream Analytics with Two-Year Retention - hard - Two years of clicks. Every query has to be affordable.
- Retail Clickstream Event Store at Kafka Scale - hard - 600 million events a day. Two years of retention.
- Cellular Connectivity and App Log Data Warehouse - hard - Tower signals meet app events. Somewhere in between is the truth.
- On-Prem and Event-Driven Pipeline Migration to Cloud - hard - Half the jobs run on cron. Half run on events. All of it has to move.
- HIPAA-Compliant PHI De-identification Pipeline for Development - hard - Dev needs production data. HIPAA says absolutely not.
- Streaming Device Telemetry and Ad Impression Pipeline - hard - Every ad seen. Every second watched. Real-time.
- Streaming and Batch Unified Pipeline on Azure Databricks - hard - Streaming and batch. One pipeline to rule them.
- Consumer Goods Trade Promotion Pipeline on GCP - hard - Was the promotion worth it? The data knows.
- EHR Platform Operational Data Pipeline - hard - Patient records in, operational insights out.
- Global Insurance Premium and Loss Ingestion Platform - hard - Premiums collected globally. Losses happen locally.
- Rocket Delivery Feature Store Pipeline - hard - Same-day delivery. The features have to be faster.
- Real-Money Card Game Session Reconstruction Pipeline - hard - Real money on the table. Reconstruct every hand.
- Legacy ETL Modernization with SCD Type 2 Entity Resolution - hard - The legacy pipeline works. Nobody knows how.
- Connected Vehicle Telemetry Pipeline with IaC Deployment - hard - Every vehicle is a sensor. Deploy the pipeline to catch it all.
- Real-Time Investment Portfolio Position Pipeline - hard - Positions shift by the second. The math cannot lag.
- Device Insurance Claims Pipeline with Real-Time Fraud Scoring - hard - The claim looks clean. The fraud model disagrees.
- TV Audience Measurement Pipeline with Panel Projection - hard - Set-top boxes tell you who watched. Projection tells you how many.
- Cross-Platform TV and Digital Ad Measurement Pipeline - hard - TV and digital. Same viewer, two measurement worlds.
- Real-Time News Event Detection Pipeline from Social Media Firehose - hard - The firehose is on. Separate signal from noise.
- Capital Markets Intraday Risk Pipeline with BCBS 239 Lineage - hard - Intraday risk, full lineage. The regulator is watching.
- Federated Clinical Trial Data Pipeline - hard - Patient data stays local. Insights have to be global.
- Print Order Ganging and Manufacturing Analytics - hard - One press run, many orders. Group them right.
- Daily Payment Log Pipeline - hard - Three regions, billions of payments, one merchant summary by 6 AM.
- The Booking That Came Three Ways - hard - PMS, OTA, and website all think they took the reservation first.
- The Boutique That Sold in Six Currencies - hard - Every sale is real. The rate it was converted at depends on who is asking.
- The Clock That Runs Two Ways - hard - Nightly batch and live events. One dashboard.
- The Fleet That Never Stops - hard - Every truck is talking. Not everyone can hear them yet.
- Three Providers, One Workout - hard - The same ride, reported three times.
- The Decision Before the Door Closes - hard - The window to stop it is smaller than you think.
- The Migration That Cannot Break Morning - hard - It all works today. Moving it without losing a single report is the hard part.
- Two Million Boxes by Monday Morning - hard - Shipped, maybe. Delivered, debatable.
- The Leaderboard That Costs $25K a Month - hard - Product wants it live. Engineering has a price tag.
- Four Teams, One Topic, No Agreement - hard - Everybody is writing to it. Nobody documented it. Now production is fragile.
- The Analyst Who Saw the Salary Data - hard - Two incidents. One shared lake. The access model was never designed, just assumed.
How It Works
- Choose a domain: SQL, Python, Data Modeling, or Pipeline Architecture
- Select your seniority level and target company tier
- Start a timed mock interview with a vague prompt
- Ask clarifying questions to the AI interviewer
- Write and execute your solution against a real database
- Get instant feedback and a hire/no-hire decision