FAANG Data Engineer Interview Questions
FAANG Data Engineer Interview Questions
FAANG-tagged data engineer interview questions with per-company rubrics.
346 practice problems matching this filter. Domains: SQL (256), Data Modeling (12), Python (76), Pipeline Architecture (2). Difficulty: medium (146), easy (138), hard (62).
SQL (256)
- 10 Lowest Uptime Services - medium - Ten services at the bottom of the reliability chart.
- 2FA Confirmation Rate - medium - Two-factor sent. How many confirmed?
- 30-Day Page View Counts - easy - Thirty days of engagement. Quick snapshot.
- 7-Day Token Retention - medium - Premium tokens, day by day.
- 80th Percentile API Latency - medium - The 80th percentile tells the real story.
- Active Users With April Transactions - easy - Active accounts that also opened their wallets. How many?
- Presence vs. Participation - medium - Being in the region and being active are two very different things.
- All Infra Regions - easy - The infrastructure spans the globe. Map it.
- The Tag Order - hard - Tags arrived in chaos. The system needs them in line.
- API Call Distribution Fraction - hard - Not all endpoints are created equal.
- API Calls With Matching Status - medium - Same status, same pattern. Coincidence?
- Average Event Progression Time - hard - How fast do users move through the funnel?
- Average Review Comments by Author - medium - Some authors get more feedback than others.
- Average Session Duration - medium - How long do users actually stay?
- Average Session Duration by Device - easy - Session length, device by device.
- Average Sessions Per User - hard - How often do users come back?
- Best Selling Product by Month - hard - Every month has a winner.
- Best-Selling Reps Each Month - easy - In every category, a few sellers rise to the top.
- Build Success Rate by Trigger - medium - Which triggers produce green builds?
- Build Success vs Failure by Repo - medium - Green versus red, repo by repo.
- Busy Authors - medium - Some developers spread their commits everywhere.
- The Notification That Paid Off - hard - The message went out to thousands. A smaller number actually bit.
- Campaign Engagement Rank Shift - hard - Two months, many countries. Who moved up? Who fell?
- Campaign Revenue Totals - easy - Every campaign has a price tag. Total them up.
- CDN-Related DNS Lookups - easy - DNS lookups tied to the CDN.
- Character Position in Endpoint - easy - URL patterns, character by character.
- Cheapest CDN Route - easy - The cheapest path across regions.
- Cheapest Cost Per Region - easy - Lowest spend per region.
- Cheapest Transaction per User - easy - Everyone has a smallest purchase.
- The Quiet Outlier - hard - Ignore what the traffic does all day. Find the spike that barely showed up.
- Clicked Ad Impressions - easy - They saw the ad. They clicked.
- Loyalty's Double Tap - medium - When a nudge and a banner team up.
- Click vs Non-Click Rates - medium - Some searches lead to clicks. Most do not.
- Cloud Cost Trend Analysis - medium - Cost trends across billing periods.
- Completed Priority-1 Jobs - easy - Priority one. Completed.
- Content Recommendation Engine - medium - Pages they haven't discovered yet.
- Content Session Counts - medium - Session metrics, content item by item.
- Cost Share Within Category - medium - Each entry's slice of the category total.
- Service Roll Call - easy - The mesh is sprawling. Find out exactly how many services are actually running.
- Cross-Variant User Pairs - medium - Same experiment. Different variants. Who overlaps?
- Currently Active Feature Flags - medium - Which flags are live right now?
- Customer Full Name Concat - easy - First name, last name. Combine them.
- Custom Message Type Counts - medium - Not all messages are created equal.
- Daily Cross-Platform Users - easy - Mobile and web. Same day, same users?
- Daily Error Resolution Ratio - medium - Reported versus removed. The daily ratio.
- Daily Net Revenue - hard - Net revenue, day by day. Refunds included.
- Daily Session and User Counts - medium - Sessions and users, day by day.
- Campaign Click Rate - medium - Among engaged users, which campaigns landed.
- Days with More Edited Than Unedited Messages - medium - Some days, more messages get edited than sent.
- Department Cost by Status - medium - Headcount and compensation. The dashboard view.
- Deployments per Environment - medium - Dev, staging, prod. Where do most deploys land?
- Deploy Reliability Scores - medium - A reliability scoreboard for deploy teams.
- The Apprentices Still in the Forge - easy - A model is not a model until it stops learning and starts earning.
- Device Type Serving Most Users - medium - One device type serves more users than the rest.
- Device Types With Chrome Users - easy - Power users and their devices.
- Disabled-Flag Share by Owner - medium - Which teams ship everything off by default.
- Distinct Chat Conversations - medium - How many unique conversations?
- Distinct Product Categories - easy - A quick category inventory.
- Duplicate DQ Check Records - medium - Passed QA twice. That's the problem.
- Duplicated User Event Messages - medium - Duplicated messages from the alerts topic.
- Duplicate Training Runs - medium - Same model, trained twice.
- Verbose by Design - hard - Audit endpoint paths. Length without the outer slashes, and how many segments.
- Engagement Gap - medium - Zero transactions is still a data point. Count everyone.
- The Failure Report - medium - Errors by day and region. Some areas are worse than they appear.
- Errors With Service Health - easy - Error data, enriched with health context.
- Even-ID February Signups - easy - A very specific slice of a very specific cohort.
- Even-ID June Signups - easy - Odd IDs, even IDs. The filter is precise.
- Event Count on Key Days - easy - Key days. Key event volumes.
- Events by Month Across Years - easy - Month by month, year by year. The pattern emerges.
- Event Types Spanning Multiple Months - easy - Some events span seasons.
- Exact Keyword Counts in Logs - hard - Errors and warnings. Count every single one.
- The A/B Verdict - medium - Variant A or Variant B. The conversion numbers pick the winner.
- Fastest Page View to Click - hard - How fast from view to click?
- Feature Flag Adoption - medium - How widely adopted are the flags?
- Feature Flag Fan vs Detractor Pairs - hard - Some users love the flag. Others want it gone.
- Feature Name Intersection - hard - Training names versus serving names. The overlap.
- Filtered User Roster - easy - A clean roster for the all-hands.
- Find Deploy Authors - easy - Same person. Many different spellings.
- Find the Fifth Largest Cost - medium - Not the biggest. Not the smallest. The fifth.
- The Ninety-Day Comeback - hard - Everyone shows up once. Who comes back before the quarter ends?
- First Half of Page Views - medium - Half the data. The first half.
- First Interaction Credit - hard - Attribute transactions to earliest touchpoint
- First Migration Record - easy - The very first migration. Where it all began.
- First Contact - easy - Every pipeline has a first run. This is what it brought back.
- Frequent Message Senders - medium - Someone is sending too many messages.
- Full Funnel - hard - Search. Browse. Buy. Only a few do all three.
- Health Checks per Service - easy - Some services get checked constantly.
- Heavy Hitters - medium - Some repos never sleep.
- Heavy Namespaces - medium - Kubernetes has favorites. Some namespaces carry more weight.
- High Engagement Pages - hard - Some pages hold attention longer than others.
- Highest and Lowest Cloud Costs - medium - The extremes in cloud spending.
- Highest Daily Spend - medium - Somewhere in that window, someone broke the spending record.
- High Price Products - easy - Everything above 100.
- High-Rated In-Stock Percentage - easy - Highly rated and in stock. A rare combo.
- Impressions by Search Keyword - hard - Campaign performance, keyword by keyword.
- Inactive Android Control Users - medium - Android control cohort. Gone quiet.
- Inactive Unverified Users - easy - Signed up. Never verified. Never came back.
- Inactive Users in Date Range - medium - Ghost accounts. Active signup, zero sessions.
- Intra-Region Latency Diff - hard - Same region. Different latency.
- iOS Adoption by Age Bucket - medium - The install numbers don't match the hype.
- iOS Sessions by Device Type - medium - iOS engagement, device by device.
- Largest Group - easy - One group towers above the rest.
- Largest Single Cloud Cost - medium - One line item. The biggest bill of all.
- Last Five Batch Jobs - easy - The last five. A quick tail check.
- Last Migration Record - easy - The most recent migration. Is it the last?
- Latest Session Per User - easy - Everyone has a most recent session.
- Latest Version Per Service - easy - The latest version deployed. Each service.
- Longest Deploy With Full Identifier - easy - The longest deployment. Full ID.
- Long Searches Containing 'er' - easy - Long queries with 'er'. A pattern?
- Low-Volume Stream Topics - medium - Quiet topics in the stream.
- Max Value Per Location - easy - Every location has a peak.
- Mentorship User Pairs - medium - Pair them up. Mentor and mentee.
- Messages Containing Keyword - easy - Flagged terms in the messages.
- Messages From Specific Users - easy - Specific users. What did they say?
- Metric Range by Department - medium - Where each team's numbers sit, low to high.
- Mid-CPU Nodes - easy - Not the heaviest. Not the lightest. The middle.
- Mid-Range Cost Allocations - easy - Not the cheapest. Not the priciest. The middle.
- The Floor Price - medium - Before the negotiation, find what each provider really charges at its cheapest.
- Mobile Event Counts - easy - Mobile engagement, device by device.
- Model Accuracy Drift - hard - Accuracy used to be higher.
- Most-Allocated Service - hard - The service every big team keeps paying for.
- Monthly Revenue Change - hard - Revenue, month over month.
- Most Common Monday Outcome - medium - Mondays have a pattern.
- Most Efficient API Endpoint - medium - Best throughput per call.
- Most Efficient Region by Token Usage - hard - Some regions squeeze more out of every token.
- The Tiebreaker - easy - One column wasn't enough. The second column settles it.
- Multi-Host Regions by Node Type - medium - Some regions are quietly building empires.
- Multi-Variant Experiments - easy - One user, multiple experiments.
- Mutual Channel Connections - medium - Two users. What channels do they share?
- Never-Ordered Products - easy - In the catalog. Never purchased.
- New Services With Poor Health - hard - New services, already struggling.
- Nodes by Region and Type - medium - Broken down by region. Broken down by type.
- Noisiest Tables by DQ Failures - medium - The tables that fail the most checks.
- Non-Draft Content - easy - Everything except drafts.
- Notification Delivery Ratio - medium - Sent versus delivered. The gap is the problem.
- Did Anyone Actually Read It? - easy - A push isn't a win until a thumb taps it.
- The Vanishing Rows - easy - Some records disappear when the tables meet. Figure out why.
- Oldest Alert per Service - hard - The oldest unresolved alert per service.
- Opened Notifications in Jan-Feb - medium - Two months of push notifications. How many were actually read?
- Peak Activity by Device - easy - Activity windows, device by device.
- Peak Ad Revenue Moment - easy - The single peak earning moment.
- Peak Concurrent Pods - hard - The most pods alive at once.
- Peak Concurrent Tokens - hard - How many tokens were alive at the same time?
- Pipeline Completion Rate - medium - How far do users get through the flow?
- Power Users - medium - Engagement separates tourists from regulars.
- Power Users by Session Activity - medium - More sessions. More time. The power users.
- The Regulars - medium - Past a certain threshold, casual becomes committed.
- Priciest Item in Each Category - medium - The most expensive item per category.
- Production Deploys From April Onward - easy - After the cutoff, how many times did prod get a push?
- Product Name Letter Replace - easy - A quick text transform on product names.
- Product Name Prefix - easy - Just the first three characters. That is all.
- Push Notification Open Rate - medium - Push sent. How many opened?
- The Notification Lifecycle - medium - Sent, opened, ignored. What happened after the alert went out?
- Q2 Search Volume - easy - Q2 search volume. The numbers.
- Quarterly Consolidated Cloud Costs - medium - Quarterly cloud spend, weighted.
- The Relentless Searchers - medium - Most users look once and leave. A few never stop looking.
- Rarest Latency Value - hard - A latency value that appeared exactly once.
- Recurring Error Types - easy - The same errors, recurring.
- Regional Sales Growth QoQ - hard - Quarter-over-quarter growth. Region by region.
- Repeat Buyers Across Halves - medium - First half buyer. Second half buyer. Same person.
- Repeat Purchase Window - medium - The retention squad is looking for repeat purchasers.
- Resolved vs Unresolved Alerts - hard - Resolved versus open. By severity.
- Retargeting Campaign Impressions - easy - Retargeting impressions. All of them.
- Returning Buyers - medium - They came back and bought again.
- Revenue for Specific Users - easy - Alice and bob. Total spend.
- Reviewer Performance Metrics - medium - Some reviewers are thorough. Others are fast.
- Reviewers Per Repo Per Year - medium - Reviewers per repo, year by year.
- Reviews Per Reviewer - easy - The workload split across reviewers.
- Rolling Revenue Average - hard - Smooth out the revenue bumps. The trend matters more.
- Runner-Up Cost Without ORDER BY - medium - The second highest. Without sorting.
- Search Algorithm Rating - hard - How good are the search results?
- Search Terms Starting With G - easy - Queries starting with 'g'.
- Second Highest Cloud Cost - medium - The second biggest bill on record.
- Senior to Junior Ratio - medium - The ratio tells you a lot about the department.
- Server With Most Errors - medium - One server stands out. Not in a good way.
- Services at Median Uptime - medium - Exactly at the median. Not above, not below.
- Service Scorecard - hard - Deploys vs. alerts. One row per service tells the whole story.
- Services With Multi-Quarter Uptime - hard - Multi-quarter uptime streaks.
- Session Count Distribution - hard - How are sessions distributed among the newest users?
- Session-Fit Content - easy - Content that fits the session length.
- Session Overview - medium - Full engagement picture, even for the ones who never showed up.
- Session Page View Distance - hard - Page view distance per session.
- Sessions Per Device Type - easy - Sessions, device by device.
- Shared Category Purchasers - medium - They bought different things from the same aisle.
- Shared Channel Contacts - hard - User networks mapped through messages.
- Shared Endpoints - medium - Shared credentials across endpoints.
- Signups by Age Bucket Since April - easy - Recent signups by age.
- The Compliance Order - easy - Token scopes need to be in the right sequence before the audit.
- The Middle Ground - medium - Strip the outliers from both ends. What does the core actually add up to?
- Symmetric Reply Network - medium - Who replies to whom? Both directions.
- Tables With Many DQ Failures - medium - Some tables have never once passed QA.
- Teams Below Double Average Spend - medium - Teams spending under twice the average.
- The Duplicate Detection Sprint - easy - Same email, different rows. Spot the repeats.
- The February Cohort - easy - One signup window. One cohort. Who joined the club?
- The Legacy Hunt - easy - Old data. Still matters.
- The Podium Finish - medium - Top two products per category.
- The Publishing Audit - easy - Published years ago. Still generating views?
- The Token Census - easy - How many tokens are out there?
- Third Highest Spender - medium - Bronze medal in spending.
- Third Largest Batch Job - easy - Bronze medal in the batch job rankings.
- Threads Excluding User - easy - Every thread they're not part of.
- Three Lowest Distinct Cloud Cost Amounts - easy - The three cheapest bills on record.
- Titles Ending With S - easy - Naming conventions. Specifically the plurals.
- Keys That Never Die - medium - Some API keys have no expiry date at all. That should worry someone.
- Top 10 CPU-Heavy Nodes - medium - The ten hungriest nodes.
- Top 10 Rated Products - medium - The ten highest-rated items.
- Top Active Senders per Channel - medium - Top three messages per channel by replies.
- Top Alert Resolvers - medium - The engineers who resolve the most.
- Top API Caller - medium - One user triggered more API calls than anyone.
- Top API Token Scopes - easy - The highest-value token scopes.
- Top Campaign by Opens - medium - One campaign got all the opens.
- Top Category by User Segment - medium - Each segment has a favorite category.
- Top Chat Contributors - medium - The ten most active chat users.
- Top Cost Entry per Team - medium - The single biggest bill per team.
- Top Framework by Deployments - hard - The framework most often deployed.
- Top Identified Event Types - medium - The top users by events, but only the identifiable ones.
- Top Metric Values - easy - The five highest numbers. No duplicates.
- Top Models by Framework - hard - Every framework has a star model.
- Top Percentile API Tokens - hard - The most suspicious tokens.
- Top Services by Uptime - medium - Uptime is a competition. Which services never blink?
- Total Cost by Category - easy - Total spend per category.
- Total Hours Between Consecutive Events - hard - Hours between state changes.
- Total User Spend - easy - Each customer's total. Summarized.
- Transaction-Only Features - hard - Exclusive to one source. Missing from the other.
- Transaction Overview - easy - The executive snapshot. Users, products, revenue.
- Transaction Revenue by Customer - medium - One month, every customer, every dollar accounted for.
- Transaction Share of User Spend - medium - Each transaction's share of the whole.
- The Named Transaction - easy - Transaction IDs are useless without context. Bring in the product names.
- Trim Endpoints Right - easy - Trailing whitespace. Clean it up.
- Trim Search Terms Left - easy - Leading whitespace. Clean it up.
- Unclicked Searches by Campaign - medium - Searched but never clicked.
- Unique Hosts by Node Type - easy - How many unique hosts per node type?
- Unique Reporters per Content - medium - How many people flagged each item?
- Unique Searchers - easy - How many users actually searched?
- Who's Looking - easy - Every search is a question someone needed answered. Count the people asking.
- Unique Stream Topics - easy - A clean inventory of streaming topics.
- US-East KV Store Entries - easy - KV store inventory. us-east-1.
- User 360 - hard - One row per user. Everything they did, or didn't do.
- User Campaign Overlap Percentage - hard - How much ad overlap between users?
- User Connection Score - hard - Every user has a social score.
- User Devices - medium - Desktop, mobile, tablet. What does each user actually use?
- User Engagement Summary - medium - Sessions plus searches. The full engagement picture.
- Behavioral Range - easy - Power users don't just visit more. They do more things.
- User Sessions on Specific Days - easy - One user. Specific days. What happened?
- Users Per Device Type - easy - Users per device. The split.
- Users Who Clicked Ads - easy - Ad clickers and their account details.
- Users Without Sessions - medium - Account created. Never logged in.
- Users With Purchase Events - easy - At least one purchase. That changes everything.
- Verify Commit ID Uniqueness - easy - Duplicate commit IDs. Are there any?
- View Count Per Page - easy - Every page has visitors. Some just have more.
- Viewer-to-Purchaser Activity - hard - Started as viewers. Became creators.
- Views by Content Type - medium - Count content views broken down by content type
- Weekly Build Status Report - hard - Every CI run, bucketed by week.
- Weekly Transaction Day Split - hard - Transactions by day of week.
- Weekly Transaction Volume - easy - Weekly volume. The pulse.
- Word Count Per Message - medium - How wordy are the messages?
Data Modeling (12)
- A Number for the Seller - easy - They want a total. Give them the right schema first.
- Content Engagement Data Model - hard - Post published. Now measure everything that happens next.
- Event Ticketing System Data Model - easy - JSON in. Reporting warehouse out. Design both ends.
- Food Truck Operations Data Model - medium - Mobile vendor, fixed menu, unpredictable locations.
- Machine Process Event Log Schema - medium - Machines fire events. Pair them up before they bury you.
- Marketplace Sales Warehouse - hard - No schema given. The interviewer is watching.
- Order and Shipment Data Model - medium - Order placed. Now track it to the door.
- The Sales Architecture - medium - Numbers are easy. Making them queryable at scale is the real job.
- Subscription and Payment Data Model - medium - Two user types. Multiple payment methods. One messy billing table.
- The Churner Who Came Back - hard - They cancelled. They came back. The report has to tell both stories correctly.
- The Plan That Changed Twice This Month - medium - Subscribers come, go, downgrade, and share. The schema has to keep up.
- The Transfer Request - medium - Apply, wait, get approved or denied. Track all of it.
Python (76)
- Batch Records - medium - Too many at once. Break them into groups.
- Batch With Metadata - easy - The list gets chopped.
- Column Max - easy - One value rules the column.
- Column Range - easy - From minimum to maximum. What is the spread?
- Column Sum - easy - Add up the column. Every value counts.
- Cumulative Sum - medium - The total grows with every row.
- Diagonal Extract - medium - Not every value sits in a row or column.
- Dictionary Key Intersection - medium - Two dictionaries. What do they share?
- Distribute Values Into Container Types - medium - Round-robin the values. Keep rotating.
- Even Filter - easy - Only the even ones survive.
- Explode List - easy - One row holds many values. Unpack it.
- Find Indices - medium - It is in there somewhere. Where exactly?
- Flatten the Feed - easy - Nested lists, all the way down.
- Full Outer Zip - medium - Two sides. No value left behind.
- Greeting Formatter Class - easy - First impressions are formatted carefully.
- Null Counter - easy - How many holes in the data?
- Portfolio Profit Calculator - medium - Portfolio gain from purchase history and current prices.
- Quality Gate - easy - Not everything passes inspection.
- Quantile Calculator - easy - Mark the boundary value at a given point.
- Rotate Buffer - medium - The buffer is full. Rotate it.
- Run Length Encoding - easy - AAABBB becomes 3A3B. Compress it.
- Sort Descending - easy - Biggest first. No exceptions.
- Subarray Signal - medium - One stretch carries the strongest signal.
- The Anomaly Detector - hard - Spot the outliers before they page someone.
- The Category Ranker - medium - Categories have standing. Rows get theirs.
- The Change Data Capture - hard - Inserts, updates, deletes : all present.
- The Character Encoder - easy - Squeeze a string down to its tightest form.
- The DAG Executor - hard - Wire up a mini pipeline and watch it run.
- The Deep Config - medium - Nested config, dot-notation output.
- The Deep Dive - easy - A specific position in the unsorted pile.
- The Dependency Resolver - medium - Everything depends on everything.
- The Dictionary Inverter - easy - Flip the dict. Group what used to be values.
- The Dominant Signal - easy - Hottest items in the transaction log. Ties included.
- The Email Ranker - medium - Some inboxes see more action.
- The Event Aggregator - medium - Bucket a firehose of events into tidy time windows.
- The Event Bucketer - easy - Logs slotted into buckets.
- The Forward Fill - easy - Patch the gaps in a noisy sensor stream.
- The Gap Filler - easy - Fill the Nones with the last real value.
- The Generous Ones - medium - The generous ones are obvious.
- The Halftime Score - easy - Middle value of a dataset. No built-in shortcuts.
- The Horizon Scanner - medium - For each position, what is coming up ahead?
- The IP Validator - easy - Real and fake, mixed together.
- The Log Pulse - easy - Some lines repeat themselves.
- The Middle Ground - hard - The middle value keeps moving.
- The Nearest Value Mapper - medium - Close enough counts. Ties go low.
- The Numbered Chair - easy - A standing list. Position n holds one entry.
- The One-of-Each - easy - Strip the repeats, keep the originals.
- The One-Way Street - easy - Monotonic time-series. Direction only.
- The Original Keeper - easy - Clean up duplicate events without losing the timeline.
- The Output Peak - hard - One stretch outpaced all the others.
- The Payload Flattener - medium - Turn a deeply nested API response into a flat row.
- The Pipeline Filter - easy - In the door as one thing, out the door as another.
- The Record Reconciler - medium - Two versions of the same truth.
- The Repeat Review - medium - The echo came back.
- The Resume Sifter - medium - Pull what's useful. Skip what you know.
- The Running Total - easy - Each position holds the sum of everything before it.
- The Schedule Cleaner - medium - Overlapping sessions. One clean line.
- The Schema Differ - medium - Schema from yesterday vs today. Something changed.
- The Schema Migrator - hard - Old schema in, new schema out.
- The Sequel Spotter - easy - Spot the sequels hiding in the catalog.
- The Shifting Standard - medium - A benchmark in motion.
- The Social Graph - easy - Everyone knows someone.
- The Spin Doctor - medium - Ninety degrees, but which way?
- The Squeeze - easy - aaabbb gets old fast. Shrink it.
- The Streak Breaker - easy - It has a problem with repetition.
- The Stream Averager - easy - The answer moves with the data.
- The Stream Joiner - hard - Events don't wait for each other. This does.
- The String Shrinker - easy - Compress the string. Shorter wins.
- The Target Hunt - medium - Pairs that hit a target. Every one of them.
- The Throttle Ceiling - medium - Too many requests in too short a timeframe. Throttle it.
- The Throttle Wall - hard - Stop the abusers. Let the rest through.
- The Trade Signal - easy - Buy low, sell high. Identify the ideal moment.
- The Word Mismatch - easy - Some text does not match.
- Transform Column - easy - Same data, new shape.
- Transpose Table - medium - Rows become columns. Columns become rows.
- Value Count - easy - How many of each? Count them.
Pipeline Architecture (2)
- The Decision Before the Door Closes - hard - The window to stop it is smaller than you think.
- The What-If Machine - hard - A million slots. A thousand campaigns. Every combination matters.