# The Heat of the Map

Canonical URL: <https://datadriven.io/problems/surge-at-every-cell>

Domain: Data Modeling · Difficulty: hard · Seniority: senior

## Problem

Grab's dispatch team runs a live map showing, for every geohash cell in a city, a surge multiplier driven by how far open requests outrun available drivers and a congestion score comparing observed speed to the road's free-flow speed, refreshed each minute. The schema backs both the always-on map, which needs only the current value in each cell, and analysts who replay how surge built up across a city over the past week. The cells nest, so a coarse cell's surge has to reconcile against the finer cells inside it and up to the one city each cell sits in, and you cannot get there by averaging one cell's ratio with another's.

## Worked solution and explanation

### Why this problem exists in real interviews

Underneath the surge-map costume, this is a semi-additive measure problem fused with a current-state-versus-history split. Surge and congestion are ratios, and ratios do not roll up: you cannot average the surge of forty finer cells to get the surge of the cell that contains them. The candidate who stores only the multiplier paints themselves into a corner the moment the city overview asks for one number per district, and the candidate who serves the live map off the raw event stream ships a dashboard that scans billions of rows to find the latest minute.

> **Trick to Solving**
>
> Two recognitions crack this. First, store the ingredients (requests, drivers, observed speed, free-flow speed), not just the factors, so a coarse cell is recomputed from its children's numerators and denominators. Second, split the serving path: a tiny current-state table for the map, a periodic-snapshot fact for replay. The map never touches the history table.

---

### Break down the requirements

#### Step 1: Declare the grain out loud

One row in fact_cell_metrics is one geohash cell in one one-minute bucket. That is a periodic snapshot, not an event log. Fixing the grain is what makes surge across cells comparable and lets replay scan a clean time series instead of reconstructing state from a firehose of demand events.

#### Step 2: Store ingredients, never bare ratios

The prompt defines surge as open_requests over available_drivers and congestion as observed_speed over free_flow_speed, so keep all four columns. When a precision-5 cell needs a value, you sum the children's requests and drivers and divide once. Sum the ratios instead and the number is meaningless.

#### Step 3: Split the map from the replay

current_cell_state holds exactly one row per active cell, updated in place each minute. The live map reads it by geohash in single-digit milliseconds. fact_cell_metrics keeps every minute for the 7-day replay window. Same measures, two grains, two access patterns.

#### Step 4: Model the nesting and the city tie in the dimension

Geohash prefixes nest by construction. dim_geohash stores parent_geohash and precision so a self-reference handles rollup without a separate bridge table, and it carries a city foreign key so the overview resolves any cell to its city straight from the dimension instead of routing through the fact tables.

---

### The reference model

Below is one defensible design. The anchor is that fact_cell_metrics is the immutable per-minute history and current_cell_state is a derived projection holding only the latest minute, both reading the same conformed geohash and city dimensions. The city key sits on dim_geohash so a cell resolves to its city even before you touch a fact row.

> **How It Scales**
>
> At 40k active precision-7 cells per minute across 30 cities, fact_cell_metrics grows about 1.2M rows per minute, so partition it by minute_bucket (or by day with minute sub-clustering) to keep the 7-day replay pruning tight. current_cell_state stays around 1.2M rows total, one per active cell, so the map read is an index seek by geohash, not a scan. The rollup to precision-5 sums children within a partition, never across the whole history.

> **Interviewers Watch For**
>
> A strong candidate says 'semi-additive' and 'periodic snapshot' unprompted, and justifies the second serving table with one sentence about the latest-per-cell scan cost. They state the grain before drawing a single table, they put a city key on the geohash dimension so a cell resolves to its city without a fact join, and they catch that the city overview cannot average child surges.

> **Common Pitfall**
>
> Storing only surge_factor and congestion_factor, or collapsing the inputs into a single demand_count instead of keeping both the request and driver sides. The dashboard works in the demo, then the first city-level rollup divides by the wrong denominator and reports a surge nobody can reproduce. A close second: pointing the live map at the raw event log and watching every redraw re-derive the latest state. A subtler one: scoping cells to cities only through the fact tables, so the geohash dimension alone cannot tell you which city a cell belongs to.

---

**Ingredients plus derived factors**

Keep requests, drivers, and both speeds on the snapshot.

* Coarse cells recompute correctly from children
* Slightly wider fact rows
* Ratios stay auditable against their inputs

**Bare ratios only**

Store only surge_factor and congestion_factor.

* Narrow rows, simple at the finest grain
* Rollups to precision-5 and city are wrong
* No way to reconcile a disputed multiplier

---

## Common follow-up questions

- After 7 days you only keep 5-minute rollups. How do you compress the per-minute fact into that without breaking the ratio reconciliation? _(Tests whether the candidate rolls up the ingredients and recomputes the factors, rather than averaging stored ratios.)_
- A late traffic event arrives 3 minutes after its bucket closed. How does the snapshot and the current-state table handle it? _(Tests late-arriving data handling and whether current_cell_state guards against an out-of-order overwrite.)_
- Product wants to A/B two surge formulas in the same city at once. What changes in the model? _(Tests whether the candidate adds a formula-version dimension rather than overloading the existing measures.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/surge-at-every-cell)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.