# Trending Dishes Dashboard

> What's everyone eating? The answer changes hourly.

Canonical URL: <https://datadriven.io/problems/trending_dishes_dashboard>

Domain: Data Modeling · Difficulty: medium · Seniority: L5

## Problem

Product wants a dashboard showing what dishes are trending in each city right now. Think: 'pad thai is up 40% in Austin this week.' We need the data model to power this, and the dashboard refreshes every 15 minutes.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether you can balance atomic grain against pre-aggregation for a low-latency dashboard. The signal is two decisions: normalizing dish names into `dim_dishes` so counts do not fragment across menu variants, and materializing `agg_dish_city_hourly` so a 15-minute refresh never scans raw orders.

> **Trick to Solving**
>
> Before drawing any tables, a strong candidate asks: "what is the refresh budget and the worst-case query volume, and can I afford raw scans?" A 15-minute refresh at 10M orders per day means pre-aggregation is the only option. The dish dimension matters because the same dish lives under many names in raw order data.
> 
> 1. Confirm refresh SLA and raw volume
> 2. Normalize dish names into `dim_dishes`
> 3. Pre-aggregate at (dish, city, hour) grain
> 4. Store trending ratios as a separate materialization

---

### Break down the requirements

#### Step 1: Normalize dish names with dim_dishes

Restaurants call the same thing by different names. A canonical dimension with a `canonical_name` and synonyms map prevents trending signals from fragmenting.

#### Step 2: Keep fact_orders at order grain

The atomic fact remains one row per order. This is the source of truth for every downstream rollup and the input to the aggregation job.

#### Step 3: Pre-aggregate at dish-city-hour

`agg_dish_city_hourly` is built on a schedule. A 15-minute refresh reads this table, never the raw fact.

#### Step 4: Materialize trending_scores

Trending ratio is `recent_count / baseline_count`. Storing the computed values with `computed_at` lets the dashboard read a single row per dish and city.

---

### The solution

Below is one defensible model. The conceptual anchor is the pre-aggregation layer: raw orders feed `agg_dish_city_hourly`, which feeds `trending_scores`.

> **Why this design holds up**
>
> The atomic fact stays available for ad hoc analysis while the dashboard reads a compact materialization. The dish dimension unifies menu variants into one canonical signal. The aggregation layer pays for itself within the first refresh.

> **What strong candidates do**
>
> They ask about refresh budget before drawing boxes. They recognize the dish normalization problem without being told. They explain why the atomic fact still exists: for backfills, incident debugging, and future metrics.

> **Red flags to avoid**
>
> Computing trending scores from raw orders at query time cannot meet a 15-minute SLA. Collapsing the atomic fact into the aggregate blocks backfills and new metrics. Skipping dish normalization fragments trending signals across menu synonyms.

---

### The analysis pattern

**Currently trending dishes per city**

```sql
SELECT
    d.canonical_name,
    c.city_name,
    t.trending_ratio
FROM trending_scores t
JOIN dim_dishes d ON d.dish_sk = t.dish_sk
JOIN dim_cities c ON c.city_sk = t.city_sk
WHERE t.computed_at = (SELECT MAX(computed_at) FROM trending_scores)
  AND t.trending_ratio > 1.2
ORDER BY c.city_name, t.trending_ratio DESC
```

---

### Trade-offs and alternatives

**Pre-aggregation layer plus canonical dish dimension**

Refresh SLA is achievable. Atomic fact remains for backfills. Requires an aggregation job and a dish resolution pipeline. Storage cost is low relative to raw orders.

**On-demand query over raw orders**

No aggregation job to maintain. Simpler to reason about. Cannot meet a 15-minute refresh at high volume without a columnar engine and aggressive caching.

---

## Common follow-up questions

- How would you handle a new menu item that has no synonym mapping yet? _(Tests fallback logic when `dim_dishes` has no canonical match.)_
- What is the exact definition of `recent_count` and `baseline_count`? _(Tests whether trending windows are explicit and consistent.)_
- How would you backfill `agg_dish_city_hourly` if a dish synonym mapping changes retroactively? _(Tests reprocessability of the aggregation layer.)_
- How would you extend the dashboard to support a 1-minute refresh during peak hours? _(Tests pre-aggregation cadence and potential stream processing.)_
- How would you prevent a single viral order spike from skewing trending ratios? _(Tests smoothing, floor thresholds on baseline counts, and outlier handling.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/trending_dishes_dashboard)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.