# Toll Road Sensor Analytics

> Cars enter, cars exit. Except when they don't.

Canonical URL: <https://datadriven.io/problems/toll_road_sensor_analytics>

Domain: Data Modeling · Difficulty: easy · Seniority: L4

## Problem

We operate a toll road network with 200 sensor-equipped lanes across 15 locations. Each sensor captures license plate reads, timestamps, and lane metadata. We need to compute real-time traffic volume, average transit times between checkpoints, and flag anomalies for toll evasion detection. Design the data model.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether you can collapse a pair of raw sensor events into a single business-meaningful entity: the crossing. The signal is declaring the grain as one row per crossing rather than mirroring the raw entry and exit events, which makes duration a subtraction instead of a window function.

> **Trick to Solving**
>
> Before drawing any tables, a strong candidate asks: "what is the business fact here, an event or a crossing?" The business cares about crossings. Sensor events are raw input; the model persists the paired result.
> 
> 1. Collapse entry and exit events into one crossing row
> 2. Store both timestamps so duration is trivial
> 3. Keep vehicles as a dimension keyed by license plate
> 4. Capture direction and lane as crossing attributes

---

### Break down the requirements

#### Step 1: Declare the grain as the crossing

One row per crossing, not one row per sensor read. Duration is `exit_time - entry_time`, no window function needed.

#### Step 2: Keep vehicles as a separate dimension

`vehicles` holds license plate and vehicle class. Repeated reads of the same plate resolve to one vehicle row, which enables frequency analysis.

#### Step 3: Capture direction and lane on the crossing

Direction is an attribute of the crossing, derivable from the sensor pair. Storing it avoids recomputation on every query.

#### Step 4: Use the crossing for anomaly detection

Transit time below a physical minimum flags potential evasion. The stored duration makes this a simple filter.

---

### The solution

Below is one defensible model. The conceptual anchor is collapsing raw sensor events into a crossing entity with both timestamps stored.

> **Why this design holds up**
>
> Collapsing the event pair at ingestion keeps query patterns simple and cheap. Volume, latency, and evasion queries are all single-table scans. The vehicle dimension lets frequency and repeat-offender questions work with a join.

> **What strong candidates do**
>
> They distinguish raw events from the business fact. They store the paired timestamps rather than computing duration on every query. They name the matching rule: same vehicle, ordered sensors, within a time window.

> **Red flags to avoid**
>
> Storing raw sensor reads as the canonical fact forces every query into a window function. Using license plate as the primary key couples the model to source quality. Omitting the matching rule lets unmatched entries pollute transit-time averages.

---

### The analysis pattern

**Hourly volume and average transit time by direction**

```sql
SELECT
    DATE_TRUNC('hour', entry_time) AS hour,
    direction,
    COUNT(*) AS crossings,
    AVG(EXTRACT(EPOCH FROM (exit_time - entry_time))) AS avg_transit_sec
FROM crossings
WHERE entry_time >= NOW() - INTERVAL '24 hours'
GROUP BY 1, 2
```

---

### Trade-offs and alternatives

**Paired crossing fact**

Duration is a subtraction. Queries are single-table. Late-arriving exits need a matching job that updates pending rows.

**Raw sensor event log**

Every sensor read is a row. No matching job at ingestion. Every duration and volume query reconstructs pairs with a window function.

---

## Common follow-up questions

- What if an exit event arrives seconds after the daily partition boundary? _(Tests late-arriving event handling and pending-crossing updates.)_
- How would you flag a vehicle with a transit time below the physical minimum? _(Tests whether anomaly detection is a filter on stored duration.)_
- What is the matching rule that pairs entry to exit events? _(Tests explicit statement of the join on vehicle and sensor order within a time window.)_
- How would you handle two vehicles with the same license plate read in different lanes? _(Tests data-quality handling and whether lane disambiguates.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/toll_road_sensor_analytics)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.