# Half a Million Rental Cars

> Every vehicle is reporting. Every rental matters.

Canonical URL: <https://datadriven.io/problems/half_a_million_rental_cars>

Domain: Pipeline Design · Difficulty: medium · Seniority: L5

## Problem

We operate a fleet of 500,000 rental vehicles across thousands of locations globally, with each vehicle emitting continuous telematics data alongside rental transactions from our reservation system and maintenance events logged by service centers. Our operations team currently has no unified view across these three sources and is flying blind on fleet utilization and vehicle health. Design the end-to-end data pipeline and the warehouse architecture that serves it.

## Worked solution and explanation

### Why this problem exists in real interviews

Three consumers (ops, rentals, maintenance) with three freshness budgets, vehicle attributes that change over time so historical rentals need point-in-time joins, and zone / parked alerts that have to fire fast. The trap is one shared store at the slowest budget or losing point-in-time when the vehicle's depot changes.

The default reach is one nightly batch into a shared warehouse. Ops sees yesterday's positions and dispatches off stale data. Historical rentals join to today's vehicle attributes and report depot information that's wrong for that rental's date. Zone and parked alerts run as nightly scans and ops finds out about a vehicle in a restricted zone the next morning.

> **Trick to Solving**
>
> Streaming for ops, batch for rentals and maintenance, vehicle attributes as a slowly-changing dimension, stream-side detection for zone and parked alerts.
> 
> 1. Telematics streams to ops within minutes; rental and maintenance facts batch on slower cadences.
> 2. Vehicle attributes (depot, damage, odometer) live as a slowly-changing dimension keyed on (vehicle_id, valid_from, valid_to); rental facts join on rental_date BETWEEN valid_from AND valid_to.
> 3. Zone and parked detection run on the streaming consumer; alerts fire to ops within minutes when a state changes.

---

### Walk the requirements

#### Step 1: Three consumers, three cadences off one telemetry stream

Telematics flows into a streaming consumer that updates the ops live store within minutes; rental and maintenance facts read from batches on hourly and daily cadences. Without two cadences either ops is on a slow path or rental and maintenance pay streaming compute they don't need.

#### Step 2: Vehicle attributes as a slowly-changing dimension for historical rentals

Vehicles change depot, accumulate damage, rack up odometer readings. The vehicle dimension is keyed on (vehicle_id, valid_from, valid_to); each change writes a new row. Rental facts join on rental_date BETWEEN valid_from AND valid_to so a historical rental reports the depot, damage status, and odometer as they were on that date. Joining to today's attributes is the version that silently rewrites history every time a vehicle changes depots; the SCD plus point-in-time join is the contract.

#### Step 3: Zone and parked detection on the stream, alerts to ops

A stream-side detection compares each vehicle's position against zone boundaries and tracks idle time. When a vehicle enters a restricted zone during a rental, leaves where it's supposed to be, or sits parked too long, an alert fires to ops within minutes. A 'nightly scan' is the version where ops finds out about a vehicle in a restricted zone the next morning; stream-side detection is what makes the alert actionable.

---

### The shape that fits

> **What this design gives up**
>
> Three consumer paths cost more than one shared store; the SCD grows the vehicle dimension over time and the point-in-time join is more expensive than equi; stream-side detection holds state per vehicle. Implementation cost is the price; the win is ops within minutes, historical rentals that report what was true then, and zone alerts that fire fast enough to act on.

> **What reviewers check**
>
> A reviewer looks at the canvas for these properties:
> - A streaming path serves ops within minutes; rental and maintenance batch on slower cadences.
> - Vehicle attributes are a slowly-changing dimension; rental facts join on rental-date for point-in-time correctness.
> - Zone and parked detection run on the streaming consumer with alerts to ops.
> - A unified warehouse anchors the cross-consumer view.

> **The mistake that ships**
>
> What gets shipped runs one nightly batch into a shared warehouse. Ops dispatches off stale data; historical rentals join to today's vehicle attributes; zone alerts run nightly. The eventual rebuild adds the streaming ops path, the SCD vehicle dimension, and the stream-side zone detection.

---

## Common follow-up questions

- A vehicle's depot changes mid-rental. What does this design report for that rental, and what does ops see? _(Tests whether the candidate sees the SCD writing a new row at the depot change; the rental fact joins by rental_date, so reports for the rental's start date show the prior depot. Ops's live view shows the current depot from the streaming store. The two views answer different questions correctly.)_
- A vehicle's idle timer fires while it's actually being serviced. What in this design avoids the false positive? _(Tests whether the candidate sees the maintenance events feeding the zone detector's state: a vehicle in active maintenance suppresses the idle alert. The zone detector reads a small piece of cross-stream state to avoid alerting on legitimate idle time.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/half_a_million_rental_cars)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.