# The Other Seat

Canonical URL: <https://datadriven.io/problems/the-other-seat>

Domain: Data Modeling · Difficulty: hard · Seniority: senior

## Problem

We run a ride-hailing marketplace where the same person can sign up to drive and to ride, so the model has to capture both roles without splitting one human into two records. Design the entities for drivers, riders, vehicles, and the trips that connect them, knowing that every completed trip records the fare, the vehicle used, and a separate rating in each direction. A driver may switch vehicles between trips, and a person's displayed rating is the running result of every rating they have received.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a polymorphic identity and junction-fact problem wearing a 'model the driver and the rider' costume. The line everyone walks into: drawing a drivers table and a riders table as two disjoint islands. That choice quietly breaks twice. The same human who drives on Saturday and hails a car on Friday becomes two unrelated rows, and the trip that connects a driver to a rider now needs a foreign key into two different tables depending on direction. The real skill being probed is whether you see one users dimension with role-specific subtypes, and a trip fact that points at that single identity on both sides.

> **Trick to solving**
>
> Ask 'what is the atomic person?' before 'what is a driver?'. The atom is a user. Driver and rider are roles a user plays, not separate species. Once identity is unified, the trip is just a junction fact with two FKs into the same dim_users, and ratings, fares, and vehicles all hang off that trip at trip grain.

---

### Break down the requirements

#### Step 1: Unify the identity first

One dim_users row per human carries the shared attributes: name, phone, signup. Driver-only fields (license, approval status, current vehicle) go in dim_driver_profiles; rider-only fields (default payment) go in dim_rider_profiles. Both subtypes share user_id as their key. A person who does both gets one users row and two profile rows, never two identities.

#### Step 2: Declare the trip grain

fact_trips is one row per completed trip. It carries two foreign keys back to the user identity: driver_user_id and rider_user_id. That single fact expresses the many-to-many between drivers and riders without a bridge table, because the trip itself is the bridge.

#### Step 3: Snapshot what drifts

Fare, surge multiplier, and the vehicle used all change over time. Store them on the trip row. The driver's current vehicle is a separate attribute on the profile; a trip from six months ago must remember the car that was actually driven, not the one the driver uses today.

#### Step 4: Put both ratings on the trip

A trip produces two independent ratings: the driver's rating of the rider and the rider's rating of the driver. Both live on fact_trips. The number shown on a profile is a trailing aggregate of the ratings that person received, computed at read time, never a stored mutable scalar.

---

### The reference model

Below is one defensible design. The anchor is that dim_users is the single identity, the two profile tables are subtypes keyed by user_id, and fact_trips references that identity twice while snapshotting the drifting measures and both directional ratings.

**Reference schema DDL**

```sql
CREATE TABLE dim_users (
    user_id    BIGINT PRIMARY KEY,
    full_name  TEXT,
    phone      TEXT,
    home_city  TEXT,
    signup_at  TIMESTAMPTZ
);

CREATE TABLE dim_vehicles (
    vehicle_id     BIGINT PRIMARY KEY,
    owner_user_id  BIGINT REFERENCES dim_users(user_id),
    make           TEXT,
    model          TEXT,
    plate          TEXT,
    seat_capacity  INT
);

CREATE TABLE dim_driver_profiles (
    user_id            BIGINT PRIMARY KEY REFERENCES dim_users(user_id),
    license_number     TEXT,
    current_vehicle_id BIGINT REFERENCES dim_vehicles(vehicle_id),
    status             TEXT,
    driver_since       DATE
);

CREATE TABLE dim_rider_profiles (
    user_id            BIGINT PRIMARY KEY REFERENCES dim_users(user_id),
    default_payment_id BIGINT,
    rider_since        DATE
);

CREATE TABLE fact_trips (
    trip_id          BIGINT PRIMARY KEY,
    driver_user_id   BIGINT REFERENCES dim_users(user_id),
    rider_user_id    BIGINT REFERENCES dim_users(user_id),
    vehicle_id       BIGINT REFERENCES dim_vehicles(vehicle_id),
    requested_at     TIMESTAMPTZ,
    completed_at     TIMESTAMPTZ,
    fare_amount      NUMERIC,
    surge_multiplier NUMERIC,
    rating_of_driver SMALLINT,
    rating_of_rider  SMALLINT
);
```

*Driver and rider profiles are subtypes keyed by the same user_id, so one human is one identity. fact_trips is one row per completed trip with two FKs into that identity, plus snapshots of the vehicle, fare, surge, and both directional ratings.*

> **Why this works**
>
> Identity is modeled once, so a human who drives and rides reconciles to one user_id and every trip on both sides rolls up to the same person. The trip fact carries the relationship and the drifting measures, so revenue and equipment stay correct under change, and ratings stay queryable as aggregates.

> **Interviewers watch for**
>
> The strong candidate asks 'can a driver also be a rider?' in the first minute and unifies identity before drawing a single table. They volunteer that the displayed rating is a derived aggregate, and they snapshot the vehicle onto the trip without being prompted. Weak candidates build two parallel tables and then stall when asked how a trip joins them or how the same person appears in both.

> **Common pitfall**
>
> Storing a single avg_rating column on the driver profile and overwriting it after each trip. It loses the per-trip history, cannot express the trailing-100 window, and makes the driver-side and rider-side ratings impossible to separate. A second trap is joining old trips to driver_profiles.current_vehicle_id, which re-attributes every historical trip to the car the driver happens to use now.

> **How it scales**
>
> At about 25 million trips per day, fact_trips dominates and the dimensions are comparatively tiny. Partition fact_trips by completed_at (daily) so history queries and rating windows prune cleanly, and cluster by driver_user_id or rider_user_id depending on the hotter read. The subtype dimensions stay small enough to broadcast in joins, so a rider's trip history is a single partition-pruned scan.

---

### The analysis pattern

**Each driver's current rating as the trailing average of ratings received**

```sql
SELECT
    u.user_id,
    u.full_name,
    ROUND(AVG(t.rating_of_driver), 2) AS driver_rating,
    COUNT(*) AS rated_trips
FROM fact_trips t
JOIN dim_users u ON u.user_id = t.driver_user_id
WHERE t.rating_of_driver IS NOT NULL
  AND t.completed_at >= NOW() - INTERVAL '180 days'
GROUP BY u.user_id, u.full_name
ORDER BY driver_rating DESC
```

*The rating is derived from trip-grain ratings, never stored as a mutable column.*

---

### Trade-offs and alternatives

**Unified users + role subtypes**

One dim_users, two profile subtypes, trip references identity twice.

* Same human is one identity across both roles
* Trip join is symmetric and obvious
* Slight extra join to reach role-specific fields

**Separate drivers and riders tables**

Disjoint dim_drivers and dim_riders, each with its own key.

* Simpler to draw on minute one
* A dual-role human becomes two rows that never reconcile
* The trip needs different FK targets per direction and dedupe across roles becomes a project

---

## Common follow-up questions

- A driver gets suspended and later reinstated. How do you let analysts see a trip's driver status as of the trip date? _(Tests SCD Type 2 on the driver profile or a status_events log versus a single current-status column.)_
- The product team wants to support a trip with multiple riders sharing one car. How does fact_trips change? _(Tests whether the candidate introduces a trip-to-rider bridge or re-grains the fact to one row per rider per trip.)_
- Surge pricing is recomputed every few seconds. Where does the surge value the rider actually paid get pinned? _(Tests snapshot-on-fact discipline versus joining to a volatile pricing table at read time.)_
- How would you compute a person's rating using only the last 100 ratings they received, efficiently, at this scale? _(Tests windowed aggregation and whether a periodic snapshot table is warranted over recomputing on every read.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the-other-seat)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.