# Event Ticketing System Data Model

> JSON in. Reporting warehouse out. Design both ends.

Canonical URL: <https://datadriven.io/problems/event_ticketing_system_data_model>

Domain: Data Modeling · Difficulty: easy · Seniority: L3

## Problem

We run an IT helpdesk platform. Users submit support tickets, which are assigned to agents. Tickets go through multiple status changes before being resolved. SLA compliance is critical: P1 tickets must be resolved within 4 hours, P2 within 24 hours. Design the schema, and describe how you would load data from a JSON API feed into it.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether a candidate can recognize that **SLA compliance is a history problem, not a current-state problem**. A status column on `tickets` loses the minute the ticket transitions; the answer lives in an immutable status event log paired with a priority dimension.

> **Trick to Solving**
>
> Before drawing tables, a strong candidate asks: do we need to measure time spent in each status, and does SLA depend on priority? The signal here is 'SLA breach for P1 vs P2,' which demands both a status history and priority modeled as a dimension with thresholds.
> 
> 1. Keep `tickets` as the header with current status cached
> 2. Append status transitions to `ticket_status_events`
> 3. Make priority a dimension with SLA hours per tier
> 4. Compute SLA breach as a query over events

---

### Break down the requirements

#### Step 1: Separate the header from the history

`tickets` is the stable ticket identity. `ticket_status_events` is the append-only log of status changes. The current status on `tickets` is a cached convenience, not the source of truth.

#### Step 2: Model priority as a dimension

`priority_sla` carries `(priority, resolution_sla_hours, response_sla_hours)`. P1 and P2 are rows, not strings scattered across the ticket table, so changing the SLA thresholds is a single update.

#### Step 3: Append status changes

Every status transition (new, assigned, in-progress, waiting, resolved) is a new row on `ticket_status_events`. SLA breach time can be derived from the first 'resolved' event compared with the create time plus the SLA hours from the priority dimension.

#### Step 4: Keep agents and users as conformed dimensions

Both are first-class dimensions. `tickets` carries the current assigned agent; historical reassignments live in the event log.

---

### The solution

Below is one defensible design: a stable ticket header, an append-only status event log, and a priority dimension that encodes the SLA contract.

> **Why this works**
>
> SLA breach is computed, not stored. The first resolved event minus the create time, compared to the priority dimension's hours, yields the answer for any point in history. The trade-off is one extra join on the priority dimension in every SLA query.

> **Interviewers watch for**
>
> A strong candidate proposes the append-only history before being asked and names the priority dimension as the SLA contract. They also note that `tickets.current_status` is a denormalization for read speed. Weak candidates mutate `tickets.status` and then cannot measure time in each status without scanning an audit table that does not exist.

> **Common pitfall**
>
> Hardcoding SLA hours in a CASE expression. Business hours change, and every dashboard now carries stale constants that drift from the real SLA contract.

---

### The analysis pattern

**SLA breach rate by priority for the last week**

```sql
WITH resolved AS (
    SELECT
        t.ticket_id,
        t.priority,
        t.created_at,
        MIN(e.event_ts) AS resolved_at
    FROM tickets t
    JOIN ticket_status_events e ON e.ticket_id = t.ticket_id
    WHERE e.to_status = 'resolved'
      AND t.created_at >= CURRENT_DATE - INTERVAL '7 days'
    GROUP BY t.ticket_id, t.priority, t.created_at
)
SELECT
    r.priority,
    COUNT(*) AS resolved_count,
    SUM(CASE WHEN r.resolved_at > r.created_at + MAKE_INTERVAL(hours => p.resolution_sla_hours) THEN 1 ELSE 0 END) AS breached
FROM resolved r
JOIN priority_sla p ON p.priority = r.priority
GROUP BY r.priority
```

---

### Trade-offs and alternatives

**Header plus append-only event log**

Clean relational queries, indexable events, easy SLA computation. Cost: every status change is two writes (event plus cached current_status).

**Single tickets table with status history JSON**

One row per ticket, history nested inside a JSONB column. Cost: history queries rely on JSONB functions, window queries are awkward, and indexing the nested events is expensive.

---

## Common follow-up questions

- How do you handle business hours vs. calendar hours in the SLA calculation? _(Tests whether the priority dimension carries a business calendar reference.)_
- What if a ticket is reopened after being resolved? _(Tests whether the SLA clock restarts or carries the cumulative in-status duration.)_
- How would you detect agents who flip status fast to game SLA metrics? _(Tests anomaly detection on status event velocity.)_
- How do you compute average time in each status per agent? _(Tests window functions over ticket_status_events.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/event_ticketing_system_data_model)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.