# Average Event Progression Time

> How fast do users move through the funnel?

Canonical URL: <https://datadriven.io/problems/average_event_progression_time>

Domain: SQL · Difficulty: hard · Seniority: L4

## Problem

The onboarding team believes users who move quickly between events are more likely to convert, but they have no data to prove it. For every user with more than one event, compute the average number of seconds between consecutive events so the team can segment fast movers from slow ones. Return each qualifying user and their average gap, ordered by user ID.

## Worked solution and explanation

### Why this problem exists in real interviews

Onboarding teams hand you this when they suspect velocity drives conversion. The interviewer wants to see if you reach for a window function on a 300M-row event log, or quietly write a self-join that would melt the warehouse. They also want to hear you say `PARTITION BY user_id` out loud, because forgetting it computes gaps across strangers.

---

### Break down the requirements

#### Step 1: Define the gap

For each `event_data` row, the gap is `event_timestamp` minus the previous `event_timestamp` for that same `user_id`. Reach for `LAG(event_timestamp) OVER (PARTITION BY user_id ORDER BY event_timestamp)`.

#### Step 2: Drop the first event per user

`LAG` returns NULL on the first row of each partition. Filter `WHERE prev_timestamp IS NOT NULL` so it doesn't poison the average.

#### Step 3: Average per user

`GROUP BY user_id` and `AVG((JULIANDAY(event_timestamp) , JULIANDAY(prev_timestamp)) * 86400)` to land in seconds. Multiply by 86400 because Julian day math returns fractional days.

#### Step 4: Enforce the more-than-one-event filter

Users with a single event have zero non-NULL gaps and disappear from the `GROUP BY` naturally. No extra `HAVING COUNT(*) > 1` needed once you filter NULLs upstream.

#### Step 5: Order for the team

`ORDER BY user_id` at the outer layer. The window sort is internal to the partition, not the final output.

---

### The solution

**PER-USER AVG GAP VIA LAG**

```sql
WITH event_order AS (
  SELECT
    user_id,
    event_timestamp,
    LAG(event_timestamp) OVER (
      PARTITION BY user_id ORDER BY event_timestamp
    ) AS prev_timestamp
  FROM event_data
),
user_gaps AS (
  SELECT
    user_id,
    AVG((JULIANDAY(event_timestamp) , JULIANDAY(prev_timestamp)) * 86400)
      AS avg_progression_seconds
  FROM event_order
  WHERE prev_timestamp IS NOT NULL
  GROUP BY user_id
)
SELECT user_id, avg_progression_seconds
FROM user_gaps
ORDER BY user_id;
```

> **Cost Analysis**
>
> 300M rows. The window does one partition-sort per `user_id`, so O(N log N) total, single pass. A self-join on `event_data e1 JOIN event_data e2 ON e1.user_id = e2.user_id AND e2.ts > e1.ts` is O(N^2) per user and will OOM. Partition pruning on `event_timestamp` still applies if the team scopes a date range.

> **Interviewers Watch For**
>
> Ask aloud: 'consecutive in `event_timestamp` order, or in `event_type` workflow order?' The prompt says timestamp, but voicing the ambiguity scores points. Also ask whether duplicate timestamps for the same `user_id` should be deduped before LAG, because tied timestamps make ordering nondeterministic.

> **Common Pitfall**
>
> Omitting `PARTITION BY user_id` and only writing `ORDER BY event_timestamp`. The window then crosses users, so a user's first event takes its `prev_timestamp` from some other user's last event. The query runs, returns numbers, and is entirely wrong.

> **The Elegant Move**
>
> `JULIANDAY(a) , JULIANDAY(b)` returns days as a float. Multiply by 86400 once, inside `AVG`, and you never juggle units again. Avoid `strftime('%s', ...)` arithmetic; it truncates to integer seconds and biases short gaps downward.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you compute the median gap per user instead of the mean? _(Probes whether you know `PERCENTILE_CONT` or a `NTILE`/`ROW_NUMBER` trick, and that AVG hides outliers in onboarding funnels.)_
- What changes if you only care about gaps between specific `event_type` pairs, like signup to first_action? _(Tests whether you switch to `LAG(event_timestamp) FILTER (WHERE event_type IN (...))` or a pre-filtered CTE, and how that interacts with the partition.)_
- How would you make this incremental so you don't rescan 300M rows nightly? _(Checks pipeline judgment: per-user running sum + count stored in a state table, updated from the day's new events.)_
- Two events with the same `event_timestamp` for one user. What does your query do? _(Tests awareness that `ORDER BY event_timestamp` alone is nondeterministic; you need a tiebreaker like `event_id`.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/average_event_progression_time)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.