# Average Fulfillment Lag

> Ordered, then... waiting.

Canonical URL: <https://datadriven.io/problems/average_fulfillment_lag>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The operations team is measuring fulfillment lag across the customer base. For each user, compute the average number of days between their transaction dates and today, only considering transactions with a positive amount. Present users from the longest average lag to the shortest.

## Worked solution and explanation

### Why this problem exists in real interviews

Fulfillment lag is the canonical date-arithmetic prompt. The interviewer wants to see you treat `julianday('now') - julianday(transaction_date)` as a real expression with a real type (float days), wrap it in `CAST` deliberately, and aggregate per user. They are also checking whether you pin down the reference date and the cohort filter before touching the keyboard.

---

### Break down the requirements

#### Step 1: Pin the cohort

Per-user metric, so `GROUP BY user_id`. The cohort filter is `total_amount > 0`, which goes in `WHERE`, not `HAVING`, since it filters rows not groups.

#### Step 2: Compute days, one row at a time

`julianday('now') - julianday(transaction_date)` gives float days since the transaction. Wrap in `CAST(... AS INTEGER)` to truncate to whole days before averaging.

#### Step 3: Aggregate and rank

`AVG(...)` over the truncated days per user. Then `ORDER BY avg_days DESC` to push the longest lags to the top, matching the operations team's worst-offender view.

---

### The solution

**AVG DAYS SINCE TRANSACTION PER USER**

```sql
SELECT
  user_id,
  AVG(CAST(julianday('now') - julianday(transaction_date) AS INTEGER)) AS avg_days
FROM transactions
WHERE total_amount > 0
GROUP BY user_id
ORDER BY avg_days DESC
```

> **Cost Analysis**
>
> `transactions` is 30M rows partitioned by `transaction_date`, but there is no date predicate here, so every partition gets read. Expect a full scan into a hash aggregate keyed on `user_id` (high cardinality, likely millions of groups). If ops only cares about recent activity, push a `transaction_date >= ?` predicate to prune partitions.

> **Interviewers Watch For**
>
> Ask two things out loud before writing: **as of what date?** (`'now'` resolves at query time, so cached results drift) and **fractional or whole days?** (the `CAST` to INTEGER truncates each row before `AVG`, which is coarser than averaging the raw float). State your answers, then write.

> **Common Pitfall**
>
> Writing `AVG(julianday('now')) - AVG(julianday(transaction_date))` instead of averaging the per-row difference. It happens to give the same number for `AVG`, but the moment you swap to `MIN`, `MAX`, or add a `CAST`, the two diverge. Keep the subtraction inside the aggregate.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- How would you make this reproducible across reruns? _(Probes that you understand `'now'` is evaluated at query time and should be replaced with a bound parameter or snapshot date.)_
- Finance wants fractional days. What changes? _(Tests whether you can explain why the inner `CAST AS INTEGER` is the lossy step and drop it cleanly.)_
- Only users with at least 5 qualifying transactions. Where does that go? _(Checks `HAVING COUNT(*) >= 5` vs `WHERE`, and that you know the filter is on the aggregated group.)_
- Same metric, but bucketed by month of transaction_date. How? _(Probes `GROUP BY user_id, strftime('%Y-%m', transaction_date)` and partition pruning awareness.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/average_fulfillment_lag)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.