# Status Report

> Where are orders getting stuck?

Canonical URL: <https://datadriven.io/problems/status_report>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The operations team suspects that too many orders are sitting in non-terminal states. For each order status, they want to see the number of orders and the average profit. Skip any orders that have no status on record. Only include statuses that have at least five orders behind them, and list them from most orders to fewest.

## Worked solution and explanation

### Why this problem exists in real interviews

This order processing problem uses the `orders` table to evaluate HAVING for post-aggregation filtering. Watch how the `status` column interact in the grouping and filtering logic.

---

### Break down the requirements

#### Step 1: Filter out null values

Exclude rows where `status` is NULL. This prevents nulls from polluting aggregations or creating phantom groups.

#### Step 2: Aggregate by `status`

`GROUP BY status` collapses rows to one per group. The aggregate functions (`SUM`, `COUNT`, `AVG`, etc.) compute the metric for each group.

#### Step 3: Filter groups with HAVING

HAVING applies after GROUP BY, filtering out groups that do not meet the threshold. This cannot be done in WHERE because the aggregate has not been computed yet.

#### Step 4: Sort the final output

The `ORDER BY` clause ensures the result appears in the expected sequence. Interviewers check that the sort direction matches the prompt.

---

### The solution

**Having filter for status report**

```sql
SELECT status, COUNT(*) AS order_count, AVG(profit) AS avg_profit
FROM orders
WHERE status IS NOT NULL
GROUP BY status
HAVING COUNT(*) >= 5
ORDER BY order_count DESC
```

> **Cost Analysis**
>
> With ~500K rows, the GROUP BY reduces the working set before any downstream operations. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use HAVING (not WHERE) to filter after aggregation; how you handle NULL values and whether you account for them in filters and aggregations.

> **Common Pitfall**
>
> Putting the aggregate condition in WHERE instead of HAVING causes a syntax error. WHERE runs before GROUP BY; HAVING runs after.

---

## Common follow-up questions

- The `status` column in `orders` has a 4% null rate. How does your query handle rows where `status` is NULL, and could that silently change the result count? _(Tests whether the candidate accounts for NULLs in `orders.status` and understands how aggregates skip NULL values.)_
- If `orders` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `order_id`? _(Tests ability to identify performance hotspots related to `orders.order_id` at scale.)_
- If the HAVING threshold in your query changed from a fixed number to a percentile, how would you restructure the query? _(Tests ability to replace static HAVING filters with dynamic subquery-based thresholds.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/status_report)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.