# 7-Check Rolling Average

> Seven entries hold the trend.

Canonical URL: <https://datadriven.io/problems/7_check_rolling_average>

Domain: SQL · Difficulty: medium · Seniority: L5

## Problem

The platform reliability team monitors latency trends per service. For each service's health check history, compute a 7-check rolling average of latency using the current check and the 6 checks immediately before it, ordered by check timestamp. Return the service name, check timestamp, raw latency, and the rolling average.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes **window functions**, combined with **window frame clauses**. The combination forces candidates to reason about query structure rather than rely on memorized templates.

> **Trick to Solving**
>
> A "rolling" or "moving" average requires a **window frame clause**. Without `ROWS BETWEEN ... AND CURRENT ROW`, the default frame includes all rows in the partition.
> 
> 1. Identify the window size (7 rows)
> 2. Use `ROWS BETWEEN 6 PRECEDING AND CURRENT ROW`
> 3. Partition by the grouping column, order by the time column

---

### Break down the requirements

#### Step 1: Identify the window dimensions

The prompt specifies a 7-check window. The `ROWS BETWEEN 6 PRECEDING AND CURRENT ROW` frame captures exactly 7 rows.

#### Step 2: Partition by `svc_name`

`PARTITION BY svc_name` restarts the window for each group.

#### Step 3: Compute AVG inside the window

`AVG(latency)` over the frame gives the rolling mean. Rows with fewer than 7 predecessors use whatever is available.

---

### The solution

**7-row rolling average with window frame**

```sql
SELECT
    svc_name,
    checked,
    latency,
    AVG(latency) OVER (
    PARTITION BY svc_name
    ORDER BY checked
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_avg
FROM svc_health
ORDER BY    svc_name, checked
```

> **Cost Analysis**
>
> The main table has 5M rows. Partitioned on `checked`, so queries filtering on that column skip most partitions. The window function runs after grouping, so it operates on the reduced result set.

> **Interviewers Watch For**
>
> Interviewers check whether the candidate chooses the right window function variant and can articulate why.

> **Common Pitfall**
>
> Omitting the `ROWS BETWEEN` frame clause changes the window default, which can silently include all partition rows instead of just the lookback period.

---

## Common follow-up questions

- What if the data volume increased 10x? _(Tests scaling awareness: would the query plan change, and where is the bottleneck?)_
- How would you handle this if the data arrived as a streaming feed? _(Tests whether the candidate can think beyond batch SQL toward incremental computation.)_
- Could you rewrite this using a CTE instead of a subquery, or vice versa? _(Tests fluency with query refactoring and understanding that CTEs and subqueries are often interchangeable.)_
- What is the difference between ROWS and RANGE in a window frame? _(ROWS is physical (row count), RANGE is logical (value-based). They differ when duplicates exist.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/7_check_rolling_average)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.