# The Middle Ground

> Strip the outliers from both ends. What does the core actually add up to?

Canonical URL: <https://datadriven.io/problems/sum_excluding_extremes>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Given a set of latency values, sum everything that falls strictly between the global minimum and maximum, excluding the extremes themselves. Return the min, the max, and the sum of everything in between.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate writing subqueries that integrate with the outer query. This pattern appears frequently in mid-level SQL rounds where interviewers want to see structured thinking.

> **Trick to Solving**
>
> Finding the 'second highest' or 'runner-up' is a classic pattern. Avoid ORDER BY/LIMIT when precision matters.
> 
> 1. Use a scalar subquery to find the maximum
> 2. Filter the outer query to values strictly less than that maximum
> 3. Take the MAX of the filtered set

---

### Break down the requirements

#### Step 1: Apply the range filter

The WHERE clause restricts rows to the target range. Applying this filter early reduces the volume flowing into downstream operations.

#### Step 2: Use a subquery to find the reference value

The scalar subquery computes a single value (like the maximum) that the outer query filters against. This avoids a self-join.

---

### The solution

**Subquery for global min/max then sum latency between the extremes**

```sql
SELECT (SELECT MIN(latency) FROM api_calls) AS min_latency, (SELECT MAX(latency) FROM api_calls) AS max_latency, SUM(latency) AS sum_between
FROM api_calls
WHERE latency > (SELECT MIN(latency) FROM api_calls) AND latency < (SELECT MAX(latency) FROM api_calls)
```

> **Cost Analysis**
>
> With ~200M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- If multiple rows share the exact minimum latency value, does your query exclude all of them or only one? _(Tests understanding of value-based exclusion; all rows at the min or max value should be excluded.)_
- What does the sum return if all latency values are identical, making min equal to max? _(Tests the degenerate case where nothing falls strictly between equal bounds, yielding NULL or zero.)_
- Could you solve this in a single pass without a subquery using window functions, and what are the trade-offs? _(Tests ability to use MIN/MAX as window aggregates and awareness of full-table scan cost.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/sum_excluding_extremes)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.