# Second Highest Value

> Almost the top. Not quite.

Canonical URL: <https://datadriven.io/problems/second_highest_value>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Find the second-highest metric value across the entire employee metrics table. Multiple employees might share the top spot; the second-highest is the next unique value below them. Return a single number.

## Worked solution and explanation

### Why this problem exists in real interviews

This challenge asks you to apply DENSE_RANK for gapless ranking to the `employee_metrics` table, simulating a real analytics workflow. Pay attention to the `metric_value` column as they drive the aggregation and output.

---

### Break down the requirements

#### Step 1: Apply the WHERE filter

Filter rows before any aggregation. This ensures only qualifying data enters the computation, keeping the result correct and the scan minimal.

#### Step 2: Rank results with a window function

`DENSE_RANK()` assigns consecutive ranks without gaps. Unlike `ROW_NUMBER()`, tied values share the same rank, which matters when the prompt asks for ties.

#### Step 3: Use a subquery to find the reference value

The scalar subquery computes a single value (like the maximum) that the outer query filters against. This avoids a self-join.

---

### The solution

**Dense-rank for second highest value**

```sql
SELECT metric_value
FROM (SELECT DISTINCT metric_value, DENSE_RANK() OVER (ORDER BY metric_value DESC) AS rnk FROM employee_metrics) ranked
WHERE rnk = 2
```

> **Cost Analysis**
>
> With ~10,000 rows, the window function runs on the reduced set after filtering and grouping. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for which ranking function you choose and whether you can explain the difference between `ROW_NUMBER`, `RANK`, and `DENSE_RANK`; whether you use a subquery or self-join, and can explain the tradeoffs.

> **Common Pitfall**
>
> Using `ROW_NUMBER()` when the prompt requires tie inclusion silently drops tied rows. Read the prompt for language about ties or 'include all'.

---

## Common follow-up questions

- What would happen to your result if `employee_metrics.metric_value` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `metric_value` and uses DISTINCT or deduplication where needed.)_
- The `metric_value` column in `employee_metrics` is heavily skewed toward a few popular values. How would data skew affect parallel execution of your query? _(Tests understanding of skew in `employee_metrics.metric_value` and its impact on distributed query performance.)_
- What is the default window frame for your window function, and would explicitly setting ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW change anything? _(Tests knowledge of implicit vs explicit window frame specifications.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/second_highest_value)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.