# Second Highest Cloud Cost

> The second biggest bill on record.

Canonical URL: <https://datadriven.io/problems/second_highest_cloud_cost>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

The FinOps team already identified the peak cost entry and now wants the runner-up. What is the second highest unique cloud cost amount on record?

## Worked solution and explanation

### Why this problem exists in real interviews

This cloud cost problem uses the `cloud_costs` table to evaluate top-N selection. Watch how the `amount` column interact in the grouping and filtering logic.

---

### Break down the requirements

#### Step 1: Deduplicate the result with DISTINCT

`SELECT DISTINCT` removes duplicate rows from the output. This is necessary when joins or subqueries can produce repeated combinations.

#### Step 2: Order and limit the output

`ORDER BY` with `LIMIT` returns only the top result. The sort must be deterministic; add a tiebreaker column if needed.

---

### The solution

**Deduplicate the result with distinct to find second highest cloud cost**

```sql
SELECT DISTINCT amount
FROM cloud_costs
ORDER BY amount DESC
LIMIT 1 OFFSET 1
```

> **Cost Analysis**
>
> With ~10M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Using LIMIT without ORDER BY returns an arbitrary subset. Always pair LIMIT with a deterministic ORDER BY.

---

## Common follow-up questions

- What would happen to your result if `cloud_costs.amount` contained duplicate values that you did not expect? _(Tests whether the candidate considers data quality issues in `amount` and uses DISTINCT or deduplication where needed.)_
- If `cloud_costs` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `cost_id`? _(Tests ability to identify performance hotspots related to `cloud_costs.cost_id` at scale.)_
- If this query ran as a scheduled job, how would you add monitoring to detect when the result set is suspiciously empty? _(Tests operational awareness around scheduled query jobs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/second_highest_cloud_cost)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.