# Deploy Reliability Scores

> A reliability scoreboard for deploy teams.

Canonical URL: <https://datadriven.io/problems/deploy_reliability_scores>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

Each deploy has an author and a service; matching deploys to health checks on service name gives a latency score. Compute the average score per author, ranked from highest to lowest. Show each author and their average score.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests a JOIN between two tables on a shared key with aggregation. Computing an average of a derived score probes whether you can combine data from deployment and health check tables correctly.

---

### Break down the requirements

#### Step 1: Join deploys to health checks

Join `deploy_logs` to `svc_health` on `svc_name` to associate each deploy with health check latency.

#### Step 2: Compute average score per author

`GROUP BY author` with `AVG(latency)` (or a derived score) gives each author's reliability score.

#### Step 3: Rank descending

`ORDER BY avg_score DESC` as specified.

---

### The solution

**Cross-table average with author ranking**

```sql
SELECT
    dl.author,
    ROUND(AVG(sh.latency::NUMERIC), 2) AS avg_score
FROM deploy_logs dl
JOIN svc_health sh ON dl.svc_name = sh.svc_name
GROUP BY dl.author
ORDER BY avg_score DESC
```

> **Cost Analysis**
>
> Join of 1M deploy_logs to 30M svc_health on svc_name. This is a many-to-many join that can produce a large intermediate result. The GROUP BY collapses it to one row per author.

> **Interviewers Watch For**
>
> Whether the candidate recognizes that the join can produce many rows per deploy (one per health check). The average is computed across all matching health checks for all of an author's deploys.

> **Common Pitfall**
>
> The many-to-many join means authors who deploy to high-traffic services (with many health checks) are over-represented. A weighted average or pre-aggregation per service would be more fair.

---

## Common follow-up questions

- How would you weight the score by the number of deploys? _(Tests weighted average or pre-aggregation strategies.)_
- What if latency is stored as text with non-numeric values? _(Tests safe casting: latency ~ '^[0-9]+$' filter before cast.)_
- How would you also show the number of deploys per author? _(Add COUNT(DISTINCT dl.log_id) to the SELECT.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/deploy_reliability_scores)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.