# First Deploy Attribution

> The first deploy per service.

Canonical URL: <https://datadriven.io/problems/first_deploy_attribution>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The release engineering team wants to measure onboarding velocity: how often are engineers deploying to a service for the very first time? For each service, show the total deployment count alongside how many of those were an author's first-ever deployment to that service.

## Worked solution and explanation

### What this is really asking

`author` plus `svc_name` defines a first deploy: the earliest `deploy_at` for that pair is `rn = 1`. Per service, you count all rows and the subset where rn = 1.

---

### Break down the requirements

#### Step 1: Rank within (author, svc_name)

ROW_NUMBER OVER (PARTITION BY author, svc_name ORDER BY deploy_at) tags each row; rn = 1 is the first deploy to that service by that author.

#### Step 2: Aggregate per service

Group by svc_name. COUNT(*) is total; SUM(CASE WHEN rn = 1 THEN 1 ELSE 0 END) is the first-deploy subset.

#### Step 3: Order by service name

ORDER BY svc_name for stable output.

---

### The solution

**FIRST DEPLOY ATTRIBUTION**

```sql
SELECT
  dl.svc_name,
  COUNT(*) AS total_deploys,
  SUM(CASE WHEN rn = 1 THEN 1 ELSE 0 END) AS first_time_deploys
FROM (
  SELECT
    deploy_logs.*,
    ROW_NUMBER() OVER (
      PARTITION BY author, svc_name
      ORDER BY deploy_at
    ) AS rn
  FROM deploy_logs
) dl
GROUP BY dl.svc_name
ORDER BY dl.svc_name;
```

> **Cost Analysis**
>
> One sort to assign ROW_NUMBER, one hash aggregate by svc_name. 800k rows fits in memory; an index on (author, svc_name, deploy_at) skips the sort.

> **Interviewers Watch For**
>
> Whether you partition by (author, svc_name) or just author. Author alone counts an engineer's lifetime first deploy, not the per-service first the prompt asks for.

> **Common Pitfall**
>
> Filtering `WHERE rn = 1` in the outer query collapses `total_deploys` into `first_time_deploys`. Keep every row and gate the second count with CASE inside SUM.

> **The False Start**
>
> First instinct is COUNT(DISTINCT author) per svc_name for the first-deploy column. It returns the same number here because each (author, svc_name) has one rn = 1, but it falls apart once the follow-ups add status filters or tiebreakers.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- What if two deploys by the same author share a deploy_at? _(ROW_NUMBER picks one arbitrarily. Add log_id to ORDER BY, or use RANK if both should count as first.)_
- How would you also report the share of first-time deploys per service? _(Wrap with `1.0 * first_time_deploys / total_deploys`, or use AVG over the CASE expression.)_
- How would you restrict to successful deploys only? _(Add `WHERE status = 'success'` inside the subquery so failed attempts do not count as someone's first.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/first_deploy_attribution)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.