# Tenure Mentorship Match

> Pair by tenure. Longest with newest.

Canonical URL: <https://datadriven.io/problems/tenure_mentorship_match>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The engineering org is piloting a mentorship program that pairs each engineer with the most tenured colleague on the same service, using the earliest deployment timestamp as a proxy for tenure. For each author, show them alongside their service's most tenured deployer.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes your ability to perform a self-join combined with a window function or correlated subquery. The interviewer wants to see if you can identify the "most tenured" colleague per partition and then pair every member of that partition with that person, a pattern that appears frequently in org-analytics queries.

---

### Break down the requirements

#### Step 1: Identify distinct author-service pairs

`SELECT DISTINCT author, svc_name` from `deploy_logs` gives the set of engineers and the services they have deployed to.

#### Step 2: Find earliest deploy per author per service

`MIN(deploy_at)` grouped by `author, svc_name` serves as the tenure proxy: the earlier the first deploy, the more tenured the person on that service.

#### Step 3: Rank authors by tenure within each service

Use `ROW_NUMBER() OVER (PARTITION BY svc_name ORDER BY first_deploy ASC)` to find the most tenured deployer (rank 1) per service.

#### Step 4: Self-join to pair each author with rank-1 mentor

Join every author back to the rank-1 author for their service. Exclude self-matches if the author IS the most tenured person.

---

### The solution

**CTE with window rank then self-join for mentorship pairs**

```sql
WITH tenure AS (
    SELECT
        author,
        svc_name,
        MIN(deploy_at) AS first_deploy
    FROM deploy_logs
    GROUP BY author, svc_name
),
ranked AS (
    SELECT
        author,
        svc_name,
        first_deploy,
        ROW_NUMBER() OVER (PARTITION BY svc_name ORDER BY first_deploy ASC) AS rnk
    FROM tenure
)
SELECT
    r.author,
    m.author AS mentor,
    m.first_deploy AS mentor_hire_date
FROM ranked r
JOIN ranked m
    ON r.svc_name = m.svc_name
    AND m.rnk = 1
WHERE r.author != m.author
ORDER BY r.author
```

> **Cost Analysis**
>
> The `GROUP BY` reduces 1M rows to roughly 12K (120 authors x 100 services, though sparse). The window function sorts 12K rows per partition, which is near-instant. The self-join is also on the 12K-row CTE, so the entire query is lightweight after the initial aggregation.

> **Interviewers Watch For**
>
> Interviewers want to see that you aggregate to the correct grain before ranking. A candidate who tries to rank raw deploy rows will get wrong results because a single author may have hundreds of deploys to the same service.

> **Common Pitfall**
>
> Using `RANK()` instead of `ROW_NUMBER()` when you want exactly one mentor per service. If two authors share the same earliest deploy timestamp, `RANK` returns both, creating duplicate mentorship pairs. The prompt says "the most tenured," implying a single match per service.

---

## Common follow-up questions

- What if an author deploys to multiple services and you need their primary service mentor only? _(Tests adding a second ranking layer to pick the author's most-frequent service.)_
- How would you handle ties in the earliest deploy timestamp? _(Probes tiebreaker strategy: add a secondary sort on `author` name or use `DENSE_RANK`.)_
- What if the deploy_logs table grows to 1B rows? _(Tests whether you would pre-aggregate into a summary table or add a partition-pruning filter.)_
- Could you also return the mentor's total deploy count for that service? _(Tests whether you can layer an additional aggregate onto the CTE without breaking the join.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/tenure_mentorship_match)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.