# Unmatched Deploy Services

> Two registries. They do not agree.

Canonical URL: <https://datadriven.io/problems/unmatched_deploy_services>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

We maintain two pipeline registries and need to reconcile them. Find service names that appear in deploy logs but have no matching entry in the CI builds table. List them alphabetically.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests set difference logic between two tables with different schemas. Interviewers check whether you can identify values in one table that have no counterpart in another using anti-join or EXCEPT patterns.

---

### Break down the requirements

#### Step 1: Get distinct service names from deploy logs

`SELECT DISTINCT svc_name FROM deploy_logs` produces the set of deployed services.

#### Step 2: Exclude services present in CI builds

The CI builds table uses `repo_name` as the service identifier. Use `EXCEPT` or `NOT EXISTS` to find deploy services without a match.

#### Step 3: Order alphabetically

`ORDER BY svc_name` satisfies the alphabetical sort requirement.

---

### The solution

**Set difference between two registries**

```sql
SELECT DISTINCT svc_name
FROM deploy_logs
WHERE svc_name NOT IN (
    SELECT DISTINCT repo_name FROM ci_builds
)
ORDER BY svc_name
```

> **Cost Analysis**
>
> The subquery extracts 60 distinct repo_names from 3M rows. The main query scans 800K deploy_logs and checks each svc_name against a 60-element set. This is fast with an index on ci_builds(repo_name).

> **Interviewers Watch For**
>
> Whether you recognize that the two tables use different column names (`svc_name` vs `repo_name`) for what is conceptually the same entity. Schema reconciliation is a real-world data engineering task.

> **Common Pitfall**
>
> Using NOT IN when the subquery could return NULLs. If `repo_name` has any NULL values, the entire NOT IN predicate returns UNKNOWN and no rows are returned. Use NOT EXISTS to be safe.

---

## Common follow-up questions

- How would you find services in CI builds but not in deploy logs? _(Reverse the direction of the set difference.)_
- What if service names have inconsistent casing between the two tables? _(Use LOWER() on both sides for case-insensitive comparison.)_
- How would you find the symmetric difference (services in either but not both)? _(Tests FULL OUTER JOIN or UNION of both anti-joins.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/unmatched_deploy_services)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.