# Longest Deploy With Full Identifier

> The longest deployment. Full ID.

Canonical URL: <https://datadriven.io/problems/longest_deploy_with_full_identifier>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

We need full identification for the deployment(s) with the longest duration. Show a label combining the service name and version, along with the duration in seconds. If multiple deployments tie for the longest, include all of them.

## Worked solution and explanation

### Why this problem exists in real interviews

This focuses on top-N selection within deploy_logs, specifically around the svc_name column. Interviewers present it as a fundamentals check because the edge cases around NULL values and boundary conditions reveal depth of understanding.

---

### Break down the requirements

#### Step 1: Read from `deploy_logs`

The query targets `deploy_logs` with 8 columns. Identify which columns are needed for the output.

#### Step 2: Order and limit the output

Sort by the target metric and apply `LIMIT` to return the requested number of rows. Ensure the sort is deterministic to produce reproducible results.

#### Step 3: Return the result set

Select the required columns with any necessary aliasing or formatting.

---

### The solution

**String concatenation with ORDER BY for longest**

```sql
SELECT svc_name || ':' || version AS full_identifier, dur_secs
FROM deploy_logs
ORDER BY dur_secs DESC
LIMIT 1
```

> **Cost Analysis**
>
> The query scans 700K rows from `deploy_logs`. CTEs in most engines are optimization fences. For production workloads, consider inlining or materializing the intermediate results.

> **Interviewers Watch For**
>
> Breaking complex logic into named CTEs shows the interviewer you prioritize readability and debuggability.

> **Common Pitfall**
>
> Returning more columns than the prompt asks for can trigger a "wrong schema" failure in automated grading. Match the output specification exactly.

---

## Common follow-up questions

- What happens to your result if deploy_logs.dur_secs contains NULLs for some rows? _(Tests whether the candidate accounts for NULL behavior in aggregates and comparisons on dur_secs.)_
- How would you verify that your aggregation on deploy_logs.log_id is not double-counting due to duplicate rows? _(Tests data quality awareness and deduplication strategies.)_
- With millions of distinct values in deploy_logs.log_id, what index strategy would you use to keep this query performant? _(Tests indexing knowledge specific to high-cardinality columns like log_id.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/longest_deploy_with_full_identifier)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.