# Successful Pipeline Runs

> Which pipelines completed successfully?

Canonical URL: <https://datadriven.io/problems/successful_pipeline_runs>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

The pipeline_runs table tracks every execution of our data pipelines. Pull each pipeline name alongside how many times it completed successfully, sorted from most successful runs to fewest.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests whether a candidate can demonstrate writing clean, correct queries under time pressure. This is a foundational check that interviewers use early in a round to verify baseline proficiency.

---

### Break down the requirements

#### Step 1: Select the target columns

The SELECT clause picks exactly the columns the prompt asks for. Returning extra columns or missing a required alias would fail the grading check.

#### Step 2: Verify the output shape

Confirm the result has the expected columns, ordering, and no duplicate rows. A quick sanity check on row count catches logic errors before submission.

---

### The solution

**Group data_pipes by name and count successful completions**

```sql
SELECT pipe_name, COUNT(*) AS success_count
FROM data_pipes
WHERE status = 'success'
GROUP BY pipe_name
ORDER BY success_count DESC
```

> **Cost Analysis**
>
> With ~70,000 rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Returning extra columns that the prompt did not ask for, or using the wrong column alias, causes a grading mismatch even when the logic is correct.

---

## Common follow-up questions

- If pipe_name contains trailing whitespace in some rows, how would that affect your GROUP BY result? _(Tests awareness of dirty data causing phantom groups; TRIM would collapse them.)_
- Should you filter with WHERE status = 'success' or use HAVING with a conditional count, and what is the performance difference? _(Tests understanding of predicate pushdown; WHERE filters before grouping, reducing work.)_
- How would you also display each pipeline's average rows_out alongside its success count? _(Tests whether the candidate can add an aggregate on a different column without breaking the grouping.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/successful_pipeline_runs)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.