# Transaction Source Features

> One pipeline reviewed them. What did it see?

Canonical URL: <https://datadriven.io/problems/transaction_source_features>

Domain: SQL · Difficulty: easy · Seniority: L3

## Problem

Return all feature names from the transactions source that have a recorded average value.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a basic multi-condition filter. Interviewers use it to check that you correctly combine `WHERE` conditions with `AND` and handle NULLs in the filter.

---

### Break down the requirements

#### Step 1: Filter to transactions source

`WHERE source = 'transactions'` restricts to features originating from the transactions pipeline.

#### Step 2: Exclude NULL average values

`AND avg_val IS NOT NULL` ensures only features with a recorded average are returned. With 10% null fraction, this removes ~1.2M rows.

---

### The solution

**Two-condition filter for source and non-null metric**

```sql
SELECT feat_name
FROM ml_features
WHERE source = 'transactions'
  AND avg_val IS NOT NULL
```

> **Cost Analysis**
>
> With 8 sources, the filter reduces 12M rows to ~1.5M. The NULL check further narrows to ~1.35M. An index on `(source, avg_val)` enables an efficient seek.

> **Interviewers Watch For**
>
> Candidates who omit the NULL check. The phrase "have a recorded average value" specifically excludes NULLs.

> **Common Pitfall**
>
> Using `avg_val != NULL` instead of `avg_val IS NOT NULL`. In SQL, any comparison with NULL using `=` or `!=` returns NULL (unknown), not TRUE or FALSE.

---

## Common follow-up questions

- What if you needed distinct feature names only? _(Wrap in SELECT DISTINCT or add GROUP BY feat_name.)_
- How would you also return the average of avg_val per feature? _(Add GROUP BY feat_name with AVG(avg_val).)_
- What if 'transactions' was stored with inconsistent casing? _(Use LOWER(source) = 'transactions' or fix the data upstream.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/transaction_source_features)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.