# Service Component Classification

> Classified by naming pattern.

Canonical URL: <https://datadriven.io/problems/service_component_classification>

Domain: SQL · Difficulty: medium · Seniority: L3

## Problem

We're enriching the service catalog. If a service name contains '/', classify it as 'Multi-Component'; otherwise classify it as 'Single-Component'. Show each unique service name with its classification.

## Worked solution and explanation

### Why this problem exists in real interviews

This problem targets pattern matching with LIKE across the `svc_health` table. You need to work with the `svc_name` column to satisfy the requirements.

---

### Break down the requirements

#### Step 1: Classify rows with a CASE expression

The `CASE` expression evaluates conditions top to bottom and returns the first match. Order matters: put the most restrictive condition first to avoid misclassification.

#### Step 2: Deduplicate the result with DISTINCT

`SELECT DISTINCT` removes duplicate rows from the output. This is necessary when joins or subqueries can produce repeated combinations.

---

### The solution

**Pattern-match for service component classification**

```sql
SELECT DISTINCT svc_name, CASE WHEN svc_name LIKE '%/%' THEN 'Multi-Component' ELSE 'Single-Component' END AS svc_class
FROM svc_health
```

> **Cost Analysis**
>
> With ~20M rows, the query performs a single sequential scan. An index on the filter/join columns would reduce the scan to a seek.

> **Interviewers Watch For**
>
> Interviewers watch for whether the query returns exactly the columns and ordering the prompt specifies; how quickly you identify the core operation and write clean, minimal code.

> **Common Pitfall**
>
> Forgetting the `%` wildcard or placing it on the wrong side changes the match semantics entirely. `LIKE 'x%'` matches prefixes; `LIKE '%x'` matches suffixes.

---

## Common follow-up questions

- If `latency` in `svc_health` is NULL for some rows, how would your aggregation or join logic be affected? _(Probes understanding of NULL propagation through joins and aggregate functions on `svc_health.latency`.)_
- If `svc_health` grew to contain billions of rows, which part of your query would become the bottleneck given the cardinality of `latency`? _(Tests ability to identify performance hotspots related to `svc_health.latency` at scale.)_
- Your LIKE pattern may prevent index usage on `svc_health`. How would you restructure the filter to be index-friendly? _(Tests understanding of leading-wildcard LIKE and its impact on index scans.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/service_component_classification)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.