# Classify Services by Name

> The name tells you what it is. Mostly.

Canonical URL: <https://datadriven.io/problems/classify_services_by_name>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The platform catalog needs a service taxonomy. Classify each service: 'api_service' if the name contains 'api', 'cache_service' if it contains 'cache' or 'redis', 'database' if it contains 'db' or 'postgres', and 'other' for everything else. Each service should appear only once. Show the service name and its category, limited to the first 100 results.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests CASE WHEN with string matching for classification logic, combined with deduplication. Interviewers check whether you can handle overlapping conditions (a service name could match multiple patterns) by ordering CASE branches correctly.

---

### Break down the requirements

#### Step 1: Deduplicate service names

`SELECT DISTINCT svc_name` ensures each service appears once.

#### Step 2: Classify with CASE WHEN

Apply pattern checks in priority order: 'api', then 'cache'/'redis', then 'db'/'postgres', with 'other' as the default. CASE evaluates top-to-bottom, so first match wins.

#### Step 3: Limit results

`LIMIT 100` restricts output to the first 100 services.

---

### The solution

**CASE-based classification with deduplication**

```sql
SELECT DISTINCT
    svc_name,
    CASE
        WHEN svc_name LIKE '%api%' THEN 'api_service'
        WHEN svc_name LIKE '%cache%' OR svc_name LIKE '%redis%' THEN 'cache_service'
        WHEN svc_name LIKE '%db%' OR svc_name LIKE '%postgres%' THEN 'database'
        ELSE 'other'
    END AS category
FROM svc_health
LIMIT 100
```

> **Cost Analysis**
>
> The DISTINCT requires scanning all 20M rows and deduplicating on `svc_name`. The CASE evaluation is O(1) per row. Output is at most 100 rows after LIMIT.

> **Interviewers Watch For**
>
> Whether the candidate recognizes that CASE short-circuits: a service named 'api-cache' would be classified as 'api_service' because that branch is checked first. Strong candidates discuss this ordering explicitly.

> **Common Pitfall**
>
> Using `LIKE 'api'` without wildcards would only match the exact string 'api', not 'user-api' or 'api-gateway'. Always include `%` wildcards for substring matching.

---

## Common follow-up questions

- What if the classification rules had overlapping categories? _(Tests CASE WHEN ordering and the first-match-wins semantics.)_
- How would you make the matching case-insensitive? _(Tests LOWER() or ILIKE for portable case-insensitive classification.)_
- What if you needed to count services per category? _(Wrap in a subquery and GROUP BY category, testing layered queries.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/classify_services_by_name)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.