# Clean Latency Cast

> The latency column is a string. It should not be.

Canonical URL: <https://datadriven.io/problems/clean_latency_cast>

Domain: SQL · Difficulty: easy · Seniority: L4

## Problem

During a data quality investigation, you found that the latency column in service health records contains some non-numeric strings. Return all records with latency converted to an integer, excluding any rows where the conversion would fail. Return all available fields.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests safe type casting and data quality handling. Interviewers use this to probe whether you know how to validate and convert text columns to numeric types without crashing the query.

> **Trick to Solving**
>
> The key is filtering out non-numeric strings before casting. PostgreSQL offers a regex check or TRY_CAST pattern. Look for phrases like "excluding rows where the conversion would fail" as the signal.
> 
> 1. Filter rows where `latency` matches a numeric pattern
> 2. Cast the validated strings to integer
> 3. Return all fields with the converted column

---

### Break down the requirements

#### Step 1: Identify valid numeric strings

Use `latency ~ '^[0-9]+$'` (PostgreSQL regex) to match only strings containing digits. This excludes 'N/A', 'timeout', or other non-numeric values.

#### Step 2: Cast to integer

`latency::INTEGER` converts the validated string to an integer.

---

### The solution

**Regex validation before type cast**

```sql
SELECT
    check_id,
    svc_name,
    status,
    latency::INTEGER AS latency,
    uptime,
    checked,
    region
FROM svc_health
WHERE latency ~ '^[0-9]+$'
```

> **Cost Analysis**
>
> Full scan of 20M rows with regex evaluation per row. The regex is simple (digit check) so each evaluation is fast. No index helps since every row must be validated.

> **Interviewers Watch For**
>
> Whether you use a safe validation pattern (regex, TRY_CAST) vs an unsafe bare cast that would throw on bad data. The safe approach demonstrates production-readiness.

> **Common Pitfall**
>
> Using `latency::INTEGER` without the regex filter crashes the query on the first non-numeric row. Always validate before casting, or use `TRY_CAST` where available.

---

## Common follow-up questions

- What if latency values include decimals like '3.14'? _(The regex would need to allow dots: '^[0-9]+(\.[0-9]+)?$', and the cast target changes to NUMERIC.)_
- Does your database support TRY_CAST or TRY_TO_NUMBER? _(Tests awareness of database-specific safe cast functions (SQL Server TRY_CAST, Snowflake TRY_TO_NUMBER).)_
- How would you report which rows failed the conversion? _(Use the negated WHERE condition to select rows that do NOT match the pattern.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/clean_latency_cast)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.