# Extract Deploy Versions

> The version number is buried in the log.

Canonical URL: <https://datadriven.io/problems/extract_deploy_versions>

Domain: SQL · Difficulty: medium · Seniority: L4

## Problem

The release team needs to compare version numbers numerically, but they're stored as strings with prefixes and separators like 'v1.2-3'. For staging deployments, strip the non-numeric characters from the version and return it as an integer alongside the service name.

## Worked solution and explanation

### What this is really asking

`REPLACE(REPLACE(REPLACE(version, 'v', ''), '.', ''), '-', '')` strips three literal characters and trusts that nothing else appears in the string. CAST AS INTEGER then turns the remaining digits into a number for staging rows only.

---

### Break down the requirements

#### Step 1: Strip the known noise

Nest three REPLACE calls. Each one removes one fixed character ('v', then '.', then '-'). Order does not matter here because the three sets do not overlap.

#### Step 2: Cast and filter

Wrap the stripped string in CAST(... AS INTEGER) so version_num is numeric and orderable. Filter env_name = 'staging' in WHERE to drop prod and dev rows.

---

### The solution

**EXTRACT DEPLOY VERSIONS**

```sql
SELECT svc_name,
       CAST(REPLACE(REPLACE(REPLACE(version, 'v', ''), '.', ''), '-', '') AS INTEGER) AS version_num
FROM deploy_logs
WHERE env_name = 'staging'
```

> **Cost Analysis**
>
> Single scan of deploy_logs (600k rows) with a WHERE on env_name. An index on env_name would prune to staging cheaply; the three REPLACE calls are O(length) per row and trivial at this scale.

> **Interviewers Watch For**
>
> Does the cast survive every value? If any staging version contains 'rc', '+build', or whitespace, CAST will error or truncate. A regex-based digit extractor is more defensible if the format is not strictly enforced.

> **Common Pitfall**
>
> Treating the result as a real version comparator. '1.2-3' becomes 123 and '1.23' also becomes 123. The numeric form collapses semantically distinct versions; fine for sort within one service, dangerous across services.

---

### COMMON FOLLOW-UP QUESTIONS

## Common follow-up questions

- What breaks if a version string contains letters other than 'v', like 'v1.2-rc3'? _(REPLACE only strips 'v', so 'rc' stays. CAST AS INTEGER then fails or returns 0 depending on the engine.)_
- How would you make this robust across formats? _(Use a regex like REGEXP_REPLACE(version, '[^0-9]', '') so any non-digit is stripped, not just the three known characters.)_
- Why is 1.2.10 a problem for numeric comparison? _(It collapses to 1210, which sorts below 1.3.0 (130). Lexicographic-by-component compare or split-and-cast is needed for correct semver ordering.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/extract_deploy_versions)
- [SQL Interview Questions](https://datadriven.io/sql-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.