# The Number Miner

> JSON strings are hiding numeric secrets - dig them out.

Canonical URL: <https://datadriven.io/problems/the_number_miner>

Domain: Python · Difficulty: medium · Seniority: L3

## Problem

Given a list of JSON strings, parse each and recursively extract every integer value (not float, not boolean). Return a dict mapping each distinct integer (as string key in output) to its total count across all parsed strings.

## Worked solution and explanation

### Why this problem exists in real interviews

This tests **JSON parsing combined with type-based filtering and counting**. It checks whether candidates can traverse parsed JSON structures, identify integer values, and aggregate counts efficiently.

---

### Break down the requirements

#### Step 1: Parse each JSON string

Use `json.loads` to convert each string into a Python dictionary.

#### Step 2: Extract integer values from each parsed object

Iterate over all values in the dictionary and keep only those that are integers.

#### Step 3: Count occurrences across all strings

Use a frequency dictionary to tally each integer's total count across all parsed objects.

---

### The solution

**JSON parse with integer extraction and counting**

```python
import json
def mine_numbers(json_strings):
    counts = {}
    for s in json_strings:
        obj = json.loads(s)
        for key in obj:
            val = obj[key]
            if isinstance(val, int) and not isinstance(val, bool):
                if val in counts:
                    counts[val] += 1
                else:
                    counts[val] = 1
    return counts
```

> **Time and Space Complexity**
>
> **Time:** O(n * k) where n is the number of JSON strings and k is the average number of fields per object.
> 
> **Space:** O(d) where d is the number of distinct integers found.

> **Interviewers Watch For**
>
> Do you exclude booleans? In Python, `bool` is a subclass of `int`, so `isinstance(True, int)` returns `True`. Checking `not isinstance(val, bool)` is necessary to filter them out.

> **Common Pitfall**
>
> Not handling nested JSON objects. If values can be nested dicts or lists, a simple iteration over top-level values misses deeper integers. Clarify the structure with the interviewer.

---

## Common follow-up questions

- What if the JSON objects were deeply nested? _(Tests recursive traversal of dicts and lists.)_
- How would you also extract float values? _(Tests adjusting the isinstance check and deciding whether to count ints and floats separately.)_
- What if JSON parsing fails on some strings? _(Tests wrapping json.loads in try/except for graceful error handling.)_
- What if the input were a stream of JSON lines (JSONL)? _(Tests line-by-line processing for memory efficiency.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the_number_miner)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.