# The Fine Print

> Risk rules nest deep. Decide which transactions clear them.

Canonical URL: <https://datadriven.io/problems/the-fine-print>

Domain: Python · Difficulty: medium · Seniority: mid

## Problem

A trade-surveillance system screens transaction `records` against a `query`, a predicate tree whose leaf nodes compare one field to a value with `>`, `>=`, `<`, `<=`, `==`, or `!=`, and whose interior nodes nest `and`, `or`, and `not` to any depth. Return the records that satisfy the whole tree, keeping them in their original order. A leaf comparison is false for any record that lacks the field it names.

## Worked solution and explanation

### Why this problem exists in real interviews

This is a recursive predicate evaluator wearing a trade-surveillance costume. The skill being probed: can you evaluate a boolean tree of and/or/not over comparison leaves, nested to any depth, against a stream of records? Anyone can filter where amount is greater than 30. The trick is that the query is not a flat list of conditions, it is a tree, and the weak candidate carries the SQL WHERE-clause mental model into Python and writes one loop over a list of comparisons. Feed that loop a node with an `or` inside a `not` and it either ignores the branch or blows up trying to read `node['field']` on a connective dict. Get this wrong and every record that should clear a nested OR branch silently disappears from the result.

---

### The trap everyone walks into

> **The flat AND-list assumption**
>
> The first example is an `and` of two things, so it is tempting to special-case exactly that: iterate the clauses, require all of them. That hardcodes the two-level shape. The query is defined recursively, a clause can itself be an `or` of an `and` of a `not`, so the only structure that survives is one that calls itself on each child. If your evaluator is not recursive, you have solved a different, easier problem than the one asked.

**Naive flat loop**

all(compare(r, c) for c in query['conditions']) assumes query is a list of leaves joined by AND. It cannot represent or/not at all, and it raises KeyError the moment a connective node appears.

**Recursive walk**

matches(record, node) inspects the node's shape: an and/or/not connective recurses on its children, a leaf does the comparison. Depth is unbounded and the same function handles every level.

---

### Break down the requirements

#### Step 1: Dispatch on the node's shape

A node is a connective if it carries an 'and', 'or', or 'not' key, otherwise it is a comparison leaf with 'field', 'op', and 'value'. Check the connective keys FIRST, before you ever reach for node['field'], so a connective dict never gets mistaken for a leaf.

#### Step 2: Let all / any / not carry the boolean logic

and maps to all(...) over the recursive results of its clauses, or maps to any(...), and not inverts the single child. all and any short-circuit for free, so a cheap clause that fails an AND stops the walk early. This is where the recursion pays off: each child is just another call to the same function.

#### Step 3: Evaluate a leaf, and absorb the missing field

Before comparing, ask whether the field is present. If it is absent, the comparison is false by definition, return False rather than letting Python raise on a missing key. With the field present, dispatch the operator string to the matching comparison. Folding the absent-field rule into the leaf keeps every connective above it oblivious to the special case.

#### Step 4: Keep the records in input order

Filter with a comprehension over records in their original sequence and return the record dicts themselves, not their indices or copies. Order preservation falls out of iterating the input once; nothing sorts or reshuffles.

---

### The solution

**Recursive predicate walk with operator dispatch**

```python
import operator

OPS = {
    ">": operator.gt,
    ">=": operator.ge,
    "<": operator.lt,
    "<=": operator.le,
    "==": operator.eq,
    "!=": operator.ne,
}

def filter_records(records, query):
    def matches(record, node):
        if "and" in node:
            return all(matches(record, clause) for clause in node["and"])
        if "or" in node:
            return any(matches(record, clause) for clause in node["or"])
        if "not" in node:
            return not matches(record, node["not"])
        field = node["field"]
        if field not in record:
            return False
        return OPS[node["op"]](record[field], node["value"])
    return [record for record in records if matches(record, query)]
```

> **Complexity**
>
> Time is O(n * t) where n is the number of records and t is the number of nodes in the query tree: each record drives one walk of the tree, and short-circuiting often visits fewer nodes. Space is O(d) for the recursion stack, where d is the tree's depth, plus the output list. A surveillance batch is typically thousands of records against a query of a few dozen nodes, so this is effectively linear in the records.

> **Interviewers Watch For**
>
> Whether you check the connective keys before indexing node['field'], whether you reach for a single recursive helper instead of unrolling the first example's shape, and whether you state the missing-field rule out loud as a decision (skip versus default versus raise) rather than letting it crash. Dispatching operators through a dict or the operator module, instead of an if/elif ladder of six branches, is a small extra signal of fluency.

> **Common Pitfall**
>
> The subtle one is the interaction of a missing field with not. Under the rule that an absent field makes the leaf False, a record lacking 'region' satisfies not(region == 'EU'), because the inner comparison is False and not False is True. Candidates who special-case missing fields at the wrong level, treating absence as a hard reject of the whole record, get this backwards and wrongly exclude it. Keep the absent-field decision inside the leaf and let not invert the boolean like any other.

---

## Common follow-up questions

- How would you support an 'in' operator whose value is a list, or a 'between' leaf with two bounds? _(Tests whether the leaf evaluator is extensible: a new op should be a new entry in the dispatch table, not a rewrite of the walk.)_
- The same query is applied to millions of records. How would you avoid re-walking the tree's structure for every row? _(Tests compiling the query once into a closure or predicate function, so per-record work is just the comparisons, not the dispatch.)_
- What should happen when a field is present but its type does not support the operator, say comparing a string with a numeric '>' bound? _(Tests reasoning about error semantics: coerce, treat as non-match, or surface a validation error before the batch runs.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/the-fine-print)
- [Python Interview Questions](https://datadriven.io/python-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.