# Loan Application Reporting Schema

> Approved, declined, or pending. Design the tables that say so.

Canonical URL: <https://datadriven.io/problems/loan_application_reporting_schema>

Domain: Data Modeling · Difficulty: medium · Seniority: L4

## Problem

We're a lending platform. Customers submit loan applications. Our risk team reviews each application and either approves or declines it. Approved applications may or may not result in funded loans, since the customer must accept the offer. The analytics team needs approval rate breakdowns by customer segment. Design the data model.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether a candidate separates **approved application** from **funded loan** and understands when a dimension attribute must be snapshotted at event time. Risk and finance teams live or die by these distinctions; conflating them produces wrong funnels and wrong underwriting backtests.

> **Trick to Solving**
>
> When a prompt says “approved applications may or may not result in funded loans,” the trick is to resist merging them. Before drawing tables, a strong candidate asks: do risk teams backtest on the applicant snapshot at application time or the current customer record?
> 
> 1. Keep loan_applications, loan_offers, and funded_loans as distinct entities
> 2. Snapshot credit_score and income on loan_applications
> 3. Never join current customer state into a historical application query
> 4. Let offers and funding carry their own grain

---

### Break down the requirements

#### Step 1: Model the funnel as three entities

`loan_applications`, `loan_offers`, `funded_loans`. Each has different cardinality and different lifecycle semantics.

#### Step 2: Snapshot risk attributes at application time

`credit_score_at_apply` and `stated_income_at_apply` live on `loan_applications`. The current customer record drifts; the underwriting signal does not.

#### Step 3: Use status enums on applications

`application_status` captures submitted, approved, declined with a nullable `decline_reason`. Status transitions are the funnel.

#### Step 4: Keep offers 1:1 with approvals

Every approved application yields an offer; not every offer is accepted. `loan_offers` has amount, APR, term.

---

### The solution

Below is one conceptually sound model. The snapshotted segment attributes on `loan_applications` are the load-bearing choice for risk backtests.

> **Why this works**
>
> Snapshotting underwriting inputs on the application row makes risk backtests deterministic. The three-entity funnel preserves drop-off analysis: applied to approved, approved to offered, offered to funded.

> **Interviewers watch for**
>
> A strong candidate refuses to compute approval-time credit score by joining a current customer attribute. They also distinguish offer amount from funded principal, which are often different.

> **Common pitfall**
>
> Treating approval and funding as the same row. A customer who is approved but never signs the disclosure still counts as approved for funnel math but never funded. Merging them destroys that distinction and silently inflates conversion.

---

### The analysis pattern

**Funnel conversion by credit tier**

```sql
SELECT
    CASE
        WHEN a.credit_score_at_apply >= 740 THEN 'prime'
        WHEN a.credit_score_at_apply >= 670 THEN 'near_prime'
        ELSE 'subprime'
    END AS tier,
    COUNT(*) AS applications,
    COUNT(*) FILTER (WHERE a.status = 'approved') AS approved,
    COUNT(f.funded_loan_id) AS funded
FROM loan_applications a
LEFT JOIN loan_offers o ON o.application_id = a.application_id
LEFT JOIN funded_loans f ON f.offer_id = o.offer_id
WHERE a.applied_at >= NOW() - INTERVAL '90 days'
GROUP BY tier
ORDER BY tier
```

---

### Trade-offs and alternatives

**Snapshot on application row**

credit_score and income frozen on the application.

* Risk backtests are deterministic
* Application rows are self-sufficient
* Slightly denormalized

**Type 2 customer dimension**

Current and historical customer attributes in a Type 2 dimension.

* Single canonical source for customer state
* Application queries need temporal joins
* Larger storage and more complex ETL

---

## Common follow-up questions

- How do you handle an applicant who applies three times in 90 days? _(Tests whether application_id uniqueness lives at attempt grain and whether dedup windows exist.)_
- A regulator asks for the exact credit score used at approval on a specific loan. Where does it come from? _(Tests whether the snapshot is on the application row or buried in a reconstructed dimension.)_
- How would you model a co-applicant without duplicating customer rows? _(Tests whether the candidate introduces an application_applicants bridge table.)_
- Funded principal sometimes differs from offered amount. Why, and where does that live? _(Tests whether the candidate preserves the offer/funding distinction.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/loan_application_reporting_schema)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.