# A marketing team computes revenue across hundreds of millions of rows by scanning raw Parquet in S3

Canonical URL: <https://datadriven.io/problems/a-marketing-team-computes-revenue-across-hundreds-of-million-48dcbd6c>

Domain: Pipeline Design · Difficulty: medium

## Problem

A marketing team computes revenue across hundreds of millions of rows by scanning raw Parquet in S3 with Pandas; each query takes 18 minutes and there is no schema enforcement. Apply the section's data-warehouse framing and add the analytical layer between the lake and the dashboard, replacing the Pandas transform with a warehouse-native one so the columnar layout and separated compute give the speedup the section names.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/a-marketing-team-computes-revenue-across-hundreds-of-million-48dcbd6c)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.