# An IoT pipeline writes hourly batch output as GZIP-compressed CSV files in a partitioned curated zon

Canonical URL: <https://datadriven.io/problems/an-iot-pipeline-writes-hourly-batch-output-as-gzip-compresse-92757732>

Domain: Pipeline Design · Difficulty: medium

## Problem

An IoT pipeline writes hourly batch output as GZIP-compressed CSV files in a partitioned curated zone. Each 5GB file is not splittable, so Spark collapses to one task per partition and the dashboard takes 40 minutes. Apply the section's compression-and-splittability framing and replace the curated GZIP-CSV format with the splittable columnar format the section names as the de facto choice.

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/an-iot-pipeline-writes-hourly-batch-output-as-gzip-compresse-92757732)
- [System Design Interview Questions](https://datadriven.io/data-engineering-system-design)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.