How Much Will This Cost?
Concepts covered: paCostOptimization
Storage cost is the question that separates engineers who build pipelines from engineers who own pipelines. The interviewer wants to see that you think about money as a first-class engineering constraint. S3 Storage Tiers A common production setup: 500 GB/day ingestion in Parquet. That's ~15 TB/month raw. With 2-year retention, you're looking at 360 TB. At S3 Standard pricing, that's $8,280/month. Move data older than 90 days to IA and older than 1 year to Glacier Instant, and the same 360 TB costs ~$2,100/month - a 75% reduction with a lifecycle policy that takes 20 minutes to configure. Format Impact on Cost File format directly affects your storage bill. A 1 TB/day JSON pipeline costs $23/GB × 30 TB = $690/month in S3 Standard. Convert to Parquet with Snappy compression (typical 5:1 r
About This Interactive Section
This section is part of the The Storage Question: Intermediate lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.
How DataDriven Lessons Work
DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.