Loading section...
How Much Will This Cost?
Concepts: paCompression
Storage cost is the question that separates engineers who build pipelines from engineers who own pipelines. The interviewer wants to see that you think about money as a first-class engineering constraint. S3 Storage Tiers A common production setup: 500 GB/day ingestion in Parquet. That's ~15 TB/month raw. With 2-year retention, you're looking at 360 TB. At S3 Standard pricing, that's $8,280/month. Move data older than 90 days to IA and older than 1 year to Glacier Instant, and the same 360 TB costs ~$2,100/month - a 75% reduction with a lifecycle policy that takes 20 minutes to configure. Format Impact on Cost File format directly affects your storage bill. A 1 TB/day JSON pipeline costs $23/GB × 30 TB = $690/month in S3 Standard. Convert to Parquet with Snappy compression (typical 5:1 r