Real Data, Fake Patients
A hard Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.
- Domain
- Pipeline Design
- Difficulty
- hard
- Seniority
- L6
Problem
Our data engineering team works with protected health information in production and we need a way to provide engineers with realistic data in development and testing environments without exposing real patient records. Design a pipeline that ingests from production, de-identifies PHI, and delivers a statistically representative synthetic dataset to the dev environment.
Summary
Dev needs production data. HIPAA says absolutely not.
Practice This Problem
Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.