Loading...

HIPAA-Compliant PHI De-identification Pipeline for Development

A hard Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
hard
Seniority
staff

Problem

Our data engineering team works with protected health information in production and we need a way to provide engineers with realistic data in development and testing environments without exposing real patient records. Design a pipeline that ingests from production, de-identifies PHI, and delivers a statistically representative synthetic dataset to the dev environment.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it instantly.