Loading interview...

HIPAA-Compliant PHI De-identification Pipeline for Development

A hard Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.

Domain
Pipeline Design
Difficulty
hard
Seniority
staff

Interview Prompt

Our data engineering team works with protected health information in production and we need a way to provide engineers with realistic data in development and testing environments without exposing real patient records. Design a pipeline that ingests from production, de-identifies PHI, and delivers a statistically representative synthetic dataset to the dev environment.

How This Interview Works

  1. Read the vague prompt (just like a real interview)
  2. Ask clarifying questions to the AI interviewer
  3. Write your pipeline design solution with real code execution
  4. Get instant feedback and a hire/no-hire decision