Staff Data Engineer Top Tech Architecture
MongoDBReal Data, Fake Patients
Our data engineering team works with protected health information in production and we need a way to provide engineers with realistic data in development and testing environments without exposing real patient records. Design a pipeline that ingests from production, de-identifies PHI, and delivers a statistically representative synthetic dataset to the dev environment.
Ask the interviewer clarifying questions to understand the requirements and constraints before designing.
When you're ready, click Ready to Design to start building.
Real Data, Fake Patients
A hard Pipeline Design mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.
- Domain
- Pipeline Design
- Difficulty
- hard
- Seniority
- staff
Interview Prompt
Our data engineering team works with protected health information in production and we need a way to provide engineers with realistic data in development and testing environments without exposing real patient records. Design a pipeline that ingests from production, de-identifies PHI, and delivers a statistically representative synthetic dataset to the dev environment.
How This Interview Works
- Read the vague prompt (just like a real interview)
- Ask clarifying questions to the AI interviewer
- Write your pipeline design solution with real code execution
- Get instant feedback and a hire/no-hire decision