DataDriven
LearnPracticeInterviewDiscussDailyJobs

A Shared Drive Full of Contracts

A medium Pipeline Design interview practice problem on DataDriven. Write and execute real pipeline design code with instant grading.

Domain
Pipeline Design
Difficulty
medium
Seniority
L5

Problem

Our legal team receives thousands of contract documents every month in PDF and scanned image format. They need to search across all of them and extract key terms like party names, dates, and obligations. Right now every document lives in a shared drive and search is impossible. Design a pipeline to ingest these documents and make the content queryable.

Summary

Buried in PDFs. The data is in there somewhere.

Practice This Problem

Solve this Pipeline Design problem with real code execution. DataDriven runs your solution and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • System Design Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons