Loading interview...
Kill the UDF
A hard Spark mock interview question on DataDriven. Practice with AI-powered feedback, real code execution, and a hire/no-hire decision.
- Domain
- Spark
- Difficulty
- hard
- Seniority
- senior
Interview Prompt
A PySpark pipeline scores 2 billion credit transactions per day for fraud using a Python UDF. The UDF computes a risk score from 12 columns: thresholds, lookups, weighted sums. All expressible with native Spark functions. The job takes 4 hours. Profiling shows 70% of the time in ArrowEvalPython (serializing data between JVM and Python). Rewrite the UDF as native Spark SQL expressions.
How This Interview Works
- Read the vague prompt (just like a real interview)
- Ask clarifying questions to the AI interviewer
- Write your spark solution with real code execution
- Get instant feedback and a hire/no-hire decision