Phase Two, Logical Optimization: The Rule Rewrites

With a resolved plan in hand, Catalyst enters the phase most people mean when they say optimization: logical optimization, a battery of rule-based rewrites that transform the plan into an equivalent but cheaper one. These are the optimizations from the beginner tier, now placed in their proper home. Each rule is a small, provably-correct transformation, and Catalyst applies them repeatedly until the plan stops changing. The headline rules are the ones that move and shrink work. Predicate pushdown moves filters down toward the data source, so fewer rows flow through the expensive operations above. Projection pushdown, or column pruning, drops columns the query never uses, so only needed data is carried. Constant folding evaluates expressions that do not depend on the data once, at planning

About This Interactive Section

This section is part of the Inside Catalyst: The Four Phases lesson on DataDriven, a free data engineering interview prep platform. Each section includes explanations, worked examples, and hands-on code challenges that execute in real time. SQL queries run against a live PostgreSQL database. Python runs in a sandboxed Docker container. Data modeling problems validate against interactive schema canvases. All content is framed around what data engineering interviewers actually test at companies like Meta, Google, Amazon, Netflix, Stripe, and Databricks.

How DataDriven Lessons Work

DataDriven combines four interview rounds (SQL, Python, Data Modeling, Pipeline Architecture) with adaptive difficulty and spaced repetition. Easy problems get harder as you improve. Weak concepts resurface until you master them. Your readiness score tracks progress across every topic interviewers test. Every lesson section ends with problems you solve by writing and running real code, not by picking multiple-choice answers.