DataDriven
LearnPracticeInterviewDiscussDaily
HelpContactPrivacyTermsSecurityiOS App

© 2026 DataDriven

Loading lesson...

  1. Home
  2. Learn
  3. Heap & Top-K: Advanced

Heap & Top-K: Advanced

Staff-level heap mastery: streaming, lazy deletion, distributed priority.

Staff-level heap mastery: streaming, lazy deletion, distributed priority.

Category
Python
Difficulty
advanced
Duration
45 minutes
Challenges
0 hands-on challenges

Topics covered: Two-Heap Running Median: Full Implementation, Lazy Deletion: Removing Arbitrary Elements from a Heap, External Sort and K-Way Merge: Design at Scale, Priority Queues in Distributed Systems, Design a Real-Time Leaderboard System

Lesson Sections

  1. Two-Heap Running Median: Full Implementation (concepts: pyStreamingMedian, pyLazyDeletion, pyTwoHeapFull)

    You saw the basic two-heap MedianFinder in the intermediate lesson. The advanced version adds two things: a proof of why this is O(n log n) and why it cannot be beaten for streaming, and the full sliding window implementation with lazy deletion. This is the complete picture. If you can implement this correctly from scratch in an interview, you are demonstrating algorithm design skill that puts you firmly in the staff-level candidate pool. Why O(n log n) is the Lower Bound for Streaming Median Fo

  2. Lazy Deletion: Removing Arbitrary Elements from a Heap (concepts: pyLazyHeap, pyReprioritize, pyCancellableQueue)

    This is one of the most underrated heap techniques in competitive programming and production systems. heapq gives you no way to efficiently remove an arbitrary element. heapq.remove() exists but it is O(n) — it does a linear scan. For a priority queue where elements can be cancelled or reprioritized, O(n) deletion makes the whole data structure O(n²) in the worst case. The lazy deletion trick brings it back to O(log n) amortized. It is used in cache eviction, scheduler cancellation, and priority

  3. External Sort and K-Way Merge: Design at Scale (concepts: pyExternalSort, pyKWayMergeDesign, pyMergePass)

    External sort is one of the most important algorithms in data engineering that almost no one can fully explain on demand. It is how ORDER BY works when the sort spills to disk in Postgres. It is how Spark shuffle merge works. It is how Hadoop reduces sorted partitions. The core is a two-phase algorithm: sort phase (produce sorted chunks) and merge phase (K-way merge all chunks). The heap is the central data structure of the merge phase. At the staff level, you are expected to reason about chunk

  4. Priority Queues in Distributed Systems (concepts: pyKafkaPriority, pyFlinkTimers, pyCAPTheorem)

    In a single process, a heap is the priority queue. In a distributed system, maintaining a global priority order across multiple machines is a fundamentally different problem with CAP theorem implications. Kafka, Flink, and other distributed systems have specific, principled answers to how they handle priority ordering. Knowing these answers, and being able to articulate the CAP tradeoffs, is what separates a senior DE who 'has used Kafka' from a staff-level candidate who understands what Kafka c

  5. Design a Real-Time Leaderboard System (concepts: pyLeaderboard, pyRedisSortedSet, pySegmentTree)

    The real-time leaderboard is the canonical system design problem that tests heap knowledge at the staff level. Top-K users by score with frequent score updates. Millions of users. Millisecond read latency. You need to support: update score (frequent), get top-K users (read-heavy), get a user's rank (occasional). This question is deliberately open-ended because the right answer depends on scale and consistency requirements. Three main options: in-memory heap, Redis sorted set, segment tree. Each

Related

  • All Lessons
  • Practice Problems
  • Mock Interview Practice
  • Daily Challenges