Loading section...

Complexity at Scale: Distributed Systems

Everything changes when your data lives on multiple machines. The Big O analysis you learned so far assumes that accessing any piece of data takes the same amount of time. On a single computer, that is roughly true. But in a distributed system like Spark, Snowflake, or BigQuery, some data is local (on the same machine) and some is remote (on a different machine across the network). Accessing remote data can be 1,000 to 1,000,000 times slower than accessing local data. This single fact reshapes how you think about complexity. Network I/O: The New Bottleneck On a single machine, the CPU is usually the bottleneck. An O(n²) algorithm is slow because it does too many computations. In a distributed system, the network is almost always the bottleneck. An algorithm that touches every machine once,