Loading lesson...
Choosing the right box for every value
Choosing the right box for every value
Topics covered: The FLOAT Money Bug, String Types & Platform Traps, Temporal Types & DST, ENUM Traps, Type Review Framework
FLOAT and DOUBLE store numbers in binary scientific notation. They cannot represent 0.1 exactly, just like 1/3 cannot be represented exactly in decimal. This is not a bug. It is how IEEE 754 floating-point works. The problem is that financial calculations accumulate these tiny errors across millions of rows until someone notices that the monthly revenue report is off by $4,700. The $0.01 rounding error does not matter on one row. On 50 million payment transactions per month, it compounds to thou
String types seem simple, but they differ across platforms in ways that cause real production issues. VARCHAR length limits, CHAR padding behavior, and TEXT vs VARCHAR semantics vary between PostgreSQL, MySQL, Snowflake, and BigQuery. Code that works on one platform can silently truncate data on another. Platform Differences That Bite UTF-8 and Character vs Byte Length VARCHAR(100) means 100 characters in PostgreSQL but 100 bytes in some MySQL configurations. A single emoji or CJK character can
Timestamps are the most dangerous data type in production systems because they look correct until daylight saving time happens, or until your pipeline runs in a different timezone than the system that generated the data. The most common bug: a pipeline running in UTC processes events timestamped in local time and shifts an entire day of data into the wrong partition. The DST Trap On the second Sunday of March in the US, clocks spring forward from 2:00 AM to 3:00 AM. The time 2:30 AM does not exi
ENUM types look like a clean way to constrain a column to a fixed set of values: status can only be 'active', 'inactive', or 'suspended'. The problem is that the set of values is baked into the schema. Adding a new status requires an ALTER TABLE, which in some databases locks the table or requires a migration. In production, the set of valid values almost always grows. The ALTER TABLE Problem In PostgreSQL, adding a value to an ENUM type is fast (ALTER TYPE ... ADD VALUE). But removing or renami
When reviewing a schema, apply the same checklist to every column. This catches type bugs before they reach production. The cost of fixing a type in design is a one-line change. The cost of fixing a type in production is a migration, a backfill, and a data reconciliation. The Checklist Common Type Smells The Operational Cost of Getting It Wrong Type changes on large tables are expensive. Changing a column from INT to BIGINT on a 1-billion-row table in PostgreSQL requires a full table rewrite. In