Schema Types
The FLOAT Money Bug
Pick integer or decimal for each column
| Expression | FLOAT Result | DECIMAL Result |
|---|---|---|
| 0.1 + 0.2 | 0.30000000000000004 | 0.30 |
| SUM of $0.10 across 1M rows | $100,000.000000014 | $100,000.00 |
| Revenue discrepancy at scale | Pennies to thousands | Zero |
The $0.01 rounding error does not matter on one row. On 50 million payment transactions per month, it compounds to thousands of dollars that cannot be reconciled with the bank statement.
When FLOAT Is Actually Fine
String Types & Platform Traps
Size text columns for real-world data
- Variable length, up to n characters
- Most common string type
- Length is enforced: inserts beyond n fail (Postgres) or truncate (MySQL)
- Storage is proportional to actual content, not declared length
- Fixed length, always n characters
- Pads with spaces to fill the declared length
- Comparisons must account for trailing spaces
- Rarely useful in modern schemas; prefer VARCHAR
Platform Differences That Bite
| Platform | VARCHAR Max | TEXT Type | Gotcha |
|---|---|---|---|
| PostgreSQL | 1 GB | TEXT = VARCHAR without limit | No performance difference between VARCHAR(255) and TEXT |
| MySQL | 65,535 bytes (row total) | TEXT is stored off-row | VARCHAR(255) is significantly faster than TEXT for indexes |
| Snowflake | 16 MB | VARCHAR = STRING = TEXT | All are aliases; no length enforcement |
| BigQuery | No limit | STRING only | No VARCHAR at all; everything is STRING |
UTF-8 and Character vs Byte Length
Temporal Types & DST
Model dates, times, and durations correctly
The DST Trap
- No timezone stored
- 2:30 AM on DST day is ambiguous or impossible
- Cannot convert between timezones reliably
- Looks correct until it is not
- Stored as UTC internally
- 2:30 AM ET is unambiguously stored as 7:30 AM UTC
- Converts correctly to any timezone
- Always use this for event timestamps
Platform Behavior
| Platform | TIMESTAMP | TIMESTAMPTZ | Recommendation |
|---|---|---|---|
| PostgreSQL | Stores as-is, no conversion | Converts input to UTC on write, converts to session TZ on read | Always use TIMESTAMPTZ |
| Snowflake | Stores as-is (TIMESTAMP_NTZ) | Stores with TZ (TIMESTAMP_TZ or _LTZ) | Use TIMESTAMP_TZ for event data |
| BigQuery | DATETIME (no TZ) | TIMESTAMP (always UTC) | Use TIMESTAMP for events, DATETIME for calendar dates |
| MySQL | Converts to UTC on write | Not supported (TIMESTAMP is implicitly TZ-aware) | MySQL TIMESTAMP is already TZ-aware but limited to 2038 |
MySQL's TIMESTAMP type overflows on January 19, 2038 (the Unix epoch limit). If you are building anything that stores dates beyond 2038, use DATETIME in MySQL. In every other database, use the timezone-aware type.
ENUM Traps
Use booleans, enums, and UUIDs effectively
The ALTER TABLE Problem
- Values baked into schema
- Adding values requires ALTER TABLE
- Removing/renaming values is painful
- Good for truly fixed sets (day_of_week, boolean)
- Values stored as data, not schema
- Adding values is a simple INSERT
- Removing values is a DELETE or soft-delete
- Scales to any number of values
The one exception where ENUMs are fine: truly immutable value sets like day_of_week or boolean_flag. If the value set has not changed in 20 years and will not change in the next 20, an ENUM is cleaner than a lookup table.
Type Review Framework
Enforce data quality through type choices
The Checklist
Common Type Smells
| Smell | Problem | Fix |
|---|---|---|
| FLOAT for prices | Rounding errors compound at scale | DECIMAL(10,2) or BIGINT cents |
| VARCHAR(255) everywhere | Arbitrary limit, often meaningless | Use platform-appropriate defaults; VARCHAR(255) in Postgres is fine, in MySQL it affects index size |
| TIMESTAMP without TZ | Ambiguous timezone; DST bugs | TIMESTAMPTZ stored as UTC |
| ENUM for growing value sets | ALTER TABLE required to add values | Lookup table with FK |
| Nullable BOOLEAN | Three-state logic: true/false/NULL | NOT NULL DEFAULT false, or use an ENUM with explicit 'unknown' |
The Operational Cost of Getting It Wrong
- Most type changes require full table rewrite. VARCHAR widening is metadata-only. INT to BIGINT is a rewrite.
- Most type changes are instant metadata operations. Exceptions: changing between numeric families (INT to VARCHAR).
- Type changes are not supported in-place. You must create a new column, backfill, drop the old one, rename.
- ALTER TABLE rewrites the table for almost all type changes. Use pt-online-schema-change for zero-downtime migrations.
> You are designing a payments table for a fintech product. The table stores transaction amounts, timestamps, and payment statuses.
Schema Types
Choosing the right box for every value
- Category
- Data Modeling
- Duration
- 22 minutes
- Challenges
- 12 hands-on challenges
Topics covered: The FLOAT Money Bug, String Types & Platform Traps, Temporal Types & DST, ENUM Traps, Type Review Framework
Lesson Sections
- The FLOAT Money Bug
FLOAT and DOUBLE store numbers in binary scientific notation. They cannot represent 0.1 exactly, just like 1/3 cannot be represented exactly in decimal. This is not a bug. It is how IEEE 754 floating-point works. The problem is that financial calculations accumulate these tiny errors across millions of rows until someone notices that the monthly revenue report is off by $4,700. The $0.01 rounding error does not matter on one row. On 50 million payment transactions per month, it compounds to thou
- String Types & Platform Traps
String types seem simple, but they differ across platforms in ways that cause real production issues. VARCHAR length limits, CHAR padding behavior, and TEXT vs VARCHAR semantics vary between PostgreSQL, MySQL, Snowflake, and BigQuery. Code that works on one platform can silently truncate data on another. Platform Differences That Bite UTF-8 and Character vs Byte Length VARCHAR(100) means 100 characters in PostgreSQL but 100 bytes in some MySQL configurations. A single emoji or CJK character can
- Temporal Types & DST (concepts: dmDataTypes)
Timestamps are the most dangerous data type in production systems because they look correct until daylight saving time happens, or until your pipeline runs in a different timezone than the system that generated the data. The most common bug: a pipeline running in UTC processes events timestamped in local time and shifts an entire day of data into the wrong partition. The DST Trap On the second Sunday of March in the US, clocks spring forward from 2:00 AM to 3:00 AM. The time 2:30 AM does not exi
- ENUM Traps
ENUM types look like a clean way to constrain a column to a fixed set of values: status can only be 'active', 'inactive', or 'suspended'. The problem is that the set of values is baked into the schema. Adding a new status requires an ALTER TABLE, which in some databases locks the table or requires a migration. In production, the set of valid values almost always grows. The ALTER TABLE Problem In PostgreSQL, adding a value to an ENUM type is fast (ALTER TYPE ... ADD VALUE). But removing or renami
- Type Review Framework
When reviewing a schema, apply the same checklist to every column. This catches type bugs before they reach production. The cost of fixing a type in design is a one-line change. The cost of fixing a type in production is a migration, a backfill, and a data reconciliation. The Checklist Common Type Smells The Operational Cost of Getting It Wrong Type changes on large tables are expensive. Changing a column from INT to BIGINT on a 1-billion-row table in PostgreSQL requires a full table rewrite. In