Schema Types

Choosing the wrong data type for a column is one of the most expensive mistakes in schema design. The bug hides for months, then surfaces as a $0.01 rounding error on 50 million transactions, or a timezone shift that double-counts an entire day of revenue. This lesson walks through the types that cause the most real-world problems: floating-point money, string encoding traps, timestamp timezone behavior, and the ENUM pitfalls that make schema migrations painful.

The FLOAT Money Bug

Daily Life
Interviews

Pick integer or decimal for each column

FLOATDECIMAL/NUMERICBIGINT Cents
FLOAT
Approximate Numeric
Stores numbers using IEEE 754 binary representation. Fast math, but cannot represent 0.1 exactly. Fine for scientific data; catastrophic for money.
DECIMAL/NUMERIC
Exact Numeric
Stores numbers as exact decimal values. Slower math, but 0.10 + 0.20 = 0.30 always. Required for financial data.
BIGINT Cents
Integer Money
Store money as cents in a BIGINT column. $19.99 becomes 1999. Zero rounding risk. This is how Stripe and most payment systems work.
FLOAT and DOUBLE store numbers in binary scientific notation. They cannot represent 0.1 exactly, just like 1/3 cannot be represented exactly in decimal. This is not a bug. It is how IEEE 754 floating-point works. The problem is that financial calculations accumulate these tiny errors across millions of rows until someone notices that the monthly revenue report is off by $4,700.
ExpressionFLOAT ResultDECIMAL Result
0.1 + 0.20.300000000000000040.30
SUM of $0.10 across 1M rows$100,000.000000014$100,000.00
Revenue discrepancy at scalePennies to thousandsZero

The $0.01 rounding error does not matter on one row. On 50 million payment transactions per month, it compounds to thousands of dollars that cannot be reconciled with the bank statement.

When FLOAT Is Actually Fine

FLOAT is not universally bad. It is the wrong choice for money, but the right choice for sensor readings, GPS coordinates, scientific measurements, and ML feature values where approximate representation is acceptable and performance matters. FLOAT operations are hardware-accelerated; DECIMAL operations are not.
Never use FLOAT for money, prices, balances, tax calculations, or anything that touches financial reconciliation.
DECIMAL(10,2) for most financial columns: 10 digits total, 2 after the decimal point.
BIGINT cents for payment systems: $19.99 stored as 1999. Divide by 100 at the presentation layer.
FLOAT for sensor readings, GPS coordinates, ML features, and scientific data where exact precision is not required.
TIP
If you inherit a table that stores money as FLOAT, the migration to DECIMAL is straightforward but must be done carefully. CAST(float_col AS DECIMAL(12,2)) can introduce rounding on the cast itself. Compare the SUM before and after migration and document any discrepancy.

String Types & Platform Traps

Daily Life
Interviews

Size text columns for real-world data

String types seem simple, but they differ across platforms in ways that cause real production issues. VARCHAR length limits, CHAR padding behavior, and TEXT vs VARCHAR semantics vary between PostgreSQL, MySQL, Snowflake, and BigQuery. Code that works on one platform can silently truncate data on another.
VARCHAR(n)
  • Variable length, up to n characters
  • Most common string type
  • Length is enforced: inserts beyond n fail (Postgres) or truncate (MySQL)
  • Storage is proportional to actual content, not declared length
CHAR(n)
  • Fixed length, always n characters
  • Pads with spaces to fill the declared length
  • Comparisons must account for trailing spaces
  • Rarely useful in modern schemas; prefer VARCHAR

Platform Differences That Bite

PlatformVARCHAR MaxTEXT TypeGotcha
PostgreSQL1 GBTEXT = VARCHAR without limitNo performance difference between VARCHAR(255) and TEXT
MySQL65,535 bytes (row total)TEXT is stored off-rowVARCHAR(255) is significantly faster than TEXT for indexes
Snowflake16 MBVARCHAR = STRING = TEXTAll are aliases; no length enforcement
BigQueryNo limitSTRING onlyNo VARCHAR at all; everything is STRING
MySQL silently truncates VARCHAR inserts that exceed the declared length in non-strict mode. You lose data without an error.
PostgreSQL treats VARCHAR(255) and TEXT identically for performance. Declaring a length limit is about data validation, not optimization.
In cloud warehouses (Snowflake, BigQuery), string length limits are effectively meaningless. Use them for documentation, not enforcement.

UTF-8 and Character vs Byte Length

VARCHAR(100) means 100 characters in PostgreSQL but 100 bytes in some MySQL configurations. A single emoji or CJK character can be 3-4 bytes in UTF-8. A VARCHAR(100) column in byte-mode MySQL can only hold 25 four-byte characters. Always confirm whether your database measures string length in characters or bytes.
TIP
When migrating between databases, check string length semantics. A VARCHAR(255) in PostgreSQL (character-counted) holds more data than VARCHAR(255) in MySQL with latin1 encoding (byte-counted with multi-byte characters). Test with real production data before cutting over.

Temporal Types & DST

Daily Life
Interviews

Model dates, times, and durations correctly

TIMESTAMPTIMESTAMPTZDATE
TIMESTAMP
No Timezone (Naive)
Stores a date and time with no timezone information. '2024-03-10 02:30:00' could be UTC, EST, or PST. You do not know. This ambiguity causes bugs.
TIMESTAMPTZ
With Timezone (Aware)
Stores a date and time normalized to UTC internally. Displays in the session's timezone. '2024-03-10 02:30:00-05:00' is unambiguous. Always prefer this.
DATE
Date Only
Year, month, day. No time component. Beware: 'today' depends on the timezone of the system generating it. March 10 in UTC is still March 9 in PST.
Timestamps are the most dangerous data type in production systems because they look correct until daylight saving time happens, or until your pipeline runs in a different timezone than the system that generated the data. The most common bug: a pipeline running in UTC processes events timestamped in local time and shifts an entire day of data into the wrong partition.

The DST Trap

On the second Sunday of March in the US, clocks spring forward from 2:00 AM to 3:00 AM. The time 2:30 AM does not exist. On the first Sunday of November, clocks fall back from 2:00 AM to 1:00 AM. The time 1:30 AM occurs twice. If your timestamp column is TIMESTAMP (naive), and someone records an event at '2024-03-10 02:30:00' in Eastern time, that time literally did not happen. Your system now has impossible data.
TIMESTAMP (Naive)
  • No timezone stored
  • 2:30 AM on DST day is ambiguous or impossible
  • Cannot convert between timezones reliably
  • Looks correct until it is not
TIMESTAMPTZ (Aware)
  • Stored as UTC internally
  • 2:30 AM ET is unambiguously stored as 7:30 AM UTC
  • Converts correctly to any timezone
  • Always use this for event timestamps

Platform Behavior

PlatformTIMESTAMPTIMESTAMPTZRecommendation
PostgreSQLStores as-is, no conversionConverts input to UTC on write, converts to session TZ on readAlways use TIMESTAMPTZ
SnowflakeStores as-is (TIMESTAMP_NTZ)Stores with TZ (TIMESTAMP_TZ or _LTZ)Use TIMESTAMP_TZ for event data
BigQueryDATETIME (no TZ)TIMESTAMP (always UTC)Use TIMESTAMP for events, DATETIME for calendar dates
MySQLConverts to UTC on writeNot supported (TIMESTAMP is implicitly TZ-aware)MySQL TIMESTAMP is already TZ-aware but limited to 2038

MySQL's TIMESTAMP type overflows on January 19, 2038 (the Unix epoch limit). If you are building anything that stores dates beyond 2038, use DATETIME in MySQL. In every other database, use the timezone-aware type.

TIP
The simplest rule: store all timestamps in UTC. Convert to local time only at the presentation layer (dashboard, API response). This eliminates every DST ambiguity and makes cross-timezone comparisons safe.

ENUM Traps

Daily Life
Interviews

Use booleans, enums, and UUIDs effectively

ENUM types look like a clean way to constrain a column to a fixed set of values: status can only be 'active', 'inactive', or 'suspended'. The problem is that the set of values is baked into the schema. Adding a new status requires an ALTER TABLE, which in some databases locks the table or requires a migration. In production, the set of valid values almost always grows.

The ALTER TABLE Problem

In PostgreSQL, adding a value to an ENUM type is fast (ALTER TYPE ... ADD VALUE). But removing or renaming a value is not supported. You must create a new type, migrate the column, and drop the old type. In MySQL, ALTER TABLE on an ENUM column rewrites the entire table. On a billion-row table, that is an hours-long operation that locks writes.
ENUM Column
  • Values baked into schema
  • Adding values requires ALTER TABLE
  • Removing/renaming values is painful
  • Good for truly fixed sets (day_of_week, boolean)
Lookup Table + FK
  • Values stored as data, not schema
  • Adding values is a simple INSERT
  • Removing values is a DELETE or soft-delete
  • Scales to any number of values
order_statusPKstatus_idstatus_nameis_activeordersPKorder_idcustomer_idFKstatus_idamount
A lookup table with a FK is the production-safe alternative to ENUMs. The set of valid values lives in a table row, not in the schema definition. Adding a new status is an INSERT, not an ALTER TABLE. This is the pattern used at virtually every company operating at scale.

The one exception where ENUMs are fine: truly immutable value sets like day_of_week or boolean_flag. If the value set has not changed in 20 years and will not change in the next 20, an ENUM is cleaner than a lookup table.

TIP
If you find yourself saying 'we might need to add a new value later,' do not use an ENUM. Use a lookup table. The cost of a lookup table is one extra JOIN. The cost of an ENUM migration on a large table is downtime.

Type Review Framework

Daily Life
Interviews

Enforce data quality through type choices

When reviewing a schema, apply the same checklist to every column. This catches type bugs before they reach production. The cost of fixing a type in design is a one-line change. The cost of fixing a type in production is a migration, a backfill, and a data reconciliation.

The Checklist

Numeric columns that touch money: is it DECIMAL or BIGINT cents? If FLOAT, flag it.
String columns: is there a length limit? Does the platform enforce it or silently truncate?
Timestamp columns: is it TIMESTAMPTZ (timezone-aware) or TIMESTAMP (naive)? If naive, how do you know the timezone?
ENUM columns: will the value set ever change? If yes, use a lookup table instead.
Boolean columns: are NULLs allowed? A nullable boolean has three states (true, false, unknown), which is almost never what you want.

Common Type Smells

SmellProblemFix
FLOAT for pricesRounding errors compound at scaleDECIMAL(10,2) or BIGINT cents
VARCHAR(255) everywhereArbitrary limit, often meaninglessUse platform-appropriate defaults; VARCHAR(255) in Postgres is fine, in MySQL it affects index size
TIMESTAMP without TZAmbiguous timezone; DST bugsTIMESTAMPTZ stored as UTC
ENUM for growing value setsALTER TABLE required to add valuesLookup table with FK
Nullable BOOLEANThree-state logic: true/false/NULLNOT NULL DEFAULT false, or use an ENUM with explicit 'unknown'

The Operational Cost of Getting It Wrong

Type changes on large tables are expensive. Changing a column from INT to BIGINT on a 1-billion-row table in PostgreSQL requires a full table rewrite. In Snowflake, type changes are metadata-only for some types (VARCHAR widening) but require a rewrite for others (INT to BIGINT). Always check your platform's ALTER TABLE behavior before declaring a type in production.
  • Most type changes require full table rewrite. VARCHAR widening is metadata-only. INT to BIGINT is a rewrite.
  • Most type changes are instant metadata operations. Exceptions: changing between numeric families (INT to VARCHAR).
  • Type changes are not supported in-place. You must create a new column, backfill, drop the old one, rename.
  • ALTER TABLE rewrites the table for almost all type changes. Use pt-online-schema-change for zero-downtime migrations.
TIP
Get the type right the first time. Review every column in your schema against this checklist before deploying to production. The 5 minutes spent reviewing saves weeks of migration work later.
PUTTING IT ALL TOGETHER

> You are designing a payments table for a fintech product. The table stores transaction amounts, timestamps, and payment statuses.

You store amount as BIGINT in cents (not FLOAT) because financial reconciliation requires exact arithmetic across millions of transactions.
You use TIMESTAMPTZ for created_at because the payment gateway operates across US timezones and DST transitions must not shift transactions between days.
You create a payment_status lookup table instead of an ENUM because the product team plans to add 'refund_pending' and 'chargeback' statuses next quarter.
KEY TAKEAWAYS
Never FLOAT for money: use DECIMAL(10,2) or BIGINT cents to avoid rounding errors
VARCHAR behavior varies by platform: MySQL truncates silently, PostgreSQL enforces, Snowflake ignores limits
Always TIMESTAMPTZ in UTC: naive timestamps cause DST bugs and cross-timezone comparison failures
ENUM for immutable sets only: use lookup tables for any value set that might grow
Review types before production: fixing a type in design is one line; fixing it in production is a migration

Schema Types

Choosing the right box for every value

Category
Data Modeling
Duration
22 minutes
Challenges
12 hands-on challenges

Topics covered: The FLOAT Money Bug, String Types & Platform Traps, Temporal Types & DST, ENUM Traps, Type Review Framework

Lesson Sections

  1. The FLOAT Money Bug

    FLOAT and DOUBLE store numbers in binary scientific notation. They cannot represent 0.1 exactly, just like 1/3 cannot be represented exactly in decimal. This is not a bug. It is how IEEE 754 floating-point works. The problem is that financial calculations accumulate these tiny errors across millions of rows until someone notices that the monthly revenue report is off by $4,700. The $0.01 rounding error does not matter on one row. On 50 million payment transactions per month, it compounds to thou

  2. String Types & Platform Traps

    String types seem simple, but they differ across platforms in ways that cause real production issues. VARCHAR length limits, CHAR padding behavior, and TEXT vs VARCHAR semantics vary between PostgreSQL, MySQL, Snowflake, and BigQuery. Code that works on one platform can silently truncate data on another. Platform Differences That Bite UTF-8 and Character vs Byte Length VARCHAR(100) means 100 characters in PostgreSQL but 100 bytes in some MySQL configurations. A single emoji or CJK character can

  3. Temporal Types & DST (concepts: dmDataTypes)

    Timestamps are the most dangerous data type in production systems because they look correct until daylight saving time happens, or until your pipeline runs in a different timezone than the system that generated the data. The most common bug: a pipeline running in UTC processes events timestamped in local time and shifts an entire day of data into the wrong partition. The DST Trap On the second Sunday of March in the US, clocks spring forward from 2:00 AM to 3:00 AM. The time 2:30 AM does not exi

  4. ENUM Traps

    ENUM types look like a clean way to constrain a column to a fixed set of values: status can only be 'active', 'inactive', or 'suspended'. The problem is that the set of values is baked into the schema. Adding a new status requires an ALTER TABLE, which in some databases locks the table or requires a migration. In production, the set of valid values almost always grows. The ALTER TABLE Problem In PostgreSQL, adding a value to an ENUM type is fast (ALTER TYPE ... ADD VALUE). But removing or renami

  5. Type Review Framework

    When reviewing a schema, apply the same checklist to every column. This catches type bugs before they reach production. The cost of fixing a type in design is a one-line change. The cost of fixing a type in production is a migration, a backfill, and a data reconciliation. The Checklist Common Type Smells The Operational Cost of Getting It Wrong Type changes on large tables are expensive. Changing a column from INT to BIGINT on a 1-billion-row table in PostgreSQL requires a full table rewrite. In