Schema Types

Choosing the wrong data type for a column is one of the most expensive mistakes in schema design. The bug hides for months, then surfaces as a $0.01 rounding error on 50 million transactions, or a timezone shift that double-counts an entire day of revenue. This lesson walks through the types that cause the most real-world problems: floating-point money, string encoding traps, timestamp timezone behavior, and the ENUM pitfalls that make schema migrations painful.

The FLOAT Money Bug

Daily Life

Interviews

Pick integer or decimal for each column

FLOATDECIMAL/NUMERICBIGINT Cents

FLOAT

Approximate Numeric

Stores numbers using IEEE 754 binary representation. Fast math, but cannot represent 0.1 exactly. Fine for scientific data; catastrophic for money.

DECIMAL/NUMERIC

Exact Numeric

Stores numbers as exact decimal values. Slower math, but 0.10 + 0.20 = 0.30 always. Required for financial data.

BIGINT Cents

Integer Money

Store money as cents in a BIGINT column. $19.99 becomes 1999. Zero rounding risk. This is how Stripe and most payment systems work.

FLOAT and DOUBLE store numbers in binary scientific notation. They cannot represent 0.1 exactly, just like 1/3 cannot be represented exactly in decimal. This is not a bug. It is how IEEE 754 floating-point works. The problem is that financial calculations accumulate these tiny errors across millions of rows until someone notices that the monthly revenue report is off by $4,700.

Expression	FLOAT Result	DECIMAL Result
0.1 + 0.2	0.30000000000000004	0.30
SUM of $0.10 across 1M rows	$100,000.000000014	$100,000.00
Revenue discrepancy at scale	Pennies to thousands	Zero

The $0.01 rounding error does not matter on one row. On 50 million payment transactions per month, it compounds to thousands of dollars that cannot be reconciled with the bank statement.

When FLOAT Is Actually Fine

FLOAT is not universally bad. It is the wrong choice for money, but the right choice for sensor readings, GPS coordinates, scientific measurements, and ML feature values where approximate representation is acceptable and performance matters. FLOAT operations are hardware-accelerated; DECIMAL operations are not.

Never use FLOAT for money, prices, balances, tax calculations, or anything that touches financial reconciliation.

DECIMAL(10,2) for most financial columns: 10 digits total, 2 after the decimal point.

BIGINT cents for payment systems: $19.99 stored as 1999. Divide by 100 at the presentation layer.

FLOAT for sensor readings, GPS coordinates, ML features, and scientific data where exact precision is not required.

TIP

If you inherit a table that stores money as FLOAT, the migration to DECIMAL is straightforward but must be done carefully. CAST(float_col AS DECIMAL(12,2)) can introduce rounding on the cast itself. Compare the SUM before and after migration and document any discrepancy.

String Types & Platform Traps

Daily Life

Interviews

Size text columns for real-world data

String types seem simple, but they differ across platforms in ways that cause real production issues. VARCHAR length limits, CHAR padding behavior, and TEXT vs VARCHAR semantics vary between PostgreSQL, MySQL, Snowflake, and BigQuery. Code that works on one platform can silently truncate data on another.

✓VARCHAR(n)

Variable length, up to n characters
Most common string type
Length is enforced: inserts beyond n fail (Postgres) or truncate (MySQL)
Storage is proportional to actual content, not declared length

⚠CHAR(n)

Fixed length, always n characters
Pads with spaces to fill the declared length
Comparisons must account for trailing spaces
Rarely useful in modern schemas; prefer VARCHAR

Platform Differences That Bite

Platform	VARCHAR Max	TEXT Type	Gotcha
PostgreSQL	1 GB	TEXT = VARCHAR without limit	No performance difference between VARCHAR(255) and TEXT
MySQL	65,535 bytes (row total)	TEXT is stored off-row	VARCHAR(255) is significantly faster than TEXT for indexes
Snowflake	16 MB	VARCHAR = STRING = TEXT	All are aliases; no length enforcement
BigQuery	No limit	STRING only	No VARCHAR at all; everything is STRING

MySQL silently truncates VARCHAR inserts that exceed the declared length in non-strict mode. You lose data without an error.

PostgreSQL treats VARCHAR(255) and TEXT identically for performance. Declaring a length limit is about data validation, not optimization.

In cloud warehouses (Snowflake, BigQuery), string length limits are effectively meaningless. Use them for documentation, not enforcement.

UTF-8 and Character vs Byte Length

VARCHAR(100) means 100 characters in PostgreSQL but 100 bytes in some MySQL configurations. A single emoji or CJK character can be 3-4 bytes in UTF-8. A VARCHAR(100) column in byte-mode MySQL can only hold 25 four-byte characters. Always confirm whether your database measures string length in characters or bytes.

TIP

When migrating between databases, check string length semantics. A VARCHAR(255) in PostgreSQL (character-counted) holds more data than VARCHAR(255) in MySQL with latin1 encoding (byte-counted with multi-byte characters). Test with real production data before cutting over.

Temporal Types & DST

Daily Life

Interviews

Model dates, times, and durations correctly

TIMESTAMPTIMESTAMPTZDATE

TIMESTAMP

No Timezone (Naive)

Stores a date and time with no timezone information. '2024-03-10 02:30:00' could be UTC, EST, or PST. You do not know. This ambiguity causes bugs.

TIMESTAMPTZ

With Timezone (Aware)

Stores a date and time normalized to UTC internally. Displays in the session's timezone. '2024-03-10 02:30:00-05:00' is unambiguous. Always prefer this.

DATE

Date Only

Year, month, day. No time component. Beware: 'today' depends on the timezone of the system generating it. March 10 in UTC is still March 9 in PST.

Timestamps are the most dangerous data type in production systems because they look correct until daylight saving time happens, or until your pipeline runs in a different timezone than the system that generated the data. The most common bug: a pipeline running in UTC processes events timestamped in local time and shifts an entire day of data into the wrong partition.

The DST Trap

On the second Sunday of March in the US, clocks spring forward from 2:00 AM to 3:00 AM. The time 2:30 AM does not exist. On the first Sunday of November, clocks fall back from 2:00 AM to 1:00 AM. The time 1:30 AM occurs twice. If your timestamp column is TIMESTAMP (naive), and someone records an event at '2024-03-10 02:30:00' in Eastern time, that time literally did not happen. Your system now has impossible data.

✗TIMESTAMP (Naive)

No timezone stored
2:30 AM on DST day is ambiguous or impossible
Cannot convert between timezones reliably
Looks correct until it is not

✓TIMESTAMPTZ (Aware)

Stored as UTC internally
2:30 AM ET is unambiguously stored as 7:30 AM UTC
Converts correctly to any timezone
Always use this for event timestamps

Platform Behavior

Platform	TIMESTAMP	TIMESTAMPTZ	Recommendation
PostgreSQL	Stores as-is, no conversion	Converts input to UTC on write, converts to session TZ on read	Always use TIMESTAMPTZ
Snowflake	Stores as-is (TIMESTAMP_NTZ)	Stores with TZ (TIMESTAMP_TZ or _LTZ)	Use TIMESTAMP_TZ for event data
BigQuery	DATETIME (no TZ)	TIMESTAMP (always UTC)	Use TIMESTAMP for events, DATETIME for calendar dates
MySQL	Converts to UTC on write	Not supported (TIMESTAMP is implicitly TZ-aware)	MySQL TIMESTAMP is already TZ-aware but limited to 2038

MySQL's TIMESTAMP type overflows on January 19, 2038 (the Unix epoch limit). If you are building anything that stores dates beyond 2038, use DATETIME in MySQL. In every other database, use the timezone-aware type.

TIP

The simplest rule: store all timestamps in UTC. Convert to local time only at the presentation layer (dashboard, API response). This eliminates every DST ambiguity and makes cross-timezone comparisons safe.

ENUM Traps

Daily Life

Interviews

Use booleans, enums, and UUIDs effectively

ENUM types look like a clean way to constrain a column to a fixed set of values: status can only be 'active', 'inactive', or 'suspended'. The problem is that the set of values is baked into the schema. Adding a new status requires an ALTER TABLE, which in some databases locks the table or requires a migration. In production, the set of valid values almost always grows.

The ALTER TABLE Problem

In PostgreSQL, adding a value to an ENUM type is fast (ALTER TYPE ... ADD VALUE). But removing or renaming a value is not supported. You must create a new type, migrate the column, and drop the old type. In MySQL, ALTER TABLE on an ENUM column rewrites the entire table. On a billion-row table, that is an hours-long operation that locks writes.

⚠ENUM Column

Values baked into schema
Adding values requires ALTER TABLE
Removing/renaming values is painful
Good for truly fixed sets (day_of_week, boolean)

✓Lookup Table + FK

Values stored as data, not schema
Adding values is a simple INSERT
Removing values is a DELETE or soft-delete
Scales to any number of values

order_status

PKstatus_id

status_name

is_active

orders

PKorder_id

customer_id

FKstatus_id

amount

A lookup table with a FK is the production-safe alternative to ENUMs. The set of valid values lives in a table row, not in the schema definition. Adding a new status is an INSERT, not an ALTER TABLE. This is the pattern used at virtually every company operating at scale.

The one exception where ENUMs are fine: truly immutable value sets like day_of_week or boolean_flag. If the value set has not changed in 20 years and will not change in the next 20, an ENUM is cleaner than a lookup table.

TIP

If you find yourself saying 'we might need to add a new value later,' do not use an ENUM. Use a lookup table. The cost of a lookup table is one extra JOIN. The cost of an ENUM migration on a large table is downtime.

Type Review Framework

Daily Life

Interviews

Enforce data quality through type choices

When reviewing a schema, apply the same checklist to every column. This catches type bugs before they reach production. The cost of fixing a type in design is a one-line change. The cost of fixing a type in production is a migration, a backfill, and a data reconciliation.

The Checklist

Numeric columns that touch money: is it DECIMAL or BIGINT cents? If FLOAT, flag it.

String columns: is there a length limit? Does the platform enforce it or silently truncate?

Timestamp columns: is it TIMESTAMPTZ (timezone-aware) or TIMESTAMP (naive)? If naive, how do you know the timezone?

ENUM columns: will the value set ever change? If yes, use a lookup table instead.

Boolean columns: are NULLs allowed? A nullable boolean has three states (true, false, unknown), which is almost never what you want.

Common Type Smells

Smell	Problem	Fix
FLOAT for prices	Rounding errors compound at scale	DECIMAL(10,2) or BIGINT cents
VARCHAR(255) everywhere	Arbitrary limit, often meaningless	Use platform-appropriate defaults; VARCHAR(255) in Postgres is fine, in MySQL it affects index size
TIMESTAMP without TZ	Ambiguous timezone; DST bugs	TIMESTAMPTZ stored as UTC
ENUM for growing value sets	ALTER TABLE required to add values	Lookup table with FK
Nullable BOOLEAN	Three-state logic: true/false/NULL	NOT NULL DEFAULT false, or use an ENUM with explicit 'unknown'

The Operational Cost of Getting It Wrong

Type changes on large tables are expensive. Changing a column from INT to BIGINT on a 1-billion-row table in PostgreSQL requires a full table rewrite. In Snowflake, type changes are metadata-only for some types (VARCHAR widening) but require a rewrite for others (INT to BIGINT). Always check your platform's ALTER TABLE behavior before declaring a type in production.

Most type changes require full table rewrite. VARCHAR widening is metadata-only. INT to BIGINT is a rewrite.
Most type changes are instant metadata operations. Exceptions: changing between numeric families (INT to VARCHAR).
Type changes are not supported in-place. You must create a new column, backfill, drop the old one, rename.
ALTER TABLE rewrites the table for almost all type changes. Use pt-online-schema-change for zero-downtime migrations.

TIP

Get the type right the first time. Review every column in your schema against this checklist before deploying to production. The 5 minutes spent reviewing saves weeks of migration work later.

❯❯❯PUTTING IT ALL TOGETHER

> You are designing a payments table for a fintech product. The table stores transaction amounts, timestamps, and payment statuses.

You store amount as BIGINT in cents (not FLOAT) because financial reconciliation requires exact arithmetic across millions of transactions.

You use TIMESTAMPTZ for created_at because the payment gateway operates across US timezones and DST transitions must not shift transactions between days.

You create a payment_status lookup table instead of an ENUM because the product team plans to add 'refund_pending' and 'chargeback' statuses next quarter.

KEY TAKEAWAYS

Never FLOAT for money: use DECIMAL(10,2) or BIGINT cents to avoid rounding errors

VARCHAR behavior varies by platform: MySQL truncates silently, PostgreSQL enforces, Snowflake ignores limits

Always TIMESTAMPTZ in UTC: naive timestamps cause DST bugs and cross-timezone comparison failures

ENUM for immutable sets only: use lookup tables for any value set that might grow

Review types before production: fixing a type in design is one line; fixing it in production is a migration

Choosing the right box for every value

Category: Data Modeling
Duration: 22 minutes
Challenges: 12 hands-on challenges

Topics covered: The FLOAT Money Bug, String Types & Platform Traps, Temporal Types & DST, ENUM Traps, Type Review Framework

Lesson Sections

The FLOAT Money Bug (concepts: dmDataTypes)
FLOAT and DOUBLE store numbers in binary scientific notation. They cannot represent 0.1 exactly, just like 1/3 cannot be represented exactly in decimal. This is not a bug. It is how IEEE 754 floating-point works. The problem is that financial calculations accumulate these tiny errors across millions of rows until someone notices that the monthly revenue report is off by $4,700. The $0.01 rounding error does not matter on one row. On 50 million payment transactions per month, it compounds to thou
String Types & Platform Traps (concepts: dmDataTypes)
String types seem simple, but they differ across platforms in ways that cause real production issues. VARCHAR length limits, CHAR padding behavior, and TEXT vs VARCHAR semantics vary between PostgreSQL, MySQL, Snowflake, and BigQuery. Code that works on one platform can silently truncate data on another. Platform Differences That Bite UTF-8 and Character vs Byte Length VARCHAR(100) means 100 characters in PostgreSQL but 100 bytes in some MySQL configurations. A single emoji or CJK character can
Temporal Types & DST (concepts: dmDataTypes)
Timestamps are the most dangerous data type in production systems because they look correct until daylight saving time happens, or until your pipeline runs in a different timezone than the system that generated the data. The most common bug: a pipeline running in UTC processes events timestamped in local time and shifts an entire day of data into the wrong partition. The DST Trap On the second Sunday of March in the US, clocks spring forward from 2:00 AM to 3:00 AM. The time 2:30 AM does not exi
ENUM Traps (concepts: dmDataTypes)
ENUM types look like a clean way to constrain a column to a fixed set of values: status can only be 'active', 'inactive', or 'suspended'. The problem is that the set of values is baked into the schema. Adding a new status requires an ALTER TABLE, which in some databases locks the table or requires a migration. In production, the set of valid values almost always grows. The ALTER TABLE Problem In PostgreSQL, adding a value to an ENUM type is fast (ALTER TYPE ... ADD VALUE). But removing or renami
Type Review Framework (concepts: dmDataTypes)
When reviewing a schema, apply the same checklist to every column. This catches type bugs before they reach production. The cost of fixing a type in design is a one-line change. The cost of fixing a type in production is a migration, a backfill, and a data reconciliation. The Checklist Common Type Smells The Operational Cost of Getting It Wrong Type changes on large tables are expensive. Changing a column from INT to BIGINT on a 1-billion-row table in PostgreSQL requires a full table rewrite. In