Data Types: Intermediate

Databricks ingests raw CSV files from thousands of enterprise customers every day, and each file arrives with its own quirks: revenue columns stored as strings with dollar signs, timestamps in a dozen different formats, and boolean flags encoded as the letter Y. Their ETL pipelines use explicit type casting and validation at the point of ingestion to catch these issues before they flow downstream into financial models where a silent precision loss in a FLOAT column could corrupt a quarterly earnings calculation. The type conversion patterns you will learn in this lesson are the same ones production-grade pipelines use to ensure that every value arrives in the right type, with the right precision, from the very first row.

BOOLEAN and NULL logic

Daily Life
Interviews

Combine true/false with unknown values

In the beginner lesson, you learned that BOOLEAN stores TRUE or FALSE. But SQL booleans have a critical third state: NULL. This is called three-valued logic, and it breaks many assumptions that work in normal programming.

TRUE
Confirmed positive. Stored as 1 byte (0x01).
FALSE
Confirmed negative. Stored as 1 byte (0x00).
NULL
Unknown or undefined. Not true, not false.
//

Truth Table Surprises

NULL propagates through logical expressions in ways that surprise most developers. Any operation involving NULL produces NULL, not TRUE or FALSE:

Truth Table Surprises
  • TRUE AND NULLNULL. Could be TRUE AND TRUE or TRUE AND FALSE
  • FALSE AND NULLFALSE. FALSE AND anything is FALSE
  • TRUE OR NULLTRUE. TRUE OR anything is TRUE
  • FALSE OR NULLNULL. Could be FALSE OR TRUE or FALSE OR FALSE
  • NOT NULLNULL. NOT unknown is still unknown
  • NULL = NULLNULL. You cannot compare unknowns
1SELECT
2 TRUE
3 AND NULL AS and_unk,
4 FALSE
5 AND NULL AS and_f,
6 TRUE
7 OR NULL AS or_t,
8 FALSE
9 OR NULL AS or_unk,
10 NOT NULL AS not_unk,
11 NULL IS NULL AS is_null
12FROM users
13LIMIT 1
Result
and_unkand_for_tor_unknot_unkis_null
NULL01NULLNULL1
TIP
The most common NULL bug: WHERE status = NULL returns zero rows, because nothing equals NULL, not even NULL itself. Always use IS NULL or IS NOT NULL instead.

Boolean Filtering Patterns

Different filtering styles handle NULL values differently. Choose the pattern that matches your intent.

Shorter, but treats NULL the same as FALSE. Rows where is_active IS NULL are silently excluded.
1SELECT
2 user_id,
3 name
4FROM users
5WHERE is_active

Handling NULL correctly often pairs with type conversion. Try formatting a price column with exact decimal precision.

> Complete the query to format the price column as a fixed two-decimal number.

SELECT
  product_name,
  ___(price AS ___(10, 2)) AS formatted_price
FROM products
VARCHAR
TRY_CAST
DECIMAL
CAST

Three-valued logic also affects aggregations. COUNT(*) counts all rows including those with NULL values, but COUNT(column) skips NULL rows. This distinction changes results when NULL values are present.

The IS NULL and IS NOT NULL operators are the correct way to test for NULL. Using = NULL or != NULL always evaluates to NULL (unknown), causing those rows to be silently excluded from results.

Boolean filtering with WHERE is_active and WHERE is_active = TRUE behave identically for non-NULL values, but differ in clarity. Use whichever makes intent most obvious to the next reader.

DECIMAL precision and scale

Daily Life
Interviews

Store exact numbers for financial data

Floating-point numbers (FLOAT, DOUBLE) approximate values using binary fractions. This causes the famous 0.1 + 0.2 problem: the result is 0.30000000000000004, not 0.3. For financial calculations, use DECIMAL instead. It stores numbers as exact base-10 digits.

DECIMAL(p, s) Explained
  • DECIMAL(10,2): 10 total digits, 2 after the decimal point
  • Can store up to 99,999,999.99
  • Values are rounded to fit: 123.456 becomes 123.46
  • Scale cannot exceed precision
//

Common Configurations

Currency (USD)PercentageUnit priceScientific
Currency (USD)
DECIMAL(19,4)
Up to trillions, 4 d.p.
Percentage
DECIMAL(5,4)
0.0000 to 9.9999 range
Unit price
DECIMAL(10,2)
Up to 99,999,999.99
Scientific
DECIMAL(38,18)
Maximum precision avail.
1SELECT
2 CAST(123.456 AS DECIMAL(5, 2)) AS rounded,
3 CAST(0.0525 AS DECIMAL(5, 4)) AS pct,
4 CAST(10 AS DECIMAL(10, 4)) / 3 AS division,
5 CAST(99999.99 AS DECIMAL(7, 2)) AS max_price
Result
roundedpctdivisionmax_price
123.460.05253.333399999.99

Avoiding Numeric Pitfalls

Integer division and floating-point approximation cause subtle bugs in calculations.
Do
  • Cast at least one operand to DECIMAL before division: 10.0 / 4 = 2.5
  • Use explicit CAST(1 AS DECIMAL) / 3 to signal fractional intent
  • Check for integer division any time you see the / operator in queries
Don't
  • Divide two integers directly: 10 / 4 silently returns 2, not 2.5
  • Rely on implicit conversion for division results
  • Assume the database will warn you about truncated decimals
TIP
Use DECIMAL(19,4) for all financial calculations. The 4 decimal places handle sub-cent precision needed for tax calculations, currency conversion, and compound interest. Never use FLOAT for money.
Which type would you use for a financial total column?
DOUBLE uses binary fractions, so 0.1 + 0.2 = 0.30000000000000004. Over thousands of transactions, rounding errors accumulate into real money discrepancies.
1SELECT
2 order_id,
3 CAST(subtotal AS DOUBLE) + CAST(
4 tax
5 AS DOUBLE
6 ) AS total
7FROM orders
Choosing the right numeric type is critical for financial accuracy.

> Complete this query to calculate an exact financial total using the correct type.

SELECT
  order_id,
  ___(subtotal + tax AS ___(19, 4)) AS total
FROM orders
FLOAT
CAST
TRY_CAST
DECIMAL

DECIMAL(19,4) is the industry standard for financial calculations, providing 4 decimal places for sub-cent precision.

Rounding errors from FLOAT may seem small per row, but they accumulate across thousands of transactions into real money discrepancies.

When migrating from FLOAT to DECIMAL, always verify totals match before and after conversion to catch any accumulated drift.

TIMESTAMP vs TIMESTAMP WITH TIME ZONE

Daily Life
Interviews

Track moments across global systems

SQL provides two primary temporal types that seem similar but solve fundamentally different problems. Choosing the wrong one causes bugs that only appear when users span multiple time zones or when daylight saving time transitions happen.
TIMESTAMP
  • Calendar date + time, NO timezone
  • Stores value exactly as given
  • Does not adjust for DST
  • Use for: recurring schedules
TIMESTAMP WITH TIME ZONE
  • Absolute point in time + timezone
  • Stored internally as UTC
  • Converts to session TZ on read
  • Use for: event logs, audit trails
Choosing between these types depends on whether your value represents a calendar label or a specific moment in time.
TIMESTAMP for schedules
"Backup runs at 3am daily" stays 3am in every timezone.
WITH TIME ZONE for events
"User clicked buy at this exact moment" needs UTC precision.
DST transition danger
TIMESTAMP without timezone is ambiguous during DST: 2am happens twice.
Cross-timezone coordination
TIMESTAMP WITH TIME ZONE ensures "same moment" across all offices.
1SELECT
2 CURRENT_DATE AS today,
3 CURRENT_TIMESTAMP AS now_utc,
4 CURRENT_DATE + INTERVAL '1' DAY AS tomorrow,
5 CURRENT_DATE - INTERVAL '1' MONTH AS last_month,
6 DATE_FORMAT(
7 CURRENT_TIMESTAMP,
8 'yyyy-MM-dd'
9 ) AS formatted
Result
todaynow_utctomorrowlast_monthformatted
2025-01-152025-01-15 14:30:252025-01-162024-12-152025-01-15

Choosing Temporal Types

Select the right temporal type based on whether you need time precision and timezone awareness.
Casting to DATE discards time information. You lose the ability to distinguish morning vs evening events, and the CAST on every row prevents index usage.
1SELECT
2 user_id,
3 CAST(event_time AS DATE) AS event_day
4FROM user_events
5WHERE CAST(event_time AS DATE) = DATE '2025-03-15'

> Complete the query to extract just the date portion from order timestamps.

SELECT
  order_id,
  ___(created_at AS ___) AS order_date
FROM orders
TIMESTAMP
FORMAT
CAST
DATE

Storing the full TIMESTAMP and truncating to DATE at query time is the best practice. This preserves detail for future analysis while still supporting daily grouping when needed.

TIMESTAMP WITH TIME ZONE is preferred for event data in global applications. It stores the absolute UTC moment and converts to the session timezone only when displaying results.

Plain TIMESTAMP without timezone information is appropriate for calendar-based data like scheduled reports, recurring reminders, or business-hours definitions that should not shift with timezone.

Time zones and UTC handling

Daily Life
Interviews

Standardize times for distributed data

Time zones are one of the most error-prone areas in data engineering. A timestamp without timezone context is ambiguous. Is "2025-03-15 14:00:00" in UTC? Eastern Time? The server's timezone?
Incorrect assumptions lead to data shifted by hours, causing wrong aggregations and confused stakeholders.
Three Timezone Representations
  • UTC: Universal reference point, no DST ambiguity, monotonic
  • Named zones: America/New_York handles EST/EDT automatically
  • Fixed offsets: +05:00 is static, ignores DST changes
1SELECT
2 created_at,
3 created_at AT TIME ZONE 'UTC' AS utc_time
4FROM event_data
5WHERE created_at > TIMESTAMP '2025-01-01'
TIP
Daylight Saving Time trap: On DST transition days, local times can be ambiguous (2am occurs twice in fall) or non-existent (2am is skipped in spring). Storing UTC avoids this entirely.

Timezone Best Practices

Following these guidelines helps avoid timezone-related bugs in global applications.
Do
  • Store event timestamps in UTC
  • Use TIMESTAMP WITH TIME ZONE
  • Convert to user TZ only at display
  • Use ISO 8601: YYYY-MM-DD
Don't
  • Store local times without TZ info
  • Use plain TIMESTAMP for global event logs
  • Assume server timezone matches users
  • Use ambiguous formats like MM/DD/YY

> Complete this query to display event timestamps in the New York timezone.

SELECT
  event_id,
  ___ AT TIME ZONE ___ AS local_time
FROM event_data
event_timestamp
'EST'
event_type
'America/New_York'
Always use full IANA timezone names like America/New_York rather than abbreviations like EST, which are ambiguous and do not handle daylight saving transitions.
Storing timestamps in UTC and converting to local time at display is the most reliable strategy for global applications.
AT TIME ZONE adjusts both the displayed value and the timezone offset, ensuring correct representation across any region.

TRY_CAST and implicit casts

Daily Life
Interviews

Convert types without crashing on bad data

In the beginner lesson, you learned CAST for type conversion. But CAST crashes the entire query if any single value fails to convert. TRY_CAST is the production-safe alternative: it returns NULL instead of throwing an error.

1SELECT
2 CAST('123' AS INTEGER) AS works,
3 CAST('abc' AS INTEGER) AS fails
4FROM users
5LIMIT 1
1SELECT
2 CAST('123' AS INTEGER) AS string_to_int,
3 CAST('45.67' AS DOUBLE) AS string_to_double,
4 COALESCE(NULLIF('abc', 'abc'), '0') AS fallback,
5 123 AS int_value,
6 45.67 AS double_value,
7 'abc' AS string_value
8FROM users
9LIMIT 1
Result
string_to_intstring_to_doublefallbackint_valuedouble_valuestring_value
12345.67012345.67abc

Implicit Type Coercion

SQL automatically converts between compatible types in certain situations. This "implicit conversion" happens silently, which can be both convenient and dangerous.
BIT
INTEGER
DECIMAL
FLOAT

Example: INTEGER + DECIMAL = DECIMAL, because DECIMAL can represent all integers exactly.

> Complete the query to safely convert stored values to integers without crashing on bad data.

SELECT
  ___(kv_value AS ___)
FROM kv_store
TRY_CAST
CAST
VARCHAR
INTEGER

TRY_CAST combined with COALESCE provides a complete safety net: TRY_CAST handles conversion failures and COALESCE substitutes a default value so downstream queries never encounter unexpected NULLs.

Implicit type coercion happens silently and can cause queries on indexed columns to fall back to full table scans. Explicit CAST makes intent clear and avoids unintended performance regressions.

Validating and casting data at ingestion time is far more efficient than casting at every query. Store data in the correct type once during ETL so all subsequent reads pay no conversion cost.
PUTTING IT ALL TOGETHER

> You are a data analyst at Brex investigating a reported discrepancy in monthly revenue totals. Transaction subtotals and taxes stored as FLOAT are producing rounding errors that accumulate into real dollar differences across thousands of payments.

BOOLEAN NULL logic explains why some transactions with unknown status are silently excluded from revenue WHERE clauses instead of surfacing as a data quality alert.
DECIMAL(19,4) stores subtotal and tax amounts as exact base-10 digits, eliminating the floating-point rounding that is causing the revenue discrepancy.
TIMESTAMP WITH TIME ZONE ensures all payment events are stored in UTC so cross-region totals are not shifted by daylight saving transitions or server timezone assumptions.
TRY_CAST safely converts legacy VARCHAR amount fields from an acquired company without crashing the pipeline when values like N/A appear in the imported data.
KEY TAKEAWAYS
SQL BOOLEAN has THREE values: TRUE, FALSE, and NULL (unknown)
NULL = NULL returns NULL, not TRUE; always use IS NULL
DECIMAL(p,s) provides exact math; use DECIMAL(19,4) for financial data
Integer division trap: 10/4 = 2; cast to DECIMAL first for decimal results
Use TIMESTAMP WITH TIME ZONE for events; use plain TIMESTAMP for schedules
Store all timestamps in UTC; convert to user timezone only for display
TRY_CAST returns NULL on failure instead of crashing; use for untrusted data
Implicit type conversion can bypass indexes and cause 100-1000x slowdowns

Data Types: Intermediate

Where pennies vanish and NULLs defy

Category
SQL
Difficulty
intermediate
Duration
19 minutes
Challenges
0 hands-on challenges

Topics covered: BOOLEAN and NULL logic, DECIMAL precision and scale, TIMESTAMP vs TIMESTAMP WITH TIME ZONE, Time zones and UTC handling, TRY_CAST and implicit casts

Lesson Sections

  1. BOOLEAN and NULL logic

    Truth Table Surprises Boolean Filtering Patterns

  2. DECIMAL precision and scale (concepts: sqlDecimalType)

    Common Configurations Avoiding Numeric Pitfalls Integer division and floating-point approximation cause subtle bugs in calculations. Which type would you use for a financial total column? Choosing the right numeric type is critical for financial accuracy.

  3. TIMESTAMP vs TIMESTAMP WITH TIME ZONE (concepts: sqlTimestampType)

    SQL provides two primary temporal types that seem similar but solve fundamentally different problems. Choosing the wrong one causes bugs that only appear when users span multiple time zones or when daylight saving time transitions happen. Choosing between these types depends on whether your value represents a calendar label or a specific moment in time. Choosing Temporal Types Select the right temporal type based on whether you need time precision and timezone awareness.

  4. Time zones and UTC handling (concepts: sqlTimezones)

    Time zones are one of the most error-prone areas in data engineering. A timestamp without timezone context is ambiguous. Is "2025-03-15 14:00:00" in UTC? Eastern Time? The server's timezone? Incorrect assumptions lead to data shifted by hours, causing wrong aggregations and confused stakeholders. Timezone Best Practices Following these guidelines helps avoid timezone-related bugs in global applications. Always use full IANA timezone names like America/New_York rather than abbreviations like EST,

  5. TRY_CAST and implicit casts (concepts: sqlTryCast)

    Implicit Type Coercion SQL automatically converts between compatible types in certain situations. This "implicit conversion" happens silently, which can be both convenient and dangerous. Validating and casting data at ingestion time is far more efficient than casting at every query. Store data in the correct type once during ETL so all subsequent reads pay no conversion cost.