Data Types: Intermediate
Databricks ingests raw CSV files from thousands of enterprise customers every day, and each file arrives with its own quirks: revenue columns stored as strings with dollar signs, timestamps in a dozen different formats, and boolean flags encoded as the letter Y. Their ETL pipelines use explicit type casting and validation at the point of ingestion to catch these issues before they flow downstream into financial models where a silent precision loss in a FLOAT column could corrupt a quarterly earnings calculation. The type conversion patterns you will learn in this lesson are the same ones production-grade pipelines use to ensure that every value arrives in the right type, with the right precision, from the very first row.
BOOLEAN and NULL logic
Combine true/false with unknown values
In the beginner lesson, you learned that BOOLEAN stores TRUE or FALSE. But SQL booleans have a critical third state: NULL. This is called three-valued logic, and it breaks many assumptions that work in normal programming.
Truth Table Surprises
NULL propagates through logical expressions in ways that surprise most developers. Any operation involving NULL produces NULL, not TRUE or FALSE:
TRUE AND NULL→NULL. Could beTRUEANDTRUEorTRUEANDFALSEFALSE AND NULL→FALSE.FALSEANDanything isFALSETRUE OR NULL→TRUE.TRUEORanything isTRUEFALSE OR NULL→NULL. Could beFALSEORTRUEorFALSEORFALSENOT NULL→NULL.NOTunknown is still unknownNULL = NULL→NULL. You cannot compare unknowns
NULL bug: WHERE status = NULL returns zero rows, because nothing equals NULL, not even NULL itself. Always use IS NULL or IS NOT NULL instead.Boolean Filtering Patterns
Different filtering styles handle NULL values differently. Choose the pattern that matches your intent.
Handling NULL correctly often pairs with type conversion. Try formatting a price column with exact decimal precision.
> Complete the query to format the price column as a fixed two-decimal number.
SELECT product_name, (price AS (10, 2)) AS formatted_price FROM products
Three-valued logic also affects aggregations. COUNT(*) counts all rows including those with NULL values, but COUNT(column) skips NULL rows. This distinction changes results when NULL values are present.
The IS NULL and IS NOT NULL operators are the correct way to test for NULL. Using = NULL or != NULL always evaluates to NULL (unknown), causing those rows to be silently excluded from results.
Boolean filtering with WHERE is_active and WHERE is_active = TRUE behave identically for non-NULL values, but differ in clarity. Use whichever makes intent most obvious to the next reader.
DECIMAL precision and scale
Store exact numbers for financial data
Floating-point numbers (FLOAT, DOUBLE) approximate values using binary fractions. This causes the famous 0.1 + 0.2 problem: the result is 0.30000000000000004, not 0.3. For financial calculations, use DECIMAL instead. It stores numbers as exact base-10 digits.
DECIMAL(10,2): 10 total digits, 2 after the decimal point- Can store up to 99,999,999.99
- Values are rounded to fit: 123.456 becomes 123.46
- Scale cannot exceed precision
Common Configurations
Avoiding Numeric Pitfalls
- Cast at least one operand to DECIMAL before division: 10.0 / 4 = 2.5
- Use explicit CAST(1 AS DECIMAL) / 3 to signal fractional intent
- Check for integer division any time you see the / operator in queries
- Divide two integers directly: 10 / 4 silently returns 2, not 2.5
- Rely on implicit conversion for division results
- Assume the database will warn you about truncated decimals
DECIMAL(19,4) for all financial calculations. The 4 decimal places handle sub-cent precision needed for tax calculations, currency conversion, and compound interest. Never use FLOAT for money.> Complete this query to calculate an exact financial total using the correct type.
SELECT order_id, (subtotal + tax AS (19, 4)) AS total FROM orders
DECIMAL(19,4) is the industry standard for financial calculations, providing 4 decimal places for sub-cent precision.
Rounding errors from FLOAT may seem small per row, but they accumulate across thousands of transactions into real money discrepancies.
When migrating from FLOAT to DECIMAL, always verify totals match before and after conversion to catch any accumulated drift.
TIMESTAMP vs TIMESTAMP WITH TIME ZONE
Track moments across global systems
- Calendar date + time, NO timezone
- Stores value exactly as given
- Does not adjust for DST
- Use for: recurring schedules
- Absolute point in time + timezone
- Stored internally as UTC
- Converts to session TZ on read
- Use for: event logs, audit trails
Choosing Temporal Types
> Complete the query to extract just the date portion from order timestamps.
SELECT order_id, (created_at AS ) AS order_date FROM orders
Storing the full TIMESTAMP and truncating to DATE at query time is the best practice. This preserves detail for future analysis while still supporting daily grouping when needed.
TIMESTAMP WITH TIME ZONE is preferred for event data in global applications. It stores the absolute UTC moment and converts to the session timezone only when displaying results.
Plain TIMESTAMP without timezone information is appropriate for calendar-based data like scheduled reports, recurring reminders, or business-hours definitions that should not shift with timezone.
Time zones and UTC handling
Standardize times for distributed data
- UTC: Universal reference point, no DST ambiguity, monotonic
- Named zones: America/New_York handles EST/EDT automatically
- Fixed offsets: +05:00 is static, ignores DST changes
Timezone Best Practices
- Store event timestamps in UTC
- Use TIMESTAMP WITH TIME ZONE
- Convert to user TZ only at display
- Use ISO 8601: YYYY-MM-DD
- Store local times without TZ info
- Use plain TIMESTAMP for global event logs
- Assume server timezone matches users
- Use ambiguous formats like MM/DD/YY
> Complete this query to display event timestamps in the New York timezone.
SELECT event_id, AT TIME ZONE AS local_time FROM event_data
TRY_CAST and implicit casts
Convert types without crashing on bad data
In the beginner lesson, you learned CAST for type conversion. But CAST crashes the entire query if any single value fails to convert. TRY_CAST is the production-safe alternative: it returns NULL instead of throwing an error.
Implicit Type Coercion
Example: INTEGER + DECIMAL = DECIMAL, because DECIMAL can represent all integers exactly.
> Complete the query to safely convert stored values to integers without crashing on bad data.
SELECT (kv_value AS ) FROM kv_store
TRY_CAST combined with COALESCE provides a complete safety net: TRY_CAST handles conversion failures and COALESCE substitutes a default value so downstream queries never encounter unexpected NULLs.
Implicit type coercion happens silently and can cause queries on indexed columns to fall back to full table scans. Explicit CAST makes intent clear and avoids unintended performance regressions.
> You are a data analyst at Brex investigating a reported discrepancy in monthly revenue totals. Transaction subtotals and taxes stored as FLOAT are producing rounding errors that accumulate into real dollar differences across thousands of payments.
BOOLEAN NULL logic explains why some transactions with unknown status are silently excluded from revenue WHERE clauses instead of surfacing as a data quality alert.DECIMAL(19,4) stores subtotal and tax amounts as exact base-10 digits, eliminating the floating-point rounding that is causing the revenue discrepancy.TIMESTAMP WITH TIME ZONE ensures all payment events are stored in UTC so cross-region totals are not shifted by daylight saving transitions or server timezone assumptions.TRY_CAST safely converts legacy VARCHAR amount fields from an acquired company without crashing the pipeline when values like N/A appear in the imported data.BOOLEAN has THREE values: TRUE, FALSE, and NULL (unknown)NULL = NULL returns NULL, not TRUE; always use IS NULLDECIMAL(p,s) provides exact math; use DECIMAL(19,4) for financial dataDECIMAL first for decimal resultsTIMESTAMP WITH TIME ZONE for events; use plain TIMESTAMP for schedulesTRY_CAST returns NULL on failure instead of crashing; use for untrusted dataData Types: Intermediate
Where pennies vanish and NULLs defy
- Category
- SQL
- Difficulty
- intermediate
- Duration
- 19 minutes
- Challenges
- 0 hands-on challenges
Topics covered: BOOLEAN and NULL logic, DECIMAL precision and scale, TIMESTAMP vs TIMESTAMP WITH TIME ZONE, Time zones and UTC handling, TRY_CAST and implicit casts
Lesson Sections
- BOOLEAN and NULL logic
Truth Table Surprises Boolean Filtering Patterns
- DECIMAL precision and scale (concepts: sqlDecimalType)
Common Configurations Avoiding Numeric Pitfalls Integer division and floating-point approximation cause subtle bugs in calculations. Which type would you use for a financial total column? Choosing the right numeric type is critical for financial accuracy.
- TIMESTAMP vs TIMESTAMP WITH TIME ZONE (concepts: sqlTimestampType)
SQL provides two primary temporal types that seem similar but solve fundamentally different problems. Choosing the wrong one causes bugs that only appear when users span multiple time zones or when daylight saving time transitions happen. Choosing between these types depends on whether your value represents a calendar label or a specific moment in time. Choosing Temporal Types Select the right temporal type based on whether you need time precision and timezone awareness.
- Time zones and UTC handling (concepts: sqlTimezones)
Time zones are one of the most error-prone areas in data engineering. A timestamp without timezone context is ambiguous. Is "2025-03-15 14:00:00" in UTC? Eastern Time? The server's timezone? Incorrect assumptions lead to data shifted by hours, causing wrong aggregations and confused stakeholders. Timezone Best Practices Following these guidelines helps avoid timezone-related bugs in global applications. Always use full IANA timezone names like America/New_York rather than abbreviations like EST,
- TRY_CAST and implicit casts (concepts: sqlTryCast)
Implicit Type Coercion SQL automatically converts between compatible types in certain situations. This "implicit conversion" happens silently, which can be both convenient and dangerous. Validating and casting data at ingestion time is far more efficient than casting at every query. Store data in the correct type once during ETL so all subsequent reads pay no conversion cost.