SQL Reference

SQL Date Functions

You'll reach for DATE_TRUNC more times in your first month on the job than any other date function. Cohort retention, weekly active users, monthly revenue rollups: all of them are one DATE_TRUNC call away. This page is the reference you'll keep open in a second tab. It covers Postgres, Snowflake, BigQuery, and MySQL syntax side by side so you don't have to hunt for the right flavor.

6%

DATE_DIFF interview frequency

4

SQL dialects shown

78

Date-focused problems

1,042

Rounds analyzed

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

DATE_TRUNC

DATE_TRUNC rounds a timestamp down to the specified precision. “Truncating to month” turns 2024-03-15 14:30:00 into 2024-03-01 00:00:00. This is the single most-used date function in analytical SQL because it powers every “group by month” and “group by week” query.

-- DATE_TRUNC(precision, timestamp)
-- PostgreSQL, Snowflake, Redshift
SELECT DATE_TRUNC('month', '2024-03-15 14:30:00'::timestamp);
-- 2024-03-01 00:00:00

SELECT DATE_TRUNC('week', '2024-03-15'::date);
-- 2024-03-11 (Monday of that week in PostgreSQL)

SELECT DATE_TRUNC('quarter', '2024-08-20'::date);
-- 2024-07-01

SELECT DATE_TRUNC('year', '2024-08-20'::date);
-- 2024-01-01

-- BigQuery uses DATE_TRUNC(date, part)
-- SELECT DATE_TRUNC('2024-03-15', MONTH);

-- MySQL has no DATE_TRUNC; use DATE_FORMAT
-- SELECT DATE_FORMAT('2024-03-15', '%Y-%m-01');
-- Most common pattern: group by month
SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(*) AS order_count,
  SUM(revenue) AS total_revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Interview note: DATE_TRUNC appears in nearly every time-series interview question. If the question says “by month” or “by week,” reach for DATE_TRUNC first. It is cleaner than EXTRACT(MONTH FROM ...) for grouping because it preserves the year (January 2024 vs January 2025).

EXTRACT

EXTRACT pulls a single numeric component from a date or timestamp. Use it when you need just the year, month, day, hour, or day of week as a number.

-- EXTRACT(field FROM timestamp)
SELECT EXTRACT(YEAR FROM '2024-03-15'::date);    -- 2024
SELECT EXTRACT(MONTH FROM '2024-03-15'::date);   -- 3
SELECT EXTRACT(DAY FROM '2024-03-15'::date);     -- 15
SELECT EXTRACT(HOUR FROM '2024-03-15 14:30:00'::timestamp);  -- 14
SELECT EXTRACT(DOW FROM '2024-03-15'::date);     -- 5 (Friday, 0=Sunday)
SELECT EXTRACT(ISODOW FROM '2024-03-15'::date);  -- 5 (Friday, 1=Monday)
SELECT EXTRACT(EPOCH FROM '2024-03-15'::date);   -- Unix timestamp

-- Use case: find peak hours
SELECT
  EXTRACT(HOUR FROM event_time) AS hour_of_day,
  COUNT(*) AS event_count
FROM events
GROUP BY 1
ORDER BY 2 DESC;

-- Use case: day-of-week analysis
SELECT
  EXTRACT(ISODOW FROM order_date) AS day_of_week,
  CASE EXTRACT(ISODOW FROM order_date)
    WHEN 1 THEN 'Monday'
    WHEN 2 THEN 'Tuesday'
    WHEN 3 THEN 'Wednesday'
    WHEN 4 THEN 'Thursday'
    WHEN 5 THEN 'Friday'
    WHEN 6 THEN 'Saturday'
    WHEN 7 THEN 'Sunday'
  END AS day_name,
  COUNT(*) AS order_count
FROM orders
GROUP BY 1
ORDER BY 1;

Common mistake: Do not use EXTRACT(MONTH ...) for grouping time series. EXTRACT gives you just the month number (1-12), so January 2024 and January 2025 both become 1 and get grouped together. Use DATE_TRUNC for time-series grouping. EXTRACT is for cross-year analysis like “which month of the year has the most sales.”

DATEDIFF and Date Arithmetic

Calculating the difference between two dates is one of the most common operations in data pipelines. The syntax varies more across databases than any other date function.

-- PostgreSQL: subtract dates directly
SELECT '2024-03-15'::date - '2024-01-01'::date;  -- 74 (integer days)

-- PostgreSQL: subtract timestamps (returns interval)
SELECT '2024-03-15 14:00'::timestamp - '2024-03-15 10:30'::timestamp;
-- '03:30:00'

-- Snowflake: DATEDIFF(unit, start, end)
SELECT DATEDIFF('day', '2024-01-01', '2024-03-15');     -- 74
SELECT DATEDIFF('month', '2024-01-01', '2024-03-15');   -- 2
SELECT DATEDIFF('hour', '2024-03-15 10:00', '2024-03-15 14:30');  -- 4

-- BigQuery: DATE_DIFF(end, start, unit)
-- SELECT DATE_DIFF('2024-03-15', '2024-01-01', DAY);   -- 74

-- MySQL: DATEDIFF(end, start) returns days only
-- SELECT DATEDIFF('2024-03-15', '2024-01-01');          -- 74
-- For other units: TIMESTAMPDIFF(MONTH, start, end)

Date Arithmetic with INTERVAL

-- Add/subtract time using INTERVAL
SELECT CURRENT_DATE + INTERVAL '7 days';       -- 7 days from now
SELECT CURRENT_DATE - INTERVAL '1 month';      -- 1 month ago
SELECT CURRENT_TIMESTAMP + INTERVAL '3 hours'; -- 3 hours from now

-- Common pattern: filter last N days
SELECT *
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';

-- Retention calculation: days between signup and first purchase
SELECT
  u.user_id,
  u.signup_date,
  MIN(o.order_date) AS first_order_date,
  MIN(o.order_date) - u.signup_date AS days_to_first_order
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.signup_date;

Interview pattern: Retention and cohort analysis questions always involve date arithmetic. “What percentage of users who signed up in January made a purchase within 7 days?” requires subtracting signup_date from order_date and checking if the result is <= 7.

CURRENT_DATE and CURRENT_TIMESTAMP

These functions return the current date or timestamp at query execution time. They are used for filtering recent data, calculating ages, and stamping records.

-- Current date and time (no parentheses in standard SQL)
SELECT CURRENT_DATE;       -- 2024-03-15
SELECT CURRENT_TIMESTAMP;  -- 2024-03-15 14:30:00+00

-- Snowflake also supports:
-- SELECT CURRENT_DATE();
-- SELECT CURRENT_TIMESTAMP();
-- SELECT GETDATE();       -- SQL Server

-- Common patterns
-- Orders from today
SELECT * FROM orders WHERE order_date = CURRENT_DATE;

-- Orders from the last 24 hours
SELECT * FROM orders
WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours';

-- Account age in days
SELECT
  user_id,
  signup_date,
  CURRENT_DATE - signup_date AS account_age_days
FROM users;

-- Stamp an ETL load
INSERT INTO load_log (table_name, loaded_at, row_count)
VALUES ('orders', CURRENT_TIMESTAMP, 1523);

Timezone Handling

Timezones are one of the hardest problems in data engineering. The rules change between databases, daylight saving time creates ambiguity, and a single wrong assumption can shift every metric by a day. Here is the practical approach.

-- PostgreSQL: AT TIME ZONE converts between timezones
-- timestamp WITHOUT time zone -> interprets as the specified zone
SELECT '2024-03-15 14:30:00'::timestamp AT TIME ZONE 'America/New_York';
-- 2024-03-15 18:30:00+00  (converts to UTC)

-- timestamptz -> converts to display in the specified zone
SELECT '2024-03-15 18:30:00+00'::timestamptz AT TIME ZONE 'America/New_York';
-- 2024-03-15 14:30:00  (displays in Eastern)

-- Snowflake: CONVERT_TIMEZONE
SELECT CONVERT_TIMEZONE('UTC', 'America/New_York', created_at);

-- BigQuery: TIMESTAMP functions with timezone
-- SELECT TIMESTAMP('2024-03-15 14:30:00', 'America/New_York');

-- Group by local day (not UTC day)
SELECT
  DATE_TRUNC(
    'day',
    created_at AT TIME ZONE 'America/New_York'
  ) AS local_day,
  COUNT(*) AS order_count
FROM orders
GROUP BY 1
ORDER BY 1;

Rule 1: Store in UTC

All timestamps in your warehouse should be stored in UTC. This removes ambiguity. Convert to local time only at the query layer (for dashboards and reports) or in the application layer. Storing in local time creates daylight saving bugs that are extremely hard to debug.

Rule 2: Use Named Timezones

Use 'America/New_York' instead of 'EST' or '-05:00'. Named timezones automatically handle daylight saving transitions. Fixed offsets do not. EST is always UTC-5, but New York is UTC-5 in winter and UTC-4 in summer.

Rule 3: Convert Before Truncating

If your business operates in US Eastern and you need daily metrics, convert to Eastern BEFORE applying DATE_TRUNC. Otherwise, an order placed at 11 PM Eastern (4 AM UTC next day) lands in the wrong day.

Date Functions in Data Pipelines

These patterns appear in production dbt models, ETL jobs, and analytical queries.

Cohort Analysis

-- Monthly cohort retention
SELECT
  DATE_TRUNC('month', u.signup_date) AS cohort_month,
  EXTRACT(MONTH FROM AGE(o.order_date, u.signup_date)) AS months_since_signup,
  COUNT(DISTINCT o.user_id) AS active_users
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY 1, 2
ORDER BY 1, 2;

Rolling Aggregations

-- 7-day rolling average revenue
SELECT
  date,
  revenue,
  AVG(revenue) OVER (
    ORDER BY date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7d_avg
FROM daily_revenue
ORDER BY date;

Freshness Monitoring

-- Check data freshness per table
SELECT
  table_name,
  MAX(loaded_at) AS last_load,
  EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - MAX(loaded_at))) / 3600
    AS hours_since_last_load,
  CASE
    WHEN MAX(loaded_at) < CURRENT_TIMESTAMP - INTERVAL '2 hours'
      THEN 'STALE'
    ELSE 'FRESH'
  END AS freshness_status
FROM load_log
GROUP BY table_name
ORDER BY hours_since_last_load DESC;

3 Date Function Interview Questions

When you get a date question in an interview, the trick is to slow down and figure out what the interviewer is really checking. Is it your syntax? Your handling of gaps? Your ability to reason about time zones? Look for the word that gives it away, then write the query. The three examples below cover the patterns that come up again and again.

Q1: Write a query to show monthly revenue for the last 12 months, including months with zero revenue.

What they test:

DATE_TRUNC for monthly grouping, date spine generation for filling gaps, COALESCE for zero-filling.

Approach:

Generate a date spine of 12 months using generate_series or a recursive CTE. LEFT JOIN the orders table (aggregated by month) onto the spine. COALESCE the revenue to 0 for months with no orders.

Q2: Calculate the 7-day retention rate for users who signed up in January 2024.

What they test:

Date arithmetic (signup_date + 7), date range filtering, ratio calculation with conditional counting.

Approach:

Filter users where signup_date is in January 2024. LEFT JOIN to activity table where activity_date is between signup_date and signup_date + 7. Count distinct users with activity divided by total users.

Q3: Find the average time between consecutive orders for each customer.

What they test:

LAG window function with date subtraction, then averaging the intervals. Tests both date arithmetic and window function skills.

Approach:

Use LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) to get the previous order date. Subtract to get days between orders. Then AVG the differences per customer. Handle customers with only one order (NULL interval).

SQL Date Functions FAQ

What are the most important SQL date functions for data engineers?+
DATE_TRUNC and EXTRACT are the two most-used date functions in data engineering. DATE_TRUNC rounds a timestamp down to a specified precision (day, week, month, quarter, year) and is essential for time-series grouping. EXTRACT pulls a specific component (year, month, day, hour) from a timestamp. After those, DATEDIFF (or date subtraction), CURRENT_DATE/CURRENT_TIMESTAMP, and INTERVAL are the next most common. These five functions cover 95% of date manipulation needs in analytical SQL.
How do date functions differ across SQL databases?+
The core concepts are the same but syntax varies. DATE_TRUNC works in PostgreSQL, Snowflake, BigQuery, and Redshift with the same syntax. But MySQL does not have DATE_TRUNC (you use DATE_FORMAT or date arithmetic instead). DATEDIFF takes different argument orders: MySQL and SQL Server use DATEDIFF(date1, date2) or DATEDIFF(unit, date1, date2), while PostgreSQL uses simple subtraction (date1 - date2). EXTRACT is mostly consistent but the field names vary (DAYOFWEEK vs DOW vs ISODOW). Always check your specific database documentation.
How should I handle timezones in SQL?+
Store timestamps in UTC. Always. Then convert to local time at query time using AT TIME ZONE (PostgreSQL, Snowflake) or CONVERT_TIMEZONE (Snowflake) or TIMESTAMP(timestamp, timezone) (BigQuery). This avoids ambiguity around daylight saving time transitions. When grouping by day, specify whose day you mean: DATE_TRUNC('day', created_at AT TIME ZONE 'America/New_York') groups by US Eastern days, not UTC days. In interviews, if you mention timezone handling unprompted, it signals real-world pipeline experience.

Time-Series Queries, One At A Time

You'll get comfortable with DATE_TRUNC and window-based date math faster than you expect. Start with one problem tonight.