SQL Reference

SQL Window Functions Cheat Sheet

Most candidates memorize ROW_NUMBER and call themselves comfortable with window functions. Interviewers don't buy it. Window functions appear in roughly 35% of senior-level DE interviews we've analyzed, and the ones that actually separate strong candidates from weak ones are never the simple ranking questions. They're the frame clause questions, the LAG with a default value questions, and the "what does PARTITION BY without ORDER BY even do" questions.

This isn't a reference card for the things you already know. It's a reference card for the things you think you know, and a quiet check on the ones you don't.

35%

Senior rounds use them

21%

Use PARTITION BY

15%

Use ROW_NUMBER

9%

Aggregate windows

Source: DataDriven analysis of 1,042 verified data engineering interview rounds.

Window Function Syntax

function_name(args) OVER (
  [PARTITION BY partition_columns]
  [ORDER BY sort_columns [ASC|DESC]]
  [frame_clause]
)

PARTITION BY: Divides rows into groups. The function resets for each group. Omitting it treats the entire result set as one partition.

ORDER BY: Defines the row order within each partition. Required for ranking and offset functions. Optional for aggregate functions (but changes their behavior when present).

Frame clause: Defines which rows within the partition are included in the calculation. Only applies to aggregate functions and FIRST_VALUE / LAST_VALUE.

Ranking Functions

Don't default to ROW_NUMBER just because it's the one you remember. The interviewer asking for "the top three by salary per department" is usually checking whether you reach for DENSE_RANK or RANK when ties matter, and whether you know the difference. ROW_NUMBER with a tied ORDER BY produces nondeterministic output. That's a bug, not a feature, and it's the first thing a sharp interviewer probes.

ROW_NUMBER()

Assigns a unique sequential integer to each row. Ties get different numbers (arbitrary tiebreak).

-- Number each employee within their department by salary
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)
-- Result: 1, 2, 3, 4 (no gaps, no ties)

RANK()

Like ROW_NUMBER but ties get the same rank. Leaves gaps after ties.

-- Rank employees by salary (gaps after ties)
RANK() OVER (PARTITION BY dept ORDER BY salary DESC)
-- Result: 1, 2, 2, 4 (gap: 3 is skipped)

DENSE_RANK()

Like RANK but no gaps. Ties get the same rank. The next distinct value gets the next integer.

-- Dense rank: no gaps after ties
DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC)
-- Result: 1, 2, 2, 3 (no gap)

NTILE(n)

Divides the partition into n approximately equal groups and assigns a group number (1 to n) to each row.

-- Divide employees into 4 salary quartiles
NTILE(4) OVER (ORDER BY salary)
-- Result: 1, 1, 2, 2, 3, 3, 4, 4 (for 8 rows)

Aggregate Window Functions

Standard aggregate functions work as window functions when used with OVER(). Without ORDER BY, they compute over the entire partition. With ORDER BY, they compute a running aggregate (using the default frame).

SUM()

-- Running total of revenue by date
SUM(revenue) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)

-- Total revenue per department (on every row)
SUM(revenue) OVER (PARTITION BY dept)

-- 7-day rolling sum
SUM(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

AVG()

-- Running average
AVG(salary) OVER (ORDER BY hire_date ROWS UNBOUNDED PRECEDING)

-- 30-day moving average
AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)

-- Department average (compare each row to its group)
AVG(salary) OVER (PARTITION BY dept)

COUNT()

-- Running count
COUNT(*) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)

-- Total rows in partition
COUNT(*) OVER (PARTITION BY dept)

-- Count of non-null values in a window
COUNT(email) OVER (PARTITION BY dept)

MIN() / MAX()

-- Running minimum
MIN(price) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)

-- Max salary in each department
MAX(salary) OVER (PARTITION BY dept)

-- Sliding window min (3-row window)
MIN(temperature) OVER (ORDER BY timestamp ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

Offset Functions

Offset functions access a value from a different row relative to the current row. They require ORDER BY and do not use a frame clause (except FIRST_VALUE and LAST_VALUE).

LAG(column, offset, default)

Returns the value from a previous row. Offset defaults to 1. Default value used when there is no previous row.

-- Previous day's revenue
LAG(revenue, 1, 0) OVER (ORDER BY date)

-- Revenue from 7 days ago
LAG(revenue, 7) OVER (ORDER BY date)

-- Day-over-day change
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS daily_change

LEAD(column, offset, default)

Returns the value from a subsequent row. Mirror of LAG.

-- Next day's revenue
LEAD(revenue, 1) OVER (ORDER BY date)

-- Time to next event (session analysis)
LEAD(event_time) OVER (PARTITION BY user_id ORDER BY event_time)
  - event_time AS time_to_next

FIRST_VALUE(column)

Returns the first value in the window frame. Respects the frame clause.

-- First order date per customer
FIRST_VALUE(order_date) OVER (
  PARTITION BY customer_id ORDER BY order_date
  ROWS UNBOUNDED PRECEDING
)

-- Earliest hire in each department
FIRST_VALUE(name) OVER (
  PARTITION BY dept ORDER BY hire_date
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

LAST_VALUE(column)

Returns the last value in the window frame. Important: the default frame ends at CURRENT ROW, so you usually need to specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

-- Most recent order per customer (needs full frame)
LAST_VALUE(order_date) OVER (
  PARTITION BY customer_id ORDER BY order_date
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

-- WRONG: default frame stops at current row
LAST_VALUE(order_date) OVER (
  PARTITION BY customer_id ORDER BY order_date
)  -- this gives the CURRENT row, not the last

NTH_VALUE(column, n)

Returns the nth value in the window frame. Useful for getting the second-highest, third-most-recent, etc.

-- Second highest salary per department
NTH_VALUE(salary, 2) OVER (
  PARTITION BY dept ORDER BY salary DESC
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Distribution Functions

PERCENT_RANK()

Relative rank as a percentage: (rank - 1) / (total_rows - 1). Returns 0.0 for the first row, 1.0 for the last.

PERCENT_RANK() OVER (ORDER BY salary)
-- Salary at 50th percentile gets ~0.5

CUME_DIST()

Cumulative distribution: fraction of rows with values less than or equal to the current row. Always returns a value between 0 (exclusive) and 1 (inclusive).

CUME_DIST() OVER (ORDER BY score)
-- 0.25 means 25% of rows have a score <= this row's score

Frame Clause Reference

The frame clause defines which rows are included in the window calculation. It only applies to aggregate functions and FIRST_VALUE / LAST_VALUE / NTH_VALUE.

{ROWS | RANGE | GROUPS} BETWEEN
  {UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW}
AND
  {CURRENT ROW | n FOLLOWING | UNBOUNDED FOLLOWING}
FrameMeaningUse Case
ROWS UNBOUNDED PRECEDINGFrom partition start to current rowRunning total, running count
ROWS BETWEEN 6 PRECEDING AND CURRENT ROWCurrent row + 6 rows before7-day rolling average
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWINGPrevious row, current, next rowSmoothing, 3-point average
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGEntire partitionPartition total on every row
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWINGCurrent row to partition endReverse running total

Interview note: The most common interview mistake with window functions is forgetting the default frame. When ORDER BY is present, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This means LAST_VALUE returns the current row, not the actual last row. Always specify the frame explicitly when using FIRST_VALUE, LAST_VALUE, or NTH_VALUE.

PARTITION BY + ORDER BY Combos

PatternOVER ClauseResult
Global totalOVER ()Same value on every row
Group totalOVER (PARTITION BY dept)Same value within each dept
Global running totalOVER (ORDER BY date)Cumulative across all rows
Group running totalOVER (PARTITION BY dept ORDER BY date)Cumulative within each dept
Group rankingOVER (PARTITION BY dept ORDER BY salary DESC)Rank resets per dept

Common Interview Patterns

Top-N Per Group

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY dept ORDER BY salary DESC
    ) AS rn
  FROM employees
)
SELECT * FROM ranked WHERE rn <= 3;

Year-over-Year Comparison

SELECT
  month,
  revenue,
  LAG(revenue, 12) OVER (ORDER BY month) AS prev_year,
  revenue - LAG(revenue, 12) OVER (ORDER BY month) AS yoy_change
FROM monthly_revenue;

Percentage of Group Total

SELECT
  dept,
  name,
  salary,
  ROUND(
    100.0 * salary / SUM(salary) OVER (PARTITION BY dept),
    1
  ) AS pct_of_dept
FROM employees;

Consecutive Streaks

-- Find consecutive days of positive revenue
WITH flagged AS (
  SELECT
    date,
    revenue,
    date - ROW_NUMBER() OVER (ORDER BY date)::INT AS grp
  FROM daily_metrics
  WHERE revenue > 0
)
SELECT MIN(date) AS streak_start, MAX(date) AS streak_end,
  COUNT(*) AS streak_length
FROM flagged
GROUP BY grp
ORDER BY streak_length DESC;

Window Functions FAQ

What is a window function in SQL?+
A window function computes a value for each row based on a set of rows related to that row (the 'window'). Unlike GROUP BY, which collapses rows into groups, window functions keep every row in the result and add the computed value as a new column. The window is defined by PARTITION BY (which rows to include) and ORDER BY (how to order them within the partition). Window functions include ranking functions (ROW_NUMBER, RANK, DENSE_RANK), aggregate functions (SUM, AVG, COUNT, MIN, MAX), offset functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE), and distribution functions (NTILE, PERCENT_RANK, CUME_DIST).
What is the difference between ROWS and RANGE in a frame clause?+
ROWS counts individual rows. RANGE considers the value of the ORDER BY expression. With ROWS BETWEEN 2 PRECEDING AND CURRENT ROW, the frame always includes exactly 3 rows. With RANGE BETWEEN 2 PRECEDING AND CURRENT ROW, the frame includes all rows whose ORDER BY value is within 2 of the current row's value, which could be more or fewer than 3 rows depending on duplicates and gaps. For most use cases, ROWS is more predictable. RANGE is useful when you want value-based windows (like 'all rows within 7 days of the current row').
What is the default frame clause when ORDER BY is specified?+
When ORDER BY is present and no explicit frame clause is given, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This means the window extends from the first row in the partition up to and including all rows with the same ORDER BY value as the current row. For SUM, this produces a running total. Be aware that RANGE (not ROWS) is the default: if there are ties in the ORDER BY column, all tied rows get the same running total. To get a strict row-by-row cumulative sum, explicitly specify ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

The 35% of senior rounds you can't bluff through

Candidates who memorize the syntax still get caught on frame defaults and tied orderings. The fix is reps against a real engine where the wrong answer actually looks wrong.