Window Functions: Intermediate

Robinhood shows every user a 30-day moving average price line and a daily portfolio change percentage, and those rolling calculations are computed in SQL using LAG, LEAD, and ROWS BETWEEN frame specifications rather than in application code. By computing trailing averages and period-over-period deltas directly in the database, Robinhood's dashboards update in milliseconds instead of seconds, even as millions of users simultaneously check their portfolios during volatile trading sessions. The intermediate window function techniques in this lesson are the exact patterns powering that kind of real-time financial analytics.

LAG and LEAD

Daily Life

Interviews

Compare each row to the one before or after

LAG and LEAD let you look backward and forward across rows without leaving your query. Every dataset has a first row with nothing before it, and that boundary is where period-over-period comparisons can quietly break.

THE CASE OF THE DISAPPEARING GROWTH

Nadia Reyes came in first thing this morning. Her growth report covers six months of revenue data. January is blank where every other month has a number. The source data is clean, no errors, no missing records. Nobody can explain the gap.

monthly_revenue

month_num	month	revenue
1	Jan	45000
2	Feb	52000
3	Mar	48000
4	Apr	61000
5	May	58000
6	Jun	72000

LAG and LEAD are window functions that reach backward or forward to access values from other rows without a self-join. They are essential for comparing a row to the row before or after it.

Think of your result set as a spreadsheet. When you are on row 5, LAG lets you peek at row 4 (or row 3, or any row above), and LEAD lets you peek at row 6 (or row 7, or any row below). The database does not need to join the table to itself or use a correlated subquery. It simply remembers what it already processed and knows what comes next.

LAG and LEAD Fundamentals

Understanding the syntax and behavior of these functions is essential before applying them to real analytics scenarios.

The Syntax

	SELECT
	sale_month,
	revenue,
	LAG(revenue, 1) OVER (
	ORDER BY sale_month
	) AS prev_revenue
	FROM transactions

LAG accepts up to three arguments: the column to read, an optional offset (defaults to 1), and an optional default value for when there is no previous row. LEAD works identically but looks forward instead of backward.

Why Not a Self-Join?

Before window functions existed, comparing a row to its predecessor required joining a table to itself. You would match each row to the previous row by offsetting a date or sequence column. This works, but it has real costs.

•Self-Join Approach

Scans the table twice
Silently drops rows with missing data
Complex join conditions to maintain

•LAG / LEAD

Single ordered pass over data
Returns NULL or your default for gaps
Clean, readable syntax

Month-Over-Month Comparison

The most common use of LAG is period-over-period comparison. By subtracting the previous period value from the current one, you get the change between periods in a single pass.

	SELECT
	sale_month,
	revenue,
	LAG(revenue, 1) OVER (
	ORDER BY sale_month
	) AS prev_month,
	revenue - LAG(revenue, 1) OVER (
	ORDER BY sale_month
	) AS mom_change
	FROM transactions

Result

sale_month	revenue	prev_month	mom_change
2025-01	42000	NULL	NULL
2025-02	45000	42000	3000
2025-03	39000	45000	-6000
2025-04	51000	39000	12000

The first row has NULL for prev_month because there is no row before January. The arithmetic with NULL also produces NULL for mom_change. You can handle this with COALESCE or by providing a default value in the LAG call itself.

Practical Applications

Beyond simple comparisons, LAG and LEAD unlock powerful patterns for forecasting, change detection, and anomaly identification.

LEAD for Change Detection

While LAG is used to look backward, LEAD is equally valuable for forward-looking analysis. A common use case is detecting upcoming changes. For example, if you have a table of subscription plan changes, LEAD can show what plan a customer will switch to next, letting you identify potential downgrades before they happen.

	SELECT
	customer_id,
	plan_name,
	change_date,
	LEAD(plan_name, 1) OVER (
	PARTITION BY customer_id
	ORDER BY change_date
	) AS next_plan,
	LEAD(change_date, 1) OVER (
	PARTITION BY customer_id
	ORDER BY change_date
	) AS next_change_date
	FROM plan_changes

Result

customer_id	plan_name	change_date	next_plan	next_change_date
C1	Premium	2025-01-01	Basic	2025-04-01
C1	Basic	2025-04-01	NULL	NULL
C2	Basic	2025-02-15	Premium	2025-03-15
C2	Premium	2025-03-15	NULL	NULL

Customer C1 is about to downgrade from Premium to Basic. Customer C2 upgraded from Basic to Premium. The NULL values in next_plan indicate the most recent plan, the one they are still on. This kind of forward-looking analysis powers churn prediction models and retention campaigns.

Detecting Sequential Gaps

LAG is also powerful for detecting gaps or anomalies in sequences. If you expect events to occur daily, you can flag days where the gap between consecutive events is larger than expected. This is critical in monitoring pipelines where a missing day of data could indicate a broken ETL job or a sensor failure.

	SELECT
	event_date,
	sensor_id,
	LAG(event_date, 1) OVER (
	PARTITION BY sensor_id
	ORDER BY event_date
	) AS prev_date,
	DATE_DIFF(
	'day',
	LAG(event_date, 1) OVER (
	PARTITION BY sensor_id
	ORDER BY event_date
	),
	event_date
	) AS days_since_last
	FROM sensor_readings

Result

event_date	sensor_id	prev_date	days_since_last
2025-03-01	S1	NULL	NULL
2025-03-02	S1	2025-03-01	1
2025-03-05	S1	2025-03-02	3
2025-03-06	S1	2025-03-05	1

The row with days_since_last = 3 reveals a gap: sensor S1 skipped March 3rd and 4th. You could wrap this in a WHERE clause (via a subquery) to filter only the rows where the gap exceeds your threshold. In a production pipeline, this pattern feeds automated alerting systems that notify engineers when data stops flowing.

TIP

LAG and LEAD always require ORDER BY inside the OVER clause. Without ordering, "previous" and "next" have no meaning. The database will reject the query or produce unpredictable results.

Larger Offsets

The offset parameter lets you look further back or ahead. LAG(revenue, 12) looks 12 rows back, useful for year-over-year comparison with monthly data:

	SELECT
	sale_month,
	revenue,
	LAG(revenue, 12) OVER (
	ORDER BY sale_month
	) AS same_month_last_year,
	revenue - LAG(revenue, 12) OVER (
	ORDER BY sale_month
	) AS yoy_change
	FROM transactions

Result

sale_month	revenue	same_month_last_year	yoy_change
2024-01	38000	NULL	NULL
2024-02	41000	NULL	NULL
2025-01	42000	38000	4000
2025-02	45000	41000	4000

The first 12 rows will have NULL for same_month_last_year because there are no rows 12 positions back yet. Once you have a full year of data, every row gets a comparison. Year-over-year analysis is one of the most requested metrics in business intelligence dashboards, and LAG with offset 12 is by far the cleanest way to compute it.

> Complete this query to rank metrics by value, with ties sharing the same rank and no gaps in the numbering.

SELECT
  metric_name,
  metric_value,
  ___() OVER (
    ORDER BY metric_value ___
  ) AS ranking
FROM employee_metrics

DENSE_RANK

DESC

RANK

ASC

DENSE_RANK is the correct choice whenever you want to count how many distinct performance levels exist above a given row, because it assigns consecutive integers with no gaps even when multiple rows share the same value.

Sorting in descending order puts the highest scores at rank 1, which matches the intuitive meaning of "first place" and is the convention used in leaderboards, competitive rankings, and compensation benchmarking tools.

LAG and LEAD are complementary to ranking functions because once you have ranked rows you can use LAG to compare each row to the one immediately above or below it in the ranking, which enables tier-boundary analysis without additional joins.

FIRST, LAST, NTH VALUE

Daily Life

Interviews

Pull specific values from a window

FIRST_VALUE, LAST_VALUE, and NTH_VALUE let you pull specific positional values from a window. They answer questions like "what was the first order amount in each category?" or "what was the third highest salary in each department?"

These functions are different from LAG and LEAD in an important way. LAG and LEAD access rows relative to the current row (1 row back, 2 rows forward). Positional functions access rows relative to the window frame itself (the first row in the frame, the last row, or the Nth row). This distinction matters because the frame can be the entire partition or just a sliding subset of it.

Common Use Cases

FIRST_VALUE: First purchase date per customer (acquisition dates)
LAST_VALUE: Last login time per user (activity monitoring)
NTH_VALUE: Third highest sale per region (outlier detection)

Positional Function Basics

Each positional function accesses a specific row within the window frame. Understanding their syntax and default behaviors is essential for correct usage.

FIRST_VALUE

FIRST_VALUE returns the value from the first row in the window frame. This is commonly used to compare every row against the starting point. Unlike LAST_VALUE, FIRST_VALUE works correctly with the default frame because the first row of the frame is always included.

	SELECT
	employee_name,
	department,
	salary,
	FIRST_VALUE(salary) OVER (
	PARTITION BY department
	ORDER BY salary DESC
	) AS top_salary,
	salary - FIRST_VALUE(salary) OVER (
	PARTITION BY department
	ORDER BY salary DESC
	) AS gap_from_top
	FROM employee_metrics

Result

employee_name	department	salary	top_salary	gap_from_top
Alice	Engineering	150000	150000	0
Bob	Engineering	130000	150000	-20000
Carol	Engineering	120000	150000	-30000
Diana	Sales	110000	110000	0
Eve	Sales	95000	110000	-15000

Every employee can see how far their salary is from the highest in their department. The person with the top salary has a gap of zero. This pattern is commonly used in compensation analysis to show each employee their relative position without revealing other individuals' exact salaries.

The LAST_VALUE Frame Trap

LAST_VALUE has a subtle but critical gotcha that catches nearly everyone the first time. The default window frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This means the "last value" in the frame is always the current row itself.

To understand why this happens, picture the frame as a sliding window that starts at the first row of the partition and ends at the current row. When the database processes row 1, the frame contains only row 1, so the last value is row 1. When it processes row 2, the frame contains rows 1 and 2, so the last value is row 2. The frame always ends at "where you are now," so LAST_VALUE just echoes the current row.

employee	salary	default_last	true_last
Alice	150000	150000	120000
Bob	130000	130000	120000
Carol	120000	120000	120000

Notice how default_last just echoes each row's own salary? That is because the frame ends at CURRENT ROW, so the "last" value is always the current row. To get the actual last value in the entire partition, you must explicitly expand the frame:

	SELECT
	employee_name,
	salary,
	LAST_VALUE(salary) OVER (
	ORDER BY salary DESC
	ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
	) AS lowest_salary
	FROM employee_metrics

Result

employee_name	salary	lowest_salary
Alice	150000	95000
Bob	130000	95000
Carol	120000	95000
Diana	110000	95000
Eve	95000	95000

By specifying ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, the frame spans the entire partition. Now LAST_VALUE correctly returns the last value (the lowest salary when ordered descending).

Whenever you use LAST_VALUE or NTH_VALUE, always add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING unless you specifically want the frame to end at the current row. This is one of the most frequently tested window function concepts in interviews.

NTH_VALUE

NTH_VALUE returns the value from the Nth row in the window frame. It is useful when you need a specific position, such as the second highest or third most recent.

	SELECT
	department,
	employee_name,
	salary,
	NTH_VALUE(salary, 2) OVER (
	PARTITION BY department
	ORDER BY salary DESC
	ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
	) AS second_highest
	FROM employee_metrics

Result

department	employee_name	salary	second_highest
Engineering	Alice	150000	130000
Engineering	Bob	130000	130000
Engineering	Carol	120000	130000
Sales	Diana	110000	95000
Sales	Eve	95000	95000

Notice the full frame specification again. Without it, NTH_VALUE suffers from the same trap as LAST_VALUE: the first row in each partition would return NULL for second_highest because the frame at that point only contains one row.

> Complete this query to show each employee's salary alongside the highest salary in their department.

SELECT
  department,
  metric_name,
  metric_value,
  ___(___) OVER (
    PARTITION BY department
    ORDER BY metric_value DESC
  ) AS top_value
FROM employee_metrics

FIRST_VALUE

metric_value

LAST_VALUE

NTH_VALUE

FIRST_VALUE with ORDER BY DESC gives the maximum value, while LAST_VALUE needs an explicit frame clause to see the entire partition.

Always add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when using LAST_VALUE or NTH_VALUE to avoid unexpected NULLs.

NTH_VALUE is useful for specific positions like second highest or third most recent, but also requires an explicit full-partition frame.

Frame Specifications

Daily Life

Interviews

Define exactly which rows a window covers

Frame specifications define exactly which rows participate in a window calculation. Every window function operates on a frame, and understanding frames is the key to unlocking the full power of window functions.

Up to this point, you have been relying on the default frame, which extends from the beginning of the partition to the current row. That default works for running totals but fails for many other patterns. To build moving averages, trailing windows, and centered calculations, you need to take explicit control of the frame boundaries.

Frame Syntax and Types

Frame clauses follow a consistent syntax pattern that gives you fine-grained control over exactly which rows are included in each calculation.

ROWS BETWEEN Syntax

The frame clause follows ORDER BY inside the OVER clause. The syntax is ROWS BETWEEN followed by a start bound and an end bound:

UNBOUNDED PRECEDING

Start from the very first row in the partition.

N PRECEDING

Start N rows before the current row.

CURRENT ROW

The current row being processed.

N FOLLOWING

End N rows after the current row.

UNBOUNDED FOLLOWING

Extend to the very last row in the partition.

ROWS vs RANGE

SQL offers two frame types: ROWS and RANGE. They behave differently when your ORDER BY column contains duplicate values, which is common in real data.

ROWS counts physical rows. "3 PRECEDING" means exactly 3 rows above the current one, regardless of the values in those rows. RANGE, on the other hand, groups by value. "3 PRECEDING" with RANGE means all rows whose ORDER BY value is within 3 units of the current row's value. If multiple rows share the same value, RANGE treats them as a single logical group.

	SELECT
	sale_date,
	revenue,
	SUM(revenue) OVER (
	ORDER BY sale_date
	ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
	) AS rolling_sum
	FROM transactions

ROWSRANGE

ROWS

Row position

Fixed sliding windows

RANGE

Column value

Calendar-based windows

In practice, ROWS is used far more often than RANGE. ROWS is simpler to reason about, more predictable, and more widely supported across database engines. Use RANGE only when you specifically need value-based grouping, such as computing a sum of all transactions within a dollar amount range of the current row.

Common Frame Patterns

	SELECT
	sale_month,
	revenue,
	SUM(revenue) OVER (
	ORDER BY sale_month
	ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
	) AS cumulative
	FROM transactions

Real-World Frame Patterns

Moving averages and trailing calculations are ubiquitous in analytics. These patterns smooth out noise and reveal underlying trends in time-series data.

Sliding Window Average

Moving averages smooth out noise in time-series data. A 3-day moving average at each point takes the average of the previous day, current day, and next day. This is called a centered moving average because the current row is in the middle of the window.

	SELECT
	sale_date,
	revenue,
	AVG(revenue) OVER (
	ORDER BY sale_date
	ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
	) AS moving_avg_3
	FROM daily_revenue

Result

sale_date	revenue	moving_avg_3
2025-03-01	1200	1350
2025-03-02	1500	1400
2025-03-03	1500	1433
2025-03-04	1300	1367
2025-03-05	1300	1300

The first row only has 2 values in its frame (itself and the next row) because there is no preceding row. The last row similarly only has 2 values. The AVG function automatically handles partial frames by averaging whatever rows are present. This edge behavior is important to understand: moving averages at the boundaries of your data are based on fewer data points and may be less reliable.

7-Day Trailing Average

A trailing window only looks backward, which is useful when you cannot use future data (such as real-time dashboards where tomorrow has not happened yet). The 7-day trailing average is one of the most widely used patterns in business analytics.

Here is how the frame works for each row. On day 1, the frame contains only day 1 (no preceding rows exist). On day 2, it contains days 1 and 2. By day 7, the frame is full: it contains the current day plus the 6 preceding days. From day 7 onward, the frame always contains exactly 7 rows, sliding forward one day at a time.

	SELECT
	sale_date,
	revenue,
	AVG(revenue) OVER (
	ORDER BY sale_date
	ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
	) AS trailing_7_day_avg,
	SUM(revenue) OVER (
	ORDER BY sale_date
	ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
	) AS trailing_7_day_total
	FROM daily_revenue

Result

sale_date	revenue	trailing_7_day_avg	trailing_7_day_total
2025-03-01	1200	1200	1200
2025-03-02	1500	1350	2700
2025-03-03	1500	1400	4200
2025-03-07	1800	1457	8700
2025-03-08	1400	1471	9300

The frame ROWS BETWEEN 6 PRECEDING AND CURRENT ROW creates a trailing 7-day window (6 previous rows plus the current one). Early rows where fewer than 7 days of data exist will average whatever is available. Note that the trailing average stabilizes after the first full week of data, making subsequent values more reliable for trend analysis.

Bounded vs Unbounded Frames

Bounded frames (like 6 PRECEDING AND CURRENT ROW) limit the number of rows the function examines at each step. Unbounded frames (like UNBOUNDED PRECEDING AND CURRENT ROW) grow as the partition progresses. This distinction has practical implications.

With a bounded frame, the database only needs to maintain a fixed-size window in memory. Adding one row to the front and dropping one from the back is efficient. With an unbounded frame, the running calculation must account for every row seen so far. For SUM and COUNT, the database can maintain a running total incrementally. But functions that require sorting or comparing all values in the frame cannot take this shortcut, making unbounded frames expensive on very large partitions.

TIP

Use bounded frames for fixed-size windows like moving averages and trailing totals. Use unbounded frames for cumulative calculations like running totals and percentage of total.

The concept of moving averages has a surprisingly long history outside of databases.

> Complete this query to compare each day's cost with the previous day.

SELECT
  bill_date,
  amount,
  ___(amount, ___) OVER (
    ORDER BY bill_date
  ) AS prev_amount
FROM cloud_costs

LAG

LEAD

LAG with an offset of 1 is the standard tool for day-over-day or period-over-period comparisons because it reads the value from the preceding row in a single pass without requiring the table to be joined to itself.

Providing a default value as the third argument to LAG replaces the NULL that appears on the first row of each partition, which prevents NULL propagation when you subsequently subtract the lagged value from the current value.

Frame specifications and LAG serve different purposes: frames control which rows an aggregate function sees at each step, while LAG always jumps to a fixed offset regardless of any frame clause you specify.

NTILE and Bucketing

Daily Life

Interviews

Split rows into equal-sized groups

NTILE divides the ordered rows in a partition into a specified number of roughly equal groups and assigns each row a bucket number from 1 to N. It is the go-to function for creating quartiles, deciles, and custom percentile buckets.

Percentile bucketing is fundamental to data engineering and analytics. Nearly every organization segments its users, customers, or products into tiers based on some metric. The top 25 percent of customers by spending might receive premium support. The bottom 10 percent of products by revenue might be candidates for discontinuation. NTILE automates this segmentation cleanly.

NTILE Fundamentals

The NTILE function follows a predictable pattern for dividing rows into equal groups, making it straightforward to apply once you understand its behavior.

Basic NTILE Usage

Consider a customer segmentation scenario. A marketing team wants to divide customers into four spending tiers to target campaigns differently. The top quartile gets exclusive offers, the bottom quartile gets re-engagement campaigns.

	SELECT
	customer_name,
	total_spent,
	NTILE(4) OVER (
	ORDER BY total_spent DESC
	) AS spending_quartile
	FROM customer_summary

Result

customer_name	total_spent	spending_quartile
Alice	15200	1
Bob	12800	1
Carol	9800	2
Diana	7600	2
Eve	5400	3
Frank	3200	3
Grace	1800	4
Hank	900	4

With 8 rows divided into 4 groups, each group gets exactly 2 rows. Quartile 1 contains the highest spenders and quartile 4 the lowest. This is a clean division because 8 divides evenly by 4.

Handling Remainders

When the row count does not divide evenly, NTILE distributes the extra rows to the earlier buckets. With 10 rows and 4 buckets, the first two buckets get 3 rows each and the last two get 2 rows each.

NTILE Remainder Distribution

10 rows into 4 buckets → 3, 3, 2, 2
11 rows into 4 buckets → 3, 3, 3, 2
12 rows into 4 buckets → 3, 3, 3, 3
100 rows into 10 buckets → 10 each (even)
103 rows into 10 buckets → 11, 11, 11, 10, 10, 10, 10, 10, 10, 10

The rule is simple: if there are R remainder rows, the first R buckets each get one extra row. This ensures the difference in group size is never more than one.

Deciles and Percentiles

Use NTILE(3) for tertiles (3 groups), NTILE(10) for deciles (10 groups), or NTILE(100) for percentiles. Deciles are popular for salary analysis and customer segmentation. Percentiles give the finest granularity.

	SELECT
	employee_name,
	salary,
	NTILE(10) OVER (
	ORDER BY salary
	) AS salary_decile
	FROM employee_metrics

Result

employee_name	salary	salary_decile
Eve	65000	1
Frank	72000	2
Bob	85000	3
Carol	95000	5
Alice	130000	9
Diana	150000	10

Employees in decile 10 are in the top 10 percent of salaries. Those in decile 1 are in the bottom 10 percent. This is far more informative than a simple rank because it immediately conveys relative position.

NTILE in Practice

Understanding when and how to use NTILE effectively requires comparing it to alternatives and knowing how to combine it with other window clauses.

NTILE vs CASE Bucketing

Before NTILE, analysts created buckets using manual CASE statements with hardcoded thresholds. This approach is fragile because the thresholds must be updated whenever the data distribution changes. NTILE adapts automatically.

•Manual CASE

Hardcoded thresholds
Must update when data changes
Buckets may be very unequal
Verbose: one WHEN per bucket

•NTILE

Adapts to data automatically
Always produces equal groups
One line of SQL
Works with any partition

Use manual CASE when your bucket boundaries have business meaning (such as "under 50K is junior, 50-100K is mid, over 100K is senior"). Use NTILE when you want equal-sized groups based on relative position, such as "top 25 percent of performers."

NTILE with PARTITION BY

Combine NTILE with PARTITION to create buckets within groups. This is essential when the same absolute value means different things in different contexts.

	SELECT
	department,
	employee_name,
	salary,
	NTILE(4) OVER (
	PARTITION BY department
	ORDER BY salary DESC
	) AS dept_quartile
	FROM employee_metrics

Result

department	employee_name	salary	dept_quartile
Engineering	Alice	150000	1
Engineering	Bob	130000	2
Engineering	Carol	120000	3
Engineering	Dave	95000	4
Sales	Diana	110000	1
Sales	Eve	95000	2
Sales	Frank	85000	3
Sales	Grace	72000	4

Now quartiles are relative to each department. Carol earns 120K and is in quartile 3 for Engineering, but Diana earns less (110K) and is in quartile 1 for Sales. Context matters.

How many buckets should you use for your analysis?

	SELECT
	customer_name,
	total_spent,
	NTILE(4) OVER (
	ORDER BY total_spent DESC
	) AS spending_quartile
	FROM customer_summary

Note that NTILE does not consider the actual values when creating buckets. It only considers row position. Two employees with identical salaries could end up in different buckets if they fall on a boundary. If you need value-aware bucketing, use PERCENT_RANK or CUME_DIST instead.

> Complete this query to divide customers into 4 spending tiers.

SELECT
  customer_name,
  total_spent,
  ___(___) OVER (
    ORDER BY total_spent DESC
  ) AS tier
FROM customers

NTILE

RANK

NTILE distributes rows as evenly as possible. If 10 rows are split into 4 buckets, two buckets get 3 rows and two get 2.

NTILE assigns buckets based on row position, not actual values. Two customers with identical spending could end up in different tiers at a boundary.

For value-aware bucketing where identical values should always share the same tier, use PERCENT_RANK or CUME_DIST instead.

Multiple Windows

Daily Life

Interviews

Run different calculations in one query

Real-world analysis often requires comparing a row from multiple perspectives in the same query. You might need a rank within a department, a running total across the company, and a comparison to the previous month all in one result set.

Rank within category

RANK with PARTITION BY category

Percentage of total revenue

SUM with UNBOUNDED frame

Month-over-month growth

LAG for period comparison

Quartile placement

NTILE for percentile bucketing

There is no named WINDOW clause. You must write the full OVER specification for each window function. This means you will see multiple OVER clauses in a single SELECT, each potentially with different partitioning, ordering, and framing.

Multiple Window Definitions

Each window function in a query can have its own partitioning, ordering, and framing. This flexibility lets you answer multiple questions in a single pass over the data.

Different Partitions

Here is a practical example: for each sale, show the rank within its category and the rank across all categories.

	SELECT
	product_name,
	category,
	revenue,
	RANK() OVER (
	PARTITION BY category
	ORDER BY revenue DESC
	) AS category_rank,
	RANK() OVER (
	ORDER BY revenue DESC
	) AS overall_rank
	FROM product_sales

Result

product_name	category	revenue	category_rank	overall_rank
Laptop Pro	Electronics	89000	1	1
Smart Watch	Electronics	45000	2	3
Winter Jacket	Clothing	52000	1	2
Running Shoes	Clothing	31000	2	4

The first RANK partitions by category so each category has its own ranking. The second RANK has no partition, ranking all products together. Winter Jacket is ranked 1st in Clothing but 2nd overall.

Different Orderings

You can also use different orderings in the same query. This is useful for showing how the same data looks from different angles.

	SELECT
	employee_name,
	department,
	salary,
	hire_date,
	ROW_NUMBER() OVER (
	ORDER BY salary DESC
	) AS salary_rank,
	ROW_NUMBER() OVER (
	ORDER BY hire_date
	) AS seniority_rank
	FROM employee_metrics

Result

employee_name	department	salary	hire_date	salary_rank	seniority_rank
Alice	Engineering	150000	2020-03-15	1	3
Bob	Engineering	130000	2018-06-01	2	1
Carol	Sales	120000	2019-01-10	3	2
Diana	Sales	110000	2021-09-22	4	4

Alice earns the most (salary rank 1) but is only the third most senior employee (seniority rank 3). Bob earns less but has been at the company the longest. Multiple window orderings reveal these relationships without needing separate queries.

Multiple Query Perspectives

The most powerful pattern combines aggregate windows, positional functions, and ranking functions with different partitions and frames in the same query. This is the bread and butter of analytics dashboards.

	SELECT
	sale_month,
	department,
	revenue,
	SUM(revenue) OVER (
	PARTITION BY department
	ORDER BY sale_month
	) AS running_total,
	LAG(revenue, 1) OVER (
	PARTITION BY department
	ORDER BY sale_month
	) AS prev_revenue,
	NTILE(3) OVER (
	ORDER BY revenue DESC
	) AS rev_tier
	FROM monthly_dept_revenue

Result

sale_month	department	revenue	running_total	prev_revenue	rev_tier
2025-01	Engineering	50000	50000	NULL	1
2025-02	Engineering	55000	105000	50000	1
2025-03	Engineering	48000	153000	55000	2
2025-01	Sales	30000	30000	NULL	3
2025-02	Sales	35000	65000	30000	2

This single query produces three different analytical perspectives: a department-level running total, month-over-month comparison via LAG, and a company-wide revenue tier via NTILE. Each OVER clause is tailored to its specific purpose.

Performance Considerations

Understanding how the database processes multiple windows helps you write efficient queries without sacrificing analytical power.

Shared Window Performance

When multiple window functions share the same OVER clause (same PARTITION BY, same ORDER BY, same frame), the query optimizer can often compute them in a single pass over the data. The database sorts the data once and evaluates all matching window functions together.

When window functions have different OVER clauses, the database may need to sort the data multiple times, once for each distinct window specification. On large datasets, each additional sort is expensive. This is why grouping window functions by their OVER clause is good practice.

TIP

Do not sacrifice clarity for performance. If your analysis requires different partitions and orderings, write them out. Only optimize window specifications if profiling shows the sorts are a bottleneck.

AggregatePositional

Aggregate

Frame totals

SUM, AVG across the frame

Positional

Row lookups

LAG, LEAD, FIRST_VALUE

Now let's look at the key rules for writing window functions correctly and avoiding common pitfalls:

✓Do

Always include ORDER BY in LAG, LEAD, and positional functions
Use explicit frame specs with LAST_VALUE and NTH_VALUE
Provide default values in LAG/LEAD to avoid NULL arithmetic
Use NTILE for equal-group bucketing instead of manual CASE logic
Comment your intent when using multiple windows in one query
Use ROWS frames unless you specifically need value-based RANGE grouping

✗Don't

Do not use LAST_VALUE without expanding the frame to UNBOUNDED FOLLOWING
Do not assume NTILE buckets are exactly equal when rows do not divide evenly
Do not confuse ROWS and RANGE frame types when dealing with duplicate values
Do not use reserved words like row or rank as column aliases
Do not forget that LAG and LEAD ignore the frame clause entirely

> Fill in the missing parts to calculate how much cost changed from one billing period to the next.

SELECT
  bill_date,
  amount,
  ___(___, 1) OVER (
    ___ bill_date
  ) AS prev_amount,
  amount - ___(amount, 1) OVER (
    ORDER BY bill_date
  ) AS mom_change
FROM cloud_costs

LAG

amount

ORDER BY

LAG

LEAD

PARTITION BY

Offset and cumulative window techniques become intuitive with repetition. Put these patterns to the test with real-world challenges.

Combining LAG for period comparison with NTILE for segmentation in the same query is a common analytics pattern because it simultaneously shows how a row changed from the previous period and where it ranks relative to all other rows.

Each window function in a SELECT list is evaluated independently against its own OVER clause, which means the database may sort the data multiple times if the specifications differ, so aligning OVER clauses where possible improves query performance on large datasets.

❯❯❯PUTTING IT ALL TOGETHER

> You are a growth analyst at Spotify tracking month-over-month revenue changes per product line to surface trends for the quarterly business review, and your stakeholders want both period comparisons and bucket tiers in a single report.

LAG() pulls the prior month revenue for each product line in one pass, letting you subtract it from the current month to compute the change directly.

FIRST_VALUE() and LAST_VALUE() with explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frames anchor each row to the quarter start and end for boundary comparisons.

Frame specifications using ROWS BETWEEN N PRECEDING AND CURRENT ROW produce a rolling three-month average that smooths seasonal noise across product lines.

NTILE() divides product lines into performance quartiles so leadership can immediately see which lines rank in the top versus bottom tier each quarter.

KEY TAKEAWAYS

LAG/LEAD access previous/next rows in a single pass; use offset and default value params to handle edges

LAST_VALUE and NTH_VALUE need an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frame to see the full partition

Frame specs control which rows participate: use ROWS for physical row counts (moving averages) and RANGE for value-based grouping

NTILE(N) divides rows into N equal buckets for quartile/percentile segmentation

Combine multiple window functions with different partitions and orderings in one query for multi-perspective analysis

Look back. Look forward. Frame it.

Category: SQL
Difficulty: intermediate
Duration: 39 minutes
Challenges: 5 hands-on challenges

Topics covered: LAG and LEAD, FIRST, LAST, NTH VALUE, Frame Specifications, NTILE and Bucketing, Multiple Windows

Lesson Sections

LAG and LEAD (concepts: sqlLagLead)
LAG and LEAD Fundamentals Understanding the syntax and behavior of these functions is essential before applying them to real analytics scenarios. The Syntax Why Not a Self-Join? Before window functions existed, comparing a row to its predecessor required joining a table to itself. You would match each row to the previous row by offsetting a date or sequence column. This works, but it has real costs. Month-Over-Month Comparison Practical Applications LEAD for Change Detection Detecting Sequential
FIRST, LAST, NTH VALUE (concepts: sqlFirstLastValue)
Positional Function Basics Each positional function accesses a specific row within the window frame. Understanding their syntax and default behaviors is essential for correct usage. FIRST_VALUE Every employee can see how far their salary is from the highest in their department. The person with the top salary has a gap of zero. This pattern is commonly used in compensation analysis to show each employee their relative position without revealing other individuals' exact salaries. The LAST_VALUE Fr
Frame Specifications (concepts: sqlWindowFrame)
Frame specifications define exactly which rows participate in a window calculation. Every window function operates on a frame, and understanding frames is the key to unlocking the full power of window functions. Up to this point, you have been relying on the default frame, which extends from the beginning of the partition to the current row. That default works for running totals but fails for many other patterns. To build moving averages, trailing windows, and centered calculations, you need to
NTILE and Bucketing (concepts: sqlNtile)
NTILE Fundamentals Basic NTILE Usage Consider a customer segmentation scenario. A marketing team wants to divide customers into four spending tiers to target campaigns differently. The top quartile gets exclusive offers, the bottom quartile gets re-engagement campaigns. With 8 rows divided into 4 groups, each group gets exactly 2 rows. Quartile 1 contains the highest spenders and quartile 4 the lowest. This is a clean division because 8 divides evenly by 4. Handling Remainders The rule is simple
Multiple Windows (concepts: sqlMultipleWindows)
Real-world analysis often requires comparing a row from multiple perspectives in the same query. You might need a rank within a department, a running total across the company, and a comparison to the previous month all in one result set. Multiple Window Definitions Each window function in a query can have its own partitioning, ordering, and framing. This flexibility lets you answer multiple questions in a single pass over the data. Different Partitions Here is a practical example: for each sale,