Window Functions: Beginner

Amazon ranks products across 12 million listings in real time, and it does not accomplish this by running a separate query for each of its hundreds of product categories. Instead, Amazon's data engineers use window functions to compute rankings within every category in a single pass over the data, using ROW_NUMBER and RANK with PARTITION BY to reset the ranking independently for Electronics, Clothing, Books, and every other vertical simultaneously. The alternative would be thousands of individual GROUP BY queries stitched together, which would be impossibly slow and expensive at scale. When you understand window functions, you understand how the world's largest retailer keeps its search results fresh.

What is a Window Function?

Daily Life

Interviews

Calculate across rows without collapsing them

Window functions compute aggregates while keeping every row in your result intact. GROUP BY can tell you the highest sale per region, but it cannot show that figure alongside each individual transaction. That is exactly the gap window functions fill.

THE CASE OF THE DUPLICATED TOTALS

Ava Mitchell flagged a salary report with the same department total repeated on every row. Individual salaries are all different, but the totals look copy-pasted. Something calculated those numbers on purpose.

employees

id	name	department	salary
1	Chen	Engineering	85000
2	Patel	Engineering	92000
3	Kim	Engineering	78000
4	Osei	Marketing	68000
5	Tanaka	Marketing	71000
6	Ruiz	Sales	88000

A window function performs a calculation across a set of rows that are related to the current row. Unlike GROUP BY, which collapses many rows into one summary row, a window function keeps every row and adds the computed value as a new column. The term "window" refers to the set of rows the function looks at when computing its result for the current row.

Think of it this way: GROUP BY answers questions like "what is the average salary per department?" by producing one row per department. Window functions answer questions like "what is each employee's salary compared to their department average?" by keeping every employee row and adding the department average as a new column next to it.

The key to window functions is the OVER clause. Every window function includes OVER() after the function name. The OVER clause defines which rows the function should consider, called the "window" of rows. Without the OVER clause, these functions behave like ordinary aggregate functions that collapse rows.

What Window Functions Solve

Consider a common analytical question: "Show me each employee's salary alongside the average salary for their department." This requires two things at once: individual row detail (each employee's name and salary) and aggregate context (the department average). Before window functions existed, you had to write a subquery or a self-join to combine these. Window functions let you do it in a single, clean pass.

Comparing Approaches

Let us walk through a concrete example to see the difference between GROUP BY and window functions.

Here is our employee data:

employee_name	department	salary
Alice	Engineering	120000
Bob	Engineering	110000
Carol	Engineering	130000
Dave	Sales	80000
Eve	Sales	90000
Frank	Marketing	95000

With GROUP BY, we get one row per department. The individual employees disappear entirely. With a window function, we keep all six rows and add the department average as a new column. Switch between the two approaches below to see the difference:

	SELECT
	department,
	AVG(salary) AS avg_salary
	FROM employee_metrics
	GROUP BY department

GROUP BY Result (Collapsed)

department	avg_salary
Engineering	120000.00
Marketing	95000.00
Sales	85000.00

Only three rows remain. All employee-level detail is gone. You cannot tell who earns what.

Preserved Rows in Results

	SELECT
	employee_name,
	department,
	salary,
	AVG(salary) OVER (
	PARTITION BY department
	) AS dept_avg
	FROM employee_metrics

Result

employee_name	department	salary	dept_avg
Alice	Engineering	120000	120000.00
Bob	Engineering	110000	120000.00
Carol	Engineering	130000	120000.00
Dave	Sales	80000	85000.00
Eve	Sales	90000	85000.00
Frank	Marketing	95000	95000.00

Notice how every row is preserved. Alice, Bob, and Carol each see the Engineering average of 120,000 next to their own salary. Dave and Eve each see the Sales average of 85,000. The OVER clause with PARTITION BY department tells SQL: "Calculate the average within each department independently, and attach the result to every row in that department."

•GROUP BY

Collapses rows into groups
Returns one row per group
Loses individual row detail
Cannot mix detail and summary
Must list all non-aggregated columns

•Window Functions

Keeps every row intact
Adds computed column to each row
Preserves individual detail
Detail and summary side by side
No restrictions on selected columns

TIP

Use GROUP BY when you only need summaries (total sales per region, count of users per country). Use window functions when you need each row to have context about its group (each sale's percentage of the regional total, each user's rank within their country).

The OVER Clause

The OVER clause is what makes a function a window function. It can contain up to three parts, each controlling a different aspect of which rows are included and how they are processed:

PARTITION BY

Divides rows into groups (partitions). Each group is processed independently, similar to GROUP BY but keeps all rows.

ORDER BY

Defines row ordering within each partition. Required for ranking functions. Determines direction of running totals.

Frame (ROWS/RANGE)

Controls which rows near the current row are included. Defaults to preceding rows plus current row with ORDER BY.

> Complete this query to add a running total of sales alongside each row.

SELECT
  user_id,
  total_amount,
  ___(total_amount) ___ (
    ORDER BY transaction_date
  ) AS running_total
FROM transactions

SUM

OVER

GROUP BY

HAVING

An empty OVER() clause means "use all rows in the entire result set as the window." This gives every row the same computed value. Adding PARTITION BY narrows each window to rows sharing the same partition value. Adding ORDER BY inside the OVER clause defines row ordering, which is required for ranking functions and controls the direction of running totals.

You can combine all three parts. For example, OVER (PARTITION BY department ORDER BY salary DESC) means: "Within each department, process rows from highest salary to lowest." The function then runs independently for each department, in salary order.

What happens when you change the OVER clause from empty to partitioned?

	SELECT
	employee_name,
	salary,
	SUM(salary) OVER () AS total
	FROM employee_metrics

Window functions execute after WHERE, GROUP BY, and HAVING. They operate on the result set that those clauses produce. You cannot reference a window function in a WHERE clause. If you need to filter on a window function result, wrap it in a subquery and filter in the outer query.

PARTITION BY Deep Dive

Daily Life

Interviews

Run calculations within grouped subsets

PARTITION BY is the window function equivalent of GROUP BY. It divides the result set into independent subsets (partitions), and the window function runs separately within each partition. Crucially, unlike GROUP BY, it does NOT collapse rows. Every row remains in the output with its partition's computed value attached.

Think of PARTITION BY as drawing invisible boundaries around groups of rows. Within each boundary, the window function operates as if the other groups do not exist. Rankings restart, sums reset, and counts begin from scratch for each partition.

Single Column Partition

The most common use case is partitioning by a category column to rank or aggregate within each category. You have already seen this pattern: partitioning employees by department to rank salaries within each department. Here is another example using MAX to find the highest price in each product category:

	SELECT
	product_name,
	category,
	price,
	MAX(price) OVER (
	PARTITION BY category
	) AS category_max_price
	FROM products

Result

product_name	category	price	category_max_price
Laptop	Electronics	1200	1200
Phone	Electronics	800	1200
Tablet	Electronics	500	1200
Jacket	Clothing	150	150
Shoes	Clothing	120	150

Every Electronics row sees 1,200 as the category max price (the Laptop). Every Clothing row sees 150 (the Jacket). The partitions are completely independent. You can immediately see how each product compares to the most expensive item in its category.

Advanced Partitioning

Once you understand basic partitioning, you can apply more sophisticated techniques to handle complex real-world scenarios like multi-dimensional groupings and edge cases.

Multiple Partition Columns

You can partition by multiple columns to create finer-grained groups. Each unique combination of partition values becomes its own window. This is useful when your data has multiple dimensions of grouping:

	SELECT
	employee_name,
	department,
	fiscal_year,
	salary,
	AVG(salary) OVER (
	PARTITION BY department, fiscal_year
	) AS dept_year_avg
	FROM employee_salaries

Result

employee_name	department	fiscal_year	salary	dept_year_avg
Carol	Engineering	2024	130000	125000.00
Alice	Engineering	2024	120000	125000.00
Carol	Engineering	2025	140000	132500.00
Bob	Engineering	2025	125000	132500.00
Eve	Sales	2024	90000	85000.00
Dave	Sales	2024	80000	85000.00

Partitioning by both department and fiscal_year creates separate windows like "Engineering 2024", "Engineering 2025", "Sales 2024", and so on. The average resets for each unique combination. Carol and Alice share the Engineering 2024 average of 125,000, while Carol and Bob in Engineering 2025 share an average of 132,500.

NULL in PARTITION BY

When a partition column contains NULL values, all NULL rows are placed into the SAME partition. This is different from most SQL operations where NULL does not equal NULL. In the context of window function partitioning, NULL values are grouped together as if they were equal:

	SELECT
	employee_name,
	department,
	salary,
	COUNT(*) OVER (
	PARTITION BY department
	) AS dept_count
	FROM employee_metrics

Result

employee_name	department	salary	dept_count
Alice	Engineering	120000	3
Bob	Engineering	110000	3
Carol	Engineering	130000	3
Dave	NULL	70000	2
Eve	NULL	65000	2

Dave and Eve have no department (NULL). They land in the same partition and share a dept_count of 2. Be aware of this behavior when your partition columns might have missing values. If you need to exclude NULLs from partitioning, filter them out with a WHERE clause before the window function runs.

PARTITION BY vs None

When you omit PARTITION BY entirely, the entire result set is treated as a single partition. This means the window function considers ALL rows together. You can use both in the same query to get company-wide and department-level stats side by side:

	SELECT
	employee_name,
	department,
	salary,
	SUM(salary) OVER () AS company_total,
	SUM(salary) OVER (
	PARTITION BY department
	) AS dept_total
	FROM employee_metrics

Result

employee_name	department	salary	company_total	dept_total
Alice	Engineering	120000	625000	360000
Bob	Engineering	110000	625000	360000
Carol	Engineering	130000	625000	360000
Dave	Sales	80000	625000	170000
Eve	Sales	90000	625000	170000
Frank	Marketing	95000	625000	95000

The company_total column uses no partition, so every row sees the same grand total of 625,000. The dept_total column partitions by department, so Engineering rows see 360,000, Sales rows see 170,000, and Marketing sees 95,000. Having both in the same query lets you compute percentages like "Engineering represents 360,000 / 625,000 = 57.6% of total payroll."

How does the number of partition columns affect your results?

	SELECT
	employee_name,
	department,
	salary,
	SUM(salary) OVER (
	PARTITION BY department
	) AS dept_total
	FROM employee_metrics

No PARTITION BY

Entire result set is one window. All rows see the same computed value.

Single column partition

One window per unique value. Most common pattern for department, region, or category analysis.

Multi-column partition

One window per unique combination. Use for year+department, region+product type, etc.

NULLs in partition columns

All NULL rows form their own partition. They are grouped together, not excluded.

With these partition behaviors in mind, here are the key guidelines for working with window functions.

✓Do

Use PARTITION BY to scope calculations to logical groups
Combine window functions with regular columns freely
Use multiple partition columns for fine-grained grouping
Filter NULL partition columns with WHERE when you need to exclude them

✗Don't

Don't use GROUP BY when you need to keep individual rows
Don't reference window functions in WHERE clauses
Don't assume NULL rows are excluded from partitions. NULLs form their own group
Don't confuse PARTITION BY (no row collapse) with GROUP BY (collapses rows)

> Complete this query to show each metric alongside the total for its department.

SELECT
  metric_name,
  department,
  metric_value,
  ___(metric_value) OVER (
    ___ ___
  ) AS dept_total
FROM employee_metrics

SUM

PARTITION BY

department

GROUP BY

AVG

PARTITION BY Summary

No PARTITION BY: entire result set is one window
Single column: one window per unique value in that column
Multiple columns: one window per unique combination of values
NULL values: all NULLs are grouped into the same partition

Understanding PARTITION BY is the key to unlocking the full power of window functions. Every window function in the following sections builds on the partitioning behavior covered here.

The pattern of PARTITION BY combined with any window function is the foundation: it computes a value across a group of related rows while keeping all rows in the result. The upcoming sections apply this to ranking (ROW_NUMBER, RANK) and running calculations (SUM/AVG/COUNT OVER).

ROW_NUMBER

Daily Life

Interviews

Assign unique position numbers to rows

ROW_NUMBER() assigns a unique sequential integer to each row within its partition, starting at 1. It is the most commonly used window function and appears in almost every data engineering interview. You will use it for numbering rows, pagination, deduplication, and picking the top N items per group.

Basic ROW_NUMBER

At minimum, ROW_NUMBER() needs an ORDER BY inside the OVER clause to know how to order the numbering. Without ORDER BY, the row numbering is non-deterministic and could change between query executions:

	SELECT
	employee_name,
	salary,
	ROW_NUMBER() OVER (
	ORDER BY salary DESC
	) AS salary_position
	FROM employee_metrics

Result

employee_name	salary	salary_position
Carol	130000	1
Alice	120000	2
Bob	110000	3
Frank	95000	4
Eve	90000	5
Dave	80000	6

Every employee gets a unique number from 1 to 6, ordered by salary from highest to lowest. Carol earns the most so she gets position 1. Dave earns the least so he gets position 6. No two rows ever share the same number.

ROW_NUMBER + PARTITION BY

	SELECT
	employee_name,
	department,
	salary,
	ROW_NUMBER() OVER (
	PARTITION BY department
	ORDER BY salary DESC
	) AS dept_salary_position
	FROM employee_metrics

Adding PARTITION BY restarts the numbering for each group. The numbering begins at 1 within each partition independently. To get only the top earner per department, wrap this in a subquery and filter WHERE dept_salary_position = 1.

Multiple ORDER BY Columns

You can order by multiple columns to control tie-breaking. When two rows have the same value in the first column, the second column determines their order. This makes the numbering deterministic even when values repeat:

	SELECT
	employee_name,
	department,
	salary,
	hire_date,
	ROW_NUMBER() OVER (
	ORDER BY salary DESC, hire_date
	) AS position
	FROM employee_metrics

Result

employee_name	department	salary	hire_date	position
Carol	Engineering	130000	2022-03-15	1
Alice	Engineering	120000	2020-06-01	2
Bob	Engineering	110000	2021-09-10	3
Frank	Marketing	95000	2019-01-20	4
Eve	Sales	90000	2023-02-14	5
Dave	Sales	80000	2021-11-30	6

If two employees had the same salary, the one hired earlier (hire_date ASC) would get the lower position number. Adding a second sort column eliminates ambiguity and ensures consistent results across repeated runs.

Handling NULLs in ORDER BY

When your ordering column contains NULL values, you should explicitly control where they appear using NULLS LAST or NULLS FIRST. Without this, the default NULL positioning depends on your database engine and can surprise you:

	SELECT
	employee_name,
	bonus,
	ROW_NUMBER() OVER (
	ORDER BY bonus DESC NULLS LAST
	) AS bonus_position
	FROM employee_metrics

Result

employee_name	bonus	bonus_position
Carol	15000	1
Alice	12000	2
Eve	8000	3
Bob	NULL	4
Dave	NULL	5

NULLS LAST pushes rows with missing bonus values to the end. The default NULL sort order varies, so it is best practice to always specify NULLS LAST or NULLS FIRST explicitly so your query behaves the same across all database engines.

ROW_NUMBER Applications

ROW_NUMBER has powerful applications including deduplication, pagination, and selecting top-N per group.

Dedup with ROW_NUMBER

One of the most common interview patterns is using ROW_NUMBER to deduplicate data. If a table has duplicate rows, you can number them within each group and keep only the first one. This pattern uses ROW_NUMBER in a subquery, then filters for row number 1 in the outer query:

	SELECT
	*
	FROM (
	SELECT
	user_id,
	email,
	created_at,
	ROW_NUMBER() OVER (
	PARTITION BY user_id
	ORDER BY created_at DESC
	) AS rn
	FROM user_signups
	) AS sub
	WHERE rn = 1

Result

user_id	email	created_at	rn
101	alice@email.com	2025-01-15	1
102	bob@email.com	2025-01-10	1
103	carol@email.com	2025-01-12	1

This keeps only the most recent signup per user. The PARTITION BY user_id groups rows by user, ORDER BY created_at DESC puts the newest first, and the outer WHERE rn = 1 keeps only the first row per user. This is cleaner than using GROUP BY with MAX and a self-join.

Which approach would you use to find the most recent signup per user?

	SELECT
	u.user_id,
	u.email,
	u.created_at
	FROM user_signups AS u
	INNER JOIN (
	SELECT
	user_id,
	MAX(created_at) AS max_date
	FROM user_signups
	GROUP BY user_id
	) AS latest
	ON u.user_id = latest.user_id
	AND u.created_at = latest.max_date

TIP

ROW_NUMBER() always produces unique numbers, even when rows have identical values. If two employees have the same salary, one will get position 3 and the other position 4. The tie-breaking is arbitrary unless you add more columns to ORDER BY. If ties matter, use RANK or DENSE_RANK instead.

> Complete this query to number metrics within each department by value.

SELECT
  department,
  metric_name,
  metric_value,
  ___() OVER (
    PARTITION BY ___
    ORDER BY metric_value ___
  ) AS rank
FROM employee_metrics

ROW_NUMBER

department

DESC

RANK

ASC

ROW_NUMBER always produces unique sequential numbers, even when rows have identical values in the ORDER BY column.

The deduplication pattern filters WHERE rn = 1 after ROW_NUMBER to keep only the top row per partition.

If ties matter and rows with the same value should share a rank, use RANK or DENSE_RANK instead of ROW_NUMBER.

RANK and DENSE_RANK

Daily Life

Interviews

Rank rows with or without gaps

When rows share the same value, ROW_NUMBER() arbitrarily assigns different numbers. But sometimes ties matter. If two students scored 92 on an exam, they should share the same rank. RANK() and DENSE_RANK() handle ties explicitly, giving identical values the same rank. They differ in what happens to the numbers that come after a tie.

How Ties Work

Consider five students with test scores where two students scored 92. Here is how the three ranking functions differ on the exact same data:

student	score	ROW_NUMBER	RANK	DENSE_RANK
Alice	98	1	1	1
Bob	92	2	2	2
Carol	92	3	2	2
Dave	85	4	4	3
Eve	78	5	5	4

ROW_NUMBER: Bob gets 2, Carol gets 3. No ties allowed. The order between Bob and Carol is arbitrary since they have the same score. One of them just happens to be processed first.

RANK: Bob and Carol both get 2. The next value skips to 4. Gaps appear after ties. Think of it like Olympic medals: if two athletes tie for silver, the next athlete gets bronze (3rd place), but in RANK's logic, they skip to 4th. There is no rank 3 because two rows occupied position 2.

DENSE_RANK: Bob and Carol both get 2. The next value is 3. No gaps. Every consecutive rank value is used. If you need to know "how many distinct score levels exist above Dave," DENSE_RANK tells you directly: 2 (which are rank 1 and rank 2).

When Ties Matter: Rankings

Imagine you are building a scholarship system. The school gives scholarships to students ranked in the top 3. If two students tie for rank 2, should the student just below them be ranked 3rd or 4th? The answer depends on your business rules, and that determines whether you use RANK or DENSE_RANK.

RANK Example

RANK assigns tied rows the same number and then skips ahead. If two rows share rank 2, the next row gets rank 4. The rank value tells you: "how many rows scored higher than or equal to this row?"

	SELECT
	student_name,
	score,
	RANK() OVER (ORDER BY score DESC) AS score_rank
	FROM test_scores

Result

student_name	score	score_rank
Alice	98	1
Bob	92	2
Carol	92	2
Dave	85	4
Eve	78	5

Bob and Carol tied at 92, so both get rank 2. RANK then skips to 4 for Dave. There is no rank 3 because two rows occupied position 2. Dave's rank of 4 tells you that exactly 3 students scored higher than or tied with him.

DENSE_RANK Example

DENSE_RANK assigns tied rows the same number but never skips values. The next row after a tie gets the very next integer. This is useful when you care about "how many distinct rank levels exist" rather than "what position would this be in a sorted list."

	SELECT
	student_name,
	score,
	DENSE_RANK() OVER (
	ORDER BY score DESC
	) AS dense_score_rank
	FROM test_scores

Result

student_name	score	dense_score_rank
Alice	98	1
Bob	92	2
Carol	92	2
Dave	85	3
Eve	78	4

With DENSE_RANK, Dave gets 3 instead of 4. No gaps in the sequence. There are exactly 4 distinct rank levels in this dataset. If your scholarship covers "the top 3 rank levels," DENSE_RANK gives you Alice, Bob, Carol, and Dave (ranks 1, 2, and 3), while RANK would only give you Alice, Bob, and Carol (ranks 1 and 2, with rank 3 never assigned).

Choosing Between Functions

The choice between these three functions comes down to how you want ties handled and whether gaps in the numbering matter. Compare them side by side on the same query:

	SELECT
	product_name,
	revenue,
	ROW_NUMBER() OVER (
	ORDER BY revenue DESC
	) AS position
	FROM products

> Rank metrics by value within each department, giving tied values the same rank and skipping ahead after ties.

SELECT
  metric_name,
  department,
  metric_value,
  ___() OVER (
    ___ department
    ORDER BY metric_value ___
  ) AS value_rank
FROM employee_metrics

RANK

PARTITION BY

DESC

ROW_NUMBER

DENSE_RANK

RANK and DENSE_RANK both give tied rows the same number. The only difference is what happens after a tie: RANK skips the next value (1, 2, 2, 4), while DENSE_RANK never skips (1, 2, 2, 3).

Choose RANK when rank values should reflect position in a sorted list ("3rd out of 10"), and choose DENSE_RANK when you need to count distinct rank levels ("how many score tiers exist above this student?").

When ties do not matter and you need a unique number for every row, use ROW_NUMBER instead. It guarantees no two rows share the same value, making it the right choice for deduplication and pagination.

SUM/COUNT/AVG OVER

Daily Life

Interviews

Compute running totals alongside detail rows

You already know SUM, COUNT, and AVG as aggregate functions with GROUP BY. When you pair them with OVER(), they become window functions. Instead of collapsing rows, they compute values across a window while preserving every row. This is one of the most practical applications of window functions in day-to-day data work.

Regular SUM vs SUM OVER

Before diving into running totals, let us clearly see the difference between a regular aggregate SUM and a window function SUM. The regular SUM collapses rows; the window SUM preserves them:

	SELECT
	region,
	SUM(amount) AS total_amount
	FROM orders
	GROUP BY region

Running Totals: SUM OVER

A running total adds each new value to the accumulated sum so far. When you use SUM() with OVER(ORDER BY ...), SQL computes the sum from the first row up to and including the current row. Let us walk through each row step by step:

	SELECT
	order_date,
	amount,
	SUM(amount) OVER (
	ORDER BY order_date
	) AS running_total
	FROM transactions

Result

order_date	amount	running_total
2025-01-01	500	500
2025-01-02	300	800
2025-01-03	700	1500
2025-01-04	200	1700
2025-01-05	400	2100

Here is how each row's running total is calculated: Row 1 (Jan 1): just 500, nothing before it. Row 2 (Jan 2): 500 + 300 = 800. Row 3 (Jan 3): 500 + 300 + 700 = 1,500. Row 4 (Jan 4): 500 + 300 + 700 + 200 = 1,700. Row 5 (Jan 5): 500 + 300 + 700 + 200 + 400 = 2,100. Each row accumulates all previous amounts plus its own.

Running totals are perfect for cumulative revenue, inventory levels, progress toward a goal, or account balances. Any "how much so far" question is a running total use case.

Grand Totals: SUM OVER ()

An empty OVER() clause (no PARTITION BY, no ORDER BY) means "use all rows." This is great for computing each row's percentage of a total without needing a subquery:

	SELECT
	department,
	salary,
	SUM(salary) OVER () AS total_payroll,
	ROUND(
	salary * 100 / SUM(salary) OVER (
	),
	1
	) AS pct_of_payroll
	FROM employee_metrics

Result

department	salary	total_payroll	pct_of_payroll
Engineering	120000	625000	19.2
Engineering	110000	625000	17.6
Engineering	130000	625000	20.8
Sales	80000	625000	12.8
Sales	90000	625000	14.4
Marketing	95000	625000	15.2

Every row sees the same total_payroll of 625,000 because the window includes all rows. Each employee's percentage of total payroll is calculated inline without needing a subquery or a separate GROUP BY query. Carol at 130,000 represents 20.8% of the total payroll.

Running Calculations

Adding ORDER BY to window aggregates creates running calculations that accumulate row by row.

COUNT OVER: Running Counts

COUNT() as a window function counts rows within the window. With PARTITION BY and no ORDER BY, it gives you the total count for the partition. With ORDER BY, it gives a running count that increments row by row:

	SELECT
	order_id,
	region,
	COUNT(*) OVER (
	PARTITION BY region
	) AS region_total_orders,
	COUNT(*) OVER (
	PARTITION BY region
	ORDER BY order_id
	) AS running_count
	FROM orders

Result

order_id	region	region_total_orders	running_count
101	East	3	1
104	East	3	2
107	East	3	3
102	West	2	1
106	West	2	2

The region_total_orders column shows the total count for the entire partition (3 for East, 2 for West) because there is no ORDER BY. The running_count column increments one by one as it processes rows within each partition because the ORDER BY limits the window to rows up to the current row.

AVG OVER: Cumulative Avg

AVG() as a window function computes the average across the window. With ORDER BY, it calculates an expanding average (also called a cumulative average) from the first row to the current row. The average adjusts as more data accumulates:

	SELECT
	order_date,
	amount,
	AVG(amount) OVER (
	ORDER BY order_date
	) AS cumulative_avg
	FROM transactions

Result

order_date	amount	cumulative_avg
2025-01-01	500	500.00
2025-01-02	300	400.00
2025-01-03	700	500.00
2025-01-04	200	425.00
2025-01-05	400	420.00

On day 1, the average is just 500 (only one value). On day 2, it is (500 + 300) / 2 = 400. On day 3, it is (500 + 300 + 700) / 3 = 500. On day 4, it drops to (500 + 300 + 700 + 200) / 4 = 425 because the low value of 200 pulls the average down. The average adjusts with every new row.

Default Window Frame

When you use an aggregate window function with ORDER BY inside OVER, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This means "from the very first row up to and including the current row." That is why you get running totals and cumulative averages automatically when you include ORDER BY.

Without ORDER BY in the OVER clause, the default frame includes ALL rows in the partition. That is why SUM() OVER (PARTITION BY region) gives you the same total on every row within the partition. The presence or absence of ORDER BY fundamentally changes the behavior from "grand total per partition" to "running total within the partition."

TIP

Without ORDER BY in the OVER clause, the default frame includes ALL rows in the partition (giving partition-wide totals). With ORDER BY, it only includes rows up to the current row (giving running totals). This distinction is the single most important thing to understand about aggregate window functions.

> Complete this query to show each product alongside the overall average price across all products.

SELECT
  product_name,
  price,
  ___(price) ___ () AS avg_price
FROM products

AVG

OVER

SUM

GROUP BY

Using AVG with an empty OVER() clause is the most efficient way to compare each row to a grand average because the database computes the average once and attaches it to every row, avoiding the need for a subquery or a self-join.

When you pair AVG with ORDER BY inside OVER, the result shifts from a fixed grand average to a cumulative average that changes with each row, reflecting the average of all rows processed so far in the ordering.

Aggregate window functions like SUM, COUNT, AVG, MIN, and MAX all follow the same pattern: add OVER() to turn them into window functions that preserve every row instead of collapsing the result set.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data analyst at Shopify ranking products within each category by monthly revenue so the merchandising team can identify top performers and allocate promotional budget per vertical.

The OVER() clause turns aggregate-style functions into per-row calculations, keeping all product rows visible alongside their ranking.

PARTITION BY divides the product catalog into independent category windows so ranking resets cleanly for Electronics, Apparel, and Home separately.

ROW_NUMBER() assigns a unique sequential rank to every product even when two items share identical monthly revenue, guaranteeing one top pick per category.

RANK() and DENSE_RANK() expose tie behavior differently, so choose DENSE_RANK() when the merchandising team needs gapless tier counts across tied revenue bands.

KEY TAKEAWAYS

Window functions add computed columns to every row without collapsing data, unlike GROUP BY

ROW_NUMBER gives unique numbers (great for deduplication); RANK and DENSE_RANK handle ties with or without gaps

Aggregates like SUM/AVG/COUNT become window functions with OVER; adding ORDER BY makes them cumulative

PARTITION BY scopes calculations to independent groups; omitting it treats all rows as one window

Window functions run after WHERE/GROUP BY/HAVING, so filter on them via a subquery

Every row and the big picture at once

Category: SQL
Difficulty: beginner
Duration: 39 minutes
Challenges: 5 hands-on challenges

Topics covered: What is a Window Function?, PARTITION BY Deep Dive, ROW_NUMBER, RANK and DENSE_RANK, SUM/COUNT/AVG OVER

Lesson Sections

What is a Window Function? (concepts: sqlAggregateOver)
What Window Functions Solve Consider a common analytical question: "Show me each employee's salary alongside the average salary for their department." This requires two things at once: individual row detail (each employee's name and salary) and aggregate context (the department average). Before window functions existed, you had to write a subquery or a self-join to combine these. Window functions let you do it in a single, clean pass. Comparing Approaches Here is our employee data: GROUP BY Resu
PARTITION BY Deep Dive (concepts: sqlPartitionBy)
Single Column Partition Every Electronics row sees 1,200 as the category max price (the Laptop). Every Clothing row sees 150 (the Jacket). The partitions are completely independent. You can immediately see how each product compares to the most expensive item in its category. Advanced Partitioning Once you understand basic partitioning, you can apply more sophisticated techniques to handle complex real-world scenarios like multi-dimensional groupings and edge cases. Multiple Partition Columns NUL
ROW_NUMBER (concepts: sqlRowNumber)
Basic ROW_NUMBER Every employee gets a unique number from 1 to 6, ordered by salary from highest to lowest. Carol earns the most so she gets position 1. Dave earns the least so he gets position 6. No two rows ever share the same number. ROW_NUMBER + PARTITION BY Multiple ORDER BY Columns Handling NULLs in ORDER BY ROW_NUMBER Applications Dedup with ROW_NUMBER Which approach would you use to find the most recent signup per user?
RANK and DENSE_RANK (concepts: sqlRankDenseRank)
How Ties Work Consider five students with test scores where two students scored 92. Here is how the three ranking functions differ on the exact same data: When Ties Matter: Rankings RANK Example DENSE_RANK Example Choosing Between Functions The choice between these three functions comes down to how you want ties handled and whether gaps in the numbering matter. Compare them side by side on the same query:
SUM/COUNT/AVG OVER (concepts: sqlAggregateOver)
Regular SUM vs SUM OVER Running Totals: SUM OVER Here is how each row's running total is calculated: Row 1 (Jan 1): just 500, nothing before it. Row 2 (Jan 2): 500 + 300 = 800. Row 3 (Jan 3): 500 + 300 + 700 = 1,500. Row 4 (Jan 4): 500 + 300 + 700 + 200 = 1,700. Row 5 (Jan 5): 500 + 300 + 700 + 200 + 400 = 2,100. Each row accumulates all previous amounts plus its own. Running totals are perfect for cumulative revenue, inventory levels, progress toward a goal, or account balances. Any "how much s