Window Functions: Beginner

Amazon ranks products across 12 million listings in real time, and it does not accomplish this by running a separate query for each of its hundreds of product categories. Instead, Amazon's data engineers use window functions to compute rankings within every category in a single pass over the data, using ROW_NUMBER and RANK with PARTITION BY to reset the ranking independently for Electronics, Clothing, Books, and every other vertical simultaneously. The alternative would be thousands of individual GROUP BY queries stitched together, which would be impossibly slow and expensive at scale. When you understand window functions, you understand how the world's largest retailer keeps its search results fresh.

What is a Window Function?

Daily Life
Interviews

Calculate across rows without collapsing them

Window functions compute aggregates while keeping every row in your result intact. GROUP BY can tell you the highest sale per region, but it cannot show that figure alongside each individual transaction. That is exactly the gap window functions fill.

THE CASE OF THE DUPLICATED TOTALS

Ava Mitchell flagged a salary report with the same department total repeated on every row. Individual salaries are all different, but the totals look copy-pasted. Something calculated those numbers on purpose.

Six employees. Three departments. Count the output rows.

employees
idnamedepartmentsalary
1ChenEngineering85000
2PatelEngineering92000
3KimEngineering78000
4OseiMarketing68000
5TanakaMarketing71000
6RuizSales88000

A window function performs a calculation across a set of rows that are related to the current row. Unlike GROUP BY, which collapses many rows into one summary row, a window function keeps every row and adds the computed value as a new column. The term "window" refers to the set of rows the function looks at when computing its result for the current row.

Think of it this way: GROUP BY answers questions like "what is the average salary per department?" by producing one row per department. Window functions answer questions like "what is each employee's salary compared to their department average?" by keeping every employee row and adding the department average as a new column next to it.

The key to window functions is the OVER clause. Every window function includes OVER() after the function name. The OVER clause defines which rows the function should consider, called the "window" of rows. Without the OVER clause, these functions behave like ordinary aggregate functions that collapse rows.

//

What Window Functions Solve

Consider a common analytical question: "Show me each employee's salary alongside the average salary for their department." This requires two things at once: individual row detail (each employee's name and salary) and aggregate context (the department average). Before window functions existed, you had to write a subquery or a self-join to combine these. Window functions let you do it in a single, clean pass.

Comparing Approaches

Let us walk through a concrete example to see the difference between GROUP BY and window functions.

Here is our employee data:
employee_namedepartmentsalary
AliceEngineering120000
BobEngineering110000
CarolEngineering130000
DaveSales80000
EveSales90000
FrankMarketing95000

With GROUP BY, we get one row per department. The individual employees disappear entirely. With a window function, we keep all six rows and add the department average as a new column. Switch between the two approaches below to see the difference:

1SELECT
2 department,
3 AVG(salary) AS avg_salary
4FROM employee_metrics
5GROUP BY department
//

GROUP BY Result (Collapsed)

departmentavg_salary
Engineering120000.00
Marketing95000.00
Sales85000.00
Only three rows remain. All employee-level detail is gone. You cannot tell who earns what.
//

Preserved Rows in Results

1SELECT
2 employee_name,
3 department,
4 salary,
5 AVG(salary) OVER (
6 PARTITION BY department
7 ) AS dept_avg
8FROM employee_metrics
Result
employee_namedepartmentsalarydept_avg
AliceEngineering120000120000.00
BobEngineering110000120000.00
CarolEngineering130000120000.00
DaveSales8000085000.00
EveSales9000085000.00
FrankMarketing9500095000.00

Notice how every row is preserved. Alice, Bob, and Carol each see the Engineering average of 120,000 next to their own salary. Dave and Eve each see the Sales average of 85,000. The OVER clause with PARTITION BY department tells SQL: "Calculate the average within each department independently, and attach the result to every row in that department."

GROUP BY
  • Collapses rows into groups
  • Returns one row per group
  • Loses individual row detail
  • Cannot mix detail and summary
  • Must list all non-aggregated columns
Window Functions
  • Keeps every row intact
  • Adds computed column to each row
  • Preserves individual detail
  • Detail and summary side by side
  • No restrictions on selected columns
TIP
Use GROUP BY when you only need summaries (total sales per region, count of users per country). Use window functions when you need each row to have context about its group (each sale's percentage of the regional total, each user's rank within their country).
//

The OVER Clause

The OVER clause is what makes a function a window function. It can contain up to three parts, each controlling a different aspect of which rows are included and how they are processed:

PARTITION BY
PARTITION BY
Divides rows into groups (partitions). Each group is processed independently, similar to GROUP BY but keeps all rows.
ORDER BY
ORDER BY
Defines row ordering within each partition. Required for ranking functions. Determines direction of running totals.
Frame (ROWS/RANGE)
Frame (ROWS/RANGE)
Controls which rows near the current row are included. Defaults to preceding rows plus current row with ORDER BY.

> Complete this query to add a running total of sales alongside each row.

SELECT
  user_id,
  total_amount,
  ___(total_amount) ___ (
    ORDER BY transaction_date
  ) AS running_total
FROM transactions
HAVING
GROUP BY
SUM
OVER

An empty OVER() clause means "use all rows in the entire result set as the window." This gives every row the same computed value. Adding PARTITION BY narrows each window to rows sharing the same partition value. Adding ORDER BY inside the OVER clause defines row ordering, which is required for ranking functions and controls the direction of running totals.

You can combine all three parts. For example, OVER (PARTITION BY department ORDER BY salary DESC) means: "Within each department, process rows from highest salary to lowest." The function then runs independently for each department, in salary order.

What happens when you change the OVER clause from empty to partitioned?

1SELECT
2 employee_name,
3 salary,
4 SUM(salary) OVER () AS total
5FROM employee_metrics

Window functions execute after WHERE, GROUP BY, and HAVING. They operate on the result set that those clauses produce. You cannot reference a window function in a WHERE clause. If you need to filter on a window function result, wrap it in a subquery and filter in the outer query.

PARTITION BY Deep Dive

Daily Life
Interviews

Run calculations within grouped subsets

PARTITION BY is the window function equivalent of GROUP BY. It divides the result set into independent subsets (partitions), and the window function runs separately within each partition. Crucially, unlike GROUP BY, it does NOT collapse rows. Every row remains in the output with its partition's computed value attached.

Think of PARTITION BY as drawing invisible boundaries around groups of rows. Within each boundary, the window function operates as if the other groups do not exist. Rankings restart, sums reset, and counts begin from scratch for each partition.

//

Single Column Partition

The most common use case is partitioning by a category column to rank or aggregate within each category. You have already seen this pattern: partitioning employees by department to rank salaries within each department. Here is another example using MAX to find the highest price in each product category:

1SELECT
2 product_name,
3 category,
4 price,
5 MAX(price) OVER (
6 PARTITION BY category
7 ) AS category_max_price
8FROM products
Result
product_namecategorypricecategory_max_price
LaptopElectronics12001200
PhoneElectronics8001200
TabletElectronics5001200
JacketClothing150150
ShoesClothing120150
Every Electronics row sees 1,200 as the category max price (the Laptop). Every Clothing row sees 150 (the Jacket). The partitions are completely independent. You can immediately see how each product compares to the most expensive item in its category.

Advanced Partitioning

Once you understand basic partitioning, you can apply more sophisticated techniques to handle complex real-world scenarios like multi-dimensional groupings and edge cases.
//

Multiple Partition Columns

You can partition by multiple columns to create finer-grained groups. Each unique combination of partition values becomes its own window. This is useful when your data has multiple dimensions of grouping:

1SELECT
2 employee_name,
3 department,
4 fiscal_year,
5 salary,
6 AVG(salary) OVER (
7 PARTITION BY department, fiscal_year
8 ) AS dept_year_avg
9FROM employee_salaries
Result
employee_namedepartmentfiscal_yearsalarydept_year_avg
CarolEngineering2024130000125000.00
AliceEngineering2024120000125000.00
CarolEngineering2025140000132500.00
BobEngineering2025125000132500.00
EveSales20249000085000.00
DaveSales20248000085000.00

Partitioning by both department and fiscal_year creates separate windows like "Engineering 2024", "Engineering 2025", "Sales 2024", and so on. The average resets for each unique combination. Carol and Alice share the Engineering 2024 average of 125,000, while Carol and Bob in Engineering 2025 share an average of 132,500.

//

NULL in PARTITION BY

When a partition column contains NULL values, all NULL rows are placed into the SAME partition. This is different from most SQL operations where NULL does not equal NULL. In the context of window function partitioning, NULL values are grouped together as if they were equal:

1SELECT
2 employee_name,
3 department,
4 salary,
5 COUNT(*) OVER (
6 PARTITION BY department
7 ) AS dept_count
8FROM employee_metrics
Result
employee_namedepartmentsalarydept_count
AliceEngineering1200003
BobEngineering1100003
CarolEngineering1300003
DaveNULL700002
EveNULL650002

Dave and Eve have no department (NULL). They land in the same partition and share a dept_count of 2. Be aware of this behavior when your partition columns might have missing values. If you need to exclude NULLs from partitioning, filter them out with a WHERE clause before the window function runs.

//

PARTITION BY vs None

When you omit PARTITION BY entirely, the entire result set is treated as a single partition. This means the window function considers ALL rows together. You can use both in the same query to get company-wide and department-level stats side by side:

1SELECT
2 employee_name,
3 department,
4 salary,
5 SUM(salary) OVER () AS company_total,
6 SUM(salary) OVER (
7 PARTITION BY department
8 ) AS dept_total
9FROM employee_metrics
Result
employee_namedepartmentsalarycompany_totaldept_total
AliceEngineering120000625000360000
BobEngineering110000625000360000
CarolEngineering130000625000360000
DaveSales80000625000170000
EveSales90000625000170000
FrankMarketing9500062500095000

The company_total column uses no partition, so every row sees the same grand total of 625,000. The dept_total column partitions by department, so Engineering rows see 360,000, Sales rows see 170,000, and Marketing sees 95,000. Having both in the same query lets you compute percentages like "Engineering represents 360,000 / 625,000 = 57.6% of total payroll."

How does the number of partition columns affect your results?
1SELECT
2 employee_name,
3 department,
4 salary,
5 SUM(salary) OVER (
6 PARTITION BY department
7 ) AS dept_total
8FROM employee_metrics
No PARTITION BY
No PARTITION BY
Entire result set is one window. All rows see the same computed value.
Single column partition
Single column partition
One window per unique value. Most common pattern for department, region, or category analysis.
Multi-column partition
Multi-column partition
One window per unique combination. Use for year+department, region+product type, etc.
NULLs in partition columns
NULLs in partition columns
All NULL rows form their own partition. They are grouped together, not excluded.

With these partition behaviors in mind, here are the key guidelines for working with window functions.

Do
  • Use PARTITION BY to scope calculations to logical groups
  • Combine window functions with regular columns freely
  • Use multiple partition columns for fine-grained grouping
  • Filter NULL partition columns with WHERE when you need to exclude them
Don't
  • Don't use GROUP BY when you need to keep individual rows
  • Don't reference window functions in WHERE clauses
  • Don't assume NULL rows are excluded from partitions. NULLs form their own group
  • Don't confuse PARTITION BY (no row collapse) with GROUP BY (collapses rows)

> Complete this query to show each metric alongside the total for its department.

SELECT
  metric_name,
  department,
  metric_value,
  ___(metric_value) OVER (
    ___ ___
  ) AS dept_total
FROM employee_metrics
GROUP BY
SUM
AVG
PARTITION BY
department
PARTITION BY Summary
  • No PARTITION BY: entire result set is one window
  • Single column: one window per unique value in that column
  • Multiple columns: one window per unique combination of values
  • NULL values: all NULLs are grouped into the same partition

Understanding PARTITION BY is the key to unlocking the full power of window functions. Every window function in the following sections builds on the partitioning behavior covered here.

The pattern of PARTITION BY combined with any window function is the foundation: it computes a value across a group of related rows while keeping all rows in the result. The upcoming sections apply this to ranking (ROW_NUMBER, RANK) and running calculations (SUM/AVG/COUNT OVER).

ROW_NUMBER

Daily Life
Interviews

Assign unique position numbers to rows

ROW_NUMBER() assigns a unique sequential integer to each row within its partition, starting at 1. It is the most commonly used window function and appears in almost every data engineering interview. You will use it for numbering rows, pagination, deduplication, and picking the top N items per group.

//

Basic ROW_NUMBER

At minimum, ROW_NUMBER() needs an ORDER BY inside the OVER clause to know how to order the numbering. Without ORDER BY, the row numbering is non-deterministic and could change between query executions:

1SELECT
2 employee_name,
3 salary,
4 ROW_NUMBER() OVER (
5 ORDER BY salary DESC
6 ) AS salary_position
7FROM employee_metrics
Result
employee_namesalarysalary_position
Carol1300001
Alice1200002
Bob1100003
Frank950004
Eve900005
Dave800006
Every employee gets a unique number from 1 to 6, ordered by salary from highest to lowest. Carol earns the most so she gets position 1. Dave earns the least so he gets position 6. No two rows ever share the same number.
//

ROW_NUMBER + PARTITION BY

1SELECT
2 employee_name,
3 department,
4 salary,
5 ROW_NUMBER() OVER (
6 PARTITION BY department
7 ORDER BY salary DESC
8 ) AS dept_salary_position
9FROM employee_metrics

Adding PARTITION BY restarts the numbering for each group. The numbering begins at 1 within each partition independently. To get only the top earner per department, wrap this in a subquery and filter WHERE dept_salary_position = 1.

//

Multiple ORDER BY Columns

You can order by multiple columns to control tie-breaking. When two rows have the same value in the first column, the second column determines their order. This makes the numbering deterministic even when values repeat:

1SELECT
2 employee_name,
3 department,
4 salary,
5 hire_date,
6 ROW_NUMBER() OVER (
7 ORDER BY salary DESC, hire_date
8 ) AS position
9FROM employee_metrics
Result
employee_namedepartmentsalaryhire_dateposition
CarolEngineering1300002022-03-151
AliceEngineering1200002020-06-012
BobEngineering1100002021-09-103
FrankMarketing950002019-01-204
EveSales900002023-02-145
DaveSales800002021-11-306

If two employees had the same salary, the one hired earlier (hire_date ASC) would get the lower position number. Adding a second sort column eliminates ambiguity and ensures consistent results across repeated runs.

//

Handling NULLs in ORDER BY

When your ordering column contains NULL values, you should explicitly control where they appear using NULLS LAST or NULLS FIRST. Without this, the default NULL positioning depends on your database engine and can surprise you:

1SELECT
2 employee_name,
3 bonus,
4 ROW_NUMBER() OVER (
5 ORDER BY bonus DESC NULLS LAST
6 ) AS bonus_position
7FROM employee_metrics
Result
employee_namebonusbonus_position
Carol150001
Alice120002
Eve80003
BobNULL4
DaveNULL5

NULLS LAST pushes rows with missing bonus values to the end. The default NULL sort order varies, so it is best practice to always specify NULLS LAST or NULLS FIRST explicitly so your query behaves the same across all database engines.

ROW_NUMBER Applications

ROW_NUMBER has powerful applications including deduplication, pagination, and selecting top-N per group.

//

Dedup with ROW_NUMBER

One of the most common interview patterns is using ROW_NUMBER to deduplicate data. If a table has duplicate rows, you can number them within each group and keep only the first one. This pattern uses ROW_NUMBER in a subquery, then filters for row number 1 in the outer query:

1SELECT
2 *
3FROM (
4 SELECT
5 user_id,
6 email,
7 created_at,
8 ROW_NUMBER() OVER (
9 PARTITION BY user_id
10 ORDER BY created_at DESC
11 ) AS rn
12 FROM user_signups
13) AS sub
14WHERE rn = 1
Result
user_idemailcreated_atrn
101alice@email.com2025-01-151
102bob@email.com2025-01-101
103carol@email.com2025-01-121

This keeps only the most recent signup per user. The PARTITION BY user_id groups rows by user, ORDER BY created_at DESC puts the newest first, and the outer WHERE rn = 1 keeps only the first row per user. This is cleaner than using GROUP BY with MAX and a self-join.

Which approach would you use to find the most recent signup per user?
1SELECT
2 u.user_id,
3 u.email,
4 u.created_at
5FROM user_signups AS u
6INNER JOIN (
7 SELECT
8 user_id,
9 MAX(created_at) AS max_date
10 FROM user_signups
11 GROUP BY user_id
12) AS latest
13 ON u.user_id = latest.user_id
14 AND u.created_at = latest.max_date
TIP
ROW_NUMBER() always produces unique numbers, even when rows have identical values. If two employees have the same salary, one will get position 3 and the other position 4. The tie-breaking is arbitrary unless you add more columns to ORDER BY. If ties matter, use RANK or DENSE_RANK instead.

> Complete this query to number metrics within each department by value.

SELECT
  department,
  metric_name,
  metric_value,
  ___() OVER (
    PARTITION BY ___
    ORDER BY metric_value ___
  ) AS rank
FROM employee_metrics
department
DESC
ROW_NUMBER
ASC
RANK

ROW_NUMBER always produces unique sequential numbers, even when rows have identical values in the ORDER BY column.

The deduplication pattern filters WHERE rn = 1 after ROW_NUMBER to keep only the top row per partition.

If ties matter and rows with the same value should share a rank, use RANK or DENSE_RANK instead of ROW_NUMBER.

RANK and DENSE_RANK

Daily Life
Interviews

Rank rows with or without gaps

When rows share the same value, ROW_NUMBER() arbitrarily assigns different numbers. But sometimes ties matter. If two students scored 92 on an exam, they should share the same rank. RANK() and DENSE_RANK() handle ties explicitly, giving identical values the same rank. They differ in what happens to the numbers that come after a tie.

//

How Ties Work

Consider five students with test scores where two students scored 92. Here is how the three ranking functions differ on the exact same data:
studentscoreROW_NUMBERRANKDENSE_RANK
Alice98111
Bob92222
Carol92322
Dave85443
Eve78554

ROW_NUMBER: Bob gets 2, Carol gets 3. No ties allowed. The order between Bob and Carol is arbitrary since they have the same score. One of them just happens to be processed first.

RANK: Bob and Carol both get 2. The next value skips to 4. Gaps appear after ties. Think of it like Olympic medals: if two athletes tie for silver, the next athlete gets bronze (3rd place), but in RANK's logic, they skip to 4th. There is no rank 3 because two rows occupied position 2.

DENSE_RANK: Bob and Carol both get 2. The next value is 3. No gaps. Every consecutive rank value is used. If you need to know "how many distinct score levels exist above Dave," DENSE_RANK tells you directly: 2 (which are rank 1 and rank 2).

//

When Ties Matter: Rankings

Imagine you are building a scholarship system. The school gives scholarships to students ranked in the top 3. If two students tie for rank 2, should the student just below them be ranked 3rd or 4th? The answer depends on your business rules, and that determines whether you use RANK or DENSE_RANK.

//

RANK Example

RANK assigns tied rows the same number and then skips ahead. If two rows share rank 2, the next row gets rank 4. The rank value tells you: "how many rows scored higher than or equal to this row?"

1SELECT
2 student_name,
3 score,
4 RANK() OVER (ORDER BY score DESC) AS score_rank
5FROM test_scores
Result
student_namescorescore_rank
Alice981
Bob922
Carol922
Dave854
Eve785

Bob and Carol tied at 92, so both get rank 2. RANK then skips to 4 for Dave. There is no rank 3 because two rows occupied position 2. Dave's rank of 4 tells you that exactly 3 students scored higher than or tied with him.

//

DENSE_RANK Example

DENSE_RANK assigns tied rows the same number but never skips values. The next row after a tie gets the very next integer. This is useful when you care about "how many distinct rank levels exist" rather than "what position would this be in a sorted list."

1SELECT
2 student_name,
3 score,
4 DENSE_RANK() OVER (
5 ORDER BY score DESC
6 ) AS dense_score_rank
7FROM test_scores
Result
student_namescoredense_score_rank
Alice981
Bob922
Carol922
Dave853
Eve784

With DENSE_RANK, Dave gets 3 instead of 4. No gaps in the sequence. There are exactly 4 distinct rank levels in this dataset. If your scholarship covers "the top 3 rank levels," DENSE_RANK gives you Alice, Bob, Carol, and Dave (ranks 1, 2, and 3), while RANK would only give you Alice, Bob, and Carol (ranks 1 and 2, with rank 3 never assigned).

Choosing Between Functions

The choice between these three functions comes down to how you want ties handled and whether gaps in the numbering matter. Compare them side by side on the same query:
1SELECT
2 product_name,
3 revenue,
4 ROW_NUMBER() OVER (
5 ORDER BY revenue DESC
6 ) AS position
7FROM products

> Rank metrics by value within each department, giving tied values the same rank and skipping ahead after ties.

SELECT
  metric_name,
  department,
  metric_value,
  ___() OVER (
    ___ department
    ORDER BY metric_value ___
  ) AS value_rank
FROM employee_metrics
ROW_NUMBER
DENSE_RANK
PARTITION BY
RANK
DESC

RANK and DENSE_RANK both give tied rows the same number. The only difference is what happens after a tie: RANK skips the next value (1, 2, 2, 4), while DENSE_RANK never skips (1, 2, 2, 3).

Choose RANK when rank values should reflect position in a sorted list ("3rd out of 10"), and choose DENSE_RANK when you need to count distinct rank levels ("how many score tiers exist above this student?").

When ties do not matter and you need a unique number for every row, use ROW_NUMBER instead. It guarantees no two rows share the same value, making it the right choice for deduplication and pagination.

SUM/COUNT/AVG OVER

Daily Life
Interviews

Compute running totals alongside detail rows

You already know SUM, COUNT, and AVG as aggregate functions with GROUP BY. When you pair them with OVER(), they become window functions. Instead of collapsing rows, they compute values across a window while preserving every row. This is one of the most practical applications of window functions in day-to-day data work.

//

Regular SUM vs SUM OVER

Before diving into running totals, let us clearly see the difference between a regular aggregate SUM and a window function SUM. The regular SUM collapses rows; the window SUM preserves them:

1SELECT
2 region,
3 SUM(amount) AS total_amount
4FROM orders
5GROUP BY region
//

Running Totals: SUM OVER

A running total adds each new value to the accumulated sum so far. When you use SUM() with OVER(ORDER BY ...), SQL computes the sum from the first row up to and including the current row. Let us walk through each row step by step:

1SELECT
2 order_date,
3 amount,
4 SUM(amount) OVER (
5 ORDER BY order_date
6 ) AS running_total
7FROM transactions
Result
order_dateamountrunning_total
2025-01-01500500
2025-01-02300800
2025-01-037001500
2025-01-042001700
2025-01-054002100
Here is how each row's running total is calculated: Row 1 (Jan 1): just 500, nothing before it. Row 2 (Jan 2): 500 + 300 = 800. Row 3 (Jan 3): 500 + 300 + 700 = 1,500. Row 4 (Jan 4): 500 + 300 + 700 + 200 = 1,700. Row 5 (Jan 5): 500 + 300 + 700 + 200 + 400 = 2,100. Each row accumulates all previous amounts plus its own.
Running totals are perfect for cumulative revenue, inventory levels, progress toward a goal, or account balances. Any "how much so far" question is a running total use case.
//

Grand Totals: SUM OVER ()

An empty OVER() clause (no PARTITION BY, no ORDER BY) means "use all rows." This is great for computing each row's percentage of a total without needing a subquery:

1SELECT
2 department,
3 salary,
4 SUM(salary) OVER () AS total_payroll,
5 ROUND(
6 salary * 100 / SUM(salary) OVER (
7 ),
8 1
9 ) AS pct_of_payroll
10FROM employee_metrics
Result
departmentsalarytotal_payrollpct_of_payroll
Engineering12000062500019.2
Engineering11000062500017.6
Engineering13000062500020.8
Sales8000062500012.8
Sales9000062500014.4
Marketing9500062500015.2

Every row sees the same total_payroll of 625,000 because the window includes all rows. Each employee's percentage of total payroll is calculated inline without needing a subquery or a separate GROUP BY query. Carol at 130,000 represents 20.8% of the total payroll.

Running Calculations

Adding ORDER BY to window aggregates creates running calculations that accumulate row by row.

//

COUNT OVER: Running Counts

COUNT() as a window function counts rows within the window. With PARTITION BY and no ORDER BY, it gives you the total count for the partition. With ORDER BY, it gives a running count that increments row by row:

1SELECT
2 order_id,
3 region,
4 COUNT(*) OVER (
5 PARTITION BY region
6 ) AS region_total_orders,
7 COUNT(*) OVER (
8 PARTITION BY region
9 ORDER BY order_id
10 ) AS running_count
11FROM orders
Result
order_idregionregion_total_ordersrunning_count
101East31
104East32
107East33
102West21
106West22

The region_total_orders column shows the total count for the entire partition (3 for East, 2 for West) because there is no ORDER BY. The running_count column increments one by one as it processes rows within each partition because the ORDER BY limits the window to rows up to the current row.

//

AVG OVER: Cumulative Avg

AVG() as a window function computes the average across the window. With ORDER BY, it calculates an expanding average (also called a cumulative average) from the first row to the current row. The average adjusts as more data accumulates:

1SELECT
2 order_date,
3 amount,
4 AVG(amount) OVER (
5 ORDER BY order_date
6 ) AS cumulative_avg
7FROM transactions
Result
order_dateamountcumulative_avg
2025-01-01500500.00
2025-01-02300400.00
2025-01-03700500.00
2025-01-04200425.00
2025-01-05400420.00
On day 1, the average is just 500 (only one value). On day 2, it is (500 + 300) / 2 = 400. On day 3, it is (500 + 300 + 700) / 3 = 500. On day 4, it drops to (500 + 300 + 700 + 200) / 4 = 425 because the low value of 200 pulls the average down. The average adjusts with every new row.
//

Default Window Frame

When you use an aggregate window function with ORDER BY inside OVER, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This means "from the very first row up to and including the current row." That is why you get running totals and cumulative averages automatically when you include ORDER BY.

Without ORDER BY in the OVER clause, the default frame includes ALL rows in the partition. That is why SUM() OVER (PARTITION BY region) gives you the same total on every row within the partition. The presence or absence of ORDER BY fundamentally changes the behavior from "grand total per partition" to "running total within the partition."

TIP
Without ORDER BY in the OVER clause, the default frame includes ALL rows in the partition (giving partition-wide totals). With ORDER BY, it only includes rows up to the current row (giving running totals). This distinction is the single most important thing to understand about aggregate window functions.

> Complete this query to show each product alongside the overall average price across all products.

SELECT
  product_name,
  price,
  ___(price) ___ () AS avg_price
FROM products
OVER
AVG
GROUP BY
SUM

Using AVG with an empty OVER() clause is the most efficient way to compare each row to a grand average because the database computes the average once and attaches it to every row, avoiding the need for a subquery or a self-join.

When you pair AVG with ORDER BY inside OVER, the result shifts from a fixed grand average to a cumulative average that changes with each row, reflecting the average of all rows processed so far in the ordering.

Aggregate window functions like SUM, COUNT, AVG, MIN, and MAX all follow the same pattern: add OVER() to turn them into window functions that preserve every row instead of collapsing the result set.

PUTTING IT ALL TOGETHER

> You are a data analyst at Shopify ranking products within each category by monthly revenue so the merchandising team can identify top performers and allocate promotional budget per vertical.

The OVER() clause turns aggregate-style functions into per-row calculations, keeping all product rows visible alongside their ranking.
PARTITION BY divides the product catalog into independent category windows so ranking resets cleanly for Electronics, Apparel, and Home separately.
ROW_NUMBER() assigns a unique sequential rank to every product even when two items share identical monthly revenue, guaranteeing one top pick per category.
RANK() and DENSE_RANK() expose tie behavior differently, so choose DENSE_RANK() when the merchandising team needs gapless tier counts across tied revenue bands.
KEY TAKEAWAYS
Window functions add computed columns to every row without collapsing data, unlike GROUP BY
ROW_NUMBER gives unique numbers (great for deduplication); RANK and DENSE_RANK handle ties with or without gaps
Aggregates like SUM/AVG/COUNT become window functions with OVER; adding ORDER BY makes them cumulative
PARTITION BY scopes calculations to independent groups; omitting it treats all rows as one window
Window functions run after WHERE/GROUP BY/HAVING, so filter on them via a subquery

Every row and the big picture at once

Category
SQL
Difficulty
beginner
Duration
39 minutes
Challenges
5 hands-on challenges

Topics covered: What is a Window Function?, PARTITION BY Deep Dive, ROW_NUMBER, RANK and DENSE_RANK, SUM/COUNT/AVG OVER

Lesson Sections

  1. What is a Window Function? (concepts: sqlPartitionBy)

    What Window Functions Solve Consider a common analytical question: "Show me each employee's salary alongside the average salary for their department." This requires two things at once: individual row detail (each employee's name and salary) and aggregate context (the department average). Before window functions existed, you had to write a subquery or a self-join to combine these. Window functions let you do it in a single, clean pass. Comparing Approaches Here is our employee data: GROUP BY Resu

  2. PARTITION BY Deep Dive

    Single Column Partition Every Electronics row sees 1,200 as the category max price (the Laptop). Every Clothing row sees 150 (the Jacket). The partitions are completely independent. You can immediately see how each product compares to the most expensive item in its category. Advanced Partitioning Once you understand basic partitioning, you can apply more sophisticated techniques to handle complex real-world scenarios like multi-dimensional groupings and edge cases. Multiple Partition Columns NUL

  3. ROW_NUMBER (concepts: sqlRowNumber)

    Basic ROW_NUMBER Every employee gets a unique number from 1 to 6, ordered by salary from highest to lowest. Carol earns the most so she gets position 1. Dave earns the least so he gets position 6. No two rows ever share the same number. ROW_NUMBER + PARTITION BY Multiple ORDER BY Columns Handling NULLs in ORDER BY ROW_NUMBER Applications Dedup with ROW_NUMBER Which approach would you use to find the most recent signup per user?

  4. RANK and DENSE_RANK (concepts: sqlRankDenseRank)

    How Ties Work Consider five students with test scores where two students scored 92. Here is how the three ranking functions differ on the exact same data: When Ties Matter: Rankings RANK Example DENSE_RANK Example Choosing Between Functions The choice between these three functions comes down to how you want ties handled and whether gaps in the numbering matter. Compare them side by side on the same query:

  5. SUM/COUNT/AVG OVER (concepts: sqlAggregateOver)

    Regular SUM vs SUM OVER Running Totals: SUM OVER Here is how each row's running total is calculated: Row 1 (Jan 1): just 500, nothing before it. Row 2 (Jan 2): 500 + 300 = 800. Row 3 (Jan 3): 500 + 300 + 700 = 1,500. Row 4 (Jan 4): 500 + 300 + 700 + 200 = 1,700. Row 5 (Jan 5): 500 + 300 + 700 + 200 + 400 = 2,100. Each row accumulates all previous amounts plus its own. Running totals are perfect for cumulative revenue, inventory levels, progress toward a goal, or account balances. Any "how much s