Summary: in this tutorial, you will learn how to use the SQLite CUME_DIST()
function to calculate the cumulative distribution of a value within a window or partition.
Introduction to SQLite CUME_DIST()
Function
The CUME_DIST()
is a window function that calculates the cumulative distribution of value within a window or partition.
The following shows the syntax of the CUME_DIST()
function:
CUME_DIST()
OVER (
[PARTITION BY partition_expression]
[ORDER BY order_list]
)
Code language: SQL (Structured Query Language) (sql)
In this syntax:
- The
PARTITION BY
clause specifies how the rows are grouped into partitions to which theCUME_DIST()
function applies. If you skip thePARTITION BY
clause, the function treats the whole result set as a single partition. - The
ORDER BY
clause specifies the order of rows in each partition to which theCUME_DIST()
function applies. If you omit theORDER BY
clause, the function returns 1 for all rows.
Suppose N
is the value of the current row of the column specified in the ORDER BY
clause and the order of rows is from low to high, the cumulative distribution of c is calculated using the following formula:
The number of rows with values <= N / The number of rows in the window or partition
Code language: SQL (Structured Query Language) (sql)
The return value of the CUME_DIST()
function is greater than 0 and less than or equal to 1:
0 < CUME_DIST() <= 1
Code language: SQL (Structured Query Language) (sql)
The rows with same values receive the same result.
SQLite CUME_DIST()
function example
First, create a new table named CumeDistDemo
for the demonstration:
CREATE TABLE CumeDistDemo(
Id INTEGER PRIMARY KEY,
value INT
);
Code language: SQL (Structured Query Language) (sql)
Second, insert some rows into the CumeDistDemo
table:
INSERT INTO CumeDistDemo(value)
VALUES(1000),(1200),(1200),(1400),(2000);
Code language: SQL (Structured Query Language) (sql)
Third, query data from the CumeDistDemo
table:
SELECT
Id,
Value
FROM
CumeDistDemo;
Code language: SQL (Structured Query Language) (sql)
Fourth, calculate the cumulative distribution of the values in the CumeDistDemo
table:
SELECT
Value,
CUME_DIST()
OVER (
ORDER BY value
) CumulativeDistribution
FROM
CumeDistDemo;
Code language: SQL (Structured Query Language) (sql)
Here is the output:
Because we skipped the PARTITION BY
clause, the function treated the whole result set as a single partition. Therefore, the number of rows to be evaluated is 5.
For the row id 1, the number of rows with the value <= 1000 is 1 therefore the cumulative distribution of 1000 is 1/ 5 = 0.2
For the row id 2, the number of rows with the value <= 1200 is 3 therefore the cumulative distribution of 1200 is 3/ 5 = 0.6.
The same logic is applied to the row with id 3, 4 and 5.
In this tutorial, you have learned how to calculate the cumulative distribution of a value in a set of values using the the SQLite CUME_DIST()
function.