SQLite CUME_DIST

Summary: in this tutorial, you will learn how to use the SQLite CUME_DIST() function to calculate the cumulative distribution of a value within a window or partition.

Introduction to SQLite CUME_DIST() Function

The CUME_DIST() is a window function that calculates the cumulative distribution of value within a window or partition.

The following shows the syntax of the CUME_DIST() function:

CUME_DIST() 
OVER (
	[PARTITION BY partition_expression]
	[ORDER BY order_list]
)
Code language: SQL (Structured Query Language) (sql)

In this syntax:

  • The PARTITION BY clause specifies how the rows are grouped into partitions to which the CUME_DIST() function applies. If you skip the PARTITION BY clause, the function treats the whole result set as a single partition.
  • The ORDER BY clause specifies the order of rows in each partition to which the CUME_DIST() function applies. If you omit the ORDER BY clause, the function returns 1 for all rows.

Suppose N is the value of the current row of the column specified in the ORDER BY clause and the order of rows is from low to high, the cumulative distribution of c is calculated using the following formula:

The number of rows with values <= N / The number of rows in the window or partition
Code language: SQL (Structured Query Language) (sql)

The return value of the CUME_DIST() function is greater than 0 and less than or equal to 1:

0 < CUME_DIST() <= 1
Code language: SQL (Structured Query Language) (sql)

The rows with same values receive the same result.

SQLite CUME_DIST() function example

First, create a new table named CumeDistDemo for the demonstration:

CREATE TABLE CumeDistDemo(
	Id INTEGER PRIMARY KEY,
	value INT
);
Code language: SQL (Structured Query Language) (sql)

Second, insert some rows into the CumeDistDemo table:

INSERT INTO CumeDistDemo(value)
VALUES(1000),(1200),(1200),(1400),(2000);
Code language: SQL (Structured Query Language) (sql)

Third, query data from the CumeDistDemo table:

SELECT 
	Id,
	Value 
FROM 
	CumeDistDemo;
Code language: SQL (Structured Query Language) (sql)
SQLite CUME_DIST Sample Table

Fourth, calculate the cumulative distribution of the values in the CumeDistDemo table:

SELECT 
	Value, 
	CUME_DIST() 
	OVER (
		ORDER BY value
	) CumulativeDistribution
FROM
	CumeDistDemo;
Code language: SQL (Structured Query Language) (sql)

Here is the output:

SQLite CUME_DIST Function Example

Because we skipped the PARTITION BY clause, the function treated the whole result set as a single partition. Therefore, the number of rows to be evaluated is 5.

For the row id 1, the number of rows with the value <= 1000 is 1 therefore the cumulative distribution of 1000 is 1/ 5 = 0.2

For the row id 2, the number of rows with the value <= 1200 is 3 therefore the cumulative distribution of 1200 is 3/ 5 = 0.6.

The same logic is applied to the row with id 3, 4 and 5.

In this tutorial, you have learned how to calculate the cumulative distribution of a value in a set of values using the the SQLite CUME_DIST() function.

Was this tutorial helpful ?