ckmeans | /index.js | simplestatistics@v7.7.5

function ckmeans

import { ckmeans } from "https://deno.land/x/simplestatistics@v7.7.5/index.js";

ckmeans(x, nClusters)

Ckmeans clustering is an improvement on heuristic-based clustering approaches like Jenks. The algorithm was developed in Haizhou Wang and Mingzhou Song as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.

Minimizing the difference within groups - what Wang & Song refer to as withinss, or within sum-of-squares, means that groups are optimally homogenous within and the data is split into representative groups. This is very useful for visualization, where you may want to represent a continuous variable in discrete color or style groups. This function can provide groups that emphasize differences between data.

Being a dynamic approach, this algorithm is based on two matrices that store incrementally-computed values for squared deviations and backtracking indexes.

This implementation is based on Ckmeans 3.4.6, which introduced a new divide and conquer approach that improved runtime from O(kn^2) to O(kn log(n)).

Unlike the original implementation, this implementation does not include any code to automatically determine the optimal number of clusters: this information needs to be explicitly provided.

References

Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming Haizhou Wang and Mingzhou Song ISSN 2073-4859

from The R Journal Vol. 3/2, December 2011

Examples

ckmeans([-1, 2, -1, 2, 4, 5, 6, -1, 2, -1], 3); // The input, clustered into groups of similar numbers. //= [[-1, -1, -1, -1], [2, 2, 2], [4, 5, 6]]);

Parameters

input data, as an array of number values

nClusters

number of desired classes. This cannot be greater than the number of values in the data array.