Things on a Heap

|
A Collection of Programming Ramblings by chjdev

The k-Means Algorithm Visualized

This quick post started as a challenge to myself: “I wonder if I can bang out a k-means implementation with visualization from memory in less than 2 game of thrones episodes…” The result is an immutable, functional implementation in ES6 including a visualization in D3.js.

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The naïve algorithm is pretty simple:

  • decide on a distance measure (e.g. Euclidean distance)
  • decide a-priori how many clusters you want and randomly position the centroid in the feature space.
  • now iteratively
    • assign each point to a cluster based on minimum distance to the centroids
    • move the centroids to the actual cluster centroid positions
    • repeat until change below threshold

The Euclidean version is well suited to find circular clusters, the choice of measure always depends on the problem at hand. Also there are approaches to infer the optimal number of clusters, but that is left to a post in the future.

The full ECMAScript code can be found here or viewed directly on Github. A transpiled version using Babel can be found here.

You have a question or found an issue?
Then head over to Github and open an Issue please!