The k-Means Algorithm Visualized
This quick post started as a challenge to myself: “I wonder if I can bang out a k-means implementation with visualization from memory in less than 2 game of thrones episodes…” The result is an immutable, functional implementation in ES6 including a visualization in D3.js.
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
The naïve algorithm is pretty simple:
- decide on a distance measure (e.g. Euclidean distance)
- decide a-priori how many clusters you want and randomly position the centroid in the feature space.
- now iteratively
- assign each point to a cluster based on minimum distance to the centroids
- move the centroids to the actual cluster centroid positions
- repeat until change below threshold
The Euclidean version is well suited to find circular clusters, the choice of measure always depends on the problem at hand. Also there are approaches to infer the optimal number of clusters, but that is left to a post in the future.
Then head over to Github and open an Issue please!