
Points are not only the most primitive datatype in computational geometry, but they are the most commonly used ones. If you need a quick overview of points, you can check this post. They are most commonly used in communicating the location of an event (the location of purchase, pickup point, a patient, etc…). But Whenever the count of these points became large enough, it would be hard to visualize them or make a pattern out of them. So sadly they may become useless. But the good news is that they can become more valuable than ever if you know what to do.
So if you cannot understand them, cluster them i.e. separate points into groups. But how? We will see later in this post. In many cases, clusters (groups) are easier to understand and reason about. For example, clustering students’ points into groups to assign a school bus for each group or performing suitability analysis to choose the best distribution for new hospitals given the residences of a city. Enough chitchat, let’s see.
Cluster Formation
First and foremost, we need to capture the points we are dealing with, If we are lucky (as I assume that we are) these points would fit in memory for processing. We also need to perform some operations on these newly acquired points like actually cluster them for instance, get the centroid of a specific cluster, and Compare a set of centroids against others. These operations can be abstracted in the header file below.
#ifndef POINT_POINTCLOUD_H
#define POINT_POINTCLOUD_H
#include "../points/point.h"
#include <iostream>
class pointCloud {
public:
pointCloud();
explicit pointCloud(vector<point *> &points);
vector<vector<point *>> cluster(int clusters);
double diameter();
string asWKT();
static point * getCentroid(const vector<point *> & pointCluster);
static bool isSameCentroids(const vector<point *> & a,const vector<point *> & b);
static void PrintClusters(const vector<vector<point *>> &clusters);
private:
vector<point *> points;
void assignPointsToCentroid(vector<point *> ¢roids, vector<vector<point *>> &ps);
};
#endif //POINT_POINTCLOUD_H
So as we can see, the class constructor accepts a vector of points as an input to be used later. The Cluster method can be invoked to split the points into groups. The diameter method gets the diameter of the point cloud. asWKT returns a WKT representation of the point cloud. As well as a couple of static methods whose functions are quite clear.
So, without further ado, let’s dive into the constructor. As shown in the code snippet below, the constructor is quite simple. It simply stores the vector of points into the points property.
It is worth noting that this algorithm has a time complexity of O(t*k*n*d) where
- t: number of iterations nade (depends on the initialization)
- k: number of clusters required
- n: number of points
- d: dimensionality of the points
Which can be close to quadratic O(n2) in most cases. Although they are many more efficient implementations that go to linear complexity. check this one for example.
pointCloud::pointCloud(vector<point *> &points) {
this->points = points;
}
The Clustering Logic
To actually cluster them we will use a simple yet intuitive algorithm called “k-means”. This algorithm takes an input value other than the points to be clustered. This input is the number of clusters to be generated denoted as n.
- The algorithm starts by selecting n points randomly (or preferably based on some heuristic) and assumes that these points are the perfect set of centroids.
- Then we assign each of the points in our point cloud to one of the cluster centroids based on euclidean distance.
- Then we recalculate the centroid for each cluster of points.
- If the old set of cluster centroid matches the newly acquired one, we know that this is the actual perfect set of clusters. So we return.
- If not we loop to step 2 until convergence.
This algorithm is guaranteed to be finite, although the full proof is out of this post’s scope. Listed below is a simple straightforward implementation of this algorithm.
vector<vector<point *>> pointCloud::cluster(int clusterCount) {
//initially assume any points as centroids
vector<point *> centroids = vector<point *>(points.begin(),points.begin()+clusterCount);
vector<vector<point *>> clusters = vector<vector<point *>>(clusterCount);
while (true){
//assign points to centroids
clusters = vector<vector<point *>>(clusterCount);
assignPointsToCentroid(centroids,clusters);
vector<point *> newCentroids ;
//update centroids
for (int i = 0; auto v : clusters) {
newCentroids.push_back(getCentroid(v));
i++;
}
//break when centroids match prev iteration
if(isSameCentroids(newCentroids,centroids))break;
else{
centroids = newCentroids;
}
}
return clusters;
}
void pointCloud::assignPointsToCentroid(vector<point *> ¢roids, vector<vector<point *>>&ps) {
for (int i = 0; i < points.size(); ++i) {
float bestDistance = INTMAX_MAX;
int bestCentroid = 0;
//find the best cluster for it
for (int j = 0;j<centroids.size();j++) {
float distance = centroids[j]->distance(points[i]);
if(bestDistance > distance){
bestDistance = distance;
bestCentroid = j;
}
}
ps[bestCentroid].push_back(points[i]);
}
}
bool pointCloud::isSameCentroids(const vector<point *> &a,const vector<point *> &b) {
if(a.size()!=b.size())return false;
for (int i = 0; i < a.size(); ++i) {
if(a[i]->inProximity(0.000001,b[i]))continue;
else return false;
}
return true;
}
To make this class testable, we need to provide some methods to print the input cloud of points as WKT (friendly format in most of the computational geometry platforms such as ARC desktop or QGIS) and it goes like this:
string pointCloud::asWKT() {
string s = "GeometryCollection(";
for(point * p : points)
s+=p->asWKT()+",";
s = s.substr(0,s.size()-1);
s+=")";
return s;
}
And we need also to print the out clusters, which is a trivial implementation:
void pointCloud::PrintClusters(const vector<vector<point *>> &clusters) {
for (int i = 0; i < clusters.size(); ++i) {
cout<< string(5,'-')<<"cluster \""<<i<<"\" points"<< string(5,'-')<<endl;
for (point * p : clusters[i])
cout<<p->asWKT()<<endl;
}
}
See it working
To try out this simple algorithm we need to perform a set of steps:
- Create a list of random points
- Use this list to form a point cloud instance
- Print the representation of this point cloud
- Visualize it
- Cluster the point cloud
- Visualize the clusters
The code to perform this task goes al follows:
#include <iostream>
#include "points/point.h"
#include "pointCloud/pointCloud.h"
#include <unordered_set>
int main() {
vector<point *> points;
//make 10 2D points
for (int i = 0; i < 10; ++i) {
vector<float> coords;
float randx = (float)(rand()%100)/100.0;
float randy = (float)(rand()%100)/100.0;
coords.push_back(randx);
coords.push_back(randy);
points.push_back(new point(coords));
}
//make point cloud
pointCloud * pc = new pointCloud(points);
cout<<pc->asWKT()<<endl;
//cluster them into 2 clusters
auto clusters = pc->cluster(3);
pointCloud::PrintClusters(clusters);
return 0;
}
The output of this program execution was as follows:
GeometryCollection(POINT(0.070000 0.490000 ),POINT(0.730000 0.580000 ),POINT(0.300000 0.720000 ),POINT(0.440000 0.780000 ),POINT(0.230000 0.090000 ),POINT(0.400000 0.650000 ),POINT(0.920000 0.420000 ),POINT(0.870000 0.030000 ),POINT(0.270000 0.290000 ),POINT(0.400000 0.120000 ))
-----cluster "0" points-----
POINT(0.070000 0.490000 )
POINT(0.230000 0.090000 )
POINT(0.270000 0.290000 )
POINT(0.400000 0.120000 )
-----cluster "1" points-----
POINT(0.730000 0.580000 )
POINT(0.920000 0.420000 )
POINT(0.870000 0.030000 )
-----cluster "2" points-----
POINT(0.300000 0.720000 )
POINT(0.440000 0.780000 )
POINT(0.400000 0.650000 )
The visualization of this point cloud was done using QGIS software and a quickWKT extension.

The output of our clustering algorithm can be visualized as follows:

In the last figure, each color represents a cluster.














