Do they belong together?

cloud of points

Points are not only the most primitive datatype in computational geometry, but they are the most commonly used ones. If you need a quick overview of points, you can check this post. They are most commonly used in communicating the location of an event (the location of purchase, pickup point, a patient, etc…). But Whenever the count of these points became large enough, it would be hard to visualize them or make a pattern out of them. So sadly they may become useless. But the good news is that they can become more valuable than ever if you know what to do.

So if you cannot understand them, cluster them i.e. separate points into groups. But how? We will see later in this post. In many cases, clusters (groups) are easier to understand and reason about. For example, clustering students’ points into groups to assign a school bus for each group or performing suitability analysis to choose the best distribution for new hospitals given the residences of a city. Enough chitchat, let’s see.

Cluster Formation

First and foremost, we need to capture the points we are dealing with, If we are lucky (as I assume that we are) these points would fit in memory for processing. We also need to perform some operations on these newly acquired points like actually cluster them for instance, get the centroid of a specific cluster, and Compare a set of centroids against others. These operations can be abstracted in the header file below.

  • All code snippets of this blog can be found here
  • We are building on our point classes here
  • If you like this post, share, and give the repo a star
#ifndef POINT_POINTCLOUD_H
#define POINT_POINTCLOUD_H
#include "../points/point.h"
#include <iostream>
class pointCloud {
public:
    pointCloud();
    explicit pointCloud(vector<point *> &points);
    vector<vector<point *>> cluster(int clusters);
    double diameter();
    string asWKT();
    static point * getCentroid(const vector<point *> & pointCluster);
    static bool isSameCentroids(const vector<point *> & a,const vector<point *> & b);
    static void PrintClusters(const vector<vector<point *>> &clusters);
private:
    vector<point *> points;
    void assignPointsToCentroid(vector<point *> &centroids, vector<vector<point *>> &ps);
};
#endif //POINT_POINTCLOUD_H

So as we can see, the class constructor accepts a vector of points as an input to be used later. The Cluster method can be invoked to split the points into groups. The diameter method gets the diameter of the point cloud. asWKT returns a WKT representation of the point cloud. As well as a couple of static methods whose functions are quite clear.

So, without further ado, let’s dive into the constructor. As shown in the code snippet below, the constructor is quite simple. It simply stores the vector of points into the points property.

It is worth noting that this algorithm has a time complexity of O(t*k*n*d) where

  • t: number of iterations nade (depends on the initialization)
  • k: number of clusters required
  • n: number of points
  • d: dimensionality of the points

Which can be close to quadratic O(n2) in most cases. Although they are many more efficient implementations that go to linear complexity. check this one for example.

pointCloud::pointCloud(vector<point *> &points) {
    this->points = points;
}

The Clustering Logic

To actually cluster them we will use a simple yet intuitive algorithm called “k-means”. This algorithm takes an input value other than the points to be clustered. This input is the number of clusters to be generated denoted as n.

  1. The algorithm starts by selecting n points randomly (or preferably based on some heuristic) and assumes that these points are the perfect set of centroids.
  2. Then we assign each of the points in our point cloud to one of the cluster centroids based on euclidean distance.
  3. Then we recalculate the centroid for each cluster of points.
  4. If the old set of cluster centroid matches the newly acquired one, we know that this is the actual perfect set of clusters. So we return.
  5. If not we loop to step 2 until convergence.

This algorithm is guaranteed to be finite, although the full proof is out of this post’s scope. Listed below is a simple straightforward implementation of this algorithm.

vector<vector<point *>> pointCloud::cluster(int clusterCount) {
    //initially assume any points as centroids
    vector<point *> centroids = vector<point *>(points.begin(),points.begin()+clusterCount);
    vector<vector<point *>> clusters = vector<vector<point *>>(clusterCount);
    while (true){
        //assign points to centroids
        clusters = vector<vector<point *>>(clusterCount);
        assignPointsToCentroid(centroids,clusters);
        vector<point *> newCentroids ;
        //update centroids
        for (int i = 0; auto v : clusters) {
            newCentroids.push_back(getCentroid(v));
            i++;
        }
        //break when centroids match prev iteration
        if(isSameCentroids(newCentroids,centroids))break;
        else{
            centroids = newCentroids;
        }
    }
    return clusters;
}

void pointCloud::assignPointsToCentroid(vector<point *> &centroids, vector<vector<point *>>&ps) {
    for (int i = 0; i < points.size(); ++i) {
        float bestDistance = INTMAX_MAX;
        int bestCentroid = 0;
        //find the best cluster for it
        for (int j = 0;j<centroids.size();j++) {
            float distance = centroids[j]->distance(points[i]);
            if(bestDistance > distance){
                bestDistance = distance;
                bestCentroid = j;
            }
        }
        ps[bestCentroid].push_back(points[i]);
    }
}

bool pointCloud::isSameCentroids(const vector<point *> &a,const vector<point *> &b) {
    if(a.size()!=b.size())return false;
    for (int i = 0; i < a.size(); ++i) {
        if(a[i]->inProximity(0.000001,b[i]))continue;
        else return false;
    }
    return true;
}

To make this class testable, we need to provide some methods to print the input cloud of points as WKT (friendly format in most of the computational geometry platforms such as ARC desktop or QGIS) and it goes like this:

string pointCloud::asWKT() {
    string s = "GeometryCollection(";
        for(point * p : points)
            s+=p->asWKT()+",";
        s = s.substr(0,s.size()-1);
    s+=")";
    return s;
}

And we need also to print the out clusters, which is a trivial implementation:

void pointCloud::PrintClusters(const vector<vector<point *>> &clusters) {
    for (int i = 0; i < clusters.size(); ++i) {
        cout<< string(5,'-')<<"cluster \""<<i<<"\" points"<< string(5,'-')<<endl;
        for (point * p : clusters[i])
            cout<<p->asWKT()<<endl;
    }
}

See it working

To try out this simple algorithm we need to perform a set of steps:

  1. Create a list of random points
  2. Use this list to form a point cloud instance
  3. Print the representation of this point cloud
  4. Visualize it
  5. Cluster the point cloud
  6. Visualize the clusters

The code to perform this task goes al follows:

#include <iostream>
#include "points/point.h"
#include "pointCloud/pointCloud.h"
#include <unordered_set>
int main() {
    vector<point *> points;
    //make 10 2D points
    for (int i = 0; i < 10; ++i) {
        vector<float> coords;
        float randx = (float)(rand()%100)/100.0;
        float randy = (float)(rand()%100)/100.0;
        coords.push_back(randx);
        coords.push_back(randy);
        points.push_back(new point(coords));
    }
    //make point cloud
    pointCloud * pc = new pointCloud(points);
    cout<<pc->asWKT()<<endl;
    //cluster them into 2 clusters
    auto clusters = pc->cluster(3);
    pointCloud::PrintClusters(clusters);
    return 0;
}

The output of this program execution was as follows:

GeometryCollection(POINT(0.070000 0.490000 ),POINT(0.730000 0.580000 ),POINT(0.300000 0.720000 ),POINT(0.440000 0.780000 ),POINT(0.230000 0.090000 ),POINT(0.400000 0.650000 ),POINT(0.920000 0.420000 ),POINT(0.870000 0.030000 ),POINT(0.270000 0.290000 ),POINT(0.400000 0.120000 ))
-----cluster "0" points-----
POINT(0.070000 0.490000 )
POINT(0.230000 0.090000 )
POINT(0.270000 0.290000 )
POINT(0.400000 0.120000 )
-----cluster "1" points-----
POINT(0.730000 0.580000 )
POINT(0.920000 0.420000 )
POINT(0.870000 0.030000 )
-----cluster "2" points-----
POINT(0.300000 0.720000 )
POINT(0.440000 0.780000 )
POINT(0.400000 0.650000 )

The visualization of this point cloud was done using QGIS software and a quickWKT extension.

Randomly Generated points

The output of our clustering algorithm can be visualized as follows:

After Clustering

In the last figure, each color represents a cluster.

Introduction to computational geometry

Computational geometry has a wide range of applications and yet it is rarely discussed in tutorials. So I’m starting this series as an introduction to this interesting topic. I’m going to start from the very basics like point representation, standard formats, simple operations to complex topics like hyper plane routing in convex hulls. The rest of the post is organized as follows:

  • What is Computational geometry and it’s applications
  • Standard representations of computational geometry objects
  • Types of simple geometries
  • Simple operations on geometrical objects

What is Computational geometries

Computational geometry is a branch of computer science concerned with the representation, standardization, and making operations on geometrical objects like road networks, classification of point clouds or building simple models like human skeletons. Rarely any application that requires simulation of models such as building games, tracking, spacial data analysis or viewing customized maps does not depend heavily on computational geometry.

Standard Representations of Computational Geometries

There are several standards when it comes to computational geometries. Each of which is best fit a different situation. For example the representation used to serve a map can be quite different than the one used to communicate data throw an API and neither are suited for performing database operations.

For example most databases are familiar with both the WKT and WKB, While GEOJSON format is an API friendly. For map viewing there are several protocols such as WFS and MVT for serving vector data and WMS, WTMS and TMS are widely used for serving raster images. For serving large data files the commonly used formats are SHP, KML, etc….

Upcoming posts will explore each and everyone of these techniques as well as comparisons and implementations. keep tuned.

Types of Simple Geometries

Simple geometries start always with a point. A point can be identified by a vector which contain n elements, where n is the number of dimensions. For example, if we consider a 1D plain, then a point is simply a vector contains one number (coordinate) and in this special case only can a point represented as a scaler. If you were operating in a 2D space, a point will contain two elements p=[x, y], and in 3D it would be p=[x, y, z] and so on.

The second is lines, lines can be represented as two points, Lines extend beyond these points to span an infinite length while keeping its orientation. So in short a line is an ordered pair of points L=[p1,p2]. If the line has finite length, it can be represented as a set of points having the same orientation.

Lines that does not conform with the above discreption, such as roads are called line strings or poly lines, this kind of line is composed of a vector of lines where each line end connects to the next one beginning.

Triangles, squares, hexagons and polygon are all considered polygons. A polygon is represented as a set of lines, where each all the lines forms a closed shape.

Shapes where its inside is emptied or form more complex type of shape can be considered either multi-polygons or geometry collection, where geometry collection are used to define an object or more than one of the aforementioned types fused together.

Simple Operations on Geometries

Operations on geometries varies from calculating the distance of a line, or between two points. To smart grouping of points into clusters, investigating the center of a propagating signal, deciding whether a point, line, etc.. crosses or fits into another. To more complicated tasks such as routing, calculating shortest path, and more. If you are interested please like the post, and our page light syntax so you do not miss any go it