An Efficient and Randomized Clustering Algorithm that utilizes Randomized Algorithms on K-Medians
“You cannot eat a cluster of grapes at once, but it is very easy if you eat them one by one.” - Jacques Roumain
I set out on this project in order to investigate why there weren’t any projects that investigated K-Medians and its functionality within the various repositories that I had explored online. I conclude my findings within the document titled Sharan_RClusterFinal.pdf. In order to conduct my research, I wrote this code.
K-Medians clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. This project could be of use to two different sets of people:
The data set contained within the file data_1024.csv(which in fact is tab seperated), was given to me without a legitimate source. When charted, the results are as shown below:
The performance of my classifier is as shown below:
There are four different classifiers within this repository. In order to find out more about their performance, please refer to the document: Sharan_RClusterFinal.pdf.
There are in total four different clustering algorithms present within this repository. There are either Vanilla or have many modifications implemented onto them by principles from the subject of Randomized Algorithms. In order, to find out the theory behind these modificaitons please look at the attached document and the respective references within the document.
To run a particular clustering algorithm; in this case, the Randomized K-Medians algorithm, simply type the following into a terminal:
python rand_kmedians.py
Please refer to my documentation and comments in order to gain more insight into the programs.
[1] Mayo, M.M., 2016. An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm.
MIT