This package contains the code for executing clustering validity indices in Java by using K-means from Weka. The package includes the following clustering validity indices: Silhouette, Dunn, BD-Silhouette, BD-Dunn, Davies-Bouldin, Calinski-Harabasz, MaximumDiameter, SquaredDistance, AverageDistance, AverageBetweenClusterDistance, MinimumDistance.
This package contains Weka Cluster algorithm with a complete list of indices that will help you to decide the optimal number of clusters that the dataset could have. The package includes the followings indices:
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
This is a Maven Project with OpenCSV 2.4 and Weka 3.8.0 dependencies. Both of them are included in the pom.xml file in the repository.
<dependency>
<groupId>au.com.bytecode</groupId>
<artifactId>opencsv</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>nz.ac.waikato.cms.weka</groupId>
<artifactId>weka-stable</artifactId>
<version>3.8.0</version>
</dependency>
WekaCluster is the main class, and it includes 4 arguments that can be set from the code or directly when you execute the jar file:
int minNumCluster = 2;
int maxNumCluster = 10;
int selector = SMALLDATA;
String fileName = "SmallDataset.csv";
String folderFile = "C:\\datasets\\";
String pathFile = folderFile + fileName;
String outFile = getFileNameOutput(selector, fileName);
For this configuration the application load a file called SmallDataset.csv in “C:/datasets” and the result file will be saved as “Results-SmallDataset.csv”in the application folder.
If we preffer executing in a terminal using java we just have to:
java -jar smallDataIndices.jar 2 10 C:/datasets/SmallDataset.csv Results.csv ALL
java -jar smallDataIndices.jar 10 20 datasets/dataset.csv results.csv SMALLDATA
[1] Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M. et al. Prog Artif Intell (2017). https://doi.org/10.1007/s13748-017-0135-3
Please, cite as: Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M. et al. Prog Artif Intell (2017). https://doi.org/10.1007/s13748-017-0135-3 (https://link.springer.com/article/10.1007%2Fs13748-017-0135-3)