Implementation of the following algorithms:
- K-Means initialization (9 different methods) [1]
 - Random Swap algorithm [2]
 - Genetic Algorithm [3]
 
This package is also needed as a dependency to be able to build some of the projects in the UEF organization (like Random Swap)
Compile:
make cbkm
Example with 15 clusters:
./cbkm -O -S15 datasets/s1.txt output.txt
./cbkm -O -S15 datasets/s1.ts output.ts
Steinley initialization method, two repeats:
./cbkm -I5 -O -R2 -S15 datasets/s1.txt output.txt
K-means++ initialization method, two repeats:
./cbkm -I2 -O -R2 -S15 datasets/s1.txt output.txt
Run ./cbkm for help:
CBKM	Version 0.65	4.4.2017
Repeated K-means algorithm.
Usage: CBKM [-option] <training set> [initial cb/pa] <result codebook>
For example: CBKM bridge initial tmp
  Options:
  -Bn              = Sample percentage 10000/(1..10000) (0..10000, default=1000)
  -Hn1,n2          = 
                      n1: Hybrid methods combining K-means
                            0 = None (default)
                            1 = Kmeans+DensityPeaks
                      n2: Hybrid method variant (0..10000, default=1)
  -I[n1,n2,..n4]   = 
                      n1: Initialization method
                            0 = DensityPeaks
                            1 = Random (default)
                            2 = K-means++
                            3 = Bradley
                            4 = Projection
                            5 = Luxburg
                            6 = MaxMin
                            7 = Random partition (Forgy)
                            8 = Splitting
                            9 = Sorting heuristic
                      n2: Initialization method variant (Maxmin-variant: 1=rand point first, 2=mean first) (0..10000, default=1)
                      n3: Initialization method option (0..10000, default=0)
                      n4: Initialization method (third) option (0..10000, default=0)
  -O               = Overwrite existing file (default=NO)
  -P               = Save partition file (default=NO)
  -Qn              = Quiet level (0..99, default=2)
  -Rn              = Number of repeats (0..1000000, default=1)
  -Sn              = Number of clusters (1..5000, default=256)
  -Tn              = Iterations, 0=INF (0..50000000, default=0)
  -Zn              = Random seed (0=clock)  (0..2147483647, default=0)
Compile:
make cbrs
Example with 15 clusters:
./cbrs -O -S15 datasets/s1.txt output.txt
Run ./cbrs  for help
Compile:
make cbga
Example with 15 clusters:
./cbga -O -S15 datasets/s1.txt output.txt
Run ./cbga for help
Debug
Compile with debug flags:
make cbkm DEBUG=-g
[1] P Fränti, S Sieranoja, "How much can k-means be improved by using better initialization and repeats?", Pattern Recognition, 93, 95-112, 2019. https://doi.org/10.1016/j.patcog.2019.04.014
[2] P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018. (pdf) JF=1 https://doi.org/10.1186/s40537-018-0122-y
[3] P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pattern Recognition Letters, 21 (1), 61-68, 2000. (pdf) IF=1.03 https://doi.org/10.1016/S0167-8655(99)00133-6