Skip to content

Commit 148167e

Browse files
authored
Merge pull request #59 from PyDataBlog/experimental
Release of v0.1.3
2 parents fa8cca8 + 461c3d6 commit 148167e

6 files changed

+328
-165
lines changed

Project.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "ParallelKMeans"
22
uuid = "42b8e9d4-006b-409a-8472-7f34b3fb58af"
33
authors = ["Bernard Brenyah", "Andrey Oskin"]
4-
version = "0.1.2"
4+
version = "0.1.3"
55

66
[deps]
77
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"

docs/src/benchmark_image.png

98.7 KB
Loading

docs/src/index.md

+18-20
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
11
# [ParallelKMeans.jl Package](https://github.com/PyDataBlog/ParallelKMeans.jl)
22

3-
```@contents
4-
Depth = 4
5-
```
6-
73
## Motivation
84

95
It's actually a funny story led to the development of this package.
@@ -61,13 +57,14 @@ git checkout experimental
6157
- [X] Interface for inclusion in Alan Turing Institute's [MLJModels](https://github.com/alan-turing-institute/MLJModels.jl#who-is-this-repo-for).
6258
- [X] Full Implementation of Triangle inequality based on [Elkan - 2003 Using the Triangle Inequality to Accelerate K-Means"](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf).
6359
- [ ] Implementation of [Geometric methods to accelerate k-means algorithm](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf).
60+
- [ ] Support for other distance metrics supported by [Distances.jl](https://github.com/JuliaStats/Distances.jl#supported-distances).
6461
- [ ] Native support for tabular data inputs outside of MLJModels' interface.
6562
- [ ] Refactoring and finalizaiton of API desgin.
6663
- [ ] GPU support.
67-
- [ ] Even faster Kmeans implementation based on recent literature.
64+
- [ ] Implementation of other K-Means algorithm variants based on recent literature.
6865
- [ ] Optimization of code base.
6966
- [ ] Improved Documentation
70-
- [ ] More benchmark tests
67+
- [ ] More benchmark tests.
7168

7269
## How To Use
7370

@@ -83,7 +80,7 @@ multi_results = kmeans(X, 3; max_iters=300)
8380
results = kmeans(X, 3; n_threads=1, max_iters=300)
8481
```
8582

86-
The main design goal is to offer all available variations of the KMeans algorithm to end users as composable elements. By default, Lloyd's implementation is used but users can specify different variations of the KMeans clustering algorithm via this interface
83+
The main design goal is to offer all available variations of the KMeans algorithm to end users as composable elements. By default, Lloyd's implementation is used but users can specify different variations of the KMeans clustering algorithm via this interface;
8784

8885
```julia
8986
some_results = kmeans([algo], input_matrix, k; kwargs)
@@ -105,8 +102,8 @@ r.converged # whether the procedure converged
105102

106103
- [Lloyd()](https://cs.nyu.edu/~roweis/csc2515-2006/readings/lloyd57.pdf)
107104
- [Hamerly()](https://www.researchgate.net/publication/220906984_Making_k-means_Even_Faster)
105+
- [Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf)
108106
- [Geometric()](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf) - (Coming soon)
109-
- [Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf) - (Coming soon)
110107
- [MiniBatch()](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf) - (Coming soon)
111108

112109
### Practical Usage Examples
@@ -162,22 +159,23 @@ Currently, the benchmark speed tests are based on the search for optimal number
162159

163160
_________________________________________________________________________________________________________
164161

165-
| 1 million (ms) | 100k (ms) | 10k (ms) | 1k (ms) | package | language |
166-
|:--------------:|:---------:|:--------:|:-------:|:-----------------------:|:--------:|
167-
| 600184.00 | 31959.00 | 832.25 | 18.19 | Clustering.jl | Julia |
168-
| 35733.00 | 4473.00 | 255.71 | 8.94 | Lloyd | Julia |
169-
| 12617.00 | 1655.00 | 122.53 | 7.98 | Hamerly | Julia |
170-
| 1430000.00 | 146000.00 | 5770.00 | 344.00 | Sklearn Kmeans | Python |
171-
| 30100.00 | 3750.00 | 613.00 | 201.00 | Sklearn MiniBatchKmeans | Python |
172-
| 218200.00 | 15510.00 | 733.70 | 19.47 | Knor | R |
173-
162+
|1 million (ms)|100k (ms)|10k (ms)|1k (ms)|package |language|
163+
|:------------:|:-------:|:------:|:-----:|:---------------------:|:------:|
164+
| 666840 | 34034 |709.049 |17.686 | Clustering.jl | Julia |
165+
| 21730 | 2975 |163.771 | 6.444 | ParallelKMeans Lloyd | Julia |
166+
| 11784 | 1339 | 94.233 | 6.6 |ParallelKMeans Hamerly | Julia |
167+
| 17591 | 1074 | 81.995 | 6.953 | ParallelKMeans Elkan | Julia |
168+
| 1430000 | 146000 | 5770 | 344 | Sklearn Kmeans | Python |
169+
| 30100 | 3750 | 613 | 201 |Sklearn MiniBatchKmeans| Python |
170+
| 218200 | 15510 | 733.7 | 19.47 | Knor | R |
174171
_________________________________________________________________________________________________________
175172

176173
## Release History
177174

178-
- 0.1.0 Initial release
179-
- 0.1.1 Added interface for MLJ
180-
- 0.1.2 Added Elkan algorithm
175+
- 0.1.0 Initial release.
176+
- 0.1.1 Added interface for MLJ.
177+
- 0.1.2 Added Elkan algorithm.
178+
- 0.1.3 Faster & optimized execution.
181179

182180
## Contributing
183181

0 commit comments

Comments
 (0)