This project involves implementing a clustering algorithm using Symmetric Non-negative Matrix Factorization (SymNMF). We also built a K-means clustering algorithm for comparison on various datasets. A highlight of our project is a custom Python module, written in C, aimed at enhancing clustering performance.
- Python 3.x
- C Compiler
- Numpy
- Pandas
- symnmf.py : Python implementation of SymNMF
- symnmf.c : C implementation of SymNMF
- symnmf.h : C header file
- symnmfmodule.c : Python C API Wrapper for SymNMF
- analysis.py : Analysis and comparison script
- setup.py : Build script for Python C extension
- Makefile : Make script for C executable
- Text_file_input_example.txt Data points input file example
Follow these steps to get your clustering project up and running:
Open your terminal and execute:
make
In the terminal, run:
python3 setup.py build_ext --inplace
Run the SymNMF algorithm with:
python3 symnmf.py <k> <goal> <file_name.txt>
Where:
<k>: Specifies the number of clusters<goal>: Defines the algorithm goal (choices are symnmf, sym, ddg, norm)<file_name.txt>: Data points input file (e.g., Text_file_input_example.txt)
Text File Input Format:
Such as Text_file_input_example.txt, should contain vectors separated by commas. Each line in the file represents a single data point in the feature space.
To perform the data analysis, execute:
python3 analysis.py <file_name.txt>
- Lior Kovtun
- Shalev Baruch