ComicFinder is an AI-powered content-based recommendation system built using Python and OpenAI Embeddings. It helps users discover semantically similar manga, manhwa, manhua, and webtoons based on natural language descriptions, genres, or titles โ ideal for fans seeking personalized recommendations beyond keyword search.
https://comicfinder.streamlit.app/
- ๐ Recommends similar manga/manhwa/manhua/webtoon based on descriptions or titles
- ๐ฆ Utilizes precomputed
clean_embeddings.npyfor fast results - ๐ง Embedding generation using OPENAI embeddings api
- โก Fast cosine similarity search for real-time recommendation
- ๐ฅ๏ธ Clean Streamlit-based frontend
- ๐ Organized data and scripts for easy retraining or extension
comic-recommender/
โโโ app.py # Main application script
โโโ data/
โ โโโ data.csv # Original manhwa dataset
โ โโโ clean_data.csv # Cleaned and preprocessed data
โ โโโ clean_embeddings.npy # (Ignored from Git, must be downloaded separately)
โโโ scripts/
โ โโโ clean_dataset.py # Data cleaning script
โ โโโ generate_embeddings.py # Embedding generation
โ โโโ recommend.py # Similarity-based recommendations but CLI version
โโโ .env # Store API keys
โโโ requirements.txt # Python dependencies
โโโ README.md # You're here!
git clone https://github.com/AdityaEXP/ComicFinder.git
cd ComicFinder
# Optional: Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
streamlit run .\app.py
- Find romance manhwa similar to What's Wrong with Secretary Kim?
- Get fantasy webtoon recommendations with strong male leads
- Discover hidden manga gems with character development arcs
- Replace genre filters with AI-powered natural language queries
Since clean_embeddings.npy is large, itโs not included in this repo. ๐ฆ Download clean_embeddings.npy Or you can also generate the clean_embeddings.npy using your own openai api key it will cost around $0.02 per generation
Create a .env file for your OpenAI API Key
OPENAIKEY=sk-xxxxxx
MIT โ free to use, modify, and distribute.
Aditya ๐ ๏ธ AI + Python + Web3 Enthusiast
- Replace cosine similarity by FAISS for fast searches
- Adding Anime and webseries dataset as well
- Create a automated source for scrapping data from webpages or api and update the dataset periodically.
- Improve searching by creating high value embeddings using more data etc.
This project uses data inspired by or adapted from the following Kaggle dataset:
๐ Kaggle - Manhwa and Webtoon Dataset
Credit to Victor Soeiro for compiling and sharing this dataset.
