A content-based recommendation system uses natural language processing and machine learning to suggest articles to users based on the content they have previously read.
This project implements a content-based recommendation system that fetches news articles from The Guardian API, processes them using machine learning techniques, and serves personalized article recommendations through a Streamlit web app. The app allows users to log in, read articles, like or dislike them, and read recommendations.
-
Environment Setup:
- Configuration of a Python environment using Visual Studio Code.
- Installation of necessary packages, including
pandas
,scikit-learn
,Streamlit
, andnltk
for natural language processing.
-
Data Collection:
- Fetch news articles using The Guardian API. You can do this through the [Notebook](./Data%20Fecthing%20The%20Guardian.ipynb!
- Store articles' metadata and content for further processing.
-
Data Processing:
- Clean and preprocess the article data, focusing on key fields like title, body, and publication date.
- Use
TF-IDF
(Term Frequency-Inverse Document Frequency) to vectorize the text content. - Calculate cosine similarity between articles to determine similarity.
- Use
nltk
to include synonyms in the search functionality, enhancing the search experience.
-
Recommendation System Development:
- Implementation of a content-based filtering approach to recommend articles similar to those previously read by the user.
- Allow users to provide feedback (like/dislike) to improve future recommendations.
-
Deployment and User Interaction:
- Serve the model through a Streamlit app.
- Provide a user interface for searching articles, viewing recommendations, and tracking reading history.
- Include user login functionality, allowing multiple users to maintain separate preferences.
- Python: Main programming language for the project.
- Streamlit: For building the interactive web application.
- The Guardian API: For fetching news articles.
- pandas & NumPy: For data manipulation and analysis.
- scikit-learn: For machine learning tasks, including TF-IDF vectorization and cosine similarity.
- nltk: For natural language processing, including handling synonyms.
- Matplotlib: For visualizing user feedback statistics.
- Demonstrate the ability to build and deploy a content-based recommendation system.
- Provide a user-friendly web application for interacting with the recommendation system.
- Implement user feedback mechanisms to refine and personalize recommendations.
- Clone the repository to your local machine.
- Set up the environment with the necessary packages as described.
- Fetch the latest articles from The Guardian using the provided scripts.
- Run the Streamlit app to start the recommendation system.
- Explore, search, and get recommendations based on your reading history.
- Enhance the model by incorporating user feedback (likes/dislikes) into the recommendation algorithm.
- Explore additional machine learning algorithms for improving recommendations.