An intelligent RSS feed aggregator that uses AI to classify and prioritize articles based on your interests. The system fetches articles from RSS feeds, analyzes their content using OpenAI's API, and creates a prioritized reading queue based on customizable topic preferences.
- AI-Powered Classification: Uses OpenAI's GPT models to automatically categorize articles by topic
- Customizable Scoring: Define your own topic preferences with positive/negative scoring weights
- Persistent Queue: Maintains a priority queue of articles across runs
- Duplicate Prevention: Tracks processed URLs to avoid re-processing articles
- Discord Bot Integration: Discord bot for notifications and interaction
- Automated Scheduling: Windows Task Scheduler integration for regular updates
- LLM Customization: A Discord-accessible LLM command that modifies project configuration, allowing for easy refactoring of rules
- Reading Channels: Ability for administrators to designate certain channels as "reading channels"
- Setup Wizard: Script to automatically set up files for individual user settings
rss-queue/
├── scraper.py # Main RSS scraping and processing logic
├── llm_handler.py # OpenAI API integration for content classification
├── scoring.py # Article scoring based on topic rules
├── bot.py # Discord bot (optional)
├── setup.ps1 # Windows setup script
├── run_scraper.bat # Batch file for running scraper
├── requirements.txt # Python dependencies
└── config/
├── config.json # General configuration
├── feeds.txt # Your RSS feed URLs (create from example)
├── feeds.example.txt # Example RSS feeds
├── rules.json # Topic scoring rules (create from example)
├── rules.example.json # Example topic rules
├── articles.json # Persistent article metadata (now in temp/)
└── processed_urls.txt # Tracking file for processed articles (now in temp/)
- Python 3.8+
- OpenAI API key
- Discord bot token for bot functionality
-
Clone the repository:
git clone https://github.com/caitlynslashai/rss-queue.git cd rss-queue -
Run automated setup:
- Open PowerShell as Administrator
- Navigate to the project directory
- Allow script execution:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process
- Run the setup script:
.\setup.ps1
The setup script will:
- Create a Python virtual environment
- Install required dependencies
- Set up Windows Task Scheduler for automated runs
-
Create virtual environment:
python -m venv venv venv\Scripts\activate # Windows # or source venv/bin/activate # Linux/Mac
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables: Create a
.envfile in the project root:OPENAI_API_KEY=your_openai_api_key_here DISCORD_BOT_TOKEN=your_discord_bot_token_here # Optional
-
Set up configuration files:
# Copy example files and customize copy config\feeds.example.txt config\feeds.txt copy config\rules.example.json config\rules.json
Add your RSS feed URLs, one per line:
https://example.com/feed.xml
https://anotherblog.com/rss
Define scoring weights for different topics. Higher scores = higher priority:
{
"topic_rules": {
"AI Safety": 100,
"Technology": 75,
"Politics": -25,
"Celebrity": -100,
"Default": 0
}
}- Positive scores: Topics you want to prioritize
- Negative scores: Topics you want to deprioritize
- Default: Score for unclassified content
{
"TRUNCATION_LENGTH": 2000
}TRUNCATION_LENGTH: Maximum characters sent to AI for classification (affects API costs)
python scraper.pyThis will:
- Fetch new articles from your RSS feeds
- Classify each article using AI
- Add articles to the priority queue
- Display the prioritized reading list
The setup script configures Windows Task Scheduler to run the scraper automatically. You can modify the schedule through Task Scheduler or by editing the setup script.
- Feed Processing: The scraper reads RSS feeds from
config/feeds.txt - Content Extraction: Uses Readability and BeautifulSoup to extract article text
- AI Classification: Sends truncated content to OpenAI for topic classification
- Scoring: Applies your custom rules to generate priority scores
- Queue Management: Maintains a persistent priority queue of articles
- Output: Displays articles in priority order for reading
The system uses OpenAI's gpt-4.1-nano model for cost efficiency. Costs depend on:
- Number of new articles processed
TRUNCATION_LENGTHsetting (longer = more expensive)
Typical costs are minimal for personal use with a few RSS feeds.
- The bot currently only syncs articles to disk every one minute. If the bot is closed or crashes within one minute of running the /next command, the article will not be removed from queue.
Check scraper.log for detailed execution logs and error messages.
This project is open source. See the repository for license details.