An intelligent, automated news aggregation system that scrapes, processes, and curates AI-related content from multiple sources, then delivers personalized daily digests via email. Features a beautiful web interface for user subscriptions with instant email confirmation.
π Live Demo: https://ai-news-aggregator-digest.vercel.app
π§ Email Service: Uses Gmail SMTP for reliable email delivery to any recipient. No domain verification required!
Beautiful, modern landing page with hero section and feature highlights
User-friendly subscription form with real-time validation
Instant email confirmation sent to new subscribers
The aggregator scrapes from 9 diverse AI news sources:
| Source | Type | Content |
|---|---|---|
| YouTube | Video | AI channels with transcript extraction |
| OpenAI Blog | RSS | Official OpenAI news and updates |
| Anthropic | RSS (3 feeds) | News, Research, Engineering |
| Google AI | RSS | Google AI blog content |
| HuggingFace Blog | RSS | ML/AI tutorials and announcements |
| HuggingFace Papers | Web Scraping | Trending research papers |
| TechCrunch AI | RSS | AI industry news and startups |
| MIT Technology Review | RSS | In-depth AI analysis |
| VentureBeat AI | RSS | AI business and enterprise news |
- Beautiful, responsive React frontend with modern UI/UX
- User-friendly subscription form with real-time validation
- Instant email confirmation upon registration
- Built with React 19, Tailwind CSS 4, and Vite
- RESTful API for subscription management
- CORS-enabled for frontend integration
- Automatic email confirmation sending
- Subscriber count tracking
- Production-ready with Uvicorn
- YouTube channels - AI videos with transcript extraction
- OpenAI Blog - Official OpenAI news and updates (RSS)
- Anthropic - 3 feeds: News, Research, Engineering (RSS with markdown)
- Google AI - Google AI blog content (RSS with markdown)
- HuggingFace Blog - ML/AI tutorials and announcements (RSS with markdown)
- HuggingFace Papers - Trending research papers (web scraping)
- TechCrunch AI - AI industry news and startups (RSS with markdown)
- MIT Technology Review - In-depth AI analysis (RSS with markdown)
- VentureBeat AI - AI business and enterprise news (RSS with markdown)
- Intelligent content summarization using Google Gemini
- Personalized content ranking based on user profiles
- Automated digest generation for all articles
- Relevance scoring (0-10 scale) based on user interests
- Customizable ranking algorithms
- Top-N article selection for daily digests
- Automated daily email digests
- Instant confirmation emails for new subscribers
- Personalized introductions
- Clean, readable HTML formatting
- Multi-recipient support with activation/deactivation
- PostgreSQL database for persistent storage
- Duplicate detection and prevention
- Comprehensive article metadata tracking
- Quick Start
- Installation
- Configuration
- Usage
- Architecture
- API Reference
- Deployment
- Development
- Troubleshooting
Quick Links:
- π Quick Reference Card - Essential commands and configs
- π§ Resend Integration Guide - Detailed email setup
- Python 3.12+
- Node.js 18+
- PostgreSQL database (Neon, Supabase, or local)
- Google Gemini API key
- Gmail account with App Password (free, no domain needed!)
- Clone and install dependencies:
git clone <repository-url>
cd ai-news-aggregator
pip install -r requirements.txt
cd frontend && npm install && cd ..- Configure environment:
cp .env.example .env
# Edit .env with your credentials:
# - Get Gemini API key from https://makersuite.google.com/app/apikey
# - Use your Gmail and App Password (see setup guide below)
# - Update DATABASE_URL with your PostgreSQL connection string- Initialize database:
python3 -m app.database.init_db- Start the application:
# Terminal 1 - Backend
python start_api.py
# Terminal 2 - Frontend
cd frontend && npm run dev- Visit http://localhost:5173 and subscribe!
pip install -r requirements.txtcd frontend
npm installOption 1: Cloud Database (Recommended)
Use Neon, Supabase, or another PostgreSQL provider:
# Set DATABASE_URL in .env
DATABASE_URL=postgresql://user:password@host:port/database?sslmode=require
# Initialize tables
python3 -m app.database.init_dbOption 2: Local PostgreSQL
# Install PostgreSQL, then:
createdb ai_news_aggregator
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/ai_news_aggregator
python3 -m app.database.init_dbBackend Configuration (backend/.env):
# ============================================================================
# AI CONFIGURATION
# ============================================================================
# Google Gemini API Key
# Get from: https://makersuite.google.com/app/apikey
GEMINI_API_KEY=your_gemini_api_key_here
# ============================================================================
# EMAIL CONFIGURATION (Gmail SMTP)
# ============================================================================
# Your Gmail address
MY_EMAIL=your_email@gmail.com
# Gmail App Password (NOT your regular password!)
# Setup guide:
# 1. Enable 2-factor authentication: https://myaccount.google.com/security
# 2. Generate App Password: https://myaccount.google.com/apppasswords
# 3. Use the 16-character password here (spaces optional)
APP_PASSWORD=xxxx xxxx xxxx xxxx
# ============================================================================
# DATABASE CONFIGURATION
# ============================================================================
# PostgreSQL connection string (Neon, Supabase, or local)
# Format: postgresql://username:password@host:port/database?sslmode=require
DATABASE_URL=postgresql://user:password@host:port/database?sslmode=require
# ============================================================================
# FRONTEND URL (for CORS)
# ============================================================================
# Frontend URL for CORS configuration
CLIENT_URL=http://localhost:5173Frontend Configuration (frontend/.env):
# Backend API URL
VITE_BASE_URL=http://localhost:8000Important Notes:
- Use your Gmail address for
MY_EMAIL - Generate an App Password (NOT your regular Gmail password)
- Gmail allows 500 emails/day on free tier - perfect for newsletters!
- No domain verification needed - works immediately!
- Get your Gemini API key for free from Google AI Studio
- Never commit
.envfiles with real values to version control
Create frontend/.env:
VITE_BASE_URL=http://localhost:8000Customize your interests in app/profiles/user_profile.py:
USER_PROFILE = {
"interests": [
"Large Language Models",
"Computer Vision",
"Reinforcement Learning",
# Add your interests
],
"technical_level": "advanced",
"focus_areas": ["research", "applications", "tools"]
}Add YouTube channels in app/config.py:
YOUTUBE_CHANNELS = [
"UCn8ujwUInbJkBhffxqAPBVQ", # Channel ID
# Add more channels
]The web interface provides a beautiful subscription experience:
- Start both servers:
python start_api.py # Terminal 1
cd frontend && npm run dev # Terminal 2- Visit http://localhost:5173
- Users can subscribe with their email
- They receive instant confirmation emails
Run the complete daily pipeline:
python main.pyWith custom parameters:
# Scrape last 48 hours, send top 15 articles
python main.py 48 15Manage email recipients:
# Add a recipient
python app/manage_emails.py add john@example.com "John Doe"
# List all recipients
python app/manage_emails.py list
# Activate/deactivate
python app/manage_emails.py activate john@example.com
python app/manage_emails.py deactivate john@example.com
# Delete a recipient
python app/manage_emails.py delete john@example.comThe main pipeline executes 6 stages:
- Scraping: Fetch new content from all sources
- Anthropic Processing: Extract full markdown from articles
- Google Processing: Extract full markdown from articles
- YouTube Processing: Extract transcripts from videos
- Digest Generation: Create AI summaries for all content
- Email Delivery: Rank, curate, and send personalized digest
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (React + Vite) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Hero β β Subscribe β β Features β β
β β Component β β Form β β Component β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β POST /api/subscribe
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND (FastAPI + Uvicorn) β
β β’ POST /api/subscribe β
β β’ GET /api/subscribers/count β
β β’ GET /health β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββ βββββββββββββββββββββββ
β DATABASE β β EMAIL SERVICE β
β (PostgreSQL) β β (Resend) β
βββββββββββββββββββββββ βββββββββββββββββββββββ
β²
β
βββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββ
β CONTENT PIPELINE (Python) β
β 1. Scraping β 2. Processing β 3. AI Agents β 4. Email Delivery β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ai-news-aggregator/
βββ app/
β βββ agent/ # AI agents for processing
β β βββ curator_agent.py # Content ranking & curation
β β βββ digest_agent.py # Summary generation
β β βββ email_agent.py # Email content creation
β βββ database/ # Data layer
β β βββ models.py # SQLAlchemy models
β β βββ repository.py # Data access layer
β β βββ connection.py # Database connection
β β βββ init_db.py # Database initialization
β βββ scrapers/ # Content scrapers (9 sources)
β β βββ youtube.py # YouTube channel scraper
β β βββ openai.py # OpenAI blog scraper
β β βββ anthropic.py # Anthropic blog scraper (3 feeds)
β β βββ google.py # Google AI blog scraper
β β βββ huggingface.py # HuggingFace blog scraper
β β βββ huggingface_papers.py # HuggingFace papers scraper
β β βββ techcrunch.py # TechCrunch AI scraper
β β βββ mittr.py # MIT Technology Review scraper
β β βββ venturebeat.py # VentureBeat AI scraper
β βββ services/ # Business logic
β β βββ email_service.py # Email service (Resend)
β β βββ process_*.py # Content processors
β βββ profiles/ # User profiles
β β βββ user_profile.py # User preferences & interests
β βββ api.py # FastAPI application
β βββ config.py # Configuration
β βββ runner.py # Scraper orchestration
β βββ daily_runner.py # Main pipeline
βββ frontend/ # React frontend
β βββ src/
β β βββ components/ # React components
β β βββ utils/ # API client
β β βββ App.jsx
β β βββ main.jsx
β βββ package.json
βββ main.py # Entry point
βββ start_api.py # FastAPI server startup
βββ manage_emails.py # Email management CLI
βββ requirements.txt # Python dependencies
-- Users/Subscribers
emails
βββ id (PK)
βββ email (UNIQUE)
βββ name
βββ is_active
βββ created_at
-- Content Tables (9 sources)
youtube_videos, openai_articles, anthropic_articles, google_articles,
huggingface_articles, huggingface_papers, techcrunch_articles,
mittr_articles, venturebeat_articles
βββ id/guid (PK)
βββ title
βββ url
βββ content/transcript/markdown
βββ published_at
-- Processed Content
digests
βββ id (PK)
βββ article_type
βββ article_id
βββ title
βββ summary
βββ created_atSubscribe a new user:
POST /api/subscribe
Content-Type: application/json
{
"email": "user@example.com",
"name": "John Doe" # optional
}
Response: {
"success": true,
"message": "Successfully subscribed! Check your email for confirmation.",
"email": "user@example.com"
}Get subscriber count:
GET /api/subscribers/count
Response: {
"count": 42
}Health check:
GET /health
Response: {
"status": "healthy"
}async function subscribe(email, name) {
const response = await fetch('http://localhost:8000/api/subscribe', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ email, name }),
});
return response.json();
}import requests
response = requests.post(
'http://localhost:8000/api/subscribe',
json={'email': 'user@example.com', 'name': 'John Doe'}
)
print(response.json())The email service uses Gmail SMTP in this format:
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
# Create message
msg = MIMEMultipart('alternative')
msg['Subject'] = "Hello World"
msg['From'] = "your_email@gmail.com"
msg['To'] = "recipient@example.com"
# Add HTML content
html_part = MIMEText("<p>Congrats on sending your <strong>first email</strong>!</p>", 'html')
msg.attach(html_part)
# Send via Gmail SMTP
with smtplib.SMTP("smtp.gmail.com", 587) as server:
server.starttls()
server.login("your_email@gmail.com", "your_app_password")
server.send_message(msg)This is already implemented in app/services/email_service.py for both confirmation and digest emails.
Visit http://localhost:8000/docs for Swagger UI with interactive API testing.
Option 1: Railway / Render / Fly.io
Create a Procfile:
web: uvicorn app.api:app --host 0.0.0.0 --port $PORT
Set environment variables and deploy.
Option 2: Docker
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "app.api:app", "--host", "0.0.0.0", "--port", "8000"]Option 3: VPS
# Install dependencies
sudo apt update && sudo apt install python3-pip nginx
# Clone and setup
git clone <repo-url>
cd ai-news-aggregator
pip install -r requirements.txt
# Create systemd service
sudo nano /etc/systemd/system/ai-news-api.serviceVercel (Recommended):
- Push to GitHub
- Import project in Vercel
- Set root directory to
frontend - Add environment variable:
VITE_BASE_URL=https://your-api-domain.com - Deploy
Netlify:
cd frontend
npm run build
# Deploy dist/ folderGmail SMTP (Recommended - Free & Easy):
- Enable 2-factor authentication: https://myaccount.google.com/security
- Generate App Password: https://myaccount.google.com/apppasswords
- Add to environment variables:
MY_EMAIL=your_email@gmail.com APP_PASSWORD=your_16_char_app_password - Done! No domain verification needed, works immediately!
Cron (Linux/macOS):
crontab -e
# Add: 0 8 * * * cd /path/to/project && python main.pyGitHub Actions:
name: Daily Digest
on:
schedule:
- cron: '0 8 * * *'
jobs:
send-digest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run pipeline
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
RESEND_API_KEY: ${{ secrets.RESEND_API_KEY }}
run: python main.pyFree Tier:
- Backend: Railway/Render free tier
- Frontend: Vercel/Netlify free tier
- Database: Neon free tier
- Email: Resend free tier (100 emails/day)
- Total: $0/month
Production (1000 subscribers):
- Backend: $5-10/month
- Frontend: Free
- Database: $10-20/month
- Email: $20/month
- Total: ~$35-50/month
- Create a scraper in
app/scrapers/new_source.py - Add a model in
app/database/models.py - Update repository in
app/database/repository.py - Register in
SCRAPER_REGISTRYinapp/runner.py - Add processing service in
app/services/
Test database connection:
python -m app.database.check_connectionTest email service:
# Quick Resend test
python test_resend.py
# Full email service test
python test_email_service.pyVerify setup:
python verify_setup.pyDigest Agent: Generates concise summaries
- Model: Gemini 2.5 Flash
- Output: Title + 2-3 sentence summary
Curator Agent: Ranks content by relevance
- Model: Gemini 2.5 Flash
- Scoring: 0-10 scale based on user profile
Email Agent: Creates personalized email content
- Model: Gemini 2.5 Flash
- Output: Greeting + introduction + article list
The application uses Gmail SMTP for all email delivery. Simple, reliable, and works immediately!
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
# Create message
msg = MIMEMultipart('alternative')
msg['Subject'] = "Hello World"
msg['From'] = "your_email@gmail.com"
msg['To'] = "user@example.com"
# Add HTML content
html_part = MIMEText("<p>Your <strong>email content</strong> here</p>", 'html')
msg.attach(html_part)
# Send via Gmail SMTP
with smtplib.SMTP("smtp.gmail.com", 587) as server:
server.starttls()
server.login("your_email@gmail.com", "your_app_password")
server.send_message(msg)Key Features:
- β Send to ANY email address (no domain verification!)
- β Instant confirmation emails for new subscribers
- β Daily digest emails with beautiful HTML formatting
- β Batch sending to all active subscribers
- β 500 emails/day free tier (plenty for newsletters!)
- β Comprehensive error handling and logging
Setup Gmail App Password:
- Enable 2-factor authentication: https://myaccount.google.com/security
- Generate App Password: https://myaccount.google.com/apppasswords
- Add to
.env:APP_PASSWORD=your_16_char_password
Implementation:
- Located in
app/services/email_service.py - Used by both the API (
app/api.py) and pipeline (app/daily_runner.py) - Supports both confirmation and digest email types
Testing:
# Quick test
python test_resend.py
# Full test with database integration
python test_email_service.pyPort already in use:
lsof -i :8000
kill -9 <PID>Package errors:
pip install -r requirements.txt --force-reinstallPort already in use:
lsof -i :5173
kill -9 <PID>Dependencies:
cd frontend
rm -rf node_modules package-lock.json
npm installConnection failed:
# Verify DATABASE_URL
echo $DATABASE_URL
# Reinitialize
python -m app.database.init_dbNot sending:
- Verify MY_EMAIL and APP_PASSWORD are correct in
.env - Make sure you're using an App Password, NOT your regular Gmail password
- Check 2-factor authentication is enabled on your Google account
- Look at backend logs for errors
- Test with:
python test_email_service.py
Common Gmail SMTP errors:
535 Authentication failed: Wrong App Password or 2FA not enabled534 Application-specific password required: Use App Password, not regular passwordConnection refused: Check firewall/network settings
Setup App Password:
- Go to https://myaccount.google.com/apppasswords
- Select "Mail" and your device
- Copy the 16-character password
- Add to
.env:APP_PASSWORD=xxxx xxxx xxxx xxxx(spaces optional)
- Ensure backend is on port 8000
- Ensure frontend is on port 5173
- Check CORS settings in
app/api.py
============================================================
Starting Daily AI News Aggregator Pipeline
============================================================
[1/9] Scraping articles from sources...
β Scraped 28 total articles from all sources
[2/9] Processing Anthropic markdown...
β Processed 2 Anthropic articles (0 failed)
[3/9] Processing Google markdown...
β Processed 4 Google articles (0 failed)
[4/9] Processing HuggingFace markdown...
β Processed 0 HuggingFace articles (0 failed)
[5/9] Processing TechCrunch markdown...
β Processed 4 TechCrunch articles (0 failed)
[6/9] Processing MIT TR markdown...
β Processed 0 MIT TR articles (0 failed)
[7/9] Processing VentureBeat markdown...
β Processed 1 VentureBeat articles (0 failed)
[8/9] Processing YouTube transcripts...
β Processed 5 transcripts (0 unavailable)
[9/9] Creating digests and sending email...
β Created 20 digests (0 failed out of 20 total)
β Email sent successfully with 10 articles
============================================================
Pipeline Summary
============================================================
Duration: 45.3 seconds
Scraped: 28 articles from 9 sources
Processed: 28 articles
Digests: 20 created
Email: Sent to 5 subscribers
============================================================
sqlalchemy- Database ORMpsycopg2-binary- PostgreSQL adaptergoogle-genai- Google Gemini APIfastapi- Web frameworkuvicorn- ASGI serverresend- Email deliverybeautifulsoup4- HTML parsingfeedparser- RSS feed parsingyoutube-transcript-api- YouTube transcriptspydantic- Data validationpython-dotenv- Environment management
react- UI frameworkvite- Build tooltailwindcss- Styling
Contributions are welcome! Areas for improvement:
- Additional content sources
- Enhanced ranking algorithms
- UI/dashboard for monitoring
- Webhook support for real-time updates
- Multi-language support
- Advanced filtering options
- Unsubscribe functionality
- User preference management
- Analytics dashboard
This project is open source and available under the MIT License.
Built with:
- Google Gemini for AI processing
- Resend for email delivery
- Neon for PostgreSQL hosting
- FastAPI for the backend
- React for the frontend
For issues or questions:
- Run
python verify_setup.pyto check configuration - Check logs for error messages
- Verify environment variables are set correctly
- Test database connection
- Ensure API keys are valid
9 Active Sources:
- 8 RSS feeds (fast, reliable)
- 1 web scraper (HuggingFace Papers)
- Covers: Company blogs, research, news outlets, video content
Performance:
- Scraping: 5-10 seconds for all sources
- No JavaScript rendering needed
- Low maintenance overhead
- Excellent content diversity
Built with β€οΈ for the AI community
Get started now: python start_api.py and visit http://localhost:5173