An AI-powered multi-agent system that transforms interview audio recordings into comprehensive learning documents with Q&A pairs and model answers using CrewAI.
This project implements a sophisticated 6-agent sequential workflow that:
- Transcribes interview audio using GPT-4o audio preview
- Extracts every Q&A pair from the interview
- Generates comprehensive model answers for each question
- Analyzes performance gaps and learning opportunities
- Formats everything into a clean study document
- Exports results as structured markdown files
- Audio Transcription: Supports MP3, WAV, M4A, OGG, and WebM formats (up to 20MB)
- Complete Q&A Extraction: Captures every single question and answer from the interview
- Model Answer Generation: Provides comprehensive correct answers for all questions
- Topic Organization: Groups related questions into logical sections
- Learning Focused: Designed for interview preparation and knowledge archiving
- Multi-Model Support: Uses optimized AI models for each specific task
- Python 3.8 or higher
- OpenRouter API key (get one at https://openrouter.ai/keys)
- Clone the repository:
git clone https://github.com/yourusername/internship-agent.git
cd internship-agent
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Configure API keys:
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY
python src/main.py --audio interview.mp3
python src/main.py --audio interview.mp3 --company "Tech Corp" --role "Software Engineer"
python src/main.py --audio interview.wav --output results/
--audio
: Path to the interview audio file (required)--company
: Company name for the interview (default: "Unknown Company")--role
: Role applied for (default: "Unknown Role")--output
: Output directory for the analysis (default: "interviews/")
- Role: Audio Intelligence Specialist
- Model: GPT-4o Audio Preview
- Task: Accurately transcribe audio to text
- Role: Interview Q&A Extractor
- Model: Google Gemini 2.5 Flash Lite
- Task: Extract ALL Q&A pairs and organize by topic
- Role: Model Answer Creator
- Model: Google Gemini 2.5 Flash Lite
- Task: Generate comprehensive model answers for every question
- Role: Learning Performance Analyst
- Model: Google Gemini 2.5 Flash Lite
- Task: Compare answers and create learning roadmap
- Role: Learning Document Creator
- Model: Google Gemini 2.5 Flash Lite
- Task: Format Q&As with model answers clearly
- Role: Digital Records Manager
- Model: Google Gemini 2.5 Flash Lite
- Task: Export and verify completeness
internship-agent/
โโโ src/
โ โโโ agents/ # Agent definitions
โ โโโ tasks/ # Task configurations
โ โโโ tools/ # Custom tools (transcription, file writer, Notion)
โ โโโ crew.py # Main crew orchestration
โ โโโ main.py # CLI entry point
โโโ config/
โ โโโ agents.yaml # Agent configurations
โ โโโ tasks.yaml # Task configurations
โโโ audio/ # Input audio files
โโโ interviews/ # Output analysis reports
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment variables template
โโโ README.md # This file
The learning document includes:
# Interview Learning Document
## Interview Metadata
- Company: [Company Name]
- Role: [Position]
- Interviewer: [Name]
- Candidate: [Name]
## Topic: [e.g., Options Pricing]
### Question 1: [Exact question text]
**Candidate's Answer:**
[Complete answer from candidate]
**MODEL ANSWER:**
[Comprehensive correct answer with explanations]
**Key Learning Points:**
- Concept 1
- Concept 2
---
# Required
OPENROUTER_API_KEY=your_openrouter_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
# Optional (for Notion export)
NOTION_TOKEN=your_notion_token_here
NOTION_DATABASE_ID=your_notion_database_id_here
Models can be changed in config/agents.yaml
. Current models used:
- Transcription:
openai/gpt-4o-audio-preview
- All Agents:
google/gemini-2.5-flash-lite
(cost-effective and efficient)
To test with a sample audio file:
- Place an interview audio file in the
audio/
directory - Run:
python src/main.py --audio audio/sample_interview.mp3
- Check the
interviews/
directory for the output
This is a homework project, but suggestions and improvements are welcome!
MIT License - See LICENSE file for details
- Coding interview support
- After transcription, new agent figures out is it live-coding interview or not
- If it is, it routes to another agent that solves coding tasks
- After solving, it routes to another agent that checks the solution inside a code sandbox
- CrewAI framework for multi-agent orchestration
- OpenRouter for unified AI model access
- MIT AI Studio for the assignment framework
Built with CrewAI | Powered by OpenRouter