Conversational AI Agent Playground

A web dashboard for managing and interacting with Agora's Conversational AI Agents. This application provides a user-friendly interface for creating, updating, and monitoring AI agents that can engage in real-time conversations with support for both traditional LLM and cutting-edge Multimodal Large Language Model (MLLM) configurations.

Project Structure

convo_ai_studio/
├── src/
│   ├── js/
│   │   ├── api.js                    # Core API integration with Agora
│   │   ├── audio.js                  # Audio processing and visualization
│   │   ├── conversational-ai-api.js  # Conversational AI API handling
│   │   ├── subtitles.js              # Live subtitles and chat history
│   │   ├── ui.js                     # UI components and event handlers
│   │   └── utils.js                  # Utility functions and helpers
│   ├── css/
│   │   ├── styles.css                # Application styles
│   │   └── modern-ui-library.css     # Modern UI component library
│   ├── lib/
│   │   └── microsoftVoicesByLang.js # Microsoft TTS voice definitions
│   └── media/
│       ├── comvoai_demo.mp4         # Demo video
│       └── *.png                     # Screenshots
├── DOCS/
│   ├── FEATURES.md                   # Complete feature list
│   ├── SETUP.md                      # Detailed setup instructions
│   ├── API.md                        # API endpoint documentation
│   ├── VENDORS.md                    # Vendor configuration guide
│   ├── ARCHITECTURE.md               # Technical architecture
│   └── BROWSER_COMPATIBILITY.md      # Browser requirements
├── index.html                        # Main application interface
├── README.md                         # This file
└── GUIDE.md                          # Detailed usage guide

Quick Start

Clone the repository:

git clone https://github.com/AgoraIO-Community/ConvoAI-Playground.git
cd ConvoAI-Playground

Set up your API credentials:
- Click the "Set API Credentials" button
- Enter your Agora Customer ID, Customer Secret, and App ID
- Optionally enter your App Certificate (required for local token generation)
Configure and create your agent:
- Choose your AI mode (LLM or MLLM)
- Configure agent settings, TTS/ASR vendors, and optional features
- Create your agent and start interacting
SIP/Phone Management (Optional):
- Import and manage phone numbers
- Initiate outbound calls with pipeline ID support
- Use override checkbox to use complete configuration even with pipeline ID
- Retrieve call records and status

For detailed setup instructions, see SETUP.md.

Documentation

SETUP.md - Detailed setup and configuration guide
FEATURES.md - Complete list of features and capabilities
API.md - API endpoint documentation and integration details
VENDORS.md - TTS, ASR, and AI Avatar vendor configuration
ARCHITECTURE.md - Technical architecture and module details
BROWSER_COMPATIBILITY.md - Browser requirements and compatibility information
GUIDE.md - Detailed usage guide and walkthrough

Key Features

Dual AI Model Support: Traditional LLM and Multimodal LLM (MLLM) configurations
- LLM Mode: Support for OpenAI, Anthropic, Gemini, Vertex AI, and custom LLM providers
- MLLM Mode: Real-time multimodal conversations with OpenAI Realtime API and Google Vertex AI
- Vertex AI MLLM support with native audio, ADC credentials, and project configuration
Comprehensive TTS Support: Microsoft, ElevenLabs, Cartesia, OpenAI, Hume AI, Rime, Fish Audio, Groq, Google, PlayHT, Sarvam, and Amazon Polly TTS
Advanced ASR Integration: Agora (ARES), Microsoft, Deepgram, OpenAI, Speechmatics, AssemblyAI, Amazon Transcribe, Google, Sarvam, and Custom ASR with extensive language support
AI Avatar Support: Akool and HeyGen avatar vendors with real-time video streaming
- HeyGen-specific settings: quality control, idle timeout, and activity timeout
- Automatic client UID configuration for avatar-agent communication
MCP Servers (Model Context Protocol): Tool calling support with multiple server configurations
- Configure multiple MCP servers with custom endpoints
- Support for http, sse, and streamable_http transport protocols
- Tool availability and allowed tools configuration
- Automatic enable_tools flag in advanced_features when enabled
SIP/Phone Management: Complete phone number and call management
- Import, update, and manage phone numbers
- Initiate outbound calls via SIP with pipeline ID support
- Override checkbox to use complete configuration even with pipeline ID
- Retrieve call records and status
- Inbound and outbound configuration with allowed addresses
Real-time Audio & Visual: Comprehensive multimedia experience
- Live audio visualization with waveform display
- Camera integration with preview overlay and device selection
- Multi-camera device selection and configuration
- Microphone and camera device management with persistent storage
- Permission management with automatic fallback
Live Subtitles & Chat: Real-time conversation tracking
- Real-time subtitle display with overlay functionality
- Live chat history with message timestamps
- RTM and Data Stream subtitle modes
- Copy and clear functionality for chat history
Advanced Configuration: Extensive customization options
- VAD & Turn Detection: Agora VAD, Server VAD, and Semantic VAD
- SAL (Speaker Adaptation Library): Voice print locking and recognition
  - Locking mode: Seamless voice locking in 10 seconds
  - Recognition mode: Voice recognition with speaker identification
  - Sample URL management for voiceprints
- Silence management with configurable timeouts and actions
- Farewell configuration with graceful timeout
- Custom parameters with type validation (string, number, array, object)
Smart Validation: Context-aware validation for agent creation and SIP calls
Local Token Generation: Built-in Agora RTC + RTM token generator
- One-click token generation for agent, avatar, and client UIDs
- 30-minute token expiration with PUBLISHER role
Modern UI Design: Professional interface with enhanced user experience
- Beautiful gradient buttons and modern form inputs
- Enhanced visual styling with smooth animations
- Responsive design with proper overflow handling
- Comprehensive tooltips and help text

For a complete feature list, see FEATURES.md.

Token Generation

The application includes a built-in Agora token generator that creates RTC + RTM tokens locally. This feature allows you to generate tokens without relying on a server-side token service.

How to Use Token Generation

Set App Certificate (Optional but required for token generation):
- Open "Set API Credentials"
- Enter your App Certificate (optional field with tooltip)
- Save credentials
Generate Tokens:
- Agent RTC Token: Click "Generate" next to the Agora RTC Token field in Agent Settings
- Avatar RTC Token: Click "Generate" next to the Avatar RTC Token field in AI Avatar Settings
- Client RTC Token: Click "Generate" next to the Client RTC Token field on the main page
Token Configuration:
- All tokens use the channel name from Agent Settings
- Token expiration: 30 minutes (1800 seconds)
- Privilege expiration: 30 minutes (1800 seconds)
- Role: PUBLISHER (allows publishing audio, video, and data streams)

Requirements

App ID (required)
App Certificate (required for token generation)
Channel Name (from Agent Settings)
UID (Agent RTC UID, Avatar RTC UID, or Client RTC UID)

The token generator uses the buildTokenWithRtm method from the RtcTokenBuilder2 library, which creates tokens that support both RTC (Real-Time Communication) and RTM (Real-Time Messaging) services.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

Create an issue on GitHub
Check the GUIDE.md for detailed usage instructions
Review the demo video in the media folder
Consult the DOCS folder for detailed documentation

Agora ConversationalAI Backend v2.0 - A comprehensive web dashboard for managing and interacting with Agora's Conversational AI Agents. Features include:

Dual AI Model Support: Traditional LLM and Multimodal LLM (MLLM) with Vertex AI integration
MCP Servers: Model Context Protocol support for tool calling with multiple server configurations
Comprehensive Vendor Support: 10+ TTS vendors, 9+ ASR vendors, and 2 AI Avatar vendors
Advanced Features: AIVAD, RTM, SAL (Speaker Adaptation Library), custom parameters, and more
SIP/Phone Management: Complete phone number and call management with pipeline support
Real-time Capabilities: Live subtitles, chat history, audio visualization, and camera integration
Device Management: Advanced microphone and camera selection with permission handling
Modern UI: Professional design with gradient buttons, tooltips, and responsive layout
Local Token Generation: Built-in RTC + RTM token generator for secure authentication
Smart Validation: Context-aware validation and error handling throughout the application

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational AI Agent Playground

Project Structure

Quick Start

Documentation

Key Features

Token Generation

How to Use Token Generation

Requirements

Contributing

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
DOCS		DOCS
src		src
.gitignore		.gitignore
GUIDE.md		GUIDE.md
README.md		README.md
index.html		index.html

Folders and files

Latest commit

History

Repository files navigation

Conversational AI Agent Playground

Project Structure

Quick Start

Documentation

Key Features

Token Generation

How to Use Token Generation

Requirements

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages