Skip to content

openai compatible worker #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
1720b78
LiveKit Pipeline Agent (#4)
benxu3 Nov 26, 2024
de2a7cb
add token in qr
benxu3 Nov 26, 2024
76f0847
fix dependency issues
benxu3 Nov 26, 2024
37198d8
Merge branch 'main' of https://github.com/benxu3/01
benxu3 Nov 26, 2024
2f53be0
update livekit server and profile docs
benxu3 Dec 9, 2024
f6c13a1
send multimodal message on startup of multimodal agent
benxu3 Dec 9, 2024
07672f4
add voice assistant state communication and clear chat context
benxu3 Dec 9, 2024
178ffc8
update logging with debug env variable
benxu3 Dec 9, 2024
c2de04a
update profiles to be compatible with new interpreter
benxu3 Dec 9, 2024
bba33db
update server to use new interpreter
benxu3 Dec 9, 2024
ba2813d
upgrade interpreter and livekit agents
benxu3 Dec 9, 2024
cedda96
use participant token in meet_url
benxu3 Dec 9, 2024
8f6d5fd
remove assistant fnc
benxu3 Dec 9, 2024
4e77a57
remove duplicate fnc_ctx declaration
benxu3 Dec 9, 2024
84e05db
add local setup docs
benxu3 Dec 30, 2024
095b704
add basic interrupt logic
benxu3 Dec 30, 2024
6084e25
refactor logging outside logic
benxu3 Dec 30, 2024
bd6f530
replace hosted livekit meet with local meet link
benxu3 Dec 30, 2024
6110e70
add local stt & tts, add anticipation logic, remove video context acc…
benxu3 Dec 30, 2024
4c271b1
remove separate transcriptions
benxu3 Dec 30, 2024
f68f83c
update local and default profile
benxu3 Dec 30, 2024
3f6ba52
add meet flag and better error handling
benxu3 Dec 30, 2024
7207add
run worker in dev mode
benxu3 Dec 30, 2024
0c6a2cd
fix error on local tts docs
benxu3 Dec 31, 2024
f989731
move tts and stt to 9001 and 9002
benxu3 Dec 31, 2024
ab8055e
draft main cli
benxu3 Jan 1, 2025
a2f86af
make request based on updated chat ctx in anticipation
benxu3 Jan 1, 2025
ce52aa6
fix cli bug in main
benxu3 Jan 1, 2025
16fb2b3
remove test.py
benxu3 Jan 1, 2025
8c89960
revert anticipation to default
benxu3 Jan 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 48 additions & 46 deletions docs/server/configure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ poetry run 01 --profile <profile_name>

### Standard Profiles

`default.py` is the default profile that is used when no profile is specified. The default TTS is OpenAI.
`default.py` is the default profile that is used when no profile is specified. The default TTS service is Elevenlabs.

`fast.py` uses elevenlabs and groq, which are the fastest providers.
`fast.py` uses Cartesia for TTS and Cerebras Llama3.1-8b, which are the fastest providers.

`local.py` uses coqui TTS and runs the --local explorer from Open Interpreter.
`local.py` requires additional setup to be used with LiveKit. Uses faster-whisper for STT, ollama/codestral for LLM (default), and piper for TTS (default).

### Custom Profiles

Expand All @@ -46,38 +46,16 @@ poetry run 01 --profile <profile_name>
### Example Profile

````python
from interpreter import AsyncInterpreter
interpreter = AsyncInterpreter()
from interpreter import Interpreter
interpreter = Interpreter()

# This is an Open Interpreter compatible profile.
# Visit https://01.openinterpreter.com/profile for all options.

# 01 supports OpenAI, ElevenLabs, and Coqui (Local) TTS providers
# 01 supports OpenAI, ElevenLabs, Cartesia, and Coqui (Local) TTS providers
# {OpenAI: "openai", ElevenLabs: "elevenlabs", Coqui: "coqui"}
interpreter.tts = "openai"

# Connect your 01 to a language model
interpreter.llm.model = "gpt-4o"
interpreter.llm.context_window = 100000
interpreter.llm.max_tokens = 4096
# interpreter.llm.api_key = "<your_openai_api_key_here>"

# Tell your 01 where to find and save skills
interpreter.computer.skills.path = "./skills"

# Extra settings
interpreter.computer.import_computer_api = True
interpreter.computer.import_skills = True
interpreter.computer.run("python", "computer") # This will trigger those imports
interpreter.auto_run = True
interpreter.loop = True
interpreter.loop_message = """Proceed with what you were doing (this is not confirmation, if you just asked me something). You CAN run code on my machine. If you want to run code, start your message with "```"! If the entire task is done, say exactly 'The task is done.' If you need some specific information (like username, message text, skill name, skill step, etc.) say EXACTLY 'Please provide more information.' If it's impossible, say 'The task is impossible.' (If I haven't provided a task, say exactly 'Let me know what you'd like to do next.') Otherwise keep going. CRITICAL: REMEMBER TO FOLLOW ALL PREVIOUS INSTRUCTIONS. If I'm teaching you something, remember to run the related `computer.skills.new_skill` function."""
interpreter.loop_breakers = [
"The task is done.",
"The task is impossible.",
"Let me know what you'd like to do next.",
"Please provide more information.",
]
interpreter.tts = "elevenlabs"
interpreter.stt = "deepgram"

# Set the identity and personality of your 01
interpreter.system_message = """
Expand All @@ -89,17 +67,37 @@ You can install new packages.
Be concise. Your messages are being read aloud to the user. DO NOT MAKE PLANS. RUN CODE QUICKLY.
Try to spread complex tasks over multiple code blocks. Don't try to complex tasks in one go.
Manually summarize text."""

# Add additional instructions for the 01
interpreter.instructions = "Be very concise in your responses."


# Connect your 01 to a language model
interpreter.model = "claude-3-5-sonnet-20240620"
interpreter.provider = "anthropic"
interpreter.max_tokens = 4096
interpreter.temperature = 0
interpreter.api_key = "<your_anthropic_api_key_here>"

# Extra settings
interpreter.tools = ["interpreter", "editor"] # Enabled tool modules
interpreter.auto_run = True # Whether to auto-run tools without confirmation
interpreter.tool_calling = True # Whether to allow tool/function calling

interpreter.allowed_paths = [] # List of allowed paths
interpreter.allowed_commands = [] # List of allowed commands
````

### Hosted LLMs

The default LLM for 01 is GPT-4-Turbo. You can find this in the default profile in `software/source/server/profiles/default.py`.
The default LLM for 01 is Claude 3.5 Sonnet. You can find this in the default profile in `software/source/server/profiles/default.py`.

The fast profile uses Llama3-8b served by Groq. You can find this in the fast profile in `software/source/server/profiles/fast.py`.
The fast profile uses Llama3.1-8b served by Cerebras. You can find this in the fast profile in `software/source/server/profiles/fast.py`.

```python
# Set your profile with a hosted LLM
interpreter.llm.model = "gpt-4o"
interpreter.model = "claude-3-5-sonnet-20240620"
interpreter.provider = "anthropic"
```

### Local LLMs
Expand All @@ -110,34 +108,38 @@ Using the local profile launches the Local Explorer where you can select your in

```python
# Set your profile with a local LLM
interpreter.llm.model = "ollama/codestral"
interpreter.model = "ollama/codestral"

# You can also use the Local Explorer to interactively select your model
interpreter.local_setup()
```

### Hosted TTS

01 supports OpenAI and Elevenlabs for hosted TTS.
01 supports OpenAI, Elevenlabs, and Cartesia for hosted TTS.

```python
# Set your profile with a hosted TTS service
interpreter.tts = "elevenlabs"
```

### Local TTS
### Local TTS and STT with LiveKit

For local TTS, Coqui is used.
We recommend having Docker installed for the easiest setup. Local TTS and STT relies on the [openedai-speech](https://github.com/matatonic/openedai-speech?tab=readme-ov-file) and [faster-whisper-server](https://github.com/fedirz/faster-whisper-server) repositories respectively.

#### Local TTS
1. Clone the [openedai-speech](https://github.com/matatonic/openedai-speech?tab=readme-ov-file) repository
2. Follow the Docker Image instructions for your system. Default run `docker compose -f docker-compose.min.yml up --publish 9001:8000` in the root.
3. Set your profile with local TTS service
```python
# Set your profile with a local TTS service
interpreter.tts = "coqui"
interpreter.tts = "local"
```

<Note>
When using the Livekit server, the interpreter.tts setting in your profile
will be ignored. The Livekit server currently only works with Deepgram for
speech recognition and Eleven Labs for text-to-speech. We are working on
introducing all-local functionality for the Livekit server as soon as
possible.
</Note>
#### Local STT
1. Clone the [faster-whisper-server](https://github.com/fedirz/faster-whisper-server) repository
2. Follow the Docker Compose Quick Start instructions for your respective system.
3. Run `docker run --publish 9002:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --env WHISPER__MODEL=Systran/faster-whisper-small --detach fedirz/faster-whisper-server:latest-cpu` to publish to port 8001 instead of the default 8000 (since our TTS uses this port).
4. Set your profile with local STT service
```python
interpreter.stt = "local"
```
26 changes: 15 additions & 11 deletions docs/server/livekit.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,18 @@ Replace the placeholders with your actual API keys.

### Starting the Server

**To use the mobile app, run the following command**

```bash
poetry run 01 --server livekit --qr --expose
```

To customize the profile, append the --profile flag with the profile name:

```bash
poetry run 01 --server livekit --qr --expose --profile fast
```

To start the Livekit server, run the following command:

```bash
Expand All @@ -87,18 +99,10 @@ To expose over the internet via ngrok
poetry run 01 --server livekit --expose
```

In order to use the mobile app over the web, use both flags

```bash
poetry run 01 --server livekit --qr --expose
```

<Note>
Currently, our Livekit server only works with Deepgram and Eleven Labs. We are
working to introduce all-local functionality as soon as possible. By setting
your profile (see [Configure Your Profile](/software/configure)), you can
still change your LLM to be a local LLM, but the `interpreter.tts` value will
be ignored for the Livekit server.
Livekit server now supports Local STT and TTS for fully local pipeline.
Setup instructions are provided in the [configuring your 01](/server/configure#local-tts-and-stt-with-livekit) section.

</Note>

## Livekit vs. Light Server
Expand Down
Loading
Loading