|
| 1 | +--- |
| 2 | +title: 1.0.1 版本发布 |
| 3 | +description: Version 1.0.1 Release |
| 4 | +slug: v1.0.1-release |
| 5 | +authors: [tim, ethan] |
| 6 | +tags: [release] |
| 7 | +image: https://i.imgur.com/mErPwqL.png |
| 8 | +hide_table_of_contents: false |
| 9 | +--- |
| 10 | + |
| 11 | + |
| 12 | +# Open-LLM-VTuber v1.0.1 Release 💥 |
| 13 | + |
| 14 | +This release marks a significant milestone for Open-LLM-VTuber, featuring a complete rewrite of the backend and frontend with over 240+ new commits, along with numerous enhancements and new features. If you were using a version before this, version `v1.0.0` is basically a new app. |
| 15 | + |
| 16 | +⚠️ Direct upgrades from older versions are impossible due to architectural changes. Please refer to our **[new documentation site](https://open-llm-vtuber.github.io/docs/intro)** for installation. |
| 17 | + |
| 18 | +(v1.0.0 had a bug after the release, so let's just ignore that and have the v1.0.1) |
| 19 | + |
| 20 | +|  |  | |
| 21 | +|:---:|:---:| |
| 22 | +|  |  | |
| 23 | + |  | |
| 24 | + |
| 25 | +<!-- truncate --> |
| 26 | + |
| 27 | +## ✨ Highlights |
| 28 | +* **Vision Capability:** Video chat with the AI. |
| 29 | +* **Desktop Pet Mode:** A new Desktop Pet Mode lets you have your VTuber companion directly on your desktop. |
| 30 | +* **Brand New Frontend:** A completely redesigned frontend built with React, ChakuraUI, and Vite offers a modern user experience. Available as web and desktop apps, located in the [Open-LLM-VTuber-Web](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber-Web) repository. |
| 31 | +* **Chat History Management:** Implemented a system to store and retrieve conversation history, enabling persistent interactions with your AI. |
| 32 | +* **New LLM support:** Many new (stateless) LLM providers are now supported (and refactored), including Ollama, OpenAI, Gemini, Claude, Mistral, DeepSeek, Zhipu, and llama.cpp. |
| 33 | +* **DeepSeek R1 Reasoning model support**: The reasoning chain will be displayed but not spoken. See your waifu's inner thoughts! |
| 34 | +* **Major Backend Rewrite:** The core of Open-LLM-VTuber has been rebuilt from the ground up, focusing on asynchronous operations, improved memory management, and a more modular architecture. |
| 35 | +* **Refactored Configuration:** The `conf.yaml` file was restructured, and `config_alts` has been renamed to `characters`. |
| 36 | +* **TTS Preprocessor**: Text inside `asterisks`, `brackets`, `parentheses`, and `angle brackets` will no longer be spoken by the TTS. |
| 37 | +* **Dependency management:** Switched to `uv` for dependency management, removed unused dependencies such as `rich`, `playsound3`, and `sounddevice`. |
| 38 | +* **Documentation Site:** A comprehensive documentation site is now live at [https://open-llm-vtuber.github.io/](https://open-llm-vtuber.github.io/). |
| 39 | + |
| 40 | +## 📋 Detailed Changes |
| 41 | + |
| 42 | +### 🧮 Backend |
| 43 | + |
| 44 | +* **Architecture:** |
| 45 | + * The project structure has been reorganized to use the `src/` directory. |
| 46 | + * The backend is now fully asynchronous, improving responsiveness. |
| 47 | + * CLI mode (`main.py`) has been removed. |
| 48 | + * The "exit word" has been removed. |
| 49 | + * Models are initialized and managed using `ServiceContext`, offering better memory management, particularly when switching characters. |
| 50 | + * Refactored LLMs into `agent` and `stateless_llm`, supporting a wider range of LLMs with a new agent interface: `basic_memory_agent` and `hume_ai_agent`. |
| 51 | +* **LLM (Language Model) Enhancements:** |
| 52 | + * New (and old but refactored) providers: Ollama, OpenAI (and any OpenAI Compatible API), Gemini, Claude, Mistral, DeepSeek, Zhipu, llama.cpp. |
| 53 | + * `temperature` parameter added. |
| 54 | + * No more tokens will be generated after interruption, improving the responsiveness of voice interruption. |
| 55 | + * Ollama models are preloaded at startup, kept in memory for the server's duration, and unloaded at exit. |
| 56 | + * Added a `hf_mirror` flag to specify whether to use the Hugging Face mirror source. |
| 57 | +* **TTS (Text-to-Speech) Enhancements:** |
| 58 | + * TTS now generates multiple audio segments concurrently and sends them sequentially, reducing latency. |
| 59 | + * New interruption logic for smoother transitions. |
| 60 | + * Added filters (`asterisks`, `brackets`, `parentheses`) to prevent unwanted text from being spoken. |
| 61 | + * Implemented `faster_first_response` feature to prioritize the synthesis and playback of the first sentence fragment, minimizing latency. |
| 62 | +* **ASR (Automatic Speech Recognition) Enhancements:** |
| 63 | + * Made Sherpa-onnx ASR with the **SenseVoiceSmall int8** model the default for both English and Chinese presets, with automatic model download. |
| 64 | + * Added a `provider` option for sherpa-onnx-asr. |
| 65 | +* **Other Improvements:** |
| 66 | + * Chat log persistence is used to maintain conversation history. |
| 67 | + * All `print` statements are replaced with `loguru` for structured logging. |
| 68 | + * Added a Chinese configuration preset: `conf.CN.yaml`. |
| 69 | + * Basic AI proactive speaking (experimental). |
| 70 | + * Added some checks in the CI/CD process |
| 71 | + * Added input/output type system to agents |
| 72 | + * Added **Tencent Translate** in https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/pull/107 |
| 73 | + |
| 74 | +### 🖥️ Frontend |
| 75 | + |
| 76 | +* **New frontend built with Electron, React, ChakuraUI, and Vite.** |
| 77 | +* **Multi-Mode in Single Codebase:** |
| 78 | + * Web Mode: Browser interface |
| 79 | + * Window Mode: Desktop window |
| 80 | + * Pet Mode: Transparent desktop companion |
| 81 | + * Seamless context sharing between Window and Pet modes, allowing for the preservation of settings, history, connections, and model states. |
| 82 | +* **Enhanced UI Features** |
| 83 | + * Responsive layout with collapsible sidebar and footer |
| 84 | + * Customizable Live2D model interactions: Mouse tracking for eye movement, Click-triggered animations, Drag & resize capabilities. |
| 85 | + * Persistent local storage for user preference settings, including background, VAD configuration, Live2D size and interactions, and agent behavior. |
| 86 | + * Supports viewing, loading, and deleting conversation history with streaming subtitles. |
| 87 | + * (Electron pet-mode) A transparent, always-on-top desktop companion with click-through, non-interactive areas featuring draggable and hideable Live2D and UI, right-click menu controls. |
| 88 | + * Camera and screen capturing panel |
| 89 | + * Switch characters easily |
| 90 | + |
| 91 | +### 📖 Documentation |
| 92 | + |
| 93 | +* Rewritten README file. |
| 94 | +* New comprehensive documentation with a dedicated website. |
| 95 | + |
| 96 | +### 🧹 Cleanup |
| 97 | + |
| 98 | +* Removed unused and legacy code, including `TaskQueue.py`, `scripts/install_piper_tts.py`, `model_manager_old.py`, `service_context_old.py`, `main.py`, `asr_with_vad`, `vad`, `start_cli`, `fake_llm`, `MemGPT`, the `pywhispercpp` submodule, and CoreML script. |
| 99 | +* Removed unused dependencies: `rich`, `playsound3`, `sounddevice`, among others. |
| 100 | +* Removed configuration options that are no longer relevant: `VOICE_INPUT_ON`, `MIC_IN_BROWSER`, `LIVE2D`, `EXTRA_SYSTEM_PROMPT_RAG`, `AI_NAME`, `USER_NAME`, `SAVE_CHAT_HISTORY`, `CHAT_HISTORY_DIR`, `RAG_ON`, `LLMASSIST_RAG_ON`, `SAY_SENTENCE_SEPARATELY`, `MEMORY_SNAPSHOT`, `PRELOAD_MODELS`, `tts_on`. |
| 101 | + |
| 102 | + |
| 103 | +## ⚠️⚠️⚠️ Critical Upgrade Notice |
| 104 | + |
| 105 | + |
| 106 | +1. No Direct Upgrades - Previous installations are incompatible |
| 107 | + |
| 108 | +2. Fresh Install Required - Follow new documentation |
| 109 | + |
| 110 | +3. Config Changes - Back up existing configurations before migration |
| 111 | + |
| 112 | +### Why the Hassle? 💡 |
| 113 | + |
| 114 | +1. UV dependency manager replaces legacy systems |
| 115 | +2. Complete configuration schema overhaul |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +Please check out the [new documentation](https://open-llm-vtuber.github.io/docs/quick-start/) to install Open-LLM-VTuber again. Fortunately, thanks to `uv,` there should be fewer headaches during installation. |
| 120 | + |
| 121 | + |
| 122 | +## 🎉 Contributors |
| 123 | +- @t41372, which is me |
| 124 | +- @ylxmf2005, the creator of the new frontend, implemented LLM vision capability, chat history management, TTS concurrency, hume AI agent, better sentence division, a better live2d configuration, countless bug fixes, and more. He also wrote the majority of the documentation and provided countless insights. The version `v1.0.0` was a close collaboration with him and wouldn't have existed without his tremendous contribution. |
| 125 | +- @Stewitch, who added the hf_mirror option and is currently working on a launcher for this project to streamline the installation and configuration process. It's still a work in progress but will be completed very soon. https://github.com/Stewitch/LiZhen |
| 126 | +- @Fluchw, who added Tecent translator and helped us fix the translator bug. |
| 127 | + |
| 128 | +And all the other contributors who worked on this project in previous versions. |
| 129 | + |
| 130 | + |
| 131 | +**Full Changelog**: https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/compare/v0.5.2...v1.0.0 |
| 132 | + |
| 133 | + |
| 134 | +## Faster download links for Chinese users 给内地用户准备的(相对)快速的下载链接 |
| 135 | +Open-LLM-VTuber-v1.0.3.zip (包含 sherpa onnx asr 的 sense-voice 模型,就不用再从github上拉取了) |
| 136 | +- https://pub-17317087be374bc68161ac63de2022a5.r2.dev/v1.0.3/Open-LLM-VTuber-v1.0.3.zip |
| 137 | + |
| 138 | +open-llm-vtuber-electron-1.0.0-frontend.exe (桌面版前端,Windows) |
| 139 | +- https://pub-17317087be374bc68161ac63de2022a5.r2.dev/v1.0.3/open-llm-vtuber-electron-1.0.0-setup.exe |
| 140 | + |
| 141 | +open-llm-vtuber-electron-1.0.0-frontend.dmg (桌面版前端,macOS) |
| 142 | +- https://pub-17317087be374bc68161ac63de2022a5.r2.dev/v1.0.3/open-llm-vtuber-electron-1.0.0.dmg |
0 commit comments