[Platform] Introduce `Speech` support #943

Guikingone · 2025-11-22T18:04:39Z

Q	A
Bug fix?	no
New feature?	yes
Docs?	yes
Issues	--
License	MIT

Summary:

Support for stt / tts and sts
Interfaces for SpeechProviders and SpeechListeners
Introduction of a SpeechConfiguration
New configuration block for configuring providers (requires a platform each time)
Introduce SpeechBag and SpeechAwareTrait

OskarStark · 2025-11-23T09:30:23Z

To me we maybe should introduce capabilities also to platforms rather than having a voice component. As far as I understand I cannot use the Voice component standalone, right?

I don't think a dedicated component is the way to go here

Guikingone · 2025-11-23T09:32:20Z

We can introduce it via the Platform, could be easier, the voice can be used without agents but it will requires the Platform at least.

Will update the PR to match this approach 👍🏻

OskarStark · 2025-11-23T09:33:36Z

I agree, Agent scope is not needed 👍🏻

chr-hertel · 2025-11-23T10:29:49Z

Hi @Guikingone, i agree that week lack some kind of guidance on how voices work - but same goes for other binary stuff like creating images or videos.

so two things i would like to understand

what's the high-level goal here - like what do you want to build?
why is it an extra component and not part of Platform?

btw, "speech" is more common than "vioce" isn't it?
btw2, have you seen the demo around audio and video?

Guikingone · 2025-11-23T10:35:36Z

what's the high-level goal here - like what do you want to build?

The main goal is to add the capacity to have an agent/platform that can "listen" and answer to inputs thanks to voice / speech (voice is used as a sugar here, could be renamed to speech), creating a workflow where you can submit voice, call the platform that transforms it to speech / text (depending on the situation you're in) and returning it to the user without frictions.

why is it an extra component and not part of Platform?

It is now part of Platform, I just pushed an update on it following the comment from @OskarStark.

btw, "speech" is more common than "voice" isn't it?

Agreed, could be renamed to Speech.

btw2, have you seen the demo around audio and video?

Yes, the goal is to ease it with a "built-in" approach / API that stays transparent for the user.

chr-hertel · 2025-11-23T10:50:15Z

just realized we should the "audio" demo to "speech" as well - and i'm def not really happy with that solution there.

can we make it as easy as the structured output - like with an listener?

i like that starting point:

$result = $platform->invoke('eleven_multilingual_v2', new Text('Hello world'), [
    'voice' => 'Dslrhjl3ZpzrctukrQSN', // Brad (https://elevenlabs.io/app/voice-library?voiceId=Dslrhjl3ZpzrctukrQSN)
]);

echo $result->asVoice();

what would be the return type here? would it be same as asBinary() or asDataUri()

Guikingone · 2025-11-23T14:42:20Z

can we make it as easy as the structured output - like with an listener?

Could be something to explore, the API is not locked for now.

what would be the return type here? would it be same as asBinary() or asDataUri()

My first approach was to do the same thing as asBinary to ease the usage.

src/agent/composer.json

This PR was merged into the main branch. Discussion ---------- [Demo][Website] Rename audio demo to speech | Q | A | ------------- | --- | Bug fix? | no | New feature? | no | Docs? | | Issues | | License | MIT Following a discussion of #943 Commits ------- ffc2b64 Rename audio demo to speech

Guikingone · 2025-11-25T12:56:56Z

Well, might seems weird but here we go, stt, tts and sts are working like a charm ... 👀

src/agent/src/Agent.php

src/ai-bundle/config/options.php

chr-hertel · 2025-12-07T23:16:21Z

Haven't made my mind up here yet - it's not clicking with me - currently that change spreads across the entire platform.

is it an option to have it as subscriber for the platform, like STT on invoke and TTS on result? instead of extending the platform interface?

Guikingone · 2025-12-08T07:57:27Z

Haven't made my mind up here yet - it's not clicking with me - currently that change spreads across the entire platform.

I have a solution for this point, thought about this and I have a solution, it would remove adding SpeechConfiguration in every platform.

is it an option to have it as subscriber for the platform, like STT on invoke and TTS on result? instead of extending the platform interface?

If it's related to previous point, let me push a refactoring and we can push further on this one 🙂

Guikingone · 2025-12-08T18:36:40Z

Ok, here's the refactoring:

A new SpeechAwarePlatform is introduced, the goal is to decorate the "platforms" that supports STT / TTS. In the current state, ElevenLabs is the only one implemented and tested, we expose a getSpeechConfiguration method that returns the current configuration for STT / TTS, no more methods in existing platforms, extra arguments, nothing, just a decorator.
A new SpeechAwarePlatformInterface is introduced to enforce the usage of the method and provide a cleaner API for "speech-supporting" platforms and non-supporting ones, if your platform is not decorated, well, you're not supporting it.
This refactoring allows to keep the current API "as it" and introduce the speech support without breaking the public API, plus, it's only enabled if one platform that supports STT / TTS is configured, no extra services if not needed.
This also improve the current API for providers / listeners plus, it reduce the current usage of providers/listeners to Invocation / Result as you mentioned @chr-hertel.

PS: Not to mention that it reduce the usages of null in methods 😅

Guikingone force-pushed the agent/voice_provider branch from 2c573eb to 8dd5cd5 Compare November 23, 2025 09:30

Guikingone changed the title ~~[Voice] Introduce the component~~ [Platform] Introduce VoiceProviders and VoiceListeners Nov 23, 2025

Guikingone changed the title ~~[Platform] Introduce VoiceProviders and VoiceListeners~~ [Platform] Introduce Speech support via Platform Nov 23, 2025

OskarStark reviewed Nov 23, 2025

View reviewed changes

src/agent/composer.json Outdated Show resolved Hide resolved

Guikingone force-pushed the agent/voice_provider branch from 79ddf87 to f011c3e Compare November 23, 2025 17:41

chr-hertel mentioned this pull request Nov 23, 2025

[Demo][Website] Rename audio demo to speech #958

Merged

Guikingone force-pushed the agent/voice_provider branch from dcae952 to be04280 Compare November 24, 2025 14:32

OskarStark changed the title ~~[Platform] Introduce Speech support via Platform~~ [Platform] Introduce Speech support Nov 24, 2025

Guikingone force-pushed the agent/voice_provider branch from be04280 to b319521 Compare November 25, 2025 12:49

Guikingone force-pushed the agent/voice_provider branch 3 times, most recently from 120f391 to 1963409 Compare November 26, 2025 12:42

Guikingone marked this pull request as ready for review November 26, 2025 12:44

Guikingone requested review from Nyholm and chr-hertel as code owners November 26, 2025 12:44

carsonbot added Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review labels Nov 26, 2025

Guikingone marked this pull request as draft November 26, 2025 12:46

Guikingone marked this pull request as ready for review November 26, 2025 13:00

Guikingone force-pushed the agent/voice_provider branch from be85dda to 74bd8cb Compare November 26, 2025 13:00

Guikingone requested a review from OskarStark November 26, 2025 13:00

Guikingone force-pushed the agent/voice_provider branch from 74bd8cb to 2832ca3 Compare November 28, 2025 08:16

OskarStark reviewed Nov 28, 2025

View reviewed changes

src/agent/src/Agent.php Show resolved Hide resolved

src/ai-bundle/config/options.php Show resolved Hide resolved

Guikingone force-pushed the agent/voice_provider branch 5 times, most recently from 3a966f9 to 982bf06 Compare December 5, 2025 10:15

Guikingone requested a review from OskarStark December 5, 2025 10:16

Guikingone force-pushed the agent/voice_provider branch 4 times, most recently from 948eaa5 to 84b6f93 Compare December 7, 2025 19:45

Guikingone force-pushed the agent/voice_provider branch 2 times, most recently from 1b4e225 to a0f88b0 Compare December 8, 2025 18:31

Guikingone force-pushed the agent/voice_provider branch 4 times, most recently from 774f1be to e2ef357 Compare December 10, 2025 16:58

feat(platform): add Speech

f1eb5ce

Guikingone force-pushed the agent/voice_provider branch from e2ef357 to f1eb5ce Compare December 10, 2025 16:59

refactor(platform): simplified API

dc2519b

Uh oh!

[Platform] Introduce Speech support #943

Are you sure you want to change the base?

[Platform] Introduce Speech support #943

Uh oh!

Conversation

Guikingone commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OskarStark commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025

Uh oh!

OskarStark commented Nov 23, 2025

Uh oh!

chr-hertel commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chr-hertel commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025

Uh oh!

Uh oh!

Guikingone commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

chr-hertel commented Dec 7, 2025

Uh oh!

Guikingone commented Dec 8, 2025

Uh oh!

Guikingone commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Platform] Introduce `Speech` support #943

[Platform] Introduce `Speech` support #943

Guikingone commented Nov 22, 2025 •

edited

Loading

Guikingone commented Nov 23, 2025 •

edited

Loading

Guikingone commented Dec 8, 2025 •

edited

Loading