Microsoft Azure Speech Service SSML

Official resources

Speech Markdown formatter coverage

The microsoft-azure formatter supports Azure Text-to-Speech features including automatic MSTTS namespace injection and neural voice styles.

SSML Element Support Matrix

The following table shows which Azure SSML elements are supported by Speech Markdown:

SSML Element	Status	Speech Markdown Syntax	Notes
Core W3C SSML
`<speak>`	✅ Full	Automatic	Root element with automatic `xmlns:mstts` injection when needed
`<voice>`	✅ Full	`(text)[voice:"name"]` or `#[voice:"name"]`	Voice selection and switching
`<lang>`	✅ Full	`(text)[lang:"locale"]` or `#[lang:"locale"]`	Language/accent switching
`<p>`	✅ Full	Automatic (optional)	Paragraph tags via `includeParagraphTag` option
`<s>`	❌ Not supported	N/A	Sentence tags not implemented
`<break>`	✅ Full	`[break:"time"]` or `[break:"strength"]`	Pauses with time or strength
`<prosody>`	✅ Full	`(text)[rate:"value"]`, `[pitch:"value"]`, `[volume:"value"]`	Rate, pitch, volume control
`<say-as>`	✅ Partial	`(text)[address]`, `[number]`, `[ordinal]`, `[telephone]`, `[fraction]`, `[date:"format"]`, `[time:"format"]`, `[characters]`	Interpret-as types supported
`<phoneme>`	✅ Full	`(text)[ipa:"pronunciation"]`	IPA pronunciation
`<sub>`	✅ Full	`(text)[sub:"alias"]`	Text substitution
`<emphasis>`	✅ Full	`++text++` (moderate), `+text+` (strong), `--text--` (reduced), `-text-` (none)	Word-level stress with 4 levels
`<audio>`	✅ Full	`!audio("url")`	Audio file playback
`<bookmark>`	✅ Full	`[mark:"name"]`	Bookmark markers (generates `<bookmark mark="..."/>` for Azure SDK events)
`<lexicon>`	❌ Not supported	N/A	Not implemented (but supported by Azure TTS API)
`<math>`	❌ Not supported	N/A	Not implemented
Azure MSTTS Extensions
`<mstts:express-as>`	✅ Full	`(text)[style]` or `(text)[style:"degree"]`	33 styles with intensity control (0.01-2.0)
`<mstts:express-as role="">`	✅ Full	`(text)[style:"name";role:"value"]` or `(text)[excited:"1.5";role:"Girl"]`	Combine style with role attribute using semicolon delimiter
`<mstts:silence>`	❌ Not supported	N/A	Use `[break:"time"]` instead
`<mstts:dialog>` / `<mstts:turn>`	❌ Not supported	N/A	Multi-speaker dialog requires grammar extension
`<mstts:backgroundaudio>`	❌ Not supported	N/A	Use raw SSML passthrough
`<mstts:viseme>`	❌ Not supported	N/A	Use raw SSML passthrough
`<mstts:audioduration>`	❌ Not supported	N/A	Use raw SSML passthrough
`<mstts:ttsembedding>`	❌ Not supported	N/A	Use raw SSML passthrough
`<mstts:voiceconversion>`	❌ Not supported	N/A	Use raw SSML passthrough

Core SSML Features

Say-as conversions: address, fraction, ordinal, telephone, number, characters map to <say-as> with automatic cardinal or digits selection for numeric text
Dates and times: Default formats are ymd for dates and hms12 for times
Pronunciation: sub and ipa modifiers generate <sub alias="..."> and <phoneme alphabet="ipa" ph="..."> tags
Prosody: Rate, pitch, and volume modifiers control <prosody> attributes
Voice selection: voice modifier generates <voice name="..."> tags for switching between Azure neural voices
Audio playback: !audio("url") generates <audio src="url"> tags

Azure MSTTS Extensions

Automatic Namespace Injection

The formatter automatically detects when Azure-specific MSTTS tags are present in the generated SSML and injects the required xmlns:mstts="https://www.w3.org/2001/mstts" namespace declaration into the <speak> tag. This ensures valid SSML without manual intervention.

Express-As Styles

Azure neural voices support 33 emotional and scenario-specific speaking styles through the mstts:express-as element.

Emotional Styles:

(text)[excited] - Excited, enthusiastic delivery
(text)[disappointed] - Disappointed, let-down delivery
(text)[friendly] - Warm, friendly delivery
(text)[cheerful] - Upbeat, cheerful delivery
(text)[sad] - Sad, melancholic delivery
(text)[angry] - Angry, irritated delivery
(text)[fearful] - Fearful, anxious delivery
(text)[empathetic] - Caring, empathetic delivery
(text)[calm] - Calm, composed delivery
(text)[hopeful] - Hopeful, optimistic delivery
(text)[terrified] - Terrified, extremely fearful delivery
(text)[unfriendly] - Cold, unfriendly delivery
(text)[gentle] - Gentle, soft delivery
(text)[serious] - Serious, formal delivery
(text)[depressed] - Depressed, low-energy delivery
(text)[embarrassed] - Embarrassed, awkward delivery
(text)[disgruntled] - Disgruntled, annoyed delivery
(text)[envious] - Envious, jealous delivery
(text)[affectionate] - Affectionate, loving delivery

Scenario-Specific Styles:

(text)[newscaster] - News broadcast style
(text)[shouting] - Shouting, loud delivery
(text)[whispering] - Whispering, quiet delivery
(text)[lyrical] - Lyrical, singing-like delivery
(text)[assistant] - Digital assistant style
(text)[chat] - Casual chat style
(text)[customerservice] - Customer service style
(text)[poetry-reading] - Poetry reading style (section-level only)
(text)[narration-professional] - Professional narration style (section-level only)
(text)[narration-relaxed] - Soothing, melodious narration style (section-level only)
(text)[newscast-casual] - Casual news style (section-level only)
(text)[newscast-formal] - Formal, confident, authoritative news style (section-level only)
(text)[documentary-narration] - Relaxed, interested documentary style (section-level only)
(text)[advertisement_upbeat] - Excited, high-energy advertising style
(text)[sports_commentary] - Relaxed, interested sports broadcasting style
(text)[sports_commentary_excited] - Intensive, energetic sports broadcasting style

Style Degree:

Control style intensity with numeric values between 0.01 and 2.0 (default 1.0):

(This is slightly excited)[excited:"0.5"]
(This is very excited)[excited:"1.8"]

Role Attribute:

Combine style with character voice roles using semicolon-delimited syntax:

(Hello there!)[excited;role:"Girl"]
(Hello there!)[excited:"1.5";role:"Girl"]
(Bonjour!)[style:"friendly";role:"YoungAdultFemale"]

Supported roles: Girl, Boy, YoungAdultFemale, YoungAdultMale, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale

Note: Role support varies by voice. Check the Azure voice gallery for availability.

Section-Level Styles:

Apply styles to entire sections:

#[excited]
This entire section is excited!
Multiple sentences work too.

Language Switching

Switch languages or accents using the lang modifier:

In Paris, they pronounce it (Paris)[lang:"fr-FR"]. #[voice:"Brian"][lang:"en-GB"]
This section uses Brian's voice with a British accent. #[voice][lang]

Unsupported Features

The following Azure SSML features are not supported by Speech Markdown. Use raw SSML passthrough for these features.

Multi-Speaker Dialog

Azure multi-talker voices (e.g., en-US-MultiTalker-Ava-Andrew:DragonHDLatestNeural) support mstts:dialog and mstts:turn elements for conversational exchanges. Requires grammar extension.

Example:

<speak xmlns:mstts='https://www.w3.org/2001/mstts'>
  <voice name='en-US-MultiTalker-Ava-Andrew:DragonHDLatestNeural'>
    <mstts:dialog>
      <mstts:turn speaker="ava">Hello, Andrew!</mstts:turn>
      <mstts:turn speaker="andrew">Hey Ava!</mstts:turn>
    </mstts:dialog>
  </voice>
</speak>

Advanced MSTTS Features

Not implemented. Use raw SSML passthrough:

<mstts:silence> - Precise silence control (use [break:"time"] as alternative)
<mstts:backgroundaudio> - Background audio with fade in/out
<mstts:viseme> - Viseme output for lip-sync
<mstts:audioduration> - Audio duration control
<mstts:ttsembedding> - Custom voice embedding
<mstts:voiceconversion> - Voice conversion

Other W3C SSML Elements

Not implemented. Use raw SSML passthrough:

<lexicon> - Custom pronunciation lexicons (use [ipa:"pronunciation"] for individual words)
<math> - MathML content
<s> - Sentence boundaries (use punctuation)

Disabled Say-As Types

The following say-as types are disabled because Azure does not support them:

expletive - Bleep out profanity
interjection - Interjection pronunciation
unit - Unit pronunciation

Platform Comparison

Azure vs Amazon Alexa

Azure:

33 express-as styles vs Alexa's 2 emotions (excited, disappointed)
Numeric style intensity (0.01-2.0) vs Alexa's 3 levels (low, medium, high)
8 role attributes for character voices
Multi-speaker dialog support (mstts:dialog)
Automatic namespace injection

Alexa:

amazon:effect for whisper
amazon:domain for music and news long-form content
amazon:auto-breaths and amazon:breath for natural pauses
Speechcons and interjections

Both:

Standard SSML (say-as, prosody, phoneme, sub, break)
Voice selection and language switching
Newscaster/news style

Azure vs Google Assistant

Azure:

33 express-as styles vs Google's 0 emotional styles
8 role attributes for character voices
Multi-speaker dialog support
Automatic namespace injection

Google:

Simpler SSML dialect
Better cross-platform compatibility

Both:

Standard SSML (say-as, prosody, phoneme, sub, break)
Voice selection and language switching

Voice Catalogue

Run npm run docs:update-voices with AZURE_SPEECH_KEY/AZURE_SPEECH_REGION or MICROSOFT_TOKEN/MICROSOFT_REGION environment variables to generate data/azure-voices.md. This file lists voice names, locales, genders, types, styles, and sample rates from the Speech Service REST API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microsoft Azure Speech Service SSML

Official resources

Speech Markdown formatter coverage

SSML Element Support Matrix

Core SSML Features

Azure MSTTS Extensions

Automatic Namespace Injection

Express-As Styles

Language Switching

Unsupported Features

Multi-Speaker Dialog

Advanced MSTTS Features

Other W3C SSML Elements

Disabled Say-As Types

Platform Comparison

Azure vs Amazon Alexa

Azure vs Google Assistant

Voice Catalogue

FilesExpand file tree

azure.md

Latest commit

History

azure.md

File metadata and controls

Microsoft Azure Speech Service SSML

Official resources

Speech Markdown formatter coverage

SSML Element Support Matrix

Core SSML Features

Azure MSTTS Extensions

Automatic Namespace Injection

Express-As Styles

Language Switching

Unsupported Features

Multi-Speaker Dialog

Advanced MSTTS Features

Other W3C SSML Elements

Disabled Say-As Types

Platform Comparison

Azure vs Amazon Alexa

Azure vs Google Assistant

Voice Catalogue