Skip to main content
The Text to Speech (TTS) node converts text into spoken audio and plays it to the caller in real time.

Behavior

  • Sends text to the configured TTS provider
  • Streams the synthesized audio to the caller
  • Supports variable interpolation in the text (e.g., {{customer_name}})
  • Emits TTS.END on completion or TTS.ERROR on failure

Configuration

ParameterTypeDefaultRange / OptionsDescription
textstring""The text to speak. Supports {{variable}} interpolation.
providerenum"ElevenlabsTTSConfig"Provider-specificTTS provider to use
modelenum"eleven_flash_v2_5"Model-specificTTS model identifier
voicestring"default"Voice identifier
voiceIdstring""Specific voice ID from the provider
languagestring"en"Language code (e.g., en, en-US, es)
speednumber1.00.5–2.0Speech rate multiplier
pitchnumber1.00.5–2.0Speech pitch multiplier
emphasisenum"moderate"none, moderate, strong, reducedEmphasis level for speech delivery

Output handles

HandleDescription
EndSpeech playback completed — flow continues to the next node
ErrorTTS generation or playback failed

Use cases

Set text to "Hello {{customer_name}}, thank you for calling." after looking up the caller in a Database node.
Set speed: 0.8 for complex instructions that callers need time to process.
Use a Switch on language preference, then route to separate TTS nodes with matching language and voiceId configurations.