Skip to main content
The Speech Input node captures the caller’s spoken words using automatic speech recognition (ASR) and stores the transcript and confidence score.

Behavior

  • Listens for the caller’s speech using the configured ASR model
  • Returns a transcript with a confidence score
  • If confidence is below the threshold, emits SPEECH.NO_MATCH
  • Supports grammar constraints and recognition hints for improved accuracy
  • Falls back to DTMF input if enabled

Configuration

Recognition

ParameterTypeDefaultRange / OptionsDescription
languagestring"en-US"Language code for recognition
modelenum"default"default, enhanced, medical, phone_callASR model selection
confidenceThresholdnumber0.70–1Minimum confidence score to accept (below this triggers NO_MATCH)
grammarstring""Grammar constraint (SRGS format or plain text word list)
hintsstring[][]Recognition hints — words or phrases the ASR model should prioritize
partialResultsbooleanfalseReturn partial (interim) results during recognition

Timing & retries

ParameterTypeDefaultRangeDescription
timeoutnumber101–60Maximum listening duration in seconds
silenceTimeoutnumber31–15Stop listening after this many seconds of silence
retriesnumber21–5Number of retry attempts on no match

Output & fallback

ParameterTypeDefaultDescription
variableNamestring"speech_input"Variable to store the recognized transcript
fallbackToDTMFbooleantrueAccept DTMF input if speech recognition fails

ASR models

ModelBest for
defaultGeneral-purpose recognition
enhancedHigher accuracy with larger vocabulary
medicalMedical terminology and clinical conversations
phone_callOptimized for telephony audio quality

Output handles

HandleDescription
SuccessSpeech recognized with confidence above threshold
No MatchSpeech detected but confidence below threshold, or no matching grammar
TimeoutNo speech detected within timeout

Output variables

VariableTypeDescription
transcriptstringThe recognized speech text
confidencenumberRecognition confidence score (0–1)

Use cases

Set hints to common first names and confidenceThreshold: 0.6 for flexible name recognition. Store the result in {{customer_name}}.
Set grammar to "yes no" and model: "phone_call" for reliable binary responses over phone lines.
Use model: "medical" with hints for medication names and symptoms. Set silenceTimeout: 5 to give patients time to think.