This version of the Speechly documentation is no longer actively maintained. For up-to-date documentation, see the latest version.

Client Library API Reference

LibraryInstallation and usageAPI Reference
JavaScriptInstallation and usage (GitHub)API reference (TypeDoc in GitHub)
ReactInstallation and usageAPI reference (TypeDoc in GitHub)
Unity and C#Installation and usage (GitHub)API reference (DocFX)
Android (Kotlin)Installation and usage (GitHub)
iOS (Swift)Installation and usage (GitHub)  

See also Web Speech API, Web Components, Unreal Engine 4 and gRPC protobuf definitions.

Overview

Preparing the client library and the audio source

// Pseudocode. Preparation steps may vary per client library.
// See specific API reference for details.
speechly_client = new SpeechlyClient()
speechly_client.initialize()
// Open the microphone or audio source before attaching
speechly_client.attach( audio_source )

Preparation consists of the following tasks:

  1. Creating the client instance.
  2. Initializing Speechly’s speech recognition engine.
  3. Opening and attaching an audio source (e.g. microphone).

An authorization token needs to be provided during the preparation steps. Either an app_id or project_id can be used. These can be acquired via the Dashboard or Command Line Tool.


Provide a handler for the results

// Pseudocode. See specific API reference for details.
speechly_client.onSegmentChange( fn(segment) )

fn is the function you pass for handling the words, intents and entities detected by the speech recognition engine. They are passed in a segment structure.

As the user speaks, the handler is called repeatedly with an updated results.


Start speech processing

// Pseudocode
context_id = await speechly_client.start( app_id )

Starts streaming audio from the microphone (or other audio source) to the speech recognition engine. The client library gathers result events from the HTTP/gRPC API and fires onSegmentChange and other relevant callbacks.

project_id authorization during the preparation allows you to direct the audio to any app configuration within the project by providing the app_id argument.


Stop speech processing

// Pseudocode
await speechly_client.stop()

Stops streaming audio to the speech recognition engine and wait for remaining results to arrive. Callbacks fire until the audio stream has been fully processed.


The Segment data structures

// Pseudocode
struct Segment {
    contextId: string,
    id: int,
    isFinal: boolean,
    intent: Intent,
    entities: list<Entity>,
    words: list<Transcript>
}
NameTypeDescription
contextIdstringThe audio context to which this segment belongs to (UUID).
idintThe index (zero-based) of this segment within the audio context. An audio context can consist of several consecutive segments.
isFinalbooleanA boolean that indicates if this is the last time callback is called with this segment. Subsequent calls to callback within the same audio context refer to the next segment. Note that none of the data associated with this segment will no longer be attached to the next segment.
intentSpeechIntentThe intent associated with this segment. There can only be one intent for a segment.
entitiesList<Entity>A list of entities. There can be several entities that belong to the same segment.
wordsList<Transcript>A list of Transcript objects. Together these contain the text produced by speech recognition.

Intent

Intent { name: string, isFinal: boolean }
NameTypeDescription
namestringName of the intent.
isFinalbooleanBoolean that indicates if the intent name is finalised. When isFinal is false it is possible that in subsequent calls to callback the name of the intent can change. When isFinal is true, it is guaranteed that the intent name does not change until the segment changes.

Entity

Entity { name: string, value: string, isFinal: boolean,
         startIndex: int, endIndex: int }
NameTypeDescription
typestringThe name of the entity.
valuestringThe value of the entity.
isFinalbooleanBoolean that indicates if the entity is finalised. Behaves in the same way as Intent.isFinal.
startIndexintIndex of the Transcript that contains the first token of the transcript span this entity was extracted from.
endIndexintIndex of the Transcript that contains the first token of the transcript span this entity was extracted from.

Transcript

Transcript { index: int, value: string, isFinal: boolean }
NameTypeDescription
indexintPosition of this Transcript in the complete transcript.
valuestringThe word of this Transcript.
isFinalbooleanBoolean that indicates if the word associated with this Transcript is final, or if it can change in subsequent calls to callback.

Last updated by Mathias Lindholm on October 25, 2022 at 21:29 +0300

Found an error on our documentation? Please file an issue or make a pull request