The following lists the most important APIs which are used for spoken language understanding. This document is created based on the API definitions in Github.

speechly.slu.v1.BatchAPI

Run SLU operations on audio sources without actively waiting the results.

Methods

name	request	response	description
ProcessAudio	ProcessAudioRequest stream	ProcessAudioResponse	Create a new background SLU operation for a single audio source. An audio source can be - audio chunks sent via repeated ProcessAudioRequests, or - URI of a file, reachable from the API The response includes an `id` that is used to match the operation to the results. A `reference` identifier can also be set. The destination can be a webhook URL, in which case the results are posted there when they are ready. The payload is an instance of `Operation`.
QueryStatus	QueryStatusRequest	QueryStatusResponse	Query the status of a given batch operation. If the `ProcessAudioRequest` did not define a `results_uri` as a destination, the results are returned in the `QueryStatusResponse`.

speechly.slu.v1.SLU

Service that implements Speechly SLU (Spoken Language Understanding) API.

To use this service you MUST use an access token from Speechly Identity API. The token MUST be passed in gRPC metadata with Authorization key and Bearer ACCESS_TOKEN as value, e.g. in Go:

ctx := context.Background()
ctx = metadata.AppendToOutgoingContext(ctx, "Authorization", "Bearer "+accessToken)
stream, err := speechlySLUClient.Stream(ctx)

Methods

name	request	response	description
Stream	SLURequest stream	SLUResponse stream	Performs bidirectional streaming speech recognition: receive results while sending audio. First request MUST be an SLUConfig message with the configuration that describes the audio format being sent. This RPC can handle multiple logical audio segments with the use of `SLUEvent_START` and `SLUEvent_STOP` messages, which are used to indicate the beginning and the end of a segment. A typical call timeline will look like this: 1. Client starts the RPC. 2. Client sends `SLUConfig` message with audio configuration. 3. Client sends `SLUEvent.START`. 4. Client sends audio and receives responses from the server. 5. Client sends `SLUEvent.STOP`. 6. Client sends `SLUEvent.START`. 7. Client sends audio and receives responses from the server. 8. Client sends `SLUEvent.STOP`. 9. Client closes the stream and receives responses from the server until EOF is received. NB: the client does not have to wait until the server acknowledges the start / stop events, this is done asynchronously. The client can deduplicate responses based on the audio context ID, which will be present in every response message.

speechly.slu.v1.WLU

Service that implements Speechly WLU (Written Language Understanding).

To use this service you MUST use an access token from Speechly Identity API. The token MUST be passed in gRPC metadata with Authorization key and Bearer ACCESS_TOKEN as value, e.g. in Go:

ctx := context.Background()
ctx = metadata.AppendToOutgoingContext(ctx, "Authorization", "Bearer "+accessToken)
res, err := speechlyWLUClient.Text(ctx, req)

Methods

name	request	response	description
Text	WLURequest	WLUResponse	Performs recognition of a text with specified language.
Texts	TextsRequest	TextsResponse	Performs recognition of a batch of texts with specified language.

Messages

AudioConfiguration
Operation
Option
Option
Option
ProcessAudioRequest
ProcessAudioResponse
QueryStatusRequest
QueryStatusResponse
RoundTripMeasurementRequest
RoundTripMeasurementResponse
SLUConfig
SLUEntity
SLUError
SLUEvent
SLUFinished
SLUIntent
SLURequest
SLUResponse
SLUSegmentEnd
SLUStart
SLUStarted
SLUStop
SLUTentativeEntities
SLUTentativeTranscript
SLUTranscript
TextsRequest
TextsResponse
Transcript
WLUEntity
WLUIntent
WLURequest
WLUResponse
WLUSegment
WLUToken

AudioConfiguration

Describes the audio content of the batch operation.

Fields

name	type	description
encoding	Encoding	The encoding of the audio data sent in the stream. Required.
channels	int32	The number of channels in the input audio data. Required.
sample_rate_hertz	int32	Sample rate in Hertz of the audio data sent in the stream (e.g. 16000). Required.
language_codes	string	The language(s) of the audio sent in the stream as a BCP-47 language tag (e.g. “en-US”). Defaults to the target application language. Optional.

Operation

Describes a single batch operation.

Fields

name	type	description
id	string	The id of the operation.
reference	string	The reference id of the operation, if given.
status	Status	The current status of the operation.
language_code	string	The language code of the detected language.
app_id	string	The application context for the operation.
device_id	string	The device or microphone id for the audio, if applicable.
transcripts	Transcript	If the operation status is STATUS_DONE and the destination is not set, the results of the operation.

Option

Option to change the default behaviour of the SLU.

Fields

name	type	description
key	string	The key of the option to be set.
value	string	The values to set the option to.

SLUConfig.Option

Option to change the default behaviour of the SLU.

Fields

name	type	description
key	string	The key of the option to be set.
value	string	The values to set the option to.

SLUStart.Option

Option to change the default behaviour of the SLU.

Fields

name	type	description
key	string	The key of the option to be set.
value	string	The values to set the option to.

ProcessAudioRequest

If sending a stream of ProcessAudioRequest messages, the first one must contain the AudioConfiguration for the audio data. The config is ignored in the following messages.

Fields

name	type	description
app_id	string	The processing context, Speechly application ID. Required.
config	AudioConfiguration	Audio configuration. Required.
audio	bytes	Raw audio data.
uri	string	URI of audio data.
results_uri	string	The results JSON will be posted to the given URI. If not given, the results must be fetched using `QueryStatus`. Optional.
reference	string	Reference id for the operation. For example an identifier of the source system. Optional.
options	Option	Additional operation specific options. Optional.

ProcessAudioResponse

Fields

name	type	description
operation	Operation	The details of the created operation.

QueryStatusRequest

Query the status of an operation. Either id or reference must be given.

Fields

name	type	description
id	string	ID of an audio processing operation.
reference	string	Reference ID of an operation.

QueryStatusResponse

Fields

name	type	description
operation	Operation	The details of the audio processing operation.

RoundTripMeasurementRequest

Network latency measurement request. Sent from the server to measure the time it takes for the client to receive a message and the server to receive the client’s response. Also known as RTT.

Fields

name	type	description
id	int32	Measurement id. Multiple measurements can be sent during one connection, so the response should contain the same `id` as in the request.

RoundTripMeasurementResponse

Response sent from the client immediately after seeing the RoundTripMeasurementRequest.

Fields

name	type	description
id	int32	`id` should match the request’s id.

SLUConfig

Describes the configuration of the audio sent by the client. Currently the API only supports single-channel Linear PCM with sample rate of 16 kHz.

Fields

name	type	description
encoding	Encoding	The encoding of the audio data sent in the stream. Required.
channels	int32	The number of channels in the input audio data. Required.
sample_rate_hertz	int32	Sample rate in Hertz of the audio data sent in the stream. Required.
language_code	string	The language of the audio sent in the stream as a BCP-47 language tag (e.g. “en-US”). Defaults to the target application language.
options	Option	Special options to change the default behaviour of the SLU for all logical audio segment.

SLUEntity

Describes an SLU entity.

An entity is a specific object in the phrase that falls into some kind of category, e.g. in a SAL example “*book book a burger restaurant for tomorrow” “burger restaurant” would be an entity of type restaurant_type, and “tomorrow” would be an entity of type date.

An entity has a start and end indices which map to the indices of words in SLUTranscript messages, e.g. in the example “book a burger restaurant for tomorrow” it would be:

Entity “burger restaurant” - start_position = 2, end_position = 3
Entity “tomorrow” - start_position = 5, end_position = 5

The start index is inclusive, but the end index is exclusive, i.e. the interval is [start_position, end_position).

Fields

name	type	description
entity	string	The type of the entity, e.g. `restaurant_type` or `date`.
value	string	The value of the entity, e.g. `burger restaurant` or `tomorrow`.
start_position	int32	The starting index of the entity in the phrase, maps to the `index` field in `SLUTranscript`. Inclusive.
end_position	int32	The finishing index of the entity in the phrase, maps to the `index` field in `SLUTranscript`. Exclusive.

SLUError

Describes the error that happened when processing an audio context. DEPRECATED: Will not be returned. Any errors are returned as gRCP status codes with detail messages.

Fields

name	type	description
code	string	Error code (refer to documentation for specific codes).
message	string	Error message.

SLUEvent

Indicates the beginning and the end of a logical audio segment (audio context in Speechly terms).

Fields

name	type	description
event	Event	The event type being sent. Required.
app_id	string	The `appId` for the utterance. Required in the `START` event if the authorization token is project based. The given application must be part of the project set in the token. Not required if the authorization token is application based.

SLUFinished

Indicates that the API has stopped processing current audio context. It guarantees that no new messages for that context will be sent by the server.

Fields

name	type	description
error	SLUError	DEPRECATED An error which has happened when processing the context, if any.

SLUIntent

Describes an SLU intent. There can be only one intent per SLU segment.

Fields

name	type	description
intent	string	The value of the intent, as defined in SAL.

SLURequest

Top-level message sent by the client for the Stream method.

Fields

name	type	description
config	SLUConfig	Describes the configuration of the audio sent by the client. MUST be the first message sent to the stream.
event	SLUEvent	Indicates the beginning and the end of a logical audio segment (audio context in Speechly terms). A context MUST be preceded by a start event and concluded with a stop event, otherwise the server WILL terminate the stream with an error. DEPRECATED in favour of SLUStart and SLUStop
audio	bytes	Contains a chunk of the audio being streamed.
rtt_response	RoundTripMeasurementResponse	Response to an RTT measurement request from server. Should be sent immediately after receiving the RoundTripMeasurementRequest in the stream. If ignored, no round trip measurements are made.
start	SLUStart	Indicates the beginning of a logical audio segment (audio context in Speechly terms). A context MUST be preceded by a SLUStart, (or the deprecated SLUEvent start event) otherwise the server WILL terminate the stream with an error.
stop	SLUStop	Indicates the end of a logical audio segment (audio context in Speechly terms). A context MUST be concluded with a SLUStop, (or the deprecated SLUEvent stop event) otherwise the server WILL terminate the stream with an error.

SLUResponse

Top-level message sent by the server for the Stream method.

Fields

name	type	description
audio_context	string	The ID of the audio context that this response belongs to.
segment_id	int32	The ID of the SLU segment that this response belongs to. This will be 0 for SLUStarted and SLUFinished responses.
transcript	SLUTranscript	Final SLU transcript.
entity	SLUEntity	Final SLU entity.
intent	SLUIntent	Final SLU intent.
segment_end	SLUSegmentEnd	A special marker message that indicates that the segment with specified `segment_id` has been finalised and no new responses belonging to that segment will be sent. The client is expected to discard any tentative responses in this segment.
tentative_transcript	SLUTentativeTranscript	Tentative SLU transcript.
tentative_entities	SLUTentativeEntities	Tentative SLU entities.
tentative_intent	SLUIntent	Tentative SLU intent.
started	SLUStarted	A special marker message that indicates that the audio context with specified `audio_context` id has been started by the API and all audio data sent by the client will be processed in that context. This message is an asynchronous acknowledgement for client-side SLUEvent_START message.
finished	SLUFinished	A special marker message that indicates that the audio context with specified `audio_context` id has been stopped by the API and no new responses for that context will be sent. The client is expected to discard any non-finalised segments. This message is an asynchronous acknowledgement for client-side SLUEvent_STOP message.
rtt_request	RoundTripMeasurementRequest	Initiates a round trip network latency measurement. The response handler should respond to this message by sending a RoundTripMeasurementResponse in the request stream. The measurement is stored server side and used to minimise the latency in the future.

SLUSegmentEnd

Indicates the end of the segment. Upon receiving this, the segment should be finalised and all future messages for that segment (if any) discarded.

Fields

name	type	description

SLUStart

Indicates the beginning and the end of a logical audio segment (audio context in Speechly terms).

Fields

name	type	description
app_id	string	The `appId` for the utterance. Required if the authorization token is project based. The given application must be part of the project set in the token. Not required if the authorization token is application based.
options	Option	Special options to change the default behaviour of the SLU for this audio segment.

SLUStarted

Indicates that the API has started processing the portion of audio as new audio context. This does not guarantee that the server will not send any more messages for the previous audio context.

Fields

name	type	description

SLUStop

Indicates the end of a logical audio segment (audio context in Speechly terms).

Fields

name	type	description

SLUTentativeEntities

Describes tentative entities.

Fields

name	type	description
tentative_entities	SLUEntity	A list of entities, which must be treated as tentative. This is not an aggregate of all entities in the audio, but rather it ONLY contains entities that have not been finalised yet. e.g. if at the start there are two tentatively recognised entities - [“burger restaurant”, “tomorrow”] but then the API marks “burger restaurant” as final and recognises a new tentative entity “for two”, this will contain [“tomorrow”, “for two”].

SLUTentativeTranscript

Describes a tentative transcript.

Tentative transcript is an interim recognition result, which may change over time, e.g. a phrase “find me a red t-shirt” can be tentatively recognised as “find me a tea”, until the API processes the audio completely.

Fields

name	type	description
tentative_transcript	string	Aggregated tentative transcript from the beginning of the audio until current moment in time. Consecutive transcripts will have this value appended to, e.g. if in the first message it’s “find me”, in the next it may be “find me a t-shirt”.
tentative_words	SLUTranscript	A list of individual words which compose `tentative_transcript`. All words must be considered tentative.

SLUTranscript

Describes an SLU transcript. A transcript is a speech-to-text element of the phrase, i.e. a word recognised from the audio.

Fields

name	type	description
word	string	The word recongised from the audio.
index	int32	The position of the word in the whole phrase, zero-based.
start_time	int32	The end time of the word in the audio, in milliseconds from the beginning of the audio.
end_time	int32	The end time of the word in the audio, in milliseconds from the beginning of the audio.

TextsRequest

Top-level message sent by the client for the Texts method.

Fields

name	type	description
app_id	string	The target application for the texts request. Required.
requests	WLURequest	List of WLURequest. Required.

TextsResponse

Top-level message sent by the server for the Texts method.

Fields

name	type	description
responses	WLUResponse	List of WLUResponses. Required.

Transcript

Describes an SLU transcript. A transcript is a speech-to-text element of the phrase, i.e. a word recognised from the audio.

Fields

name	type	description
word	string	The word recongised from the audio.
index	int32	The position of the word in the whole phrase, zero-based.
start_time	int32	The end time of the word in the audio, in milliseconds from the beginning of the audio.
end_time	int32	The end time of the word in the audio, in milliseconds from the beginning of the audio.

WLUEntity

Describes a single entity in a segment.

An entity has a start and end indices which map to the indices of words in WLUToken messages, e.g. in the example “book a burger restaurant for tomorrow” it would be:

Entity “burger restaurant” - start_position = 2, end_position = 3
Entity “tomorrow” - start_position = 5, end_position = 5

The start index is inclusive, but the end index is exclusive, i.e. the interval is [start_position, end_position).

Fields

name	type	description
entity	string	The type of the entity, e.g. `restaurant_type` or `date`.
value	string	The value of the entity, e.g. `burger restaurant` or `tomorrow`.
start_position	int32	The starting index of the entity in the phrase, maps to the `index` field in `SLUTranscript`. Inclusive.
end_position	int32	The finishing index of the entity in the phrase, maps to the `index` field in `SLUTranscript`. Exclusive.

WLUIntent

Describes the intent of a segment. There can only be one intent per segment.

Fields

name	type	description
intent	string	The value of the intent, as defined in SAL.

WLURequest

Top-level message sent by the client for the Text method.

Fields

name	type	description
language_code	string	The language of the text sent in the request as a BCP-47 language tag (e.g. “en-US”). Required.
text	string	The text to recognise. Required.
reference_time	Timestamp	The reference time for postprocessing. By default, the current date is used. Optional.

WLUResponse

Top-level message sent by the server for the Text method.

Fields

name	type	description
segments	WLUSegment	A list of WLU segments.

WLUSegment

Describes a WLU segment. A segment is a logical portion of text denoted by its intent, e.g. in a phrase “book me a flight and rent a car” there would be a segment for “book me a flight” and another for “rent a car”.

Fields

name	type	description
text	string	The portion of text that contains this segment.
tokens	WLUToken	The list of word tokens which are contained in this segment.
entities	WLUEntity	The list of entities which are contained in this segment.
intent	WLUIntent	The intent that defines this segment.
annotated_text	string	The value of text annotated in SAL format.

WLUToken

Describes a single word token in a segment.

Fields

name	type	description
word	string	The value of the word.
index	int32	Position of the token in the text.

speechly.identity.v2.IdentityAPI

Speechly Identity API is used for creating access tokens for the Speechly APIs.

Methods

name	request	response	description
Login	LoginRequest	LoginResponse	Performs a login of specific Speechly application. Returns an access token which can be used to access thee Speechly API.

ApplicationScope

Used as the scope in LoginRequest when the access is for a single Speechly application.

Fields

name	type	description
app_id	string	Speechly application ID. The defined application can be accessed with the returned token. Required.
config_id	string	Define a specific model configuration to use. Defaults to the application’s latest configuration.

LoginRequest

Top-level message sent by the client for the Login method.

Fields

name	type	description
device_id	string	A unique end-user device identifier. Must be a `UUID`. Required.
application	ApplicationScope	Login scope application: use the given application context for all utterances.
project	ProjectScope	Login scope project: define the target application per utterance. The target applications must be located in the same project.

LoginResponse

Top-level message returned by the server for the Login method.

Fields

name	type	description
token	string	Access token which can used for the Speechly API. The token is a JSON Web Token and includes all standard claims, as well as custom ones. The token has expiration, so you should check whether it has expired before using it. It is safe to cache the token for future use until its expiration date.
valid_for_s	uint32	Amount of seconds the returned token is valid.
expires_at_epoch	uint64	Token expiration time in seconds after 1970-01-01 (“unix time”).
expires_at	string	ISO-formatted UTC timestamp of the expiration time of the returned token.

ProjectScope

Used as the scope in LoginRequest when access is required for every application in a Speechly project.

Fields

name	type	description
project_id	string	Speechly project ID. Every application in the same project is accessible with the same token. Required.

Speechly Streaming API Reference

speechly.slu.v1.BatchAPI

Methods

speechly.slu.v1.SLU

Methods

speechly.slu.v1.WLU

Methods

Messages

AudioConfiguration

Fields

Operation

Fields

Option

Fields

SLUConfig.Option

Fields

SLUStart.Option

Fields

ProcessAudioRequest

Fields

ProcessAudioResponse

Fields

QueryStatusRequest

Fields

QueryStatusResponse

Fields

RoundTripMeasurementRequest

Fields

RoundTripMeasurementResponse

Fields

SLUConfig

Fields

SLUEntity

Fields

SLUError

Fields

SLUEvent

Fields

SLUFinished

Fields

SLUIntent

Fields

SLURequest

Fields

SLUResponse

Fields

SLUSegmentEnd

Fields

SLUStart

Fields

SLUStarted

Fields

SLUStop

Fields

SLUTentativeEntities

Fields

SLUTentativeTranscript

Fields

SLUTranscript

Fields

TextsRequest

Fields

TextsResponse

Fields

Transcript

Fields

WLUEntity

Fields

WLUIntent

Fields

WLURequest

Fields

WLUResponse

Fields

WLUSegment

Fields

WLUToken

Fields

speechly.identity.v2.IdentityAPI

Methods