To make coding against Speechly Streaming API easier, we provide client libraries for various platforms that can reduce the amount of code you need to write and make your code more robust.
Make sure you have created and deployed a Speechly application. Take note of the App ID, you’ll need it shortly.
You’ll also need a React app. Use your existing app, or create a new one using:
npx create-react-app my-app
1. Install the @speechly/react-client
package:
npm install @speechly/react-client
# or
yarn add @speechly/react-client
2. Import SpeechProvider
and wrap the app with it, passing the App ID of your Speechly application:
import { SpeechProvider } from '@speechly/react-client';
<React.StrictMode>
<SpeechProvider
appId="YOUR_APP_ID"
logSegments
debug
>
<App />
</SpeechProvider>
</React.StrictMode>,
SpeechProvider
properties:
appId
: your Speechly application ID (required)logSegments
: logs the speech segment emitted by the client library (optional)debug
: prints the decoder status, status changes and possible errors (optional)3. Start the development server:
npm run start
# or
yarn start
Navigate to http://localhost:3000 to see your app running. Next, let’s capture some audio!
There are two ways of capturing audio from the browser microphone:
To use other audio sources, like a MediaStream
or pre-recorded audio files, see the Use other audio sources section.
Import the useSpeechContext
hook, create a button to initialize the microphone, another button for toggling the microphone and then display the transcript:
import { useSpeechContext } from '@speechly/react-client';
function App() {
const { segment, listening, attachMicrophone, start, stop } = useSpeechContext();
return (
<div className="App">
<button onClick={attachMicrophone}>Initialize microphone</button>
<button onPointerDown={start} onPointerUp={stop}>
{listening ? 'Listening…' : 'Push to talk'}
</button>
<p>
{segment && segment.words.map(word => word.value).join(' ')}
</p>
</div>
);
}
Enable vad
in SpeechProvider
:
<SpeechProvider
appId="YOUR_APP_ID"
logSegments
debug
vad={{ enabled: true }}
>
Import the useSpeechContext
hook, create a button to initialize the microphone and then display the transcript:
import { useSpeechContext } from '@speechly/react-client';
function App() {
const { segment, attachMicrophone } = useSpeechContext();
return (
<div className="App">
<button onClick={attachMicrophone}>Initialize microphone</button>
<p>
{segment && segment.words.map(word => word.value).join(' ')}
</p>
</div>
);
}
When VAD is enabled, it takes control of calling start
and stop
so no microphone button is needed. To adjust the VAD behavior, see VadOptions
for available options.
Regardless which way you chose, go ahead and press the Initialize microphone button and you should be prompted for microphone permissions:
Press Allow and say something like “hello world” to very that the integration works. If you have logSegments
enabled, open the developer console to see the raw speech segments.
A common practice in React apps is to the useEffect
hook to subscribe to changes for segment
and useState
for making changes to the app state.
Use segment.isFinal
to check the segment state. When false
, the segment might be updated several times. When true
, the segment won’t be updated anymore and subsequent callbacks within the same audio context refer to the next segment.
List all transcripts and show the tentative transcript:
const { segment } = useSpeechContext();
const [tentativeTranscript, setTentativeTranscript] = useState("");
const [transcripts, setTranscripts] = useState([]);
useEffect(() => {
if (segment) {
// Handle speech segment and make tentative changes to app state
const plainString = segment.words.map(word => word.value).join(' ');
setTentativeTranscript(plainString);
if (segment.isFinal) {
// Handle speech segment and make permanent changes to app state
setTentativeTranscript("");
setTranscripts(current => [...current, plainString]);
}
}
}, [segment]);
return (
<div className="App">
{transcripts?.map((value) =>
<p>{value}</p>
)}
{tentativeTranscript && <p><em>{tentativeTranscript}</em></p>}
</div>
);
Each word
in the segment.words
array has startTimestamp
and endTimestamp
properties, indicating the start/end timestamp of the word from the start of the audio stream counted in milliseconds.
1. Extract the startTimestamp
from the first word in the segment and store it in an object together with the transcript:
const [tentativeTranscript, setTentativeTranscript] = useState("");
const [transcripts, setTranscripts] = useState([]);
useEffect(() => {
if (segment) {
const plainString = segment.words.map(word => word.value).join(' ');
setTentativeTranscript(plainString);
if (segment.isFinal) {
setTentativeTranscript("");
setTranscripts(current => [...current, {
timestamp: words[0].startTimeStamp,
value: plainString,
}]);
}
}
}, [segment]);
2. Show and format the timestamp:
return (
<div className="App">
{transcripts?.map(({ timestamp, value }) =>
<p>{formatDuration(timestamp)}: {value}</p>
)}
{tentativeTranscript && <em>{tentativeTranscript}</em>}
</div>
);
In this example we’re using the format-duration package for formatting milliseconds to a standard duration string.
Speechly can extract intents and entities from the users speech based on your training data. They are available in the segment
data structure, using either segment.intent
or segment.entities
. Note that each segment can only have one intent, but possibly several entities.
1. Update the configuration, providing example utterances for what’s offensive and what’s not:
*offensive thats [bullsiht | bollocks](profanity)
*not_offensive thats [interesting | nice | great]
Remember to declare an entity named profanity
of type string
and to deploy the changes.
2. Create a component for showing the transcript and bleeping out entities with the type profanity
:
const findEntity = (word, entities) =>
entities.find(entity => word.index >= entity.startPosition && word.index < entity.endPosition)
const matchWordsWithEntities = (words, entities) =>
words.flatMap(word => [{ word, entity: findEntity(word, entities) }])
export const MyComponent = ({ entities, words }) => {
const wordsWithEntities = matchWordsWithEntities(words, entities);
return (
<div className="MyComponent">
{wordsWithEntities?.map(({ word, entity }) =>
<>
{!entity && (
<span>{word.value} </span>
)}
{entity && (
<span>{entity.type === "profanity" ? "******" : entity.value} </span>
)}
</>
)}
</div>
)
}
3. Update and render the allSegments
array while showing an alert if an intent
has the value offensive
:
import { MyComponent } from './components/MyComponent';
const { segment } = useSpeechContext();
const [allSegments, setAllSegments] = useState([]);
const updateOrAddSegment = segment => {
const newArray = Array.from(allSegments);
const i = newArray.findIndex(item => item.contextId === segment.contextId);
if (i > -1) newArray[i] = segment;
else newArray.push(segment);
setAllSegments(newArray);
};
useEffect(() => {
if (segment) {
updateOrAddSegment(segment);
if (segment.isFinal) {
if (segment.intent?.intent === "offensive") {
window.alert("Hey, no cursing here ☝️");
}
updateOrAddSegment(segment);
}
}
}, [segment]);
return (
<div className="App">
{allSegments?.map(({ entities, words }) =>
<MyComponent entities={entities} words={words} />
)}
</div>
);
The example above assumes that your entities are single word entities. If you expect entities with multiple words like “two weeks” or “next tuesday”, you need to handle the matching of words with entities.
Each word
in segment.words
has an index
property, indicating it’s index within a segment. Similarly each entity
in segment.entities
has startPosition
and endPosition
properties, indicating the index of the first/last word that contains this entity.
You can modify the matchWordsWithEntities
function to deal with this:
const isWorPartOfEntity = (entity, word) =>
word && word.index >= entity.startPosition && word.index < entity.endPosition;
const matchWordsWithEntities = (words, entities) => {
let wordsBuilder = [];
words.forEach(word => {
if (!word) return;
const wordEntity = entities.find(entity => isWorPartOfEntity(entity, word))
if (!wordEntity) {
wordsBuilder[word.index] = { word };
} else {
const combinedWordValue = words
.filter(word => isWorPartOfEntity(wordEntity, word))
.map(word => word.value)
.join(" ");
wordsBuilder[wordEntity.startPosition] = {
word: { ...word, value: combinedWordValue, index: wordEntity.startPosition },
entity: wordEntity
}
}
});
return wordsBuilder.filter(item => item.word);
}
If you’d like for new speech segments to be emitted when the user pauses for a short while, you need to enable Silence triggered segmentation. You can enable it in Speechly Dashboard by going to Application → Settings, or by adding it to your config.yaml
. If you have trouble figuring out a good value, start with 720
and play around with it.
Depending on what your app does and how your segment handling is written, this may or may not require changes to the code. For example, enabling this in the profanity example above will introduce a bug where the updated segment sometimes overrides the previous one!
To fix this, add another condition in the updateOrAddSegment
function:
const updateOrAddSegment = segment => {
const newArray = Array.from(allSegments);
//highlight-next-line
const i = newArray.findIndex(item => item.contextId === segment.contextId && item.id === segment.id);
if (i > -1) newArray[i] = segment;
else newArray.push(segment);
setAllSegments(newArray);
};
The reason for this is that an audio context (contextId
) can consist of several consecutive segments (id
). Previously when Silence triggered segmentation was disabled, only one segment per context was emitted. Once we enabled it, we need to make sure that both contextId
and id
match in order to perform an update.
In addition to using the browser microphone, you can also attach a MediaStream
or upload a pre-recorded audio file. For this purpose the React client library exposes a client
interface, which provides low-level access to underlying Speechly BrowserClient.
MediaStream
or a file, consider using VAD as it provides a better hands-free experience while eliminating timeout related issues.To use your webcam (or some other MediaStream
) use the client.attach()
method to attach it:
const { client, segment } = useSpeechContext();
const handleMediaStream = async (stream) => {
await client.attach(stream);
};
return (
<div className="App">
<Webcam muted audio={true} onUserMedia={handleMediaStream} />
// ...
</div>
);
In this example we’re using the react-webcam package to create a webcam component. The component has a callback function for when it receives a MediaStream
. We need to provide a handler that attaches the MediaStream
to the client.
If you have VAD enabled and need to stop listening (for example if the user muted their microphone) you can use the client.adjustAudioProcessor()
method and pass { vad: { enabled: false }
to it. You can use the the same method to pass other VadOptions
as well.
To upload a pre-recorded audio file use the client.uploadAudioData()
method:
const { client, segment } = useSpeechContext();
const sendAudioToSpeechly = async (files) => {
if (files === null) return
try {
const buffer = await files[0].arrayBuffer();
client.uploadAudioData(buffer)
} catch (e) {
console.error(e)
}
}
return (
<div className="App">
<input
type="file"
accept="audio/*"
onChange={e => sendAudioToSpeechly(e.target.files)}
onClick={e => e.target.value = ""}
/>
// ...
</div>
)
Please note that the client.uploadAudioData()
method: expects the audio data to be in binary format.
Last updated by Mathias Lindholm on November 2, 2022 at 20:23 +0200
Found an error on our documentation? Please file an issue or make a pull request