In this blog post, we'll explore how to implement speech-to-text conversion in a React application using the Web Speech API. This functionality can be incredibly useful for accessibility features, voice commands, or any application that benefits from voice input.
Try the SpeechRecognition
component here:
Let's break down the key parts of our SpeechRecognition
component that enable speech-to-text conversion:
Firstly, we'll create our SpeechRecognition
component and initialise it.
const SpeechRecognition: React.FC = () => {
const [isListening, setIsListening] = useState<boolean>(false);
const [transcript, setTranscript] = useState<string>('');
const [recognition, setRecognition] = useState<CustomSpeechRecognition | null>(null);
const [error, setError] = useState<string>('');
// other code...
}
We use React hooks to manage our component's state:
isListening
: Tracks whether we're currently listening for speech.transcript
: Stores the converted text from speech.recognition
: Holds the speech recognition instance. This object allows web applications to capture and transcribe spoken words into texterror
: Captures any errors that occur during the process.One of the first things we have to do when the component mounts is to check that the Web Speech API is supported.
useEffect(() => {
if ('webkitSpeechRecognition' in window) {
const recognitionInstance = new window.webkitSpeechRecognition();
recognitionInstance.continuous = true;
recognitionInstance.interimResults = true;
// Event handlers setup...
setRecognition(recognitionInstance);
} else {
setError('Speech recognition is not supported in this browser.');
}
}, []);
In the useEffect
hook, we:
webkitSpeechRecognition
. The window.webkitSpeechRecognition() object in the Web Speech API creates a new instance of a speech recognition service. This object provides methods and properties for controlling speech recognition, such as starting and stopping recognition, accessing recognized text, and handling various events related to speech recognition.The Web Speech API allows us to be notified for some key events:
recognitionInstance.onresult = (event: SpeechRecognitionEvent) => {
const current = event.resultIndex;
const transcript = event.results[current][0].transcript;
setTranscript(transcript);
};
recognitionInstance.onerror = (event: SpeechRecognitionError) => {
console.error('Speech recognition error', event.error);
setIsListening(false);
// Error handling logic...
};
recognitionInstance.onend = () => {
setIsListening(false);
};
We set up three crucial event handlers:
onresult
: Triggered when speech is recognized. It updates our transcript
state with the converted text.onerror
: Handles any errors that occur during speech recognition.onend
: Updates our state when speech recognition ends.In a Real World app, it will be useful to toggle the Speech Recognition Feature. This can be achieved in the following way:
const toggleListening = useCallback(() => {
if (isListening) {
recognition?.stop();
} else {
setError('');
navigator.mediaDevices
.getUserMedia({ audio: true })
.then(() => {
recognition?.start();
setIsListening(true);
})
.catch((err: Error) => {
console.error('Error accessing the microphone', err);
setError(
'Unable to access the microphone. Please ensure you have given permission.',
);
});
}
}, [isListening, recognition]);
This function toggles speech recognition on and off. When starting:
return (
<div className={styles.container}>
{error && <div className={styles.errorMessage}>{error}</div>}
<Button
onClick={toggleListening}
variant="primary"
userStyles={styles.button}
disabled={!recognition}
>
{isListening ? 'Stop Listening' : 'Start Listening'}
</Button>
<div className={styles.transcript}>
<p>{transcript}</p>
</div>
</div>
);
Our component renders:
When the code is first deployed, it results in some TypeScript errors. To fix these, we have to declare some new types:
interface SpeechRecognitionEvent extends Event {
resultIndex: number;
results: SpeechRecognitionResultList;
}
interface SpeechRecognitionError extends Event {
error: string;
}
interface CustomSpeechRecognition extends EventTarget {
continuous: boolean;
interimResults: boolean;
start: () => void;
stop: () => void;
onresult: (event: SpeechRecognitionEvent) => void;
onerror: (event: SpeechRecognitionError) => void;
onend: () => void;
}
declare global {
interface Window {
webkitSpeechRecognition: new () => CustomSpeechRecognition;
}
}
and then using TypeScript's declaration merging feature, we extend the Window interface by adding the new type. This will include the webkitSpeechRecognition property, allowing TypeScript to recognize it as valid.
This React component demonstrates the core functionality of speech-to-text conversion using the Web Speech API. It handles:
The development of this component illustrates the core concepts of using the Web Speech API to build a speech-to-text capability in the browser. This was a fun project to build, but it's clear from the results, that whilst it is a cool technology, it requires further refinement before it could be used in a production environment. The potential for such a capability is vast and could provide some serious enhancements to existing and future web applications.
Here's a complete listing of the code:
'use client';
import React, { useState, useEffect, useCallback } from 'react';
import Button from '../button/button';
import Tooltip from '../tooltip/tooltip';
import { styles } from './web-speech.styles';
const micOn = (
<svg
xmlns="http://www.w3.org/2000/svg"
height="24px"
viewBox="0 -960 960 960"
width="24px"
fill="#fff"
>
<path d="M480-400q-50 0-85-35t-35-85v-240q0-50 35-85t85-35q50 0 85 35t35 85v240q0 50-35 85t-85 35Zm0-240Zm-40 520v-123q-104-14-172-93t-68-184h80q0 83 58.5 141.5T480-320q83 0 141.5-58.5T680-520h80q0 105-68 184t-172 93v123h-80Zm40-360q17 0 28.5-11.5T520-520v-240q0-17-11.5-28.5T480-800q-17 0-28.5 11.5T440-760v240q0 17 11.5 28.5T480-480Z" />
</svg>
);
const micOff = (
<svg
xmlns="http://www.w3.org/2000/svg"
height="24px"
viewBox="0 -960 960 960"
width="24px"
fill="#fff"
>
<path d="m710-362-58-58q14-23 21-48t7-52h80q0 44-13 83.5T710-362ZM480-594Zm112 112-72-72v-206q0-17-11.5-28.5T480-800q-17 0-28.5 11.5T440-760v126l-80-80v-46q0-50 35-85t85-35q50 0 85 35t35 85v240q0 11-2.5 20t-5.5 18ZM440-120v-123q-104-14-172-93t-68-184h80q0 83 57.5 141.5T480-320q34 0 64.5-10.5T600-360l57 57q-29 23-63.5 39T520-243v123h-80Zm352 64L56-792l56-56 736 736-56 56Z" />
</svg>
);
interface SpeechRecognitionEvent extends Event {
resultIndex: number;
results: SpeechRecognitionResultList;
}
interface SpeechRecognitionError extends Event {
error: string;
}
interface CustomSpeechRecognition extends EventTarget {
continuous: boolean;
interimResults: boolean;
start: () => void;
stop: () => void;
onresult: (event: SpeechRecognitionEvent) => void;
onerror: (event: SpeechRecognitionError) => void;
onend: () => void;
}
declare global {
interface Window {
webkitSpeechRecognition: new () => CustomSpeechRecognition;
}
}
const SpeechRecognition: React.FC = () => {
const [isListening, setIsListening] = useState<boolean>(false);
const [transcript, setTranscript] = useState<string>('');
const [recognition, setRecognition] =
useState<CustomSpeechRecognition | null>(null);
const [error, setError] = useState<string>('');
useEffect(() => {
if ('webkitSpeechRecognition' in window) {
const recognitionInstance = new window.webkitSpeechRecognition();
recognitionInstance.continuous = true;
recognitionInstance.interimResults = true;
recognitionInstance.onresult = (event: SpeechRecognitionEvent) => {
const current = event.resultIndex;
const transcript = event.results[current][0].transcript;
setTranscript(transcript);
};
recognitionInstance.onerror = (event: SpeechRecognitionError) => {
console.error('Speech recognition error', event.error);
setIsListening(false);
if (event.error === 'not-allowed') {
setError(
'Microphone access was denied. Please allow microphone access and try again.',
);
} else {
setError(`Speech recognition error: ${event.error}`);
}
};
recognitionInstance.onend = () => {
setIsListening(false);
};
setRecognition(recognitionInstance);
} else {
setError('Speech recognition is not supported in this browser.');
}
}, []);
const toggleListening = useCallback(() => {
if (isListening) {
recognition?.stop();
} else {
setError('');
navigator.mediaDevices
.getUserMedia({ audio: true })
.then(() => {
recognition?.start();
setIsListening(true);
})
.catch((err: Error) => {
console.error('Error accessing the microphone', err);
setError(
'Unable to access the microphone. Please ensure you have given permission.',
);
});
}
}, [isListening, recognition]);
return (
<div className={styles.container}>
{error && <div className={styles.errorMessage}>{error}</div>}
<Tooltip
content={isListening ? 'Listening' : 'Not Listening'}
position="right"
>
<Button
onClick={toggleListening}
variant="primary"
userStyles={styles.button}
disabled={!recognition}
icon={isListening ? micOn : micOff}
iconPosition="left"
></Button>
</Tooltip>
<div className={styles.transcript}>
<p>{transcript}</p>
</div>
</div>
);
};
export default SpeechRecognition;