Using Azure’s Speech SDK to transcribe real time audio

Pallav Raval
Analytics Vidhya
Published in
3 min readJul 19, 2024
Photo by Steve Johnson on Unsplash

The advancements in LLMs is increasing at a very rapid rate, with all the big tech companies jumping strainght into it from the Microsoft, Google, Amazon and Meta. Through this article I will demonstarte how easy it is to transcribe real time audio.

Photo by BoliviaInteligente on Unsplash

In this article, we’ll walk through the creation of a real-time transcription web app using Azure Cognitive Services. We’ll delve into the code, explaining each part to help you understand how to implement this functionality.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Real-Time Transcription</title>
<script src="https://aka.ms/csspeech/jsbrowserpackageraw"></script>
</head>
<body>
<h1>Real-Time Transcription</h1>
<button id="startButton">Start Listening</button>
<button id="stopButton" disabled>Stop Listening</button>
<p id="transcription"></p>
<a id="downloadLink" href="#" download="transcription.txt" style="display:none;">Download Transcription</a>
</body>
</html>

The code is a basic html code with two buttons to start and stop listening to the audio from the input. once the stop button is clicked the webpage provides a link to download the transcript file as a text document.

Now lets move on to the javascript code:

<script>
const subscriptionKey = "your_subscription_key";
const serviceRegion = "your_service_region";

let audioConfig;
let speechConfig;
let recognizer;
let accumulatedTranscription = "";

document.getElementById('startButton').addEventListener('click', function () {
startListening();
});

document.getElementById('stopButton').addEventListener('click', function () {
stopListening();
});

function startListening() {
speechConfig = SpeechSDK.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);

recognizer.recognizing = function (s, e) {
document.getElementById('transcription').innerText = e.result.text;
};

recognizer.recognized = function (s, e) {
if (e.result.reason === SpeechSDK.ResultReason.RecognizedSpeech) {
document.getElementById('transcription').innerText = e.result.text;
accumulatedTranscription += e.result.text + "\n";
} else if (e.result.reason === SpeechSDK.ResultReason.NoMatch) {
document.getElementById('transcription').innerText = "No speech could be recognized.";
}
};

recognizer.canceled = function (s, e) {
console.error(`Canceled: ${e.reason}`);
if (e.reason === SpeechSDK.CancellationReason.Error) {
console.error(`Error details: ${e.errorDetails}`);
}
stopListening();
};

recognizer.sessionStopped = function (s, e) {
console.log("Session stopped.");
stopListening();
};

recognizer.startContinuousRecognitionAsync();

document.getElementById('startButton').disabled = true;
document.getElementById('stopButton').disabled = false;
}

function stopListening() {
recognizer.stopContinuousRecognitionAsync(() => {
recognizer.close();
recognizer = undefined;

// Enable download link
const blob = new Blob([accumulatedTranscription], { type: 'text/plain' });
const url = URL.createObjectURL(blob);
const downloadLink = document.getElementById('downloadLink');
downloadLink.href = url;
downloadLink.style.display = 'block';
});

document.getElementById('startButton').disabled = false;
document.getElementById('stopButton').disabled = true;
}
</script>

Fill in the constants of your subscription key and the region for your azure account before running the code. The startListening function, initializes the recognizer and starts continuous recognition and updates button states.By clicking the on the start button the page requests audio access from a list of available audio inputs to the user. By clicking on the stop button the page stops listening to the audio and enables a download link for the transcript generated. The stopListening function, stops the recognition process. It finalizes the transcription, creates a downloadable text file, and updates the UI to enable the download link.

Webpage requests user for access to audio input
Real-time transcription on the go
Downloadable link for the transcript

With this code, you can create a real-time transcription web app using Azure Cognitive Services. This tool can be invaluable for various applications, making speech-to-text functionality easily accessible through a web browser. By understanding each part of the code, you can further customize and extend this functionality to suit your needs.

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Pallav Raval
Pallav Raval

Written by Pallav Raval

Computer science enthusiast | ASU '23 | BITS Pilani ’21 | Podcast host: Running In Circles |

No responses yet