Convert Text to Speech Using Web Speech API in JavaScript | by Gourav Kajal | Apr, 2022

Let’s hearken to these phrases!

Photograph by Daiga Ellaby on Unsplash

I spent numerous time on Medium. Generally to jot down about one thing, however largely to learn. Learn in regards to the experiences that different builders are prepared to share with the group.

Lately, I observed that the Medium has a play button for each story. Initially, I believed this privilege is given to only some tales or writers. However on the subsequent second, I knew that that is for all of the readers. This implies we will even hearken to the tales on Medium. Superior!

Then, identical to a typical developer, I believed, how did they try this? I knew that within the JavaScript realm, now we have an internet API known as Web Speech API for that, however I by no means used or discovered about that.

So at present, let’s find out about this internet API collectively, and let’s even create a working instance as properly.

Voice information is integrated into on-line apps utilizing the Web Speech API. On this article, we’ll create a easy webpage that implements text-to-speech utilizing the Web Speech API.

For the sake of this demo, let’s create a brand new listing and create two new information: index.html and text-to-speech.js

Within the HTML file, let’s arrange the next components:

  • A choose menu with no choices. Utilizing JavaScript, we’ll fill the empty choose menu with a listing of attainable voices
  • Vary sliders for quantity, pitch, and price
  • A textarea to kind in
  • Management buttons for the speech

On this demo, we’re going to use Bootstrap 5 for the styling. Right here’s some code:

That is the way it will look within the browser:


Within the JavaScript file, we’re primarily going to make use of three interfaces: SpeechSynthesis , window.speechSynthesis and SpeechSynthesisUtterance. So, let’s perceive them briefly.

JavaScript SpeechSynthesis Interface

That is the principal interface for the speech synthesis service, which controls the synthesis or manufacturing of speech primarily based on the textual content enter. This interface is used to start out, cease, pause, and restart speech, in addition to to entry the machine’s supported voices.

The strategies supplied on this Interface are as follows:

  • communicate(): So as to add the utterance(object of SpeechSynthesisUtterance) within the queue, which can be spoken when there is no such thing as a pending utterance earlier than it, that is the operate, we can be utilizing to
  • pause(): To pause the present ongoing speech
  • resume(): To renew the paused speech
  • cancel(): To cancel all of the pending utterances or speech created, which aren’t but performed
  • getVoices(): To get the checklist of all supported voices which the machine helps

JavaScript window.speechSynthesis Property

The communicate() methodology is named on the voice synthesis controller interface, which is referenced by this property of the JavaScript window object.

We’ll perceive this extra after we leap into the code.

JavaScript SpeechSynthesisUtterance Interface

That is the interface the place we actually produce the speech or utterance from the textual content supplied, together with language kind, quantity, voice pitch, price of speech, and so forth. After creating an object for this interface, we offer it to the communicate() methodology of the SpeechSynthesis object to play the speech.

There are six properties on the SpeechSynthesisUtterance interface that we will tweak. They’re as follows:


The language property obtains and units the utterance’s language. If unset, the <html lang=”en”> lang worth can be used, or the user-agent default if the <html lang=”en”> lang just isn’t obtainable.

speech.lang = "en";

Textual content:

When the utterance is spoken, the textual content property obtains and units the textual content that can be synthesized. The textual content could be despatched in plain textual content format. The textual content property should be set when the beginning button is pressed in our instance.

Let’s give the button a click on listener. We must always retrieve the textual content worth from the textarea and set it to this property when the button is clicked.

doc.querySelector("#begin").addEventListener("click on", () => 
speech.textual content = doc.querySelector("textarea").worth;


The quantity property obtains and units the utterance’s quantity. It’s a float that signifies the quantity worth, which ranges from 0 (lowest) to 1 (highest). If this property just isn’t set, the default worth is 1.

Add an enter listener to the quantity vary slider and alter the quantity property when the slider worth modifications. The slider’s min, max, and default values have already been specified within the HTML tag.

Subsequent to the vary slider, we’ll add a <span> that shows the quantity’s worth on the webpage.

doc.querySelector("#quantity").addEventListener("enter", () => 
// Get quantity worth from the enter
const quantity = doc.querySelector("#quantity").worth;

// Set quantity property of the SpeechSynthesisUtterance occasion
speech.quantity = quantity;

// Replace the quantity label
doc.querySelector("#volume-label").innerHTML = quantity;


The price property returns and units the utterance’s price. It’s a float that represents the speed worth, which may vary from 0.1 (lowest) to 10 (highest). If this property just isn’t set, the default worth is 1.

Let’s do the identical factor for price, which we had performed for quantity.

doc.querySelector("#price").addEventListener("enter", () => 
// Get price worth from the enter
const price = doc.querySelector("#price").worth;

// Set price property of the SpeechSynthesisUtterance occasion
speech.price = price;

// Replace the speed label
doc.querySelector("#rate-label").innerHTML = price;


The pitch property returns and units the utterance’s pitch. Once more, it’s a float worth the place 0 means lowest and 1 means highest.

Let’s do the identical factor for pitch, which we had performed for price and quantity.

doc.querySelector("#pitch").addEventListener("enter", () => 
// Get pitch Worth from the enter
const pitch = doc.querySelector("#pitch").worth;

// Set pitch property of the SpeechSynthesisUtterance occasion
speech.pitch = pitch;

// Replace the pitch label
doc.querySelector("#pitch-label").innerHTML = pitch;


The voice property retrieves and modifies the voice that can be used to ship the speech. One of many SpeechSynthesisVoice objects must be used. If it isn’t configured, essentially the most applicable default voice for the language setting of the utterance can be utilized.

We have to retrieve the checklist of accessible voices within the window object to set the voice of the utterance. The voices is not going to be obtainable immediately when the window object hundreds. It’s an async operation. When the voices are loaded, an occasion can be triggered. We will set a operate that must be executed when the voices are loaded.

window.speechSynthesis.onvoiceschanged = () => 
// On Voices Loaded

Utilizing window.speechSynthesis.getVoices(), we will retrieve a listing of voices. It can return an array of accessible SpeechSynthesisVoice objects. Let’s save the checklist in a world array and use it to replace the net web page’s choose menu with the obtainable voices.

Now that the voice menu has been modified, we will add an change occasion listener to it to replace the voice of the SpeechSynthesisUtterance occasion. We’ll make the most of the index quantity (which is about as the worth for every possibility) and the worldwide array of voices to replace the voice when a person updates it.

doc.querySelector("#voices").addEventListener("change", () => 
speech.voice = voices[document.querySelector("#voices").value];


In the event you bear in mind, in our index.html now we have a number of management buttons like begin, resume, pause, and cancel. Let’s make them work by utilizing the SpeechSynthesis interface and its strategies.


The SpeechSynthesisUtterance occasion must be handed to the window.speechSynthesis.communicate() when the beginning button is pressed. This can start the method of remodeling the textual content into speech.

Earlier than calling this operate, the textual content property should be set.

In the event you begin one other text-to-speech occasion when one is already operating, the brand new one can be queued behind the present one.

doc.querySelector("#begin").addEventListener("click on", () => 
speech.textual content = doc.querySelector("textarea").worth;


To pause the at present operating occasion of SpeechSynthesisUtterance, we will use window.speechSynthesis.pause() .

doc.querySelector("#pause").addEventListener("click on", () => 


To renew the at present paused occasion of SpeechSynthesisUtterance, we will use window.speechSynthesis.resume() .

doc.querySelector("#resume").addEventListener("click on", () => 


We will cancel the SpeechSynthesisUtterance occasion that’s operating in the mean time utilizing window.speechSynthesis.cancel().

doc.querySelector("#cancel").addEventListener("click on", () => 

Now, we’re performed with all of the controls and we already arrange the required properties. So, right here’s the ultimate model of text-to-speech.js:

And right here’s the ultimate output on the browser display.

Last Output

Now, merely enter some textual content in textarea and click on on the Begin button and hearken to the phrases which you’ve got simply written.

More Posts