Using Amazon Polly Text-to-Speech Service
Writing a blog about your latest product’s features? Case studies on your latest’s customer integrations? Opinions and insights into your industry, or your customer’s industries? Well written blog posts not only inform, they can also showcase you or your organization’s experience and expertise. However, if you blog on a regular basis, you know there is a lot of content out there to steal a reader’s interest. Your audience’s interest in your post can be short-lived.
A great way to extend the reach of your posts and maintain interest longer is to produce an audio version. Audio versions can be included along with the written post. Audio versions may also be shared separately on popular services such as YouTube and SoundCloud. Many multi-tasking readers may prefer to listen to a post as opposed to reading. I often listen while commuting, working out, or running.
Amazon Polly’s text-to-speech (TTS) capabilities make it easy to convert a written post to a lifelike, professionally-sounding audio version. In this brief post, we will learn how to use Amazon Polly to convert blog posts to audio.
Most posts only require minor modifications to optimize them for conversion to audio.
- Adding an opening statement to your post, like ‘Audio version of the post’, followed by the post’s title, provides a good way to start the audio. For example, ‘Audio Introduction to: Getting Started with Data Analytics using Jupyter Notebooks, PySpark, and Docker’.
- If your post includes code, you probably want to exclude those sections. If the post contains a large amount of code, you might consider only creating an audio introduction to the post.
- If your post contains graphs or charts, which are referenced in the post, I suggest adding text, such as ‘See the Chart’, along with the caption of the graph or chart. For example, ‘See the Chart — AWS Marketplace: Product Delivery Methods (February 2020)’.
- Create a simple URL for your post and add it to the audio, either at the beginning or end of the post. For example, ‘To read the full version of this post, including code samples, please go to tiny url dot com forward slash streaming warehouse’.
Amazon Polly offers the ability to use custom lexicons, or vocabularies. According to AWS, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words, and neologisms. If you write industry-specific or highly technical blogs, you will find creating a lexicon is probably necessary to ensure your accompanying audio sounds accurate. In my own technical posts, I most often use a custom lexicon file for acronyms and company names. While many acronyms are spelled out, others are not and have unique pronunciations. Likewise, many company names have a unique pronunciation.
Take for example the following acronyms, which I used in my last few posts: PaaS, BYOL, ELA, PAYG, IPv4, IPv6, IAM, ENI. Using the default lexicon of Amazon Polly, we end up with incorrect pronunciations for all these acronyms 🎧 listen to the results.
Now 🎧 listen to the pronunciation of the same acronyms, after we apply a custom lexicon.
Lexicons must conform to the Pronunciation Lexicon Specification (PLS) W3C recommendation. The lexicon files are in XML format. Below is a snippet of a sample lexicon files.
Amazon Polly supports synthesizing speech from either plain text or SSML input. From the Amazon Polly’s Management Console, copy and paste your post’s prepared text into the ‘Plain text’ tab.
Next, choose the Voice Engine. If you are using English, I suggest ‘Neural’. According to AWS, Amazon Polly has a Neural TTS (NTTS) system that can produce even higher quality voices than its standard voices. The NTTS system produces the most natural and human-like text-to-speech voices possible.
Choose your Language and Region. Then, select your Voice; I prefer ‘Joanna’. In my opinion, her voice has a natural, lifelike sound. If you prefer a male voice, ‘Matthew’ is quite natural sounding. Lastly, upload your lexicon file(s).
To start the process, choose ‘Synthesize to S3’. Indicate the S3 bucket you would like the mp3 format audio file, output into. You can also add a prefix to the mp3 files. For most average length posts, the text synthesis process takes less than one minute. To be notified, you can include an Amazon SNS topic ARN. Select ‘Synthesize’.
Polly creates a synthesis task.
The synthesis tasks may be viewed from the ‘S3 synthesis tasks’ tab.
Once the synthesis task is complete, the resulting mp3 audio file may be viewed and downloaded from the S3 Management Console. If you are using a Mac, QuickTime Player works great to review the audio file.
Amazon Polly may also be used from the AWS CLI or using the AWS SDK. In the example below, we have replicated the same operations performed in the Console, this time using the AWS CLI. First, upload your lexicon file(s) using the
polly put-lexicon command. Each lexicon can only be up to 4,000 characters in size. Then call the
polly start-speech-synthesis-task command to create a synthesis task.
The output should look similar to the screengrab, below. The results will be identical to using the Console.
You can check the task’s results using the
polly list-speech-synthesis-tasks command.
In this brief post, we saw a great use case for Amazon Polly, converting your written blog posts into audio. Creating audio versions of our blogs is a great way to extend the reach of the post to a potentially new audience and maintain your current audience’s interest a little longer. Amazon Polly has several other features and capabilities to explore.
This blog represents my own view points and not of my employer, Amazon Web Services.