Natural sounding text-to-speech in the terminal (and more).
This is NOT intended to be a completely-free, pick-up-and-use TTS solution. In fact, it is simply a wrapper around Google’s Cloud Text-to-Speech API.
You will need:
- A GCP account with billing enabled.
- Google gives you 1 million characters free every month. That’s nearly 10 books a month. See pricing.
- Once you have a GCP account, enable the TTS API and get a service account.
- Export service account credentials in your shell. You will need to do this every time you open a new shell. Add it to your shell configuration or make a script to run
- Internet connection every time you need some text spoken to you.
- I have only tested this on Linux. Commands for playing audio will be different on other platforms.
Go to the latest release, scroll down to “Assets” and download the correct file for your platform. Unzip the file and run the
gosling binary inside:
If you have
go install github.com/Samyak2/[email protected]
gosling input.txt output.mp3
Play the resulting
output.mp3 file using your audio player.
echo "hello there" | gosling - output.mp3
Play audio directly
If you have the
play command, which is usually a part of the
sox package (
sudo dnf install sox on Fedora):
echo "hello there" | gosling - - | play -t mp3 -
If you have the
ffplay command, which is a part of
echo "hello there" | gosling - - | ffplay -nodisp -autoexit -
gosling --help for all options.
Usage: gosling <input-file> <output-file> Arguments: <input-file> Text file to read from. Use - for standard input. <output-file> Audio file to write to. Use - for standard output. Flags: -h, --help Show context-sensitive help. -l, --language-code="en-US" Language code to use for the synthesis. See full list at: https://cloud.google.com/text-to-speech/docs/voices -v, --voice-name="en-US-Wavenet-A" Voice name to use for the synthesis. Use an empty string to let the GCP API choose. See full list at: https://cloud.google.com/text-to-speech/docs/voices --pitch=-3 Pitch adjustment in the range [-20.0, 20.0]. Use a negative number to decrease the pitch. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize -r, --speaking-rate=1.0 Speaking rate/speed in the range [0.25, 4.0]. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize --volume-gain=0.0 Volume gain (in dB) in the range [-96.0, 16.0]. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize -s, --[no-]ssml Use if text has SSML. Default is plain text. See: https://cloud.google.com/text-to-speech/docs/basics#speech_synthesis_markup_language_ssml_support --service-endpoint=STRING GCP Service Endpoint. You'll need to set this if you want a Neural2 voice. See: https://cloud.google.com/text-to-speech/docs/endpoints.
The voice sounds too robotic
By default, on the default language,
gosling uses a WaveNet based voice model. If you’re using a different language, make sure to switch the voice to a WaveNet based one too. Use
--voice-name for this.
If WaveNet is not good enough, try using a
Neural2 voice type (search for
Neural2 in the voice list if you need other languages):
gosling input.txt output.mp3 --service-endpoint 'https://us-central1-texttospeech.googleapis.com' -v en-US-Neural2-A
TODO: this endpoint is currently timing out for all TTS requests, not sure why.
If Neural2 isn’t good enough either, well… you’ll have to take this up with Google.
Why am I getting this error
google: could not find default credentials?
- You did not read the Pre-requisites section.
- You forgot to export the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable in your shell.
- Something is wrong with your GCP service account. See this page that is also linked from the error.
--volume-gain have short versions?
These options can have negative values and the command-line parser I use behaves weirdly with negative numbers and short flags. I have removed the short versions to avoid making it a pitfall.
How do I use this with
I use this script:
#!/bin/bash # requires gosling and sox export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json" gosling - - | play -t mp3 - & trap 'kill $!; exit 0' INT wait
Copy and save this to a file and
chmod +x /path/to/foliate-gosling.sh it.
TODO: this only works with English text. I need to figure out a way to convert
FOLIATE_TTS_LANG_LOWER to Google’s format.
When I’m too lazy to read an article, I use Google Assistant’s “read me this article” feature on my phone. It’s extremely good, especially with text-only articles. I could not find an alternative on desktop (specifically, Linux).
Yes, there are quite a few text-to-speech apps on Linux. Most of them either sound like R2D2 or something from the depths of the void. The only one, that I found, which sounds bearable uses an undocumented Google Translate API (probably a ToS violation?). There are also some pre-trained neural-network based models, but they sound like a person speaking through a very low-bandwidth voice call and they skip over numbers and abbreviations pretending they never existed.
The only text-to-speech that sounded good was Google’s. So I thought – “they must have a GCP API for this”. And they did. And I hacked this together.
speech-dispatchersupport. This will allow using it in Firefox’s reader mode, for example.
- Some pre-processing of raw text – remove extra/unnecessary punctuation, better formatting for numbers, etc.