Natural sounding text-to-speech in the terminal (and more).


This is NOT intended to be a completely-free, pick-up-and-use TTS solution. In fact, it is simply a wrapper around Google’s Cloud Text-to-Speech API.

You will need:

  • A GCP account with billing enabled.
    • Google gives you 1 million characters free every month. That’s nearly 10 books a month. See pricing.
    • Once you have a GCP account, enable the TTS API and get a service account.
    • Export service account credentials in your shell. You will need to do this every time you open a new shell. Add it to your shell configuration or make a script to run gosling for convenience.
      export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json"
  • Internet connection every time you need some text spoken to you.
  • I have only tested this on Linux. Commands for playing audio will be different on other platforms.


Pre-built binaries

Go to the latest release, scroll down to “Assets” and download the correct file for your platform. Unzip the file and run the gosling binary inside:


If you have go installed

go install github.com/Samyak2/[email protected]


Text file

gosling input.txt output.mp3

Play the resulting output.mp3 file using your audio player.

Standard input

echo "hello there" | gosling - output.mp3

Play audio directly

If you have the play command, which is usually a part of the sox package (sudo dnf install sox on Fedora):

echo "hello there" | gosling - - | play -t mp3 -

If you have the ffplay command, which is a part of ffmpeg:

echo "hello there" | gosling - - | ffplay -nodisp -autoexit -


gosling has a lot of configuration around language & voice, audio, etc.

See gosling --help for all options.

Usage: gosling <input-file> <output-file>

  <input-file>     Text file to read from. Use - for standard input.
  <output-file>    Audio file to write to. Use - for standard output.

  -h, --help                            Show context-sensitive help.
  -l, --language-code="en-US"           Language code to use for the synthesis. See full list at: https://cloud.google.com/text-to-speech/docs/voices
  -v, --voice-name="en-US-Wavenet-A"    Voice name to use for the synthesis. Use an empty string to let the GCP API choose. See full list at: https://cloud.google.com/text-to-speech/docs/voices
      --pitch=-3                        Pitch adjustment in the range [-20.0, 20.0]. Use a negative number to decrease the pitch. See:
  -r, --speaking-rate=1.0               Speaking rate/speed in the range [0.25, 4.0]. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize
      --volume-gain=0.0                 Volume gain (in dB) in the range [-96.0, 16.0]. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize
  -s, --[no-]ssml                       Use if text has SSML. Default is plain text. See: https://cloud.google.com/text-to-speech/docs/basics#speech_synthesis_markup_language_ssml_support
      --service-endpoint=STRING         GCP Service Endpoint. You'll need to set this if you want a Neural2 voice. See: https://cloud.google.com/text-to-speech/docs/endpoints.


The voice sounds too robotic


By default, on the default language, gosling uses a WaveNet based voice model. If you’re using a different language, make sure to switch the voice to a WaveNet based one too. Use --voice-name for this.


If WaveNet is not good enough, try using a Neural2 voice type (search for Neural2 in the voice list if you need other languages):

gosling input.txt output.mp3 --service-endpoint 'https://us-central1-texttospeech.googleapis.com' -v en-US-Neural2-A

TODO: this endpoint is currently timing out for all TTS requests, not sure why.

If Neural2 isn’t good enough either, well… you’ll have to take this up with Google.

Why am I getting this error google: could not find default credentials?


  • You did not read the Pre-requisites section.
  • You forgot to export the GOOGLE_APPLICATION_CREDENTIALS environment variable in your shell.
  • Something is wrong with your GCP service account. See this page that is also linked from the error.

Why don’t --pitch and --volume-gain have short versions?

These options can have negative values and the command-line parser I use behaves weirdly with negative numbers and short flags. I have removed the short versions to avoid making it a pitfall.

How do I use this with foliate?

I use this script:

# requires gosling and sox
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json"
gosling - - | play -t mp3 - &
trap 'kill $!; exit 0' INT

Copy and save this to a file and chmod +x /path/to/foliate-gosling.sh it.

TODO: this only works with English text. I need to figure out a way to convert FOLIATE_TTS_LANG_LOWER to Google’s format.

But why?

When I’m too lazy to read an article, I use Google Assistant’s “read me this article” feature on my phone. It’s extremely good, especially with text-only articles. I could not find an alternative on desktop (specifically, Linux).

Yes, there are quite a few text-to-speech apps on Linux. Most of them either sound like R2D2 or something from the depths of the void. The only one, that I found, which sounds bearable uses an undocumented Google Translate API (probably a ToS violation?). There are also some pre-trained neural-network based models, but they sound like a person speaking through a very low-bandwidth voice call and they skip over numbers and abbreviations pretending they never existed.

The only text-to-speech that sounded good was Google’s. So I thought – “they must have a GCP API for this”. And they did. And I hacked this together.


  • speech-dispatcher support. This will allow using it in Firefox’s reader mode, for example.
  • Some pre-processing of raw text – remove extra/unnecessary punctuation, better formatting for numbers, etc.




View Github