What do you do if you need to be present on a call but you've lost your voice? Why write an 11 line shell script to replace it of course!

Our first port of call is of course to Google! So what is the first result for "linux text to speech"? (well for me it's RPi Text to Speech (Speech Synthesis)) and deep down there after Cepstral it's) Festival of course!

So how do we use that?

First we need to install it. Since I wrote this on an Ubuntu system I do:

$ apt install festival

This installs the festival command, which has a convenient --tts option!

$ echo hello world | festival --tts

This however has two problems:

  1. It is fatiguing on the fingers to tweak the parameters and run the command.
  2. The output of the command is to the speakers rather than a microphone.

We can fix problem 1 with a trivial shell script to produce output after every line instead.

#!/bin/sh
while read -r REPLY; do
        printf %s "$REPLY" | festival --tts
done

The problem of outputting to a microphone is somewhat more involved.

It's possible to loop your speaker output through a recording device in Pulse Audio by setting the recording device to a "Monitor" device.

It's no doubt possible to drive this from the command-line, but since my chat software is graphical I've no problem using pavucontrol.

Once the chat software is running the "Recording" tab and change the input device of the application.

This works but is unsatisfying because you need a second output device otherwise other sounds will be broadcast and is prone to causing feedback.

What we need is some kind of virtual microphone. As usual, the likes of Google and StackOverflow come to hand, and a virtual microphone is what Pulse Audio calls a "null sink".

We can create a null sink and give it a recognisable name by running:

pacmd load-module module-null-sink sink_name=Festival
pacmd update-sink-proplist Festival device.description=Festival

Then we can remove it again by running:

pacmd unload-module module-null-sink

So how do we get festival to play its output to that?

We can't start the command, then tweak the parameters in pavucontrol because it doesn't run long enough to change that before it starts playing.

We can play audio to a specified device with the paplay command, but how do we get Festival to output?

Fortunately Festival lets you set some parameters in its scripting language.

We need to pick a common audio format that paplay can read and festival can produce. We can set this with:

(Parameter.set 'Audio_Required_Format 'aiff)

We need to tell festival to play audio through a specified Pulse Audio device. The best way I could find to do this was setting Audio_Method to Audio_Command and Audio_Command to a paplay command.

(Parameter.set 'Audio_Method 'Audio_Command)
(Parameter.set 'Audio_Command "paplay $FILE --client-name=Festival --stream-name=Speech --device=Festival")

Festival lets us run commands on its command-line so the final script we get is:

#!/bin/sh
pacmd load-module module-null-sink sink_name=Festival
pacmd update-sink-proplist Festival device.description=Festival
while read -r REPLY; do
        festival --batch \
                '(Parameter.set '\''Audio_Required_Format '\''aiff)' \
                '(Parameter.set '\''Audio_Method '\''Audio_Command)' \
                '(Parameter.set '\''Audio_Command "paplay $FILE --client-name=Festival --stream-name=Speech --device=Festival")' \
                '(SayText "'"$REPLY"'")'
done
pacmd unload-module module-null-sink
  1. Run that in a terminal window.
  2. Start your chat program.
  3. Start pavucontrol and change the input device of your program to Festival.
  4. Type lines of text into the terminal to speak.

Since this was a project to achieve the goal of being able to participate in a group chat without being able to speak, development stopped there.

Should further development be warranted other changes could include:

  1. The module load and unload process is pretty fragile. Would need to use an API that is tied to process lifetime or at least unload by ID rather than name.
  2. No escaping mechanism for $REPLY. Would need to learn string escaping in lisp.
  3. Lots of work done per line of text. Festival has a server mode which could reduce the amount of work per line.
  4. Investigate a way to pipe audio directly between Festival and Pulse Audio. text2wave exists to write to a file, possibly standard output, and pacat exists to take audio from standard input and put it to speakers, but I couldn't get it to work at the time.
  5. Replace festival entirely. It is in need of maintainership, and has been broken in Fedora releases, so replacing the voice generation with pyttsx, espeak or flite could help.