This is going to be (I hope) the first of a series of posts about voice recognition.
Decided to control my LEGO RC Tracked Racer with my recent FTDI based IR Transmitter. While reading some blogs I find my self thinking… hey, I can use voice control on my Ubuntu laptop, doesn’t seem to dificult!
So, in a nutshell:
- install pocketsphinx
- create a keyhphrase list
- write a bash script to parse commands and control the LEGO
- glue it all
So there are a few open source speech recognition projects. I picked Sphinx from Carnegie Mellon University, mainly because it is available in Debian and Ubuntu and they have lighter version, pocketsphinx, for lighter devices like Android or Raspberry Pi (of course I also thought that, with some luck and sweat, it could be used with ev3dev later on).
pocketsphinx is a command line tool but can be also used with python with a library, I made some fast tests but gave up when complexity started to increase – pyaudio and gstreamer may be OK on Ubuntu or Raspberry Pi but the EV3 will most probably choke, so let’s try just shell scripts first.
I decided to have 5 commands for my LEGO (4 directions and STOP). Documentation suggests that it is best to use sentences with at least 3 syllables so I created this keyphrase-list.txt file:
move forward /1e-12/ move backward /1e-5/ turn left /1e-12/ turn right /1e-14/ stop /1e-20/
The numbers represent detection threshold values, I started with /1e-10/ for all and then adapted for better results by trial and error. Not quite happy yet and will probably use just “front” and “back” instead of “forward” and “backward”.
I also created a Sphinx knowledge base compilation with CMU’s Sphinx Knowledge Base Tool, using a file with the same keyphrases:
move forward move backward turn left turn right stop
Your Sphinx knowledge base compilation has been successfully processed!
This generated a ‘0772. TAR0772.tgz’ file containing 5 files:
[TXT] 0772.dic 110 Pronunciation Dictionary [ ] 0772.lm 1.3K Language Model [ ] 0772.log_pronounce 100 Log File [ ] 0772.sent 98 Corpus (processed) [ ] 0772.vocab 43 Word List
I made some tests with these files as parameters for the pocketsphinx_continuous command as also the pyhton library but for the next examples they don’t seem to be required. But they will be used later 🙂
Now to test is, just run this command and start speaking:
$ pocketsphinx_continuous -inmic yes -kws keyphrase_list.txt -logfn /dev/null READY.... Listening... READY.... Listening... stop READY.... Listening... ^C
So I just use pocketsphinx_continuous command to keep listening to what I say to the microphone (“-inmic yes”) and find my keyphrases (“-kws keyphrase_list.txt) without filling my console with log messages (“-logfn /dev/null”).
Each time a keyphrase is detected with enough confidence it is displayed so I just need to redirect the output of these command to a shell script that parses it and sends the right IR codes to my LEGO:
#!/bin/bash while read -a words do case "${words[0]}" in move) if [ "${words[1]}" = "forward" ]; then echo "FRONT" irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct FORWARD_BACKWARD sleep 0.2 irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE fi if [ "${words[1]}" = "backward" ]; then echo "BACK" irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BACKWARD_FORWARD sleep 0.2 irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE fi ;; turn) if [ "${words[1]}" = "left" ]; then echo "LEFT" irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct FORWARD_FORWARD sleep 0.2 irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE fi if [ "${words[1]}" = "right" ]; then echo "RIGHT" irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BACKWARD_BACKWARD sleep 0.2 irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE fi ;; stop) echo "STOP" irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE ;; *) echo "?" ;; esac
Not pretty but it works – we can test in the command line like this:
$ echo "move forward" | ./transmitter.sh FRONT
Of course, the ‘irsend’ commands only work if lircd is running and controlling an IR transmitter.
Now to glue everything we need to use a trick: Ubuntu version of pocketsphinx doesn’t flush stdout so the piping its output to my script wasn’t working, I found that I need to use the “unbuffer” command from “expect” package:
$ sudo apt install expect $ make pipe
So in one console window I send the output, unbuffered, to the pipe I created
$ unbuffer pocketsphinx_continuous -inmic yes -kws keyphrase_list.txt -logfn /dev/null > pipe
And in another console window I read the pipe and send it to the trasmitter.sh script:
$ cat pipe |./transmitter.sh
And that’s it.