LEGO Voice Control

This post is part 1 of 2 of  LEGO Voice Control

This is going to be (I hope) the first of a series of posts about voice recognition.

Decided to control my LEGO  RC Tracked Racer with my recent FTDI based IR Transmitter. While reading some blogs I find my self thinking… hey, I can use voice control on my Ubuntu laptop, doesn’t seem to dificult!

So, in a nutshell:

  • install pocketsphinx
  • create a keyhphrase list
  • write a bash script to parse commands and control the LEGO
  • glue it all

So there are a few open source speech recognition projects. I picked Sphinx from Carnegie Mellon University, mainly because it is available in Debian and Ubuntu and they have lighter version, pocketsphinx, for lighter devices like Android or Raspberry Pi (of course I also thought that, with some luck and sweat, it could be used with ev3dev later on).

pocketsphinx is a command line tool but can be also used with python with a library, I made some fast tests but gave up when complexity started to increase – pyaudio and gstreamer may be OK on Ubuntu or Raspberry Pi but the EV3 will most probably choke, so let’s try just shell scripts first.

I decided to have 5 commands for my LEGO (4 directions and STOP). Documentation suggests that it is best to use sentences with at least 3 syllables so I created this keyphrase-list.txt file:

move forward /1e-12/
move backward /1e-5/
turn left /1e-12/
turn right /1e-14/
stop /1e-20/

The numbers represent detection threshold values, I started with /1e-10/ for all and then adapted for better results by trial and error. Not quite happy yet and will probably use just “front” and “back” instead of “forward” and “backward”.

I also created a Sphinx knowledge base compilation with CMU’s Sphinx Knowledge Base Tool, using a file with the same keyphrases:

move forward
move backward
turn left
turn right
stop

Your Sphinx knowledge base compilation has been successfully processed!

This generated a ‘0772. TAR0772.tgz’ file containing 5 files:

[TXT] 0772.dic                110    Pronunciation Dictionary
[   ] 0772.lm                 1.3K   Language Model
[   ] 0772.log_pronounce      100    Log File
[   ] 0772.sent                98    Corpus (processed)
[   ] 0772.vocab               43    Word List

I made some tests with these files as parameters for the pocketsphinx_continuous command as also the pyhton library but for the next examples they don’t seem to be required. But they will be used later 🙂

Now to test is, just run this command and start speaking:

$ pocketsphinx_continuous -inmic yes -kws keyphrase_list.txt -logfn /dev/null
READY....
Listening...
READY....
Listening...
stop
READY....
Listening...
^C

So I just use pocketsphinx_continuous command to keep listening to what I say to the microphone (“-inmic yes”) and find my keyphrases (“-kws keyphrase_list.txt) without filling my console with log messages (“-logfn /dev/null”).

Each time a keyphrase is detected with enough confidence it is displayed so I just need to redirect the output of these command to a shell script that parses it and sends the right IR codes to my LEGO:

#!/bin/bash

while read -a words
do

case "${words[0]}" in

  move)
    if [ "${words[1]}" = "forward" ]; then
      echo "FRONT"
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct FORWARD_BACKWARD
      sleep 0.2
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    fi
    if [ "${words[1]}" = "backward" ]; then
      echo "BACK"
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BACKWARD_FORWARD
      sleep 0.2
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    fi
    ;;
  turn)
    if [ "${words[1]}" = "left" ]; then
      echo "LEFT"
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct FORWARD_FORWARD
      sleep 0.2
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    fi
    if [ "${words[1]}" = "right" ]; then
      echo "RIGHT"
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BACKWARD_BACKWARD
      sleep 0.2
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    fi    
    ;;

  stop)
    echo "STOP"
    irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    ;;

  *)
    echo "?"
    ;;

esac

Not pretty but it works – we can test in the command line like this:

$ echo "move forward" | ./transmitter.sh
FRONT

Of course, the ‘irsend’ commands only work if lircd is running and controlling an IR transmitter.

Now to glue everything we need to use a trick: Ubuntu version of pocketsphinx doesn’t flush stdout so the piping its output to my script wasn’t working, I found that I need to use the “unbuffer” command from “expect” package:

$ sudo apt install expect
$ make pipe

So in one console window I send the output, unbuffered, to the pipe I created

$ unbuffer pocketsphinx_continuous -inmic yes -kws keyphrase_list.txt -logfn /dev/null > pipe

And in another console window I read the pipe and send it to the trasmitter.sh script:

$ cat pipe |./transmitter.sh

And that’s it.

 

 

 

 

 

LEGO Voice Control – EV3

This post is part 2 of 2 of  LEGO Voice Control

And now the big test – will it work with EV3?

So, ev3dev updated:

Linux ev3dev 4.4.47-19-ev3dev-ev3 #1 PREEMPT Wed Feb 8 14:15:28 CST 2017 armv5tejl GNU/Linux

I can’t find any microphone at the moment so I’ll use the mic of my Logitech C270 webcam – ev3dev sees it as an UVC device as you can see with dmesg:

...
[ 1343.702215] usb 1-1.2: new full-speed USB device number 7 using ohci
[ 1343.949201] usb 1-1.2: New USB device found, idVendor=046d, idProduct=0825
[ 1343.949288] usb 1-1.2: New USB device strings: Mfr=0, Product=0, SerialNumber=2
[ 1343.949342] usb 1-1.2: SerialNumber: F1E48D60
[ 1344.106161] usb 1-1.2: set resolution quirk: cval->res = 384
[ 1344.500684] Linux video capture interface: v2.00
[ 1344.720788] uvcvideo: Found UVC 1.00 device <unnamed> (046d:0825)
[ 1344.749629] input: UVC Camera (046d:0825) as /devices/platform/ohci.0/usb1/1-1/1-1.2/1-1.2:1.0/input/input3
[ 1344.772321] usbcore: registered new interface driver uvcvideo
[ 1344.772372] USB Video Class driver (1.1.1)
[ 1352.171498] usb 1-1.2: reset full-speed USB device number 7 using ohci
...

and we can check with “alsamixer” that ALSA works fine with the internal microphone:

First press F6 to select sound card (the webcam is a sound card for ALSA)

Then press F5 to view all sound devices – there is just one, the mic:

We also need to know how ALSA addresses the mic:

arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: U0x46d0x825 [USB Device 0x46d:0x825], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Card 1, Device 0 means we should use ‘hw:1,0’

Now we just follow the same process we used with Ubuntu. First we install pocketsphinx:

sudo apt install pocketsphinx
...
The following extra packages will be installed:
  javascript-common libblas-common libblas3 libjs-jquery liblapack3 libpocketsphinx1 libsphinxbase1
  pocketsphinx-hmm-en-hub4wsj pocketsphinx-lm-en-hub4
Suggested packages:
  apache2 lighttpd httpd
The following NEW packages will be installed:
  javascript-common libblas-common libblas3 libjs-jquery liblapack3 libpocketsphinx1 libsphinxbase1
  pocketsphinx pocketsphinx-hmm-en-hub4wsj pocketsphinx-lm-en-hub4
0 upgraded, 10 newly installed, 0 to remove and 0 not upgraded.
Need to get 8910 kB of archives.
After this operation, 30.0 MB of additional disk space will be used.
..

Although Ubuntu and Debian packages seem to be the same, the maintaners made some differente choices because in Ubuntu the ‘pocketsphinx-hmm-en-hub4wsj’ and ‘pocketsphinx-lm-en-hub4’ packages are missing.

So we copy 3 files from our previous work in Ubuntu:

  • keyphrase_list.txt
  • 0773.lm
  • 0772.dic

And we test it:

pocketsphinx_continuous -kws keyphrase_list.txt -adcdev hw:1,0 -lm 0772.lm -dict 0772.dic -inmic yes -logfn /dev/null

We get a “Warning: Could not find Capture element” but… yes, it works!

Of course it is slow… we see a big delay while starting until it displays “READY….” and also a big delay between each “Listening…” cycle. But it works! Isn’t open source great?

So we install expect to use our pipe again:

sudo apt install expect
mkfifo pipe

and we rewrite our ‘transmit.sh’ to command two EV3 motors (let’s call it “controller.sh” this time):

#!/bin/bash

while read -a words
do
case "${words[1]}" in

  move)
    if [ "${words[2]}" = "forward" ]; then
      echo "FRONT"
      echo run-timed > /sys/class/tacho-motor/motor0/command
      echo run-timed > /sys/class/tacho-motor/motor1/command
      sleep 0.2
    fi

    if [ "${words[2]}" = "backward" ]; then
      echo "BACK"
      sleep 0.2
    fi
    ;;

  turn)
    if [ "${words[2]}" = "left" ]; then
      echo "LEFT"
      echo run-timed > /sys/class/tacho-motor/motor1/command
      sleep 0.2
    fi

    if [ "${words[2]}" = "right" ]; then
      echo "RIGHT"
      echo run-timed > /sys/class/tacho-motor/motor0/command
      sleep 0.2
    fi    
    ;;

  stop)
    echo "STOP"
    ;;

  *)
    echo "?"
    echo "${words[1]}"
    echo "${words[2]}"
    ;;
esac
done

For some reason I don’t yet understand I had to change 2 things that worked fine with Ubuntu:

  • increase the index of the arguments (“${words[1]” and “${words[2]” instead of “${words[0]” and “${words[1]”
  • use capital letters for the keywords

This script sends “run-timed” commands to the motor file descriptors (you can read a good explanation on this ev3dev tutorial: ‘Using the Tacho-Motor Class’). I didn’t write commands for “move backward” this time (it would require extra lines to change direction, not difficult but I don’t want to increase the script to much).

Before we can use this script, we need to initialize the motors so we can use this other script, “init.sh”

#!/bin/bash

echo 1050 > /sys/class/tacho-motor/motor0/speed_sp
echo 200 > /sys/class/tacho-motor/motor0/time_sp
echo 1050 > /sys/class/tacho-motor/motor1/speed_sp
echo 200 > /sys/class/tacho-motor/motor1/time_sp

(it just sets maximum speed to motor0 and motor1 and the timer to 200 ms for the duration of each “run-timed” command).

So we open two a second ssh session to our EV3 and we ran in the first session:

unbuffer pocketsphinx_continuous -kws keyphrase_list.txt -adcdev hw:1,0 -lm 0772.lm -dict 0772.dic -inmic yes -logfn /dev/null > pipe

and in the second session:

cat pipe | ./controller.sh

And presto!

The robot is a RileyRover, a “very quick to build” design from Damien Kee.