LEGO Voice Control

This post is part 1 of 2 of  LEGO Voice Control

This is going to be (I hope) the first of a series of posts about voice recognition.

Decided to control my LEGO  RC Tracked Racer with my recent FTDI based IR Transmitter. While reading some blogs I find my self thinking… hey, I can use voice control on my Ubuntu laptop, doesn’t seem to dificult!

So, in a nutshell:

  • install pocketsphinx
  • create a keyhphrase list
  • write a bash script to parse commands and control the LEGO
  • glue it all

So there are a few open source speech recognition projects. I picked Sphinx from Carnegie Mellon University, mainly because it is available in Debian and Ubuntu and they have lighter version, pocketsphinx, for lighter devices like Android or Raspberry Pi (of course I also thought that, with some luck and sweat, it could be used with ev3dev later on).

pocketsphinx is a command line tool but can be also used with python with a library, I made some fast tests but gave up when complexity started to increase – pyaudio and gstreamer may be OK on Ubuntu or Raspberry Pi but the EV3 will most probably choke, so let’s try just shell scripts first.

I decided to have 5 commands for my LEGO (4 directions and STOP). Documentation suggests that it is best to use sentences with at least 3 syllables so I created this keyphrase-list.txt file:

move forward /1e-12/
move backward /1e-5/
turn left /1e-12/
turn right /1e-14/
stop /1e-20/

The numbers represent detection threshold values, I started with /1e-10/ for all and then adapted for better results by trial and error. Not quite happy yet and will probably use just “front” and “back” instead of “forward” and “backward”.

I also created a Sphinx knowledge base compilation with CMU’s Sphinx Knowledge Base Tool, using a file with the same keyphrases:

move forward
move backward
turn left
turn right
stop

Your Sphinx knowledge base compilation has been successfully processed!

This generated a ‘0772. TAR0772.tgz’ file containing 5 files:

[TXT] 0772.dic                110    Pronunciation Dictionary
[   ] 0772.lm                 1.3K   Language Model
[   ] 0772.log_pronounce      100    Log File
[   ] 0772.sent                98    Corpus (processed)
[   ] 0772.vocab               43    Word List

I made some tests with these files as parameters for the pocketsphinx_continuous command as also the pyhton library but for the next examples they don’t seem to be required. But they will be used later 🙂

Now to test is, just run this command and start speaking:

$ pocketsphinx_continuous -inmic yes -kws keyphrase_list.txt -logfn /dev/null
READY....
Listening...
READY....
Listening...
stop
READY....
Listening...
^C

So I just use pocketsphinx_continuous command to keep listening to what I say to the microphone (“-inmic yes”) and find my keyphrases (“-kws keyphrase_list.txt) without filling my console with log messages (“-logfn /dev/null”).

Each time a keyphrase is detected with enough confidence it is displayed so I just need to redirect the output of these command to a shell script that parses it and sends the right IR codes to my LEGO:

#!/bin/bash

while read -a words
do

case "${words[0]}" in

  move)
    if [ "${words[1]}" = "forward" ]; then
      echo "FRONT"
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct FORWARD_BACKWARD
      sleep 0.2
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    fi
    if [ "${words[1]}" = "backward" ]; then
      echo "BACK"
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BACKWARD_FORWARD
      sleep 0.2
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    fi
    ;;
  turn)
    if [ "${words[1]}" = "left" ]; then
      echo "LEFT"
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct FORWARD_FORWARD
      sleep 0.2
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    fi
    if [ "${words[1]}" = "right" ]; then
      echo "RIGHT"
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BACKWARD_BACKWARD
      sleep 0.2
      irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    fi    
    ;;

  stop)
    echo "STOP"
    irsend -d /var/run/lirc/lircd SEND_ONCE LEGO_Combo_Direct BRAKE_BRAKE
    ;;

  *)
    echo "?"
    ;;

esac

Not pretty but it works – we can test in the command line like this:

$ echo "move forward" | ./transmitter.sh
FRONT

Of course, the ‘irsend’ commands only work if lircd is running and controlling an IR transmitter.

Now to glue everything we need to use a trick: Ubuntu version of pocketsphinx doesn’t flush stdout so the piping its output to my script wasn’t working, I found that I need to use the “unbuffer” command from “expect” package:

$ sudo apt install expect
$ make pipe

So in one console window I send the output, unbuffered, to the pipe I created

$ unbuffer pocketsphinx_continuous -inmic yes -kws keyphrase_list.txt -logfn /dev/null > pipe

And in another console window I read the pipe and send it to the trasmitter.sh script:

$ cat pipe |./transmitter.sh

And that’s it.

 

 

 

 

 

Series NavigationLEGO Voice Control – EV3 >>

Deixe um comentário

O seu endereço de email não será publicado. Campos obrigatórios marcados com *