Make your own Voice Command App using Java and Sphinx4

Hello and welcome to another tutorial on Java, In this tutorial we’ll be creating a Voice command application using Java and Sphinx4 Speech Recognition Library for Java.

If you are new to this Voice Command term, there are many apps that serve as an example in reality. If you are an Android user, you must have used the Google App where you speak “Ok Google” and it listens to your command and if you say something like “open google”, it’ll automatically launch Chrome and open Google.com on it.

Now when you speak into your mic, the computer might not be able to understand what is it that you are saying so we’ll be providing our computer with the ability to recognize the words that we say and then covert them into a form that the computer is able to understand hence basis of the term Speech Recognition. You might be wondering how in the world are we going to do that? Well we don’t have to worry about anything because we have  been blessed with a library called Sphinx4 which does all the complex work for us hence we only have to call certain methods in order to create our Voice Command app.

Approach

So what is it that our app is going to do?

  1. We will speak a command like “open terminal” in our mic
  2. Sphinx4 detects and recognizes the words that we speak
  3. Sphinx4 outputs the recognized words
  4. We compare the words to our list of commands
  5. If a command exists, it’ll execute a certain task
  6. Wait for another command and repeat step 2

Here we have our basic approach on creating our Voice command Application.

Requirements

For this app, you’ll require the following:

  1. Java 8
  2. Sphinx4 (Download the latest Alpha 5 Version)
    Goto https://oss.sonatype.org/#nexus-search and download the Alpha 5 sphinx-core.jar and sphinx4-data.jar files.
  3. Netbeans IDE
  4. A good quality Microphone.

About Models

There are basically three models required for speech recognition in Sphinx4:

  1. Acoustic Model
  2. Phonetic Dictionary (File ends with .dict extension)
  3. Language Model (File ends with .lm extension)

The sphinx4-data.jar comes with the English version of Acoustic Model as Default hence we will be using that, if you are using other language then you’ll have to download it from Here.

Since we are creating a Voice Command app so we’ll be creating our own Language Model and the Phonetic Dictionary because our vocabulary will be limited i.e. our commands only. Now lets create our needed files,

Creating Language Model & Dictionary

As said above our vocabulary is limited hence making the model and dict will be a breeze thanks to Sphinx Online Base Generator. But first we have to make a corpus (Data using which we will train our Language Model) file containing our commands for which we will create our Language Model and Dictionary. For this tutorial I’ll be choosing 4 commands.

  1. open file manager
  2. open browser
  3. close browser
  4. close file manager

Now type these commands in your text file and save it. Then navigate to the Sphinx Online Base Generator, click Choose File and select your corpus text file. Now in response the site will give you a list of files, for now we are interested in the files ending with .dict and .lm extension, so download them and place them in your project folder.

Importing Jar Files

We’ll create a new Java Project in NetBeans and then import some jar files for our project because they are required by Sphinx4. So when you have created your project, goto

Run > Set Project Configuration > Customize > Libraries > Add JAR/Folder

Now select the 2 jar files you downloaded earlier, sphinx4-core.jar and sphinx4-data.jar

Press Ok and you are all set, Now lets get to the coding part.

Coding the Application

Now that we are done creating and importing important files, we now have to create a Configuration object and pass it to the Recognizer so that it can make use of the required files, Create a new class called voiceLanucher in the project which will serve as our main class.

//Imports
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
import java.io.IOException;

/**
 *
 * @author ex094
 */
public class VoiceLauncher {
    public static void main(String[] args) throws IOException {
        // Configuration Object
        Configuration configuration = new Configuration();

        // Set path to the acoustic model.
        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        // Set path to the dictionary.
        configuration.setDictionaryPath("PATH_TO_YOUR_.DIC_FILE");
        // Set path to the language model.
        configuration.setLanguageModelPath("PATH_TO_YOUR_.LM_FILE");
      }

}

Replace the PATH_TO_YOUR_.DIC_FILE with the .dic file and PATH_TO_YOUR_.LM_FILE with the .lm file you downloaded from the Sphinx Online Base Generator earlier from the Creating Language Model and Dictionary.

The configuration object is now set and we need to pass it to the recognizer. Also we need the recognizer to use our microphone as a source of input, Gladly the latest (Alpha 5) makes it really easy. We just have to create a LiveSpeechRecognizer object, pass in the configuration and call the startRecognition method.

//Imports
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
import java.io.IOException;

/**
 *
 * @author ex094
 */
public class VoiceLauncher {
    public static void main(String[] args) throws IOException {
        //Configuration Object
        Configuration configuration = new Configuration();

        // Set path to the acoustic model.
        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        // Set path to the dictionary.
        configuration.setDictionaryPath("PATH_TO_YOUR_.DIC_FILE");
        // Set path to the language model.
        configuration.setLanguageModelPath("PATH_TO_YOUR_.LM_FILE");

        //Recognizer Object, Pass the Configuration object
        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)
        recognize.startRecognition(true);
     }

}

Now that the recognition process has started, the recognizer will take your speech when ever you speak into the mic and then processes. For the voice command app we definitely need to check that what type of command is the user giving hence we need the recognizer to display the result that what command has it recognized from the speech.

For that we will use the getHypothesis() method from the SpeechResult object, using a while loop we will be able to get all the recognized speech that the user will speak.

//Imports
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
import java.io.IOException;

/**
 *
 * @author ex094
 */
public class VoiceLauncher {
    public static void main(String[] args) throws IOException {
        //Configuration Object
        Configuration configuration = new Configuration();

        // Set path to the acoustic model.
        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        // Set path to the dictionary.
        configuration.setDictionaryPath("PATH_TO_YOUR_.DIC_FILE");
        // Set path to the language model.
        configuration.setLanguageModelPath("PATH_TO_YOUR_.LM_FILE");

        //Recognizer object, Pass the Configuration object
        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)
        recognize.startRecognition(true);

        //Create SpeechResult Object
        SpeechResult result;

        //Checking if recognizer has recognized the speech
        while ((result = recognize.getResult()) != null) {
            //Get the recognize speech
            String command = result.getHypothesis();
        }
    }
}

The command variable will store the recognized speech from the user (The command that you speak) in string format hence we can compare whether the recognized command matches any from our list of commands and then execute the command. We will be using if conditions but you can do it using switch conditional.

//Imports
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
import java.io.IOException;

/**
 *
 * @author ex094
 */
public class VoiceLauncher {
    public static void main(String[] args) throws IOException {
        //Configuration Object
        Configuration configuration = new Configuration();

        // Set path to the acoustic model.
        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        // Set path to the dictionary.
        configuration.setDictionaryPath("/home/ex094/Desktop/4220.dic");
        // Set path to the language model.
        configuration.setLanguageModelPath("/home/ex094/Desktop/4220.lm");

        //Recognizer object, Pass the Configuration object
        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)
        recognize.startRecognition(true);

        //Creating SpeechResult object
        SpeechResult result;

        //Check if recognizer recognized the speech
        while ((result = recognize.getResult()) != null) {

            //Get the recognized speech
            String command = result.getHypothesis();
            //Match recognized speech with our commands
            if(command.equalsIgnoreCase("open file manager")) {
                System.out.println("File Manager Opened!");
            } else if (command.equalsIgnoreCase("close file manager")) {
                System.out.println("File Manager Closed!");
            } else if (command.equalsIgnoreCase("open browser")) {
                System.out.println("Browser Opened!");
            } else if (command.equalsIgnoreCase("close browser")) {
                System.out.println("Browser Closed!");
            }
        }
    }
}

Since the recognized speech is stored in our command variable, we can now compare using String comparison easily. Now run the code and speak into your mic one of the 4 commands If you speak “open filemanager” it should print “File Manager Opened”.

After your testing is complete, it’s time to add real commands like the one’s that’ll open the file manager when you speak the “open file manager” command. We will store the command in a variable and then use the Process library to execute the commands.

//Imports
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
import java.io.IOException;

/**
 *
 * @author ex094
 */
public class VoiceLauncher {
    public static void main(String[] args) throws IOException {
        //Configuration Object
        Configuration configuration = new Configuration();

        // Set path to the acoustic model.
        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        // Set path to the dictionary.
        configuration.setDictionaryPath("/home/ex094/Desktop/4220.dic");
        // Set path to the language model.
        configuration.setLanguageModelPath("/home/ex094/Desktop/4220.lm");

        //Recognizer object, Pass the Configuration object
        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)
        recognize.startRecognition(true);

        //Creating SpeechResult object
        SpeechResult result;

        //Check if recognizer recognized the speech
        while ((result = recognize.getResult()) != null) {

            //Get the recognized speech
            String command = result.getHypothesis();
            String work = null;
            Process p;

            if(command.equalsIgnoreCase("open file manager")) {
                work = "nautilus";
            } else if (command.equalsIgnoreCase("close file manager")) {
                work = "pkill nautilus";
            } else if (command.equalsIgnoreCase("open browser")) {
                work = "google-chrome";
            } else if (command.equalsIgnoreCase("close browser")) {
                work = "pkill google-chrome";
            }
            //In case command recognized is none of the above hence work might be null
            if(work != null) {
                //Execute the command
                p = Runtime.getRuntime().exec(work);
            }
        }
    }
}

Run this code and speak the “open browser” command into your mic, it should open the file manager, test all the other commands as well.

Adding more commands

In order to add more commands, just add your new commands in your previous corpus.txt file and then repeat the steps from the Creating Language Model and Dictionary.

If-Else Spaghetti

There might come a time when you’ll have a lot of commands in your program and putting them in if-else would be an absolute mess, so what to do? The best thing would be to load all the commands from corpus text file inside a HashTable and map the speech command to it’s respective executable command. I’ll add an updated code in the github repo of this tutorial so in case if you needed it.

Github Repo: A Simple Voice Command App powered by Java and Sphinx4

That’s it for this tutorial, have fun with your voice command app 🙂
Regards,
Ex094

 

 

Advertisements

28 thoughts on “Make your own Voice Command App using Java and Sphinx4

  1. Hey man do you mind assisting me? I coded the same thing (win10) and I got this. It seems as if the code can’t find any exe.
    18:38:52.673 INFO
    Exception in thread “main” java.io.IOException: Cannot run program “google-chrome”: CreateProcess error=2, The system cannot find the file specified
    at java.lang.ProcessBuilder.start(Unknown Source)
    at java.lang.Runtime.exec(Unknown Source)
    at java.lang.Runtime.exec(Unknown Source)
    at java.lang.Runtime.exec(Unknown Source)
    at VoiceLauncher.main(VoiceLauncher.java:51)
    Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
    at java.lang.ProcessImpl.create(Native Method)
    at java.lang.ProcessImpl.(Unknown Source)
    at java.lang.ProcessImpl.start(Unknown Source)
    … 5 more

      1. Same result unfortunately, here is my code.
        //Imports
        import edu.cmu.sphinx.api.Configuration;
        import edu.cmu.sphinx.api.LiveSpeechRecognizer;
        import edu.cmu.sphinx.api.SpeechResult;
        import java.io.IOException;

        /**
        *
        * @author ex094
        */
        public class VoiceLauncher {
        public static void main(String[] args) throws IOException {
        // Configuration Object
        Configuration configuration = new Configuration();

        // Set path to the acoustic model.
        configuration.setAcousticModelPath(“resource:/edu/cmu/sphinx/models/en-us/en-us”);
        // Set path to the dictionary.
        configuration.setDictionaryPath(“/Users/melko/Desktop/9535.dic”);
        // Set path to the language model.
        configuration.setLanguageModelPath(“/Users/melko/Desktop/9535.lm”);
        //Recognizer object, Pass the Configuration object
        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)
        recognize.startRecognition(true);

        //Creating SpeechResult object
        SpeechResult result;

        //Check if recognizer recognized the speech
        while ((result = recognize.getResult()) != null) {

        //Get the recognized speech
        String command = result.getHypothesis();
        String work = null;
        Process p;

        if(command.equalsIgnoreCase(“open file manager”)) {
        work = “nautilus”;
        } else if (command.equalsIgnoreCase(“close file manager”)) {
        work = “pkill nautilus”;
        } else if (command.equalsIgnoreCase(“open browser”)) {
        work = “google-chrome”;
        } else if (command.equalsIgnoreCase(“close browser”)) {
        work = “pkill google-chrome”;
        }
        //In case command recognized is none of the above hence work might be null
        if(work != null) {
        //Execute the command
        p = Runtime.getRuntime().exec(work);
        }
        }
        }
        }

    1. I don’t think that’s an issue, But you need to change


      if(command.equalsIgnoreCase(“open file manager”)) {
      work = “nautilus”;
      } else if (command.equalsIgnoreCase(“close file manager”)) {
      work = “pkill nautilus”;
      } else if (command.equalsIgnoreCase(“open browser”)) {
      work = “google-chrome”;
      } else if (command.equalsIgnoreCase(“close browser”)) {
      work = “pkill google-chrome”;
      }

      To


      if(command.equalsIgnoreCase(“open file manager”)) {
      work = “explorer”;
      } else if (command.equalsIgnoreCase(“close file manager”)) {
      work = “Taskkill /IM explorer.exe /F”;
      } else if (command.equalsIgnoreCase(“open browser”)) {
      work = “chrome”;
      } else if (command.equalsIgnoreCase(“close browser”)) {
      work = “Taskkill /IM chrome.exe /F”;
      }

      These changes were made because Windows CMD commands differ a bit from Linux

      And also you must have Chrome installed, Tell me if it works

    2. Also, instead of using just “chrome”, try giving the whole path to the chrome.exe file like:
      “C:\Program Files (x86)\Google\Chrome\Application\chrome.exe”
      So the part of the code

      } else if (command.equalsIgnoreCase(“open browser”)) {
      work = “google-chrome”;

      Becomes


      } else if (command.equalsIgnoreCase(“open browser”)) {
      work = “C:\Program Files (x86)\Google\Chrome\Application\chrome.exe";

      1. Thanks man! Everything worked as it should. One concern though, when I close the explorer.exe it shuts down a vital process in the windows system. Is there any way I can open the file explorer and close it without the issue?

      2. I’m glad it’s sorted out 🙂
        Regarding the explorer, I was facing the same thing in Linux because as I issued that command, no matter how many file managers I had, all of them got closed but no ciritcal processes were shutdown. I’m still in the process of finding a solution but for the time sake try something else instead of the File Manager.

  2. I think I found a fix, but it’s in ubuntu Linux and I lack the skills to change it to windows. Do you mind trying this and providing a windows version? Thanks!
    When there are multiple processes using the same process name, you may want to kill one of the processes but not all. Doing a “pkill processname” will kill all processes with that name (no matter how many processes there are).

    Here’s a bash script that I wrote that allows you specify which process of the processname’s you want to kill.

    you run it like so: “script.sh processname filtername”

    This would kill all processname’s that match filtername.

    Here is script.sh:

    #!/bin/bash
    lookup=$1
    finelookup=$2

    PIDs=”$(pgrep -l $lookup | tr ‘ ‘ ‘=’ | awk -F= ‘{print $1}’)”

    for i in ${PIDs} ; do
    toKill=”$(ps -ef | grep $i | grep $finelookup)”
    if [ “$toKill” != “” ]; then
    kill -9 $i
    fi
    done

  3. I fixed the previous error but now when i run then it shows:
    Exception in thread “main” java.lang.IllegalArgumentException: Empty command
    at java.lang.Runtime.exec(Runtime.java:444)
    at java.lang.Runtime.exec(Runtime.java:347)
    at my.Heather.VoiceController.main(VoiceController.java:66)

    My code:
    /*
    * To change this license header, choose License Headers in Project Properties.
    * To change this template file, choose Tools | Templates
    * and open the template in the editor.
    */
    package my.Heather;
    import edu.cmu.sphinx.api.Configuration;
    import edu.cmu.sphinx.api.LiveSpeechRecognizer;
    import edu.cmu.sphinx.api.SpeechResult;
    import java.io.IOException;
    import javax.swing.JOptionPane;
    import java.awt.Desktop;
    import java.io.File;
    /**
    *
    * @author ****
    */
    public class VoiceController {
    public static void main(String []args) throws Exception
    {//create configuration object and pass it to the recognizer so that it can make use of the required files
    Configuration config=new Configuration();
    config.setAcousticModelPath(“resource:/edu/cmu/sphinx/models/en-us/en-us”);
    config.setDictionaryPath(“8482.dic”);
    config.setLanguageModelPath(“8482.lm”);
    /*livespeechrecogniser is for the mic,to it’d obj
    pass config and call startRecognition,
    for other three call recognizer using set_Path,for mic call startRecognition*/
    LiveSpeechRecognizer recognize= new LiveSpeechRecognizer(config);
    recognize.startRecognition(true);//true clears the prev cache
    SpeechResult res;
    while((res=recognize.getResult())!=null)//ifapp recognized the cmd
    {
    String cmd=res.getHypothesis();//take the hypothesis in a string to display it to the user
    String w=””;
    Process p;
    if(cmd.equalsIgnoreCase(“open file manager”))
    {
    System.out.println(“Opening File Manager”);
    w=”explorer”;
    }else if(cmd.equalsIgnoreCase(“close file manager”))
    {
    w=”Taskkill/IM explorer.exe/F”;
    System.out.println(“Closing File Manager”);
    }else if(cmd.equalsIgnoreCase(“open browser”))
    {
    w=”chrome”;
    System.out.println(“Opening Browser”);
    }else if(cmd.equalsIgnoreCase(“close Browser”))
    {
    w=”Taskkill/IM chrome.exe/F”;
    System.out.println(“Closing Browser”);
    }else if(cmd.equalsIgnoreCase(“praju kaju”))
    {
    System.out.println(“praju kaju”);
    /* try
    {
    File f=new File(“c:\\Users\\Prakash\\Desktop\\Surprised.jpg”);
    Desktop dt=Desktop.getDesktop();
    dt.open(f);
    JOptionPane.showMessageDialog(null,”WHO’S THAT ????”);
    System.out.println(“Displaying Image”);
    }
    catch(Exception ee){}
    */}
    if(w!=null){
    p=Runtime.getRuntime().exec(w);
    }
    }
    }
    }

  4. Can you help me? on IMTOOL site it says “You may not need to exhastively list all possible sentences: the decoder will allow fragments to recombine into new sentences.”, is there a way to not allow fragments to recombine into new sentences?

    1. For nested commands, you can do it like this:

        
              //Check if recognizer recognized the speech
              while ((result = recognize.getResult()) != null) {
                  
                  //Get the recognized speech
                  //System.out.println(result.getHypothesis());
                  String command = result.getHypothesis();
                  
                  //Starting point for nested command
                  if(command.equalsIgnoreCase("open")) {
                      System.out.println("Awaiting Subcommand!");
                      while ((result = recognize.getResult()) != null) {
                          command = result.getHypothesis();
                          //Nested Commands
                          if(command.equalsIgnoreCase("notepad")) {
                              System.out.println("Opening Notepad!");
                          }
                          else if(command.equalsIgnoreCase("calculator")) {
                              System.out.println("Opening Calculator!");
                          }
                          //Break out of the Nested Command
                          break;
                      }
                  }
              }
      
  5. To close a window in Windows, what is the command.. I think pkill is for linux.. its not working in my code.

    1. Yes PKILL is for linux, on Windows it’s TASKKILL. Unfortunately closing Windows will close all instances of it, You cannot close a single Windows.

      1. Thanks for reply.. there is another problem I face.. I want to make a language model file for all the english word but the corpus.txt file is too large so that Sphinx Online Base Generator taking so much time and at the end is show time out error.. So can you help me to build the .lm file for my corpus.txt or is there any other tools that can help to build that.. Thanks..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s