Skip to content

tryolabs/TLSphinx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TLSphinx

TLSphinx is a Swift wrapper around Pocketsphinx, a portable library based on CMU Sphinx, that allow an application to perform speech recognition without the audio ever leaving the device

This repository has two main parts. The first is a syntetized version of the pocketsphinx and sphinx base repositories with a module map to access the library as a Clang module. This module is accessed under the name Shpinx and has two submodules: Pocket and Base in reference to pocketsphinx and sphinx base.

The second part is TLSphinx, a Swift framework that uses the Sphinx Clang module and exposes a Swift-like API that talks to pocketsphinx.

Note: I write a blog post about TLSphinx here at the Tryolabs Blog. Check it out for a short history about why I wrote this.

Usage

The framework provides three classes:

  • Config describe the configuration needed to recognize speech.
  • Decoder is the main class that provides the API to perform all decoding.
  • Hypotesis is the result of a decode attempt. It has a text and a score properties.

Config

Represents the cmd_ln_t opaque structure in Sphinx. The default constructor takes an array of tuples with the form (param name, param value) where "param name" is the name of one of the parameters recognized by Sphinx. In this example we are passing the acustic model, the language model and the dictionary. For a complete list of recognized parameters check the Sphinx docs.

The class has a public property to turn on/off the debug info from Sphinx:

public var showDebugInfo: Bool

Decoder

Represent the ps_decoder_t opaque struct in Sphinx. The default constructor take a Config object as parameter.

This has the functions to perform the decode from a file or from the mic. The result is returned in an optional Hypotesis object, following the naming convention of the Pocketsphinx API. The functions are:

To decode speech from a file:

public func decodeSpeechAtPath (filePath: String, complete: (Hypotesis?) -> ())

The audio pointed by filePath must have the following characteristics:

  • single-channel (monaural)
  • little-endian
  • unheadered
  • 16-bit signed
  • PCM
  • sampled at 16000 Hz

To control the size of the buffer used to read the file, the Decoder class has a public property

public var bufferSize: Int

To decode a live audio stream from the mic:

public func startDecodingSpeech (utteranceComplete: (Hypotesis?) -> ())
public func stopDecodingSpeech ()

You can use the same Decoder instance many times.

Hypotesis

This struct represents the result of a decode attempt. It has a text property with the best scored text and a score with the score value. This struct implements Printable so you can print it with println(hypotesis_value).

Examples

Processing an Audio File

As an example let's see how to decode the speech in an audio file. To do so you first need to create a Config object and pass it to the Decoder constructor. With the decoder you can perform automatic speech recognition from an audio file like so:

import TLSphinx

let hmm = ...   // Path to the acustic model
let lm = ...    // Path to the languaje model
let dict = ...  // Path to the languaje dictionary

if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
  if let decoder = Decoder(config:config) {
      
      let audioFile = ... // Path to an audio file
      
      decoder.decodeSpeechAtPath(audioFile) {
          
          if let hyp: Hypotesis = $0 {
              // Print the decoder text and score
              println("Text: \(hyp.text) - Score: \(hyp.score)")
          } else {
              // Can't decode any speech because of an error
          }
      }
  } else {
      // Handle Decoder() fail
  }
} else {
  // Handle Config() fail  
}

The decode is performed with the decodeSpeechAtPath function in the bacground. Once the process finishes, the complete closure is called in the main thread.

Speech from the Mic

import TLSphinx

let hmm = ...   // Path to the acoustic model
let lm = ...    // Path to the language model
let dict = ...  // Path to the language dictionary

if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
  if let decoder = Decoder(config:config) {
      
      decoder.startDecodingSpeech {
          
          if let hyp: Hypotesis = $0 {
              println(hyp)
          } else {
              // Can't decode any speech because an error
          }
      }
  } else {
      // Handle Decoder() fail
  }
} else {
  // Handle Config() fail  
}

//At some point in the future stop listen to the mic
decoder.stopDecodingSpeech()

Installation

The easiest way to integrate TLSphinx is using Carthage or a similar method to get the framework bundle. This lets you integrate the framework and the Sphinx module without magic.

Carthage

In your Cartfile add a reference to the last version of TLSphinx:

github "Tryolabs/TLSphinx" ~> 1.0.2

Then run carthage update and follow the standar installation instructions described on the Carthage site.

You must also tell XCode where to find Sphinx module that is located in the Carthage checkout. To do so:

  • add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include to Header Search Paths recursive
  • add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/lib to Library Search Paths recursive
  • in Swift Compiler - Search Paths add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include to Import Paths

Manual

Download the project from this repository and drag the TLSpinx project to your XCode project. If you encounter any errors about missing headers and/or libraries for Sphinx please add the Spinx/include directory to your header search path and Sphinx/lib to the library search path and mark it as recursive.

Community

Join us on Slack!

Author

BrunoBerisso, [email protected]

License

TLSphinx is available under the MIT license. See the LICENSE file for more info.