Automatic Speech Recognition (ASR) on Linux is becoming easier. Several packages are available for users as well as developers. This document describes the basics of speech recognition and describes some of the available software.
Carnegie Mellon University is dedicated to speech technology research, development, and deployment, and we hope this page will be a vehicle to make our work available online. CMU has a historic position in computational speech research, and continues to test the limits of the art.
Welcome to Speech Software at CMU
These pages provide a distribution mechanism for a number of Speech related software systems developed at, hosted at or substatially used within the CMU Speech Group. These pages are part of our continuing goal to provide state of the art, stable, free software components to allow anyone to build and use speech technology systems.
The CMU Sphinx Group Open Source Speech Recognition Engines
The Sphinx Group at Carnegie Mellon University is committed to releasing the long-time, DARPA-funded Sphinx projects widely, in order to stimulate the creation of speech-using tools and applications, and to advance the state of the art both directly in speech recognition, as well as in related areas including dialog systems and speech synthesis.
The Open Mind Speech project is part of the Open Mind Initiative and aims to develop free (GPL) speech recognition tools and applications, as well as collect speech data from e-citizens using the Internet. The main target will still be Linux (and other UNIX flavors). The software will be designed so that it can be easily integrated into any application, window manager or desktop environments (KDE and gnome). Open Mind Speech is using the Overflow environment.
Praat - Phonetically analyzes, manipulates, and synthesizes speech
'Praat' is a computer program with which phoneticians can analyze, synthesize, and manipulate speech, and create high- quality pictures for articles and theses. It has functions for speech analysis, speech synthesis, learning algorithms, labelling and segmentation, speech manipulation, listening experiments, and more.