Speech recognition, also known as automatic
speech recognition (ASR), computer speech recognition, or speech-to-text, is a
capability which enables a program to process human speech into a written
speech recognition technology is pioneering in the industry. Unlike others who
rely on 3rd party services providers, we develop our own and unique
speech processing engine, capable of handling precisely Cantonese, Mandarin and English. Our strength is in Cantonese, and our model is being optimized
from time to time.
Low error rate even in environments with
Preciseness is high even with low audio
Transcribe precisely for both long and short
A pre-defined magic word will wake up the system
Cloud/on premise deployment flexibility
Down to a single character
Improves accuracy by 80%
Cantonese mixed with English
Speech recognition is used in wide areas, from
robots to smart home to autopilot vehicles.
Author: Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, Sanjeev Khudanpur, March 2017
The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real RIRs can be difficult to acquire, and also the effect of adding point-source noises. We find that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added. Further we show that the trained acoustic models not only perform well in the distant-talking scenario but also provide better results in the close-talking scenario. We evaluate our approach on several LVCSR tasks which can adequately represent both scenarios.
Author: Tuomas Virtanen, Rita Singh, Bhiksha Raj,
Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.
Key features:- Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.- Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.- Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.- Includes contributions from top ASR researchers from leading research units in the field