ParaASR - Speech Recognition

Speech recognition, also known as automatic speech recognition (ASR), computer speech recognition, or speech-to-text, is a capability which enables a program to process human speech into a written format.

Our speech recognition technology is pioneering in the industry. Unlike others who rely on 3rd party services providers, we develop our own and unique speech processing engine, capable of handling precisely Cantonese, Mandarin and English. Our strength is in Cantonese, and our model is being optimized from time to time.

Key Features

Noise tolerant

Low error rate even in environments with background noise

Low bandwidth tolerant

Preciseness is high even with low audio bandwidth sources

Flexible recognition

Transcribe precisely for both long and short sentences

Silence and magic word detectors

A pre-defined magic word will wake up the system

Cloud based

Cloud/on premise deployment flexibility

Trainable and editable

Down to a single character

NLP

Improves accuracy by 80%

Mixed language capability

Optimized for Cantonese mixed with English

Application

Speech recognition is used in wide areas, from robots to smart home to autopilot vehicles.

Robots

Smart home

IoT

Autopilot

Research Paper

A study on data augmentation of reverberant speech for robust speech recognition

Author: Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, Sanjeev Khudanpur, March 2017
The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real RIRs can be difficult to acquire, and also the effect of adding point-source noises. We find that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added. Further we show that the trained acoustic models not only perform well in the distant-talking scenario but also provide better results in the close-talking scenario. We evaluate our approach on several LVCSR tasks which can adequately represent both scenarios.

Full paper

Techniques for Noise Robustness in Automatic Speech Recognition

Author: Tuomas Virtanen, Rita Singh, Bhiksha Raj, October 2012
Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.

Key features:
- Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.
- Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.
- Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.
- Includes contributions from top ASR researchers from leading research units in the field

Wiley Online

Artificial intelligence speech recognition model for correcting spoken English teaching

Author: Ran. Duan, Yingli. Wang, Haoxin. Qin
Artificial intelligence speech recognition technology is an important direction in the field of human-computer interaction. The use of speech recognition technology to assist teachers in the correction of spoken English pronunciation in teaching has certain effects and can help students without being constrained by places, time and teachers. Based on artificial intelligence speech recognition technology, this paper improves and analyzes speech recognition algorithms, and uses effective algorithms as the system algorithms of artificial intelligence models. Meanwhile, based on phoneme-level speech error correction, after introducing the basic knowledge, construction and training of acoustic models, the basic process of speech cutting, including the front-end processing of speech and the extraction of feature parameters, is elaborated. In addition, this study designed a control experiment to verify and analyze the artificial intelligence speech recognition correction model. The research results show that the method proposed in this paper has a certain effect.

Reference

Technology