Free Essay

Spech Recog

In:

Submitted By dhiblg
Words 1009
Pages 5
ABSTRACT:
This report introduces and motivates the use of hybrid robust feature extraction techniques for continuous speech recognition for Bengali Numerical digits system. The speech recognizers use a parametric form of a signal to get the most important distinguishable features of speech signal for recognition task. In this paper Linear predictive coding (LPC), Mel-frequency cepstral coefficients (MFCC), Perceptual linear prediction coefficients (PLP) along with a hybrid feature Bark Frequency Cepstral Coefficients (BFCC) is used for language Identification. Bark Frequency Cepstral Coefficients (BFCC) and Revised Perceptual Linear Prediction Coefficients (RPLP) were obtained from combination of MFCC and PLP. Two different classifiers, Vector Quantization (VQ) with Dynamic Time Warping (DTW) and Gaussian Mixture Model (GMM) were used for classification. The experiment shows better identification rate using hybrid feature extraction techniques compared to conventional feature extraction methods. BFCC has shown better performance than MFCC with both classifiers. RPLP along with GMM has shown best identification performance among all feature extraction techniques.
Key words—Linear Predictive Coding(LPC), Perceptual Linear Prediction(PLP), Revised Perceptual Linear Prediction(RPLP), Bark Frequency Cepstral Coefficient (BFCC), Mel Frequency Cepstral Coefficient(MFCC), Vector Quantization(VQ), Gaussian Mixture Model(GMM), Dynamic Time Warping (DTW), Hidden Markov Model(HMM).

Introduction:-
Speech is the predominant mode of human communication. Though much of the knowledge that is passed from generation to generation is in written format, still speech is the preferred mode for everyday interaction. It is natural to assume that speech will also be the preferred mode for human-machine interaction as well. Speech is very efficient and convenient, and allows our hands to be free to perform other tasks.
Speech recognition systems fall into two broad categories. An Isolated word recognition system recognizes one word at a time. To use such system, the speaker must pause between each word. On the other hand a Continuous speech recognition system recognizes speech as we normally speak it, with words flowing together in a continuous stream. Most systems currently on the market use isolated word recognition techniques. Continuous speech recognition systems are under active development, and are nearing practical use.
The basic architecture of a spoken language understanding system is shown in the block diagram below:

Fig. 1: The architecture of a speech processing system
The sounds produced by the speaker are converted into the varying electric current that corresponds to the complex sound wave by microphone. To use such signal as input to a computer it must be digitized. This is performed by an Analog-to-digital converter. This signal is then processed to extract various features, such as the intensity of sound at different frequencies and the change in intensity over time. The challenge for speech recognition is to reduce this data to a manageable representation. In fact, most current speech recognition system end up classifying each segment of signal only one of 256 distinct categories.
The first technique is to represent the signal as a sequence of segments. The larger the segment, the more data is available to make a classification, but the less sensitive the classification will be for representing the rapid transition that are necessary to reliably recognize stops and other transient constants.
Now the task is to characterize the signal within each segment in a way that captures the information that most reliably identifies particular speech sounds over a wide range of conditions. Two simple measures on a segment are:
Overall Intensity – intensity can be measured by the sum of the squares of the numbers in the segment, and it is a good indicator of whether the segment is part of a voiced phoneme, an unvoiced phoneme or silence.
Peak measurement – the average time between significant intensity peaks in the signal will tend to reflect the fundamental frequency in voiced speech. Other measures on a segment can be obtained from performing a spectral analysis, using a technique such as the Fast Fourier Transform (FFT) to produce an analysis of the intensity of the signal at different frequencies. The spectral analysis can be used for many purposes such as identification of thepeaks in the spectrum that tend to reflect the formants in voiced speech, and different spectral patterns willreflect different sorts of constants. Often the spectral analysis is used as an intermediate stage for additional processing to extract the key aspects of the spectrum. A large number of techniques are used in the literature to reduce this information to a few key features. Once the signal processing has reduced the signal to a sequence of symbols the speech recognition task looks more like a traditional parsing problem. Specifically, it is given a sequence of symbols and must identify the most likely sequence of words that could have generated the input. To perform this HMM models can be used effectively. The extracted features serve as the input to the speech recognition system, which uses Hidden Markov Model (HMM) techniques.
Previous Works in Speech Recognition:
The first speech recognizer appeared in 1952 and consisted of a device for the recognition of single spoken digits. Another early device was the IBM Shoebox, exhibited at the 1964 New York World's Fair. Popular speech recognition conferences held each year or two include Speech TEK and Speech TEK Europe, ICASSP, Euro speech/ICSLP (now named Inter speech) and the IEEE ASRU. Conferences in the field of Natural language processing, such as ACL, NAACL, EMNLP, and HLT, are beginning to include papers on speech processing. Important journals include the IEEE Transactions on Speech and Audio Processing (now named IEEE Transactions on Audio, Speech and Language Processing), Computer Speech and Language, and Speech Communication. Still research works are going on several areas of speech processing. There are different topics to discuss under this main area of interest. Work on speech processing on different languages is in progress also. Besides research on digit recognition in many languages is also in train. Here our topic of discussion is Bengali digit recognition. Lots of research work has also been done on Bengali digit recognition system. We have mentioned some of our predecessor’s papers in reference.

Similar Documents