Voice processing

lptv
Speech processing is a research area that involves the study of speech signals and all those methods that are needed to process the speech. Among the research areas focused in speech processing are:

Speech Recognition

The automatic recognition of speech (ASR, Automatic Speech Recognition) is the process of converting a speech signal, captured by a telephone apparatus or a microphone connected to a PC, to linguistic and phonetic information pronounced by a speaker. Speech is the most natural manner of interaction and the ASR could be used in telephone services, in human-machine interfaces for giving commands, in dictation machines, etc. Usually, the technologies developed in ASR are based on HMM (Hidden Markov Models) which are capable of dealing with natural language and continuous speech.

Speech Technology for Language Learning

One of the most important applications of speech technology takes place in the area of language learning. For example, in Chile one area of research interest based on speech technology is learning English as a second language. ASR and parameter estimation of prosody can be investigated as tools for evaluating the quality of pronunciation.

Speaker Recognition

Recognition of the speaker is a process of automatically recognizing who is speaking based on the speech signal as biometric information. Speaker recognition is divided into identification of the speaker (SI, Speaker Identification) and verification of the speaker (SV, Speaker Verification). SI corresponds to the work of associating the recorded speech with one among N speakers. Consequently, SI is a problem of classification 1:N. On the other hand, in SV, the idea is to confirm or reject the identity claimed by a speaker. As a result, SV is a 1:1 problem.

Robust Speech Processing

Robustness is one of the main areas of interest in researching ASR and SV systems. Some problems of research interest in the field of speech technologies based on robustness are additive noise, channel mismatching, and distortion in codification-decodification.

QoS on the Internet for Real-Time Application

The Internet is designed for elastic traffic based on TCP, which in turn can adjust its transmission rate according to the condition of the network. However, the development of various new speech applications in real time has created the problem of how to guarantee QoS levels (QoS, Quality of Service).

Speech Transmission over IP

Speech transmission over the Internet is affected by packet loss and codification-decodification distortion. Some problems of research interest in this field based on speech transmission over IP are precision of ASR and subjective evaluation of the quality of IP networks.

Evaluation of the Usability of Dialogue Systems

The concept of usability attempts to measure how well an interface can be utilized by users to obtain specific objectives effectively, efficiently, and with satisfaction in a specific context of use. The evaluation of usability is studied to optimize the design of dialogue systems from the point of view of the user and to evaluate the reliability of a given service provided by ASR or SV.