10:00
Oral Session 4-KZ - Emotional Speech Analysis
Chair: Bjoern Schuller
10:00
25 mins
|
Stress and Emotion Recognition Based on Log-Gabor Filter Analysis of Speech Spectrograms
Ling He, Margaret Lech, Namunu Maddage, Nicholas Allen
Abstract: We present new methods that derive characteristic features from speech magnitude spectrograms. Two of the presented approaches have been found to be particularly efficient in the process of automatic stress and emotion classification. In the first approach, the spectrograms are sub-divided into ERB frequency bands and the average energy for each band is calculated. In the second approach, the spectrograms are passed through a bank of 12 log-Gabor filters and the outputs are averaged and passed through an optimal feature selection procedure based on mutual information criteria. The proposed methods were tested using single vowels, words and sentences from the SUSAS data base with 3 classes of stress, and spontaneous speech recordings made by psychologists (ORI) with 5 emotional classes. The classification results based on the Gaussian mixture model show correct classification rates of 58%-82%, for different SUSAS data sets and 40%-53% for the ORI data base.
|
10:25
25 mins
|
Recognition of emotions in speech by a hierarchical approach
Zhongzhe Xiao, Emmanuel Dellandrea, Weibei Dou, Liming Chen
Abstract: This paper deals with speech emotion analysis within the context of increasing awareness of the wide application potential of affective computing. Unlike most works in the literature which mainly rely on classical frequency and energy based features along with a single global classifier for emotion recognition, we propose in this paper some new harmonic and Zipf based features for better speech emotion characterization in the valence dimension and a multi-stage classification scheme driven by a dimensional emotion model for better emotional class discrimination. Experimented on the Berlin dataset with 68 features and six emotion states, our approach shows its effectiveness, displaying a 68.60% classification rate and reaching a 71.52% classification rate when a gender classification is first applied. Using the DES dataset with five emotion states, our approach achieves an 81% recognition rate when the best performance in the literature to our knowledge is 76.15% on the same dataset.
|
10:50
25 mins
|
Affect Sensing in Speech: Studying Fusion of Linguistic and Acoustic Features
Alexander Osherenko, Thurid Vogt, Elisabeth André
Abstract: Recently, there has been considerable interest in the recognition of affect in speech. In this paper, we investigate how information fusion using linguistic (lexical, stylometric, deictic) and acoustic information can be utilized for this purpose and present a comprehensive study of fusion. We examine fusion at the decision level and the feature level and discuss obtained results.
|
|