16:06   Poster Session 2 – Speech Analysis & Synthesis, Affective Interfaces & Artificial Agents - 1st Balcony
Chair: Bart van Straalen and Anton Nijholt
Mixtract: A Directable Musical Expression System
Mitsuyo Hashida, Shunji Tanaka, Haruhiro Katayose
Abstract: This paper describes a music performance design system focusing on phrasing, the design and development of an intuitive interface to assist music performance design system. The proposed interface has an editor to control the parameter curves of “dynamics” and “tempos” of hierarchical phrase structures, and supports analysis mechanisms for hierarchical phrase structures that lighten the users’ work for music interpretation. We are interested in how a system can assist the users in designing music performances, but not to develop a full automatic system. Unlike the the most automatic performance rendering systems to date, assisting the process of music interpretation and to convey the musical interpretive intent to the the system are focussed in this paper. The advantage of the proposed system was verified from shortening time required for music performance design. The proposed system is more beneficial, from the viewpoint that it can be a platform to test various possibilities of phrasing expression.
The Power of Words: Enhancing Music Mood Estimation with Textual Input of Lyrics
Ying-Sian Wu, Wei-rong Chu, Chung-Yi Chi, Daniel C Wu, Richard Tzong-Han Tsai, Jane Yung-jen Hsu
Abstract: Music mood estimation (MME) is a key technology in mood-based music recommendation. While mainstream MME research nowadays relies on audio music analysis, exploring the significance of lyrics text in predicting song emotion is gaining attention in recent years. One major impediment to MME research is the lack of a clearly labeled and publicly available dataset annotating the emotion ratings of lyrics text and audio separately. In light of this, we compiled a dataset of 600 pop songs (iPop) from the mood ratings of 246 participants who experienced three different song sessions, lyrics text (L), audio music track (M), and lyrics text plus audio music track (LM). We then applied statistical analysis to estimate how lyrics text and audio contribute to a song's overall valence-arousal (V-A) mood ratings. Our results show that lyrics text are not only a valid measure for estimating a song's mood ratings but also provide supplementary information that can improve audio-only MME systems. Furthermore, a detailed examination suggests that lyrics text (L) ratings are better estimators of the overall mood ratings of a song (LM) in cases where L and M ratings conflict. We then construct a MME system that employs both features extracted from lyrics text and audio music track and validate the conclusions acquired in our statistical analysis. In estimating either V or A rating, the model with lyrics text plus audio track features performs better than only the model with only lyrics text or audio track features. These results validate the statement acquired by the statistical analysis.
Pitch Envelope based Frame Level Score Reweighed Algorithm for Emotion Robust Speaker Recognition
Dongdong Li, Yingchun Yang, Ting Huang
Abstract: Speech with various emotions aggravates the performance of speaker recognition systems. In this paper, a novel score normalization approach called PFLSR is introduced to compensate the influence of the affective speech on speaker recognition. The approach assumes that the maximum likelihood model is not easily changed with the expressive corruption for most of the frames. Thus the test frames are divided into two parts according to F0, the heavily affected ones and the slightly affected ones. The confidences of the slightly affected frames are reweighted into new scores to strengthen their confidence, and to optimize the final accumulated frame scores over the whole test utterance. The experiments are conducted on the Mandarin Affective Speech Corpus. An improvement of 15.1% in identification rate over the traditional speaker recognition is achieved.
Transmission of vocal emotion: do we have to care about the listener? The case of the Italian speech corpus EMOVO
Carlo Giovannella, Riccardo Santoboni, Davide Conflitti, Andrea Paoloni
Abstract: The evaluation of emotionally colored non-sense sentences contained in the Italian vocal database EMOVO has been performed by means of a new testing tool based on the Plutchick's finite stated model of emotions. The validation of the corpus has been performed by taking into account also the ability of the listeners to recognize a given emotion. Such a detailed analysis allowed us to identify the unreliable listeners and to operate a more accurate assessment of the vocal database and of the speakers.
Understanding Behavioral Problems in Text-based Communication Using Neuroscientific Perspective
Didem Gökçay, Şeref Arıkan, Gülsen Yıldırım
Abstract: In face-to-face communication, humans handle a variety of inputs in addition to the target content. Many affective clues such as facial expressions, body postures, and characteristics of speech, environmental sensory inputs, and even the mood of the interacting parties influence the overall meaning extracted from communication. However, text-based computer mediated communication (i.e., instant messaging, email, chat) generally exhibit poor media content in terms of these inputs. In particular, peers communicating through computer mediated communication (CMC) are usually prone to make wrong emotional judgments. Because of the tight connectivity of emotion and cognition, emotional judgment errors cause errors in the perception of the received message and shift behavioral preference toward fearless, disinherited, aggressive, and deceptive content in the responses. In this study, we are putting forward a cognitive neuroscience perspective to show the similarity between the behavioral problems brought by the text-based CMC platforms and cognitive and emotional behavioral problems exhibited by brain damaged patient populations. We present brief examples of behavioral deficits observed in amygdala and/or orbito-frontal cortex (OFC) damaged patients and show that these deficits bear striking similarities with those in text-based CMC platforms. While we consider ourselves to communicate similarly in face-to-face and computerized text-based environments, our brains produce dissimilar cognitive input and output in these two separate environments. Our conclusion is: when the communication problems introduced by the limited social cues in email and chat are seen in the light of the neurology perspective, developing solutions for these problems will become a priority issue.
Accounting for irony and emotional oscillation in computer architectures
Artemy Kotov
Abstract: We demonstrate computer architecture, operating on semantic structures (sentence meanings or representations of events) and simulating several emotional phenomena: top-down emotional processing, hypocrisy, emotional oscillation, sarcasm and irony. The phenomena can be simulated through the interaction between emotional processing and operations with semantics. We rely on a multimodal corpus of oral exams to observe the usage of emotional expressive cues in situations of strong conflict between internal motivation and external social limitations. We apply the observations to make the computer model simulate the observed cases of combined emotional expressions.
Ambient Telephony: Designing a Communication System for Enhancing Social Presence in Home Mediated Communication
Jorge Peregrín Emparanza, Pavan Dadlani, Boris de Ruyter, Aki Harma
Abstract: The experience of telephonic communication in the home environment has remained very similar for decades: practical, but intrusive, and providing little experience of social presence. This paper presents the work aiming at improving the experience of social experience in telephony. We present the results of several user studies on telephone usage and based on these propose the use of distributed speakerphone systems (or ambient telephones). We report empirical research comparing two different ambient telephone systems. The first system is an ambient system where the arrays of loudspeakers and microphones are embedded in the ceiling and the home audio system around the home. In the second experiment, we replaced the embedded system by a distributed set of clearly visible and tangible speakerphone units. We report lessons learned and implications for the design of ambient telephone systems.
Engineering affective computing: a unifying software architecture
Alexis Clay, Nadine Rouillon Couture, Laurence Nigay
Abstract: In the field of affective computing, one of the most exciting motivations is to enable a computer to sense users' emotions. To achieve this goal an interactive application has to incorporate emotional sensitivity. Following an engineering approach, the key point is then to define a unifying software architecture that allows any interactive system to become emotionally sensitive. Most research focus on identifying and validating interpretation systems and/or emotional characteristics from different modalities. However, there is little focus on modeling generic software architecture for emotion recognition. Therefore, we propose an integrative approach and define such a generic software architecture based on the grounding theory of multimodality. We state that emotion recognition should be multimodal and serve as a tool for interaction. As such, we use results on multimodality in interactive applications to propose the emotion branch, a component-based architecture model for emotion recognition systems that integrates itself within general models for interactive systems. The emotion branch unifies existing emotion recognition applications architectures following the usual three-level schema: capturing signals from sensors, extracting and analyzing emotionally-relevant characteristics from the obtained data and interpreting these characteristics into an emotion. We illustrate the feasibility and the advantages of the emotion branch with a test case that we developed for gesture-based emotion recognition.
Game Adaptivity Impact on Affective Physical Interaction
Georgios Yannakakis
Abstract: Adaptive human computer interaction is necessary for successfully closing the affective loop within intelligent interactive systems. This paper investigates the impact of adaptivity on the physiological state and the expressed emotional preferences of users. A physical interactive game is used as a test-bed system and its real-time adaptation mechanism is evaluated using a survey experiment. Results reveal that entertainment preferences expressed are consistent with the affective model constructed and that adaptation generates dissimilar physiological responses with respect to preferences.
Creating Emotional Communication with Interactive Artwork
Matt Iacobini, Tina Gonsalves, Chris Frith, Nadia Bianchi Berthouze
Abstract: This pilot study contributes to the building of an art installation exploring emotional contagion. The aim of the final art installation is to build an emotional communication loop with the audience. It will do so by: a) eliciting emotional responses in its audience through videos of emotional portraits, b) monitoring the audience’s facial expressions and c) reacting to or mimicking the audience changes in emotional expressions. The study presented in this paper aims to inform this project by creating a better understanding of how people emotionally engage with this type of artwork and what are the dynamics that characterize the interaction. The analysis of our early experiments indicates that the system can elicit visible emotional reactions from its audience, that their reaction patterns vary in relation to their reported perceptions of the quality of the interaction, and that this loop can bring people to reflect about their own emotions.
Social Networking Service for Mobile Communities Based on Spatial Cumulative Gossiping
Arttu Akseli Lamsa, Jani Mantyjarvi
Abstract: Social networking service for mobile devices is presented and evaluated. The operating principle of the service is inspired by human-like cumulative gossiping. The usage of the service is browser based and implementation utilises standard ad hoc communication between smartphones. The cumulative gossiping protocol is built on the service level and on top of the existing ad-hoc communication method. The service is evaluated with six social groups. Results suggest that there is great social demand for this kind of social networking service due to similarity of human-type gossiping. Also, a number of propositions for technical solutions to further enhance the end user experience of the service were gathered.
Simulation of the Dynamics of Virtual Characters' Emotions and Social Relations
Magalie Ochs, Nicolas Sabouret
Abstract: One of the main challenges is to give life to believable virtual characters. Research shows that emotions and social relations, closely related, play a key role in determining the behavior of individuals. In order to improve the believability of virtual characters' behavior, we propose in this article a method to compute virtual characters emotions based on attitudes and a model of their influence on the dynamics of social relations. Based on this work, a tool aiming at the simulation of the evolution of emotions and social relations of virtual characters have been implemented.
An Ambient Agent Model for Group Emotion Support
Rob Duell, Zulfiqar Ali Memon, Jan Treur, C. Natalie van der Wal
Abstract: This paper introduces an agent-based support model for group emotion, to be used by ambient systems to support teams in their emotion dynamics. Using model-based reasoning, an ambient agent analyzes the team’s emotion level for present and future time points. In case the team’s emotion level is found to become deficient, the ambient agent provides support to the team by proposing the team leader, for example, to give a pep talk to certain team members. The support model has been formally designed and within a dedicated software environment, simulation experiments have been performed.
Study of consumer’s emotion during product interviews
Christophe Vaudable, Christine Balague, Laurence Y. Devillers
Abstract: In this study we have focused our attention on the study of consumer’s emotion during interviews about products. We have based our analysis on annotation of video-taped dialogs. The collected corpus has been annotated by two experts, and then a perceptive test has been carried out with 40 subjects. The interviews have shown many “real-life” complex emotions. In this paper, we mainly present results showing the impact of the context (judges’ personality and subjects’ eating habits) on the production and perception of emotional states.
Theme Detection an Exploration of Opinion Subjectivity
Amitava Das, Sivaji Bandyopadhyay
Abstract: Work in opinion mining and classification often assumes the incoming documents to be opinionated. Opinion mining system makes false hits while attempting to compute polarity values for non-subjective or factual sentences or documents. It becomes imperative to decide whether a given document contains subjective information or not as well as to identify which portions of the document are subjective or factual. In this work a Theme Detection technique has been evolved for more generic domain independent subjectivity detection that classifies sentences with binary feature: opinionated or non-opinionated. Theme Detection technique examines sentence level opinion and finally accumulates the opinion clues to reach the discourse level subjectivity. The subjectivity detection system has been evaluated on the Multi Perspective Question Answering (MPQA) corpus as well as on Bengali corpus. The system evaluation has shown the precision and recall values of 76.08 and 83.33 for English and 72.16 and 76.00 for Bengali respectively.
Protocol CINEMO: The use of fiction for collecting emotional data in naturalistic controlled oriented context
Nicolas Rollet, Agnes Delaborde, Laurence Y. Devillers
Abstract: In this study we have collected a corpus for training model for emotion detection in the context of monitoring Artificial Agent’s by voice. In order to control occurrences of various range of affective state in a modeling purpose, we used acted emotional expression through play dubbing’s exercises. We observed that some natural affective states occurred during those exercises. We ran a perceptive test in order to validate our annotations: anger and sadness were the more consensual annotations.
openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit
Florian Eyben, Martin Wöllmer, Björn Schuller
Abstract: Various open-source toolkits exist for speech recognition and speech processing. These toolkits have brought a great benefit to the research community, i. e. speeding up research. Yet, no such freely available toolkit exists for automatic affect recognition from speech. We herein introduce a novel open-source affect and emotion recognition engine, which integrates all necessary components in one highly efficient software package. The components include audio recording and audio file reading, state-of-the-art paralinguistic feature extraction and plugable classification modules. In this paper we introduce the engine and extensive baseline results. Pre-trained models for four affect recognition tasks are included in the openEAR distribution. The engine is tailored for multi-threaded, incremental on-line processing of live input in real-time, however it can also be used for batch processing of databases.