Emotional Speech Corpus Construction, Annotation and Distribution

Charlie Cullen, Dublin Institute of Technology
Brian Vaughan, Dublin Institute of Technology
Spyros Kousidis, Dublin Institute of Technology

Document Type Conference Paper

The sixth international conference on Language Resources and Evaluation, LREC 2008

Abstract

Advances in both speech/emotion recognition and emotional speech synthesis largely depend on the availability of annotated, emotional speech corpora. Although it is common that corpora are purpose-built for specific applications or research purposes, it would be desirable to re-use existing corpora. However, there is a lack of widely accepted standards in such areas as audio quality, annotation with metadata in order to perform queries, as well as mutually agreed definitions, as in ‘what is emotion?’. The work described here is a developing process of emotional asset acquisition, annotation and on-line publishing for emotional rating by end users, which attempts to address some of the above issues, while being flexible in practical issues such as re-usability, standardisation and access. The paper is divided into three parts: (1) A method for obtaining “genuine” emotional speech recordings, namely Mood Induction Procedures (MIP 4), while recording in a controlled environment; (2) the analysis and annotation of the recorded assets via a purpose-built audio analysis tool and (3) an implementation of the IMDI corpus annotation schema.