This item is available under a Creative Commons License for non-commercial use only
Advances in both speech/emotion recognition and emotional speech synthesis largely depend on the availability of annotated, emotional speech corpora. Although it is common that corpora are purpose-built for specific applications or research purposes, it would be desirable to re-use existing corpora. However, there is a lack of widely accepted standards in such areas as audio quality, annotation with metadata in order to perform queries, as well as mutually agreed definitions, as in ‘what is emotion?’. The work described here is a developing process of emotional asset acquisition, annotation and on-line publishing for emotional rating by end users, which attempts to address some of the above issues, while being flexible in practical issues such as re-usability, standardisation and access. The paper is divided into three parts: (1) A method for obtaining “genuine” emotional speech recordings, namely Mood Induction Procedures (MIP 4), while recording in a controlled environment; (2) the analysis and annotation of the recorded assets via a purpose-built audio analysis tool and (3) an implementation of the IMDI corpus annotation schema.
Cullen, C., Vaughan, B., Kousidis, S.: Emotional Speech Corpus Construction, Annotation and Distribution. LREC: the sixth international conference on Language Resources and Evaluation, 2008.