Conference papers

DIT Speech Corpus

Dermot Campbell, Technological University DublinFollow
Yi Wang, Technological University DublinFollow
John D. Kelleher, Technological University DublinFollow
Marty Meinardi, Technological University DublinFollow
Bunny Richardson, Technological University DublinFollow

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences, Linguistics

Publication Details

3rd IVACS. Nottingham, UK. June 2006.

Abstract

DIT’s nascent speech corpus will allow a body of spoken material to be searched for features of informal native speech via a normalised transcription. Once located, the original sound files can be played at normal speed or slowed down in order to better study the recorded speech. The DIT speech corpus treats speed of delivery as a key element in producing the elisions, assimilation, reductions and co-articulations characteristic of native-to-native dialogues. Lack of training in dealing with this spoken register can lead to lack of preparation for the world of real speech and even to a degree of social exclusion. It is also envisaged that non-native speech will be included in the corpus so that comparisons can be drawn between native speech and that of various nativised productions of the same items. The database will therefore be capable of being queried on a multi-factorial basis depending on user needs. The optimal segmentation of the normalised transcript is, however, far from clear, and some of the difficulties will be touched on by this presentation. While the tone unit, as proposed by David Brazil, for example, is attractive as a base unit for displaying the concordanced speech corpus, it nevertheless raises problems when there is a discrepancy between semantic segmentation and actual phonetic delivery. The rationale for the currently adopted minimal unit will be explained and members of the audience will be invited to offer feedback on any requirements their own use of corpora would place on the database.

Recommended Citation

Campbell, D. et al (2006) DIT Speech Corpus. 3rd IVACS. Nottingham, UK. 23 – 24 June.

Funder

EU FP6

Download

COinS

Conference papers

DIT Speech Corpus

Document Type

Rights

Disciplines

Publication Details

Abstract

Recommended Citation

Funder

Search

Browse

Author Corner

Links

Conference papers

DIT Speech Corpus

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

Recommended Citation

Funder

Share

Search

Browse

Author Corner

Links