Articles

Quantitative Fine-grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian

Filip Klubicka, Technological University DublinFollow
Antonio Toral, University of GroningenFollow
Victor Manuel Sanchez-Cartagena, Prompsit Language EngineeringFollow

Document Type

Article

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences, Information Science, Linguistics, 6.5 OTHER HUMANITIES

Publication Details

Machine Translation.

Access the journal here

Abstract

This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established Multidimensional Quality Metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-to- Croatian, a language direction that involves translating into a morphologically rich language, for which we compare three MT systems belonging to different paradigms: pure phrase-based, factored phrase-based and neural. First, we design an MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of Slavic languages, which made the annotation process feasible and accurate. Errors in MT outputs were then annotated by two annotators following this taxonomy. Subsequently, we carried out a statistical analysis which showed that the best-performing system (neural) reduces the errors produced by the worst system (pure phrase-based) by more than half (54%). Moreover, we conducted an additional analysis of agreement errors in which we distinguished between short (phrase-level) and long distance (sentence-level) errors. We discovered that phrase-based MT approaches are of limited use for long distance agreement phenomena, for which neural MT was found to be especially effective.

DOI

https://doi.org/10.1007/s10590-018-9214-x

Recommended Citation

Klubicka, F., Toral, A. & Sanchez-Cartagena, V. (2018). Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian. Machine Translation, pg. 1-21. doi.org/10.1007/s10590-018-9214-x

Download

Included in

Computational Engineering Commons, Digital Humanities Commons, Language Interpretation and Translation Commons, Modern Languages Commons, Other Computer Engineering Commons, Slavic Languages and Societies Commons

COinS

Articles

Quantitative Fine-grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Articles

Quantitative Fine-grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links