Document Type

Theses, Ph.D


This item is available under a Creative Commons License for non-commercial use only


Computer Sciences

Publication Details

Thesis submitted for the degree of Doctor of Philosophy, to School of Computing, Dublin Institute of Technology, October 2017.


An idiom is a multiword expression (MWE) whose meaning is non- compositional, i.e., the meaning of the expression is different from the meaning of its individual components. Idioms are complex construc- tions of language used creatively across almost all text genres. Idioms pose problems to natural language processing (NLP) systems due to their non-compositional nature, and the correct processing of idioms can improve a wide range of NLP systems. Current approaches to idiom processing vary in terms of the amount of discourse history required to extract the features necessary to build representations for the expressions. These features are, in general, stat- istics extracted from the text and often fail to capture all the nuances involved in idiom usage.

We argue in this thesis that a more flexible representations must be used to process idioms in a range of idiom related tasks. We demonstrate that high-dimensional representations allow idiom classifiers to better model the interactions between global and local features and thereby improve the performance of these systems with regard to processing idioms. In support of this thesis we demonstrate that distributed representations of sentences, such as those generated by a Recurrent Neural Network (RNN) greatly reduce the amount of discourse history required to process idioms and that by using those representations a “general” classifier, that can take any expression as input and classify it as either an idiomatic or literal usage, is feasible. We also propose and evaluate a novel technique to add an attention module to a language model in order to bring forward past information in a RNN-based Language Model (RNN-LM). The results of our evaluation experiments demonstrate that this attention module increases the performance of such models in terms of the perplexity achieved when processing idioms. Our analysis also shows that it improves the performance of RNN-LMs on literal language and, at the same time, helps to bridge long-distance dependencies and reduce the number of parameters required in RNN-LMs to achieve state-of-the-art performance. We investigate the adaptation of this novel RNN-LM to Neural Machine Translation (NMT) systems and we show that, despite the mixed results, it improves the translation of idioms into languages that require distant reordering such as German. We also show that these models are suited to small corpora for in-domain translations for language pairs such as English/Brazilian-Portuguese.