Document Type

Conference Paper


This item is available under a Creative Commons License for non-commercial use only


Statistics, Computer Sciences

Publication Details

Presented at the Joint Workshop on Interfaces and Human Decision Making for Recommender Systems (IntRS@RecSys 2017)


Labels representing value judgements are commonly elicited using an interval scale of absolute values. Data collected in such a manner is not always reliable. Psychologists have long recognized a number of biases to which many human raters are prone, and which result in disagreement among raters as to the true gold standard rating of any particular object. We hypothesize that the issues arising from rater bias may be mitigated by treating the data received as an ordered set of preferences rather than a collection of absolute values. We experiment on real-world and artificially generated data, finding that treating label ratings as ordinal, rather than interval data results in an increased inter-rater reliability. This finding has the potential to improve the efficiency of data collection for applications such as Top-N recommender systems; where we are primarily interested in the ranked order of items, rather than the absolute scores which they have been assigned.