Document Type



This item is available under a Creative Commons License for non-commercial use only


Information Science

Publication Details

Previously appeared in Elsevier's Expert Systems with Applications, Volume 40, Issue 4, March 2013, Pages 1372–1380.


After credit has been granted, lenders use behavioural scoring to assess the likelihood of default occurring during some specific outcome period. This assessment is based on customers’ repayment performance over a given fixed period. Often the outcome period and fixed performance period are arbitrarily selected, causing instability in making predictions. Behavioural scoring has failed to receive the same attention from researchers as application scoring. The bias for application scoring research can be attributed, in part, to the large volume of data required for behavioural scoring studies. Furthermore, the commercial sensitivities associated with such a large pool of customer data often prohibits the publication of work in this area. This paper focuses on behavioural scoring and evaluates the contrasting effects of altering the performance period and outcome period using 7-years worth of data from the Irish market. The results of this work indicate that a 12-month performance period yields an easier prediction task when compared with other historical payment periods of varying lengths. This article also quantifies differences in the classification performance of logistic regression arising from different outcome periods length. Our findings show that the performance of a logistic regression classifier degrades significantly when the outcome window is extended beyond 6-month. Finally we consider different approaches to how the concept of default is defined. Typically whether the customer is identified as a default risk or not is set based on either (i) whether the account is in default at the end of the outcome period or (ii) at any time during the outcome period. This paper studies both approaches and finds that the latter approach resulted in an easier classification problem, that is, it gives the highest assurance that the classification will be correct.