Document Type

Dissertation

Rights

This item is available under a Creative Commons License for non-commercial use only

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE, Computer Sciences, Information Science

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Dublin Institute of Technology for the degree of M.Sc. in Computing (Data Analytics).

Abstract

Developing predictive models for classification problems considering imbalanced datasets is one of the basic difficulties in data mining and decision-analytics. A classifier’s performance will decline dramatically when applied to an imbalanced dataset. Standard classifiers such as logistic regression, Support Vector Machine (SVM) are appropriate for balanced training sets whereas provides suboptimal classification results when used on unbalanced dataset. Performance metric with prediction accuracy encourages a bias towards the majority class, while the rare instances remain unknown though the model contributes a high overall precision. There are chances where minority instances might be treated as noise and vice versa. (Haixiang et al., 2017). Wide range of Class Imbalanced learning techniques are introduced to overcome the above-mentioned problems, although each has some advantages and shortcomings. This paper provides details on the behavior of a novel imbalanced learning technique Synthetic Informative Minority Over-Sampling (SIMO) Algorithm Leveraging Support Vector Machine (SVM) on small datasets of records less than 200. Base classifiers, Logistic regression and SVM is used to validate the impact of SIMO on classifier’s performance in terms of metrices G-mean and Area Under Curve. A Comparison is derived between SIMO and other algorithms SMOTE, Smote-Borderline, ADAYSN to evaluate performance of SIMO over others.

DOI

10.21427/D71N6B

Share

COinS