Document Type

Theses, Masters

Rights

This item is available under a Creative Commons License for non-commercial use only

Disciplines

Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Dublin Institute of Technology for the degree of M.Sc. in Computing (Data Analytics), March 2015.

Abstract

Mobile phone applications (apps) can generate background traffic when the end-user is not actively using the app. If this background traffic could be accurately identified, network operators could de-prioritise this traffic and free up network bandwidth for priority network traffic. The background app traffic should have IP packet features that could be utilised by a machine learning algorithm to identify app-generated (passive) traffic as opposed to user-generated (active) traffic. Previous research in the area of IP traffic classification focused on classifying high level network traffic types originating on a PC device. This research was concerned with classifying low level app traffic originating on mobile phone device. An innovative experiment setup was designed in order to answer the research question. A mobile phone running Android OS was configured to capture app network data. Three specific data trace procedures where then designed to comprehensively capture sample active and passive app traffic data. Feature generation in previous research recommend computing new features based on IP packet data. This research proposes a different approach. Feature generation was enabled by exposing inherent IP packet attributes as opposed to computing new features. Specific evaluation metrics were also designed in order to quantify the accuracy of the machine learning models at classifying active and passive app traffic. Three decision tree models were implemented; C5.0, C&R tree and CHAID tree. Each model was built using a standard implementation and with boosting. The findings indicate that passive app network traffic can be classified with an accuracy up to 84.8% using a CHAID decision tree algorithm with model boosting enabled. The finding also suggested that features derived from the inherent IP packet attributes, such as time frame delta and bytes in flight, had significant predictive value.

Share

COinS