Document Type

Theses, Masters


This item is available under a Creative Commons License for non-commercial use only


Computer Sciences

Publication Details

Dublin Institute of Technology, School of Computing College of Science of Health, 2016.


The idea of this project is from a Kaggle competition “Bike Sharing Demand”① which provides dataset of Capital Bikeshare in Washington D.C. and asked to combine historical usage patterns with weather data in order to forecast bike rental demand. This dissertation will extend this work, working with a broader range of project not only just focusing on the phrase of model building but all phases of KDD (Knowledge Discovery in Databases). This dissertation focuses on Citi Bike which is one of the biggest bike share projects in the world, collects Citi Bike data, weather data and holiday data from three different databases, and integrates the data to a model ready format. Four basic predictive models are built and compared using multiple modelling algorithms, five techniques are used to enhance the accuracy of random forest model, and the final model’s RMSLE (with 10-fold cross validation) decreases from 0.499 to 0.265. This paper learns many experience from case study of Kaggle Bike Sharing Demand, and seek to build optimize predictive model with smallest error rate. This project generally answers a question of “How many bikes will meet users’ demand in a future certain time”, the future work of this project will be to focus on each docking station’s activity. The realistic meaning of this dissertation is to provide an overview solution for bike rebalance problem, and helps to better manage Citi Bike program.