Prediction of bike-sharing trip counts: comparing parametric spatial regression models to a geographically weighted XGBoost algorithm

Abstract

Regression models are commonly applied in the analysis of transportation data. This research aims at broadening the range of methods used for this task by modeling the spatial distribution of bike-sharing trips in Cologne, Germany, applying both parametric regression models and a modified machine learning approach while incorporating measures to account for spatial autocorrelation. Independent variables included in the models consist of land use types, elements of the transport system and sociodemographic characteristics. Out of several regression models with different underlying distributions, a Tweedie generalized additive model is chosen by its values for AIC, RMSE, and sMAPE to be compared to an XGBoost model. To consider spatial relationships, spatial splines are included in the Tweedie model, while the estimations of the XGBoost model are modified using a geographically weighted regression. Both methods entail certain advantages: while XGBoost leads to far better values regarding RMSE and sMAPE and therefore to a better model fit, the Tweedie model allows an easier interpretation of the influence of the independent variables including spatial effects.

Description

Table of contents

Keywords

Citation