Model and Algorithm Selection in Statistical Learning and Optimization.
Loading...
Files
Date
2014-02-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Modern data-driven statistical techniques, e.g., non-linear classification and
regression machine learning methods, play an increasingly important role in applied data analysis
and quantitative research. For real-world we do not know
a priori which methods will work best. Furthermore, most of the available models depend on
so called hyper- or control parameters, which can drastically influence their performance.
This leads to a vast space of potential models, which cannot be explored exhaustively.
Modern optimization techniques, often either evolutionary or model-based, are employed to speed up
this process.
A very similar problem occurs in continuous and discrete optimization and, in general,
in many other areas where problem instances are solved by algorithmic approaches: Many competing
techniques exist, some of them heavily parametrized. Again, not much knowledge
exists, how, given a certain application, one makes the correct choice here.
These general problems are called algorithm selection and algorithm configuration. Instead of relying on
tedious, manual trial-and-error, one should rather employ available computational power
in a methodical fashion to obtain an appropriate algorithmic choice, while supporting this
process with machine-learning techniques to discover and exploit as much of the
search space structure as possible.
In this cumulative dissertation I summarize nine papers that deal with the problem of model and
algorithm selection in the areas of machine learning and optimization. Issues in benchmarking,
resampling, efficient model tuning, feature selection and automatic algorithm selection are addressed and
solved using modern techniques. I apply these methods to tasks from engineering, music data analysis
and black-box optimization.
The dissertation concludes by summarizing my published R packages for such tasks and specifically
discusses two packages for parallelization on high performance computing clusters and parallel statistical
experiments.
Description
Table of contents
Keywords
model selection, algorithm selection, algorithm configuration, tuning, benchmarking, machine learning