Cost-constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms
Loading...
Date
2020-01-28
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Background: With modern methods in biotechnology, the search for biomarkers has advanced to a challenging
statistical task exploring high dimensional data sets. Feature selection is a widely researched preprocessing step to
handle huge numbers of biomarker candidates and has special importance for the analysis of biomedical data. Such
data sets often include many input features not related to the diagnostic or therapeutic target variable. A less
researched, but also relevant aspect for medical applications are costs of different biomarker candidates. These costs
are often financial costs, but can also refer to other aspects, for example the decision between a painful biopsy marker
and a simple urine test. In this paper, we propose extensions to two feature selection methods to control the total
amount of such costs: greedy forward selection and genetic algorithms. In comprehensive simulation studies of
binary classification tasks, we compare the predictive performance, the run-time and the detection rate of relevant
features for the new proposed methods and five baseline alternatives to handle budget constraints.
Results: In simulations with a predefined budget constraint, our proposed methods outperform the baseline
alternatives, with just minor differences between them. Only in the scenario without an actual budget constraint, our
adapted greedy forward selection approach showed a clear drop in performance compared to the other methods.
However, introducing a hyperparameter to adapt the benefit-cost trade-off in this method could overcome this
weakness.
Conclusions: In feature cost scenarios, where a total budget has to be met, common feature selection algorithms are
often not suitable to identify well performing subsets for a modelling task. Adaptations of these algorithms such as
the ones proposed in this paper can help to tackle this problem.
Description
Table of contents
Keywords
Feature cost, Genetic algorithm, Budget constraint, Cost limit, Feature selection