Modelling with feature costs under a total cost budget constraint

Jagdhuber, Rudolf

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Rahnenführer, Jörg	-
dc.contributor.author	Jagdhuber, Rudolf	-
dc.date.accessioned	2020-11-06T08:43:56Z	-
dc.date.available	2020-11-06T08:43:56Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://hdl.handle.net/2003/39806	-
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-21697	-
dc.description.abstract	In modern high-dimensional data sets, feature selection is an essential pre-processing step for many statistical modelling tasks. The field of cost-sensitive feature selection extends the concepts of feature selection by introducing so-called feature costs. These do not necessarily relate to financial costs, but can be seen as a general construct to numerically valuate any disfavored aspect of a feature, like for example the run-time of a measurement procedure, or the patient harm of a biomarker test. There are multiple ideas to define a cost-sensitive feature selection setup. The strategy applied in this thesis is to introduce an additive cost-budget as an upper bound of the total costs. This extends the standard feature selection problem by an additional constraint on the sum of costs for included features. Main areas of research in this field include adaptations of standard feature selection algorithms to account for this additional constraint. However, cost-aware selection criteria also play an important role for the overall performance of these methods and need to be discussed in detail as well. This cumulative dissertation summarizes the work of three papers in this field. Two of these introduce new methods for cost-sensitive feature selection with a fixed budget constraint. The other discusses a common trade-off criterion of performance and cost. For this criterion, an analysis of the selection outcome in different setups revealed a reduction of the ability to distinguish between information and noise. This can for example be counteracted by introducing a hyperparameter in the criterion. The presented research on new cost-sensitive methods comprises adaptations of Greedy Forward Selection, Genetic Algorithms, filter approaches and a novel Random Forest based algorithm, which selects individual trees from a low-cost tree ensemble. Central concepts of each method are discussed and thorough simulation studies to evaluate individual strengths and weaknesses are provided. Every simulation study includes artificial, as well as real-world data examples to validate results in a broad context. Finally, all chapters present discussions with practical recommendations on the application of the proposed methods and conclude with an outlook on possible further research for the respective topics.	en
dc.language.iso	en	de
dc.subject	Feature costs	en
dc.subject	Cost-sensitive learning	en
dc.subject	Feature selection	en
dc.subject	Random forest	en
dc.subject.ddc	310	-
dc.title	Modelling with feature costs under a total cost budget constraint	en
dc.type	Text	de
dc.contributor.referee	Ligges, Uwe	-
dc.date.accepted	2020-10-28	-
dc.type.publicationtype	doctoralThesis	de
dc.subject.rswk	Statistisches Modell	de
dc.subject.rswk	Merkmalsextraktion	de
dc.subject.rswk	Modellauswahl	de
dc.subject.rswk	Maschinelles Lernen	de
dcterms.accessRights	open access	-
eldorado.secondarypublication	false	de
Appears in Collections:	Statistische Methoden in der Genetik und Chemometrie

Files in This Item:

File	Description	Size	Format
Jagdhuber_Dissertation.pdf	DNB	854.26 kB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

This item is protected by original copyright rightsstatements.org