How is a data-driven approach better than random choice in label space division for multi-label classification?

Szymanski, Piotr; Kajdanowicz, Tomasz; Kersting, Kristian

Full metadata record

DC Field	Value	Language
dc.contributor.author	Szymanski, Piotr	-
dc.contributor.author	Kajdanowicz, Tomasz	-
dc.contributor.author	Kersting, Kristian	-
dc.date.accessioned	2019-11-15T12:34:48Z	-
dc.date.available	2019-11-15T12:34:48Z	-
dc.date.issued	2016-07-30	-
dc.identifier.uri	http://hdl.handle.net/2003/38382	-
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-20315	-
dc.description.abstract	We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd. We evaluate modularity-maximizing using fast greedy and leading eigenvector approximations, infomap, walktrap and label propagation algorithms. For this purpose, we propose to construct a label co-occurrence graph (both weighted and unweighted versions) based on training data and perform community detection to partition the label set. Then, each partition constitutes a label space for separate multi-label classification sub-problems. As a result, we obtain an ensemble of multi-label classifiers that jointly covers the whole label space. Based on the binary relevance and label powerset classification methods, we compare community detection methods to label space divisions against random baselines on 12 benchmark datasets over five evaluation measures. We discover that data-driven approaches are more efficient and more likely to outperform RAkELd than binary relevance or label powerset is, in every evaluated measure. For all measures, apart from Hamming loss, data-driven approaches are significantly better than RAkELd ( α=0.05 ), and at least one data-driven approach is more likely to outperform RAkELd than a priori methods in the case of RAkELd’s best performance. This is the largest RAkELd evaluation published to date with 250 samplings per value for 10 values of RAkELd parameter k on 12 datasets published to date.	en
dc.language.iso	en	de
dc.relation.ispartofseries	Entropy;18(8)	-
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	-
dc.subject	Label space clustering	en
dc.subject	Label co-occurence	en
dc.subject	Label grouping	en
dc.subject	Multi-label classification	en
dc.subject	Clustering	en
dc.subject	Machine learning	en
dc.subject	Random k-label sets	en
dc.subject	Ensemble classification	en
dc.subject.ddc	004	-
dc.title	How is a data-driven approach better than random choice in label space division for multi-label classification?	en
dc.type	Text	de
dc.type.publicationtype	article	de
dcterms.accessRights	open access	-
eldorado.secondarypublication	true	de
eldorado.secondarypublication.primaryidentifier	https://doi.org/10.3390/e18080282	de
eldorado.secondarypublication.primarycitation	Szymański, P.; Kajdanowicz, T.; Kersting, K. How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification? Entropy 2016, 18, 282.	de
Appears in Collections:	LS 08 Künstliche Intelligenz

Files in This Item:

File	Description	Size	Format
entropy-18-00282.pdf		2.97 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

This item is licensed under a Creative Commons License