Measuring and resource-efficient optimization of clustering quality

Lenssen, Lars Filipp

Measuring and resource-efficient optimization of clustering quality

Files

Dissertation_Lenssen.pdf (3.76 MB)

Date

2025

Authors

Lenssen, Lars Filipp

Abstract

Cluster analysis is a fundamental task in exploratory data mining, widely used to uncover hidden structures within datasets across various fields. It has broad applications, from identifying subgroups in gene expression data for disease research to segmenting customer bases in industry. Over time, a diverse range of clustering methods has been developed to handle the complex structure of different data domains. Despite these, key challenges remain, particularly in evaluating the quality of clustering results and optimizing the performance of clustering algorithms. The research presents the Average Medoid Silhouette (AMS), an improved version of the Average Silhouette Width (ASW), and introduces the FastMSC and FasterMSC algorithms, which optimize the AMS directly. The DynMSC algorithm is also proposed to simplify determining the optimal number of clusters. For categorical data, the Average Relative Entropy Score (ARES) and Minimum Relative Entropy Contrast (MREC) are introduced, forming the basis of the CatRED algorithm, an agglomerative hierarchical method applied in information systems research.

Keywords

Cluster analysis, Clustering quality, Clustering evaluation

Subjects based on RSWK

Cluster-Analyse

URI

http://hdl.handle.net/2003/43678
http://dx.doi.org/10.17877/DE290R-25451

Collections

LS 08 Künstliche Intelligenz

Full item page

Measuring and resource-efficient optimization of clustering quality

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Alternative Title(s)

Abstract

Description

Table of contents

Keywords

Subjects based on RSWK

Citation

URI

Collections