Measuring and resource-efficient optimization of clustering quality

Loading...
Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Alternative Title(s)

Abstract

Cluster analysis is a fundamental task in exploratory data mining, widely used to uncover hidden structures within datasets across various fields. It has broad applications, from identifying subgroups in gene expression data for disease research to segmenting customer bases in industry. Over time, a diverse range of clustering methods has been developed to handle the complex structure of different data domains. Despite these, key challenges remain, particularly in evaluating the quality of clustering results and optimizing the performance of clustering algorithms. The research presents the Average Medoid Silhouette (AMS), an improved version of the Average Silhouette Width (ASW), and introduces the FastMSC and FasterMSC algorithms, which optimize the AMS directly. The DynMSC algorithm is also proposed to simplify determining the optimal number of clusters. For categorical data, the Average Relative Entropy Score (ARES) and Minimum Relative Entropy Contrast (MREC) are introduced, forming the basis of the CatRED algorithm, an agglomerative hierarchical method applied in information systems research.

Description

Table of contents

Keywords

Cluster analysis, Clustering quality, Clustering evaluation

Subjects based on RSWK

Cluster-Analyse

Citation