Coresets and streaming algorithms for the k-means problem and related clustering objectives

Schmidt, Melanie

Coresets and streaming algorithms for the k-means problem and related clustering objectives

dc.contributor.advisor	Sohler, Christian
dc.contributor.author	Schmidt, Melanie
dc.contributor.referee	Blömer, Johannes
dc.date.accepted	2014-12-17
dc.date.accessioned	2015-05-28T08:50:55Z
dc.date.available	2015-05-28T08:50:55Z
dc.date.issued	2014
dc.description.abstract	The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For input points P from the Euclidean space R^d and any solution consisting of k centers from R^d, the cost is the sum of the squared distances of any point to its closest center. This thesis studies concepts used for large input point sets. For inputs with many points, the term coreset refers to a reduced version with less but weighted points. For inputs with high-dimensional points, dimensionality reduction is used to reduce the number of dimensions. In both cases, the reduced version has to maintain the cost function up to an epsilon-fraction for all choices of k centers. We study coreset constructions and dimensionality reductions for the k-means problem. Further, we develop coreset constructions in the data stream model. Here, the data is so large that it should only be read once and cannot be stored in main memory. The input might even arrive as a stream of points in an arbitrary order. Thus, a data stream algorithm has to continuously process the input while it arrives and can only store small summaries. In the second part of the thesis, the obtained results are extended to related clustering objectives. Projective clustering minimizes the squared distances to k subspaces instead of k points. Kernel k-means is an extension of k-means to scenarios where the target clustering is not linearly separable. In addition to extensions to these objectives, we study coreset constructions for a probabilistic clustering problem where input points are given as distributions over a finite set of locations.	en
dc.identifier.uri	http://hdl.handle.net/2003/34099
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-43
dc.language.iso	en	de
dc.subject	K-means clustering	en
dc.subject	Kernmengen	en
dc.subject	Big data	en
dc.subject	Dimensionsreduktion	de
dc.subject.ddc	004
dc.title	Coresets and streaming algorithms for the k-means problem and related clustering objectives	en
dc.type	Text	de
dc.type.publicationtype	doctoralThesis	de
dcterms.accessRights	open access
eldorado.dnb.deposit	true	de

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1

Name:: Dissertation.pdf
Größe:: 3.22 MB
Format:: Adobe Portable Document Format
Beschreibung:: DNB

Herunterladen

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 3.12 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

Herunterladen

Sammlungen

LS 02 Komplexitätstheorie und Effiziente Algorithmen