Authors: Springer, Tobias
Title: Total frame potential and its applications in data clustering
Language (ISO): en
Abstract: Short time series arise in a variety of fields such as biology or social sciences. For the statistical analysis of microarray gene expression data, the clustering of short time series is an important objective in order to identify subsets of genes sharing a temporal expression pattern. An established method, the Short Time Series Expression Miner (STEM) by Ernst et al. (2005), assigns time series data to the closest of suitably selected prototypes followed by the selection of significant clusters and eventual grouping. For the clustering of normalized d-dimensional data we propose to minimize the penalized frame potential by Springer et al. (2011). The functional contains the "Total Frame Potential" of Finite Unit Norm Tight Frames (FUNTFs), see Benedetto and Fickus (2003), and includes a data-driven component for the selection of prototypes. The idea of using the frame potential in combination with a data-dependent term for optimization was originally proposed by Benedetto, Czaja and Ehler (2010) for finding sparse representations. We show that the solution of the corresponding constrained optimization problem is naturally connected to the spherical Dirichlet cells of the given normalized data. Furthermore, the minimizers of the functional are located in the interior of the Dirichlet cells. The objective function is differentiable in the minimum and satisfies a matrix-valued extremal condition. The general problem is closely related to the search for point configurations on the unit sphere like in Tammes' (1930) or Thomson's Problem (1904). Moreover, the minimization contains connections to problems in matrix completion (see e.g. Candes and Tao (2009) or Mazumder, Hastie and Tibshirani (2010)). The work contains an exhaustive analysis of the proposed functional using methods from calculus, linear algebra and nonlinear programming. Numerical results from the application on real and artificial data are included.
Subject Headings: Data clustering
Finite unit norm tight frames
Frame potential
Gene expression data
Short time series
URI: http://hdl.handle.net/2003/31251
http://dx.doi.org/10.17877/DE290R-7234
Issue Date: 2013-12-06
Appears in Collections:Lehrstuhl VIII Approximationstheorie

Files in This Item:
File Description SizeFormat 
Dissertation.pdfDNB1.29 MBAdobe PDFView/Open


This item is protected by original copyright



This item is protected by original copyright rightsstatements.org