Cluster Analysis

Ickstadt, Katja; Müller, Tina; Selinski, Silvia

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ickstadt, Katja	de
dc.contributor.author	Müller, Tina	de
dc.contributor.author	Selinski, Silvia	de
dc.date.accessioned	2005-05-31T10:57:22Z	-
dc.date.available	2005-05-31T10:57:22Z	-
dc.date.issued	2005	de
dc.identifier.uri	http://hdl.handle.net/2003/21354	-
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-6830	-
dc.description.abstract	The issue of suitable similarity measures for a particular kind of genetic data - so called SNP data - arises, e.g., from the GENICA (The Interdisciplinary Study Group on Gene Environmental Interactions and Breast Cancer in Germany) case-control study of sporadic breast cancer. The GENICA study aims to investigate the influence and interaction of single nucleotide polymorphic (SNP) loci and exogenous risk factors. It is very unlikely that there exists one main effect, say only one polymorphism, being responsible for such a complex disease as sporadic breast cancer as the role of a single gene within the carcinogenic process is limited (Garte, 2001). Nevertheless, it is assumed that a number of interacting SNPs in combination with certain environmental risk factors increase the individual susceptibility. The search for SNP patterns in the present data set may be performed by a variety of clustering and classification approaches. Here we consider the problem of adequate similarity measures for variables or subjects as an indispensable basis for a further cluster analysis. The term ’similarity’ is still vague for SNP data. A main problem arises by the general structure of such data sets: the proportion of hetero- or homozygous SNPs is rather small compared with the homozygous reference sequence. Thus, the relevant information of combinations of genetic alterations is often masked by a huge amount of common occurrences of homozygous reference types. Therefore, we examine different similarity measures, conventional ones as well as new coefficients which we created especially for SNP data. Furthermore, we compare the resulting partitions with each other adapting the clustering of clustering methods of Rand (1971) for different similarity measures.	en
dc.format.extent	175409 bytes	-
dc.format.extent	563804 bytes	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/postscript	-
dc.language.iso	en	de
dc.publisher	Universität Dortmund	de
dc.subject	cluster analysis	en
dc.subject	clustering methods	en
dc.subject	GENICA	en
dc.subject	similarity	en
dc.subject	single nucleotide polymorphism	en
dc.subject	sporadic breast cancer	en
dc.subject.ddc	310	de
dc.title	Cluster Analysis	en
dc.title.alternative	A Comparision of Different Similarity Measures for SNP Data	en
dc.type	Text	de
dc.type.publicationtype	report	en
dcterms.accessRights	open access	-
Appears in Collections:	Sonderforschungsbereich (SFB) 475

Files in This Item:

File	Description	Size	Format
14_05.pdf	DNB	171.3 kB	Adobe PDF	View/Open
14_05.ps		550.59 kB	Postscript	View/Open

This item is protected by original copyright

Show simple item record

This item is protected by original copyright rightsstatements.org