Bernholt, ThorstenIckstadt, KatjaNunkesser, RobinSchwender, HolgerWegener, Ingo2007-07-132007-07-132007-07-13http://hdl.handle.net/2003/2444110.17877/DE290R-14487Motivation: Not individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as can- cer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these high-order interactions. This search is ad- ditionally impeded by the fact that these interactions often are only explanatory for a relatively small subgroup of patients. Most of the feature selection methods proposed in the literature, unfortunately, fail at this task, since they can either only identify individ- ual variables or interactions of a low order, or try to find rules that are explanatory for a high percentage of the observations. In this paper, we present a procedure based on genetic programming and multi-valued logic that enables the identification of high-order interactions of categorical variables such as SNPs. This method called GPAS (Genetic Programming for Association Studies) cannot only be used for feature selection, but can also be employed for discrimination. Results: In an application to the genotype data from the GENICA study, an associa- tion study concerned with sporadic breast cancer, GPAS is able to identify high-order interactions of SNPs leading to a considerably increased breast cancer risk for different subsets of patients that are not found by other feature selection methods. As an applica- tion to a subset of the HapMap data shows, GPAS is not restricted to association studies comprising several ten SNPs, but can also be employed to analyze whole-genome data.enGenetic association studyGenetic programmingGenetic Programming for Association StudyGPASHigh-order interactionMulti-valued logicSingle nucleotide polymorphismSNP004Detecting high-order interactions of single nucleotide polymorphisms using genetic programmingreport