Sonderforschungsbereich (SFB) 876
Permanent URI for this collection
Das Gebiet der eingebetteten Systeme und das der Datenanalyse (Data Mining) zusammenzubringen, ermöglicht eine Fülle von Anwendungen in Informatik, Biomedizin, Physik und Maschinenbau. Einerseits werden die eingebetteten Systeme durch die Datenanalyse optimiert, andererseits können Analysealgorithmen z.B. als FPGAs realisiert werden. Die starken Beschränkungen eingebetteter Systeme in Rechnenkapazität, Speicher und Energie erfordern neue Algorithmen für Lernverfahren. Diese Ressourcen-beschränkten Lernverfahren lassen sich genauso für sehr große Datenmassen auch auf Servern einsetzen.
Browse
Recent Submissions
Item Providing Information by Resource- Constrained Data Analysis(2020-12-31) Morik, Katharina; Rhode, WolfgangThe Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school.Item Providing Information by Resource- Constrained Data Analysis(2019-12-31) Morik, Katharina; Rhode, WolfgangThe Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school.Item Providing Information by Resource- Constrained Data Analysis(2018-12-31) Morik, Katharina; Rhode, WolfgangThe Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school.Item RISE Germany Internship: Applying Deep Learning Methods to the Search for Astrophysical Tau Neutrinos(2017-11) Martin, WilliamItem Feature Selection for High-Dimensional Data with RapidMiner(2011-01) Sangkyun, Lee; Schowe, Benjamin; Sivakumar, Viswanath; Morik, KatharinaFeature selection is an important task in machine learning, reducing dimensionality of learning problems by selecting few relevant features without losing too much information. Focusing on smaller sets of features, we can learn simpler models from data that are easier to understand and to apply. In fact, simpler models are more robust to input noise and outliers, often leading to better prediction performance than the models trained in higher dimensions with all features. We implement several feature selection algorithms in an extension of RapidMiner, that scale well with the number of features compared to the existing feature selection operators in RapidMiner.Item Energy-Efficient GPS-Based Positioning in the Android Operating System(2011-03) Streicher, Jochen; Spincyk, OlafWe present our ongoing collaborative work on EnDroid, an energy-efficient GPS-based positioning system for the Android Operating System. EnDroid is based on the EnTracked positioning system, developed at the University of Aarhus, Denmark. We describe the current prototypical state of our implementation and present our experiences and conclusions from preliminarily evaluating EnDroid on the Google Nexus One Smartphone. Although the preliminary results seem to sup- port the approach, there are still several open questions, both at the application interface, as well as at the hardware management level.Item Probabilistic Graphical Models in RapidMiner(2011-02) Piatkowski, NicoThis Report describes the technical background and usage of the GraphMod plug-in for RapidMiner. The plug-in enables RapidMiner to load factor graphs and interpret Label and Attributes which are contained in an Example as assignments to random variables. A set of examples which belong to the same Batch is treated as assignment to a whole factor graph. New operators allow the estimation of factor weights, the computation of the single-node marginal probability functions and the computation of the most probable assignment for each Labelnode with several methods. All algorithms are optimized for parallel execution on common multi-core processors and NVIDIA CUDA capable many-core processors (also known as Graphics Processing Unit).Item Technical report for Collaborative Research Center SFB 876 - Graduate School(2011-10) Morik, Katharina; Rhode, WolfgangItem Computing on High Performance Clusters with R: Packages BatchJobs and BatchExperiments(2012-01) Bischl, Bernd; Lang, Michel; Mersmann, Olaf; Rahnenführer, Jörg; Weihs, ClausEmpirical analysis of statistical algorithms often demands time-consuming experiments which are best performed on high performance computing clusters. We present two R packages which greatly simplify working in batch computing environments. The package BatchJobs implements the basic objects and procedures to control a batch cluster within R. It is structured around cluster versions of the well-known higher order functions Map, Reduce and Filter from functional programming. An important feature is that the state of computation is persistently available in a database. The user can query the status of jobs and then continue working with a desired subset. The second package, BatchExperiments, is tailored for the still very general scenario of analyzing arbitrary algorithms on problem instances. It extends BatchJobs by letting the user define an array of jobs of the kind “apply algorithm A to problem instance P and store results”. It is possible to associate statistical designs with parameters of algorithms and problems and therefore to systematically study their influence on the results. In general our main contributions are: (a) Portability : Both packages use a clear and well-defined interface to the batch system which makes them applicable in most high-performance computing environments. (b) Reproducibility: Every computational part has an associated seed that the user can control to ensure reproducibility even when the underlying batch system changes. (c) Efficiency: Efficiently use batch computing clusters completely within R. (d) Abstraction and good software design: The code layers for algorithms, experiment definitions and execution are cleanly separated and enable the writing of readable and maintainable code.Item Technical report for Collaborative Research Center SFB 876 - Graduate School(2012-09) Morik, Katharina; Rhode, WolfgangItem Optimization plugin for RapidMiner(2012-04) Umaashankar, Venkatesh; Sangkyun, LeeOptimization in general means selecting a best choice out of various alternatives, which reduces the cost or disadvantage of an objective. Optimization problems are very popular in the fields such as economics, finance, logistics, etc. Optimization is a science of its own and machine learning or data mining is a diverse growing field which applies techniques from various other areas to find useful insights from data. Many of the machine learning problems can be modelled and solved as optimization problems, which means optimization already provides a set of well established methods and algorithms to solve machine learning problems. Due to the importance of optimization in machine learning, in recent times, machine learning researchers are contributing remarkable improvements in the field of optimization. We implement several popular optimization strategies and algorithms as a plugin for RapidMiner, which adds an optimization tool kit to the list of existing arsenal of operators in RapidMiner.Item The Streams Framework(2012) Bockermann, Christian; Blom, HendrikIn this report, we present the streams library, a generic Java-based library for designing data stream processes. The streams library defines a simple abstraction layer for data processing and provides a small set of online algorithms for counting and classification. Moreover it integrates existing libraries such as MOA. Processes are defined in XML files following the semantics and ideas of well established tools like Ant, Maven or the Spring Framework. The streams library can be easily embedded into existing software, used as a standalone tool or be used to define compute graphs that are executed on other back end systems such as the Stormstream engine. This report reflects the status of the streams framework in version 0.9.6. As the framework is continuously enhanced, the report is extended along. The most recent version of this report is available online.Item Measuring the Power Consumption of Smartphones(2012-03) Manning-Dahan, Tyler; Putzke, Markus; Wietfeld, ChristianSmartphones are becoming a part of everyday life and as such, a better understanding of hardware and software power consumption is crucial to develop more efficient smartphones. In order to extend battery life, application developers and phone designers must become aware of the limitations of a phone’s CPU power, as well as the LCD display consumption and connectivity via WiFi, 3G, and GPS systems. We present power consumption measurements of an HTC Incredible S and compare these results to known analytical models. The evaluation shows that power consumption is considerably varying with different types of smartphones and that well known models underestimate the actual consumption. The results illustrate that touching the screen nearly doubles the power consumption , which is not captured by any analytical model. Moreover, we present in which way the transmitted packet size of WiFi and cellular communications affect the power consumption.Item Unimodal regression using Bernstein-Schoenberg-splines and penalties(2012-06) Köllmann, Claudia; Bornkamp, Björn; Ickstadt, KatjaResearch in the field of nonparametric shape constrained regression has been intensive. However, only few publications explicitly deal with unimodality although there is need for such methods in applications, for example, in dose-response analysis. In this paper we propose unimodal spline regression methods that make use of Bernstein-Schoenberg-splines and their shape preservation property. To achieve unimodal and smooth solutions we use penalized splines, and extend the penalized spline approach towards penalizing against general parametric functions, instead of using just difference penalties. For tuning parameter selection under a unimodality constraint a restricted maximum likelihood and an alternative Bayesian approach for unimodal regression are developed. We compare the proposed methodologies to other common approaches in a simulation study and apply it to a dose-response data set. All results suggest that the unimodality constraint or the combination of unimodality and a penalty can substantially improve estimation of the functional relationship.Item Preserving Confidentiality in Multiagent Systems - An Internship Project within the DAAD RISE Program(2013-05) Dilger, Daniel; Krümpelmann, Patrick; Tadros, CorneliaRISE (Research Internships in Science and Engineering) is a summer internship program for undergraduate students from the United States, Canada and the UK organized by the DAAD (Deutscher Akademischer Austausch Dienst). Within the project A5 in the Collaborative Research Center SFB 876, we have planned and conducted an internship project in the RISE program that should support our research. Daniel Dilger was the intern and has been supervised by the PhD students Patrick Krümpelmann and Cornelia Tadros. The aim was to model an application scenario for our prototype implementation of a confidentiality preserving multiagent system and to run experiments with that prototype.Item Technical report for Collaborative Research Center SFB 876 - Graduate School(2013-10) Morik, Katharina; Rhode, WolfgangItem RobPer: An R Package to Calculate Periodograms for Light Curves Based On Robust Regression(2013-02) Thieler, Anita Monika; Fried, Roland; Rathjens, JonathanAn important task in astroparticle physics is the detection of periodicities in irregularly sampled time series, called light curves. The classic Fourier periodogram cannot deal with irregular sampling and with the measurement accuracies that are typically given for each observation of a light curve. Hence, methods to fit periodic functions using weighted regression were developed in the past to calculate periodograms. We present the R Package RobPer which allows to combine different periodic functions and regression techniques to calculate periodograms. Possible regression techniques are least squares, least absolute deviation, least trimmed, M-, S- and {\tau} -regression. Measurement accuracies can be taken into account including weights. Our periodogram function covers most of the attempts that have been tried earlier and provides new model-regression-combinations that have not been used before. To detect valid periods, we apply an outlier search on the periodogram instead of using fixed critical values that are theoretically only justified in case of least squares regression, independent periodogram bars and a null hypothesis allowing only normal white noise. This outlier search can be performed using RobPer as well. Finally, the package also includes a generator to generate artificial light curves e.g., for simulation studies.Item Preprocessing of Affymetrix Exon Expression Arrays(2013-03) Sangkyun, Lee; Schramm, AlexanderThe activity of genes can be captured by measuring the amount of messenger RNAs transcribed from the genes, or from their subunits called exons. In our study, we use the Affymetrix Human Exon ST v1.0 micro arrays to measure the activity of exon s in Neuroblastoma cancer patients. The purpose is to discover a small number of genes or exons that play important roles in differentiating high - risk patients fro m low - risk counterparts. Although the technology has been improved for the past 15 years, array measurements still can be contaminated by various factors, including human error. Since the number of arrays is often only few hundreds, atypical errors can hardly be canceled by large numbers of normal arrays. In this article we describe how we filter out low - quality arrays in a principled way, so that we can obtain more reliable results in downstream analyses.Item A Survey of the Stream Processing Landscape(2014-05) Bockermann, ChristianThe continuous processing of streaming data has become an important aspect in many applications. Over the last years a variety of different streaming platforms has been developed and a number of open source frameworks is available for the implementation of streaming applications. In this report, we will survey the landscape of existing streaming platforms. Starting with an overview of the evolving developments in the recent past, we will discuss the requirements of modern streaming architectures and present the ways these are approached by the different frameworks.Item Random projections for Bayesian regression(2014-04) Geppert, Leo N.; Ickstadt, Katja; Munteanu, Alexander; Sohler, ChristianThis article introduces random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire d -dimensional distribution is preserved under random projections by reducing the number of data points from n to k element of O(poly(d/epsilon)) in the case n >> d . Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a (1+ O(epsilon))-approximation in the l_2-Wasserstein distance. Our main result states that the posterior distribution of a Bayesian linear regression is approximated up to a small error depending on only an epsilon-fraction of its defining parameters when using either improper non-informative priors or arbitrary Gaussian priors. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model while considerably reducing the total run-time.