Eldorado Community:
http://hdl.handle.net/2003/9
2020-02-17T19:50:25ZProviding Information by Resource- Constrained Data Analysis
http://hdl.handle.net/2003/38571
Title: Providing Information by Resource- Constrained Data Analysis
Authors: Morik, Katharina; Rhode, Wolfgang
Abstract: The Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school.2020-02-14T15:13:47ZExplicit results on conditional distributions of generalized exponential mixtures
http://hdl.handle.net/2003/38570
Title: Explicit results on conditional distributions of generalized exponential mixtures
Authors: Klüppelberg, Claudia; Seifert, Miriam Isabel
Abstract: For independent exponentially distributed random variables Xi, i ∈ N with distinct rates λi we consider sums ∑i∈AXi for A⊆N which follow generalized exponential mixture (GEM) distributions. We provide novel
explicit results on the conditional distribution of the total sum ∑i∈NXi giventhat a subset sum
∑j∈NXj exceeds a certain threshold value t > 0, and vice versa. Moreover, we investigate the characteristic tail behavior of these conditional distributions for t → ∞,. Finally, we illustrate how our probabilistic results can be applied in practice by providing examples
from both reliability theory and risk management.2020-02-14T15:10:11ZPrediction in locally stationary time series
http://hdl.handle.net/2003/38530
Title: Prediction in locally stationary time series
Authors: Dette, Holger; Wu, Weichi
Abstract: We develop an estimator for the high-dimensional covariance matrix of a locally
stationary process with a smoothly varying trend and use this statistic to derive consistent
predictors in non-stationary time series. In contrast to the currently available
methods for this problem the predictor developed here does not rely on fitting an
autoregressive model and does not require a vanishing trend. The finite sample properties
of the new methodology are illustrated by means of a simulation study and a
data example.2020-01-17T15:20:09ZDetecting structural breaks in eigensystems of functional time series
http://hdl.handle.net/2003/38386
Title: Detecting structural breaks in eigensystems of functional time series
Authors: Dette, Holger; Kutta, Tim
Abstract: Detecting structural changes in functional data is a prominent topic in statistical
literature. However not all trends in the data are important in applications, but only
those of large enough in
uence. In this paper we address the problem of identifying
relevant changes in the eigenfunctions and eigenvalues of covariance kernels of L^2[0; 1]-
valued time series. By self-normalization techniques we derive pivotal, asymptotically
consistent tests for relevant changes in these characteristics of the second order structure
and investigate their finite sample properties in a simulation study. The applicability of
our approach is demonstrated analyzing German annual temperature data.2019-11-19T12:07:46ZEquivalence tests for binary efficacy-toxicity responses
http://hdl.handle.net/2003/38379
Title: Equivalence tests for binary efficacy-toxicity responses
Authors: Möllenhoff, Kathrin; Dette, Holger; Bretz, Frank
Abstract: Clinical trials often aim to compare a new drug with a reference treatment in terms of efficacy and/or toxicity depending on covariates such as, for example, the dose level of the drug. Equivalence of these treatments can be claimed if the difference in average outcome is below a certain threshold over the covariate range. In this paper we assume that the efficacy and toxicity of the treatments are measured as binary outcome variables and we address two problems. First, we develop a new test procedure for the assessment of equivalence of two treatments over the entire covariate range for a single binary endpoint. Our approach is based on a parametric bootstrap, which generates data under the constraint that the distance between the curves is equal to the pre-speciﬁed equivalence threshold. Second, we address equivalence for bivariate binary (correlated) outcomes by extending the previous approach for a univariate response. For this purpose we use a 2-dimensional Gumbel model for binary efficacy-toxicity responses. We investigate the operating characteristics of the proposed approaches by means of a simulation study and present a case study as an illustration.2019-11-13T13:37:14ZConvergence of spectral density estimators in the locally stationary framework
http://hdl.handle.net/2003/38260
Title: Convergence of spectral density estimators in the locally stationary framework
Authors: Kawka, Rafael
Abstract: Locally stationary processes are characterised by spectral densities that are functions
of rescaled time. We study the asymptotic properties of spectral density
estimators in the locally stationary framework. In particular, we show that for a
locally stationary process with time-varying spectral density function f(u; ) standard
spectral density estimators consistently estimate the time-averaged spectral
density R 1 0 f(u; ) du. This result is complemented by some illustrative examples
and applications including HAC-inference in the multiple linear regression model
and a simple visual tool for the detection of unconditional heteroskedasticity.2019-10-02T14:28:23ZSteuer versus Emissionshandel: Optionen für die Ausgestaltung einer CO2-Bepreisung
http://hdl.handle.net/2003/38259
Title: Steuer versus Emissionshandel: Optionen für die Ausgestaltung einer CO2-Bepreisung
Authors: Frondel, Manuel
Abstract: Nach Auffassung von Ökonomen können die Treibhausgase in
Europa am kosteneffizientesten dadurch vermieden werden, dass der bislang auf die
Energiewirtschaft und die Industrie beschränkte EU-Emissionshandel auf alle noch nicht
darin integrierten Sektoren ausgeweitet wird. Allerdings müssen für die Ausweitung des
Emissionshandels Mehrheiten in der Europäischen Union gefunden werden. Solange diese
Ausweitung nicht die Zustimmung aller Mitgliedsstaaten findet, könnte die Einführung
einer nationalen CO2-Bepreisung in diesen Sektoren erwogen und im Prinzip auf zwei
Wegen umgesetzt werden: über einen Emissionshandel, entweder separat als nationales
Handelssystem etabliert oder durch einen Opt-in der noch nicht integrierten Sektoren
Deutschlands in den bestehenden EU-Emissionshandel, oder mittels Einführung einer
nationalen CO2-Steuer. Die in diesem Beitrag vorgenommene Abwägung der Vor- und
Nachteile beider Optionen, CO2-Steuer versus Emissionshandel, zeigt, dass eine CO2-
Steuer gravierende Nachteile aufweist, allen voran die mangelnde Treffsicherheit bei der
Erreichung vorgegebener Emissionsziele.2019-10-02T14:27:25ZCognitive reflection and the valuation of energy efficiency
http://hdl.handle.net/2003/38258
Title: Cognitive reflection and the valuation of energy efficiency
Authors: Andor, Mark A.; Frondel, Manuel; Gerster, Andreas; Sommer, Stephan
Abstract: Based on a stated-choice experiment among about 3,600 German household
heads on the purchase of electricity-using durables, this paper explores the impact
of cognitive reflection on consumers’ valuation of energy efficiency, as well as its
interaction with consumers’ response to the EU energy label. Using a standard
cognitive reflection test, our results indicate that consumers with low cognitive
reflection scores value energy efficiency less than those with high scores. Furthermore,
we find that consumers with a low level of cognitive reflection respond more
strongly to grade-like energy efficiency classes than to detailed information on
annual energy use.2019-10-02T14:24:46ZTwo-sample tests for relevant differences in the eigenfunctions of covariance operators
http://hdl.handle.net/2003/38256
Title: Two-sample tests for relevant differences in the eigenfunctions of covariance operators
Authors: Aue, Alexander; Dette, Holger; Rice, Gregory
Abstract: This paper deals with two-sample tests for functional time series data, which have become widely
available in conjunction with the advent of modern complex observation systems. Here, particular interest
is in evaluating whether two sets of functional time series observations share the shape of their primary
modes of variation as encoded by the eigenfunctions of the respective covariance operators. To this end,
a novel testing approach is introduced that connects with, and extends, existing literature in two main
ways. First, tests are set up in the relevant testing framework, where interest is not in testing an exact
null hypothesis but rather in detecting deviations deemed sufficiently relevant, with relevance determined
by the practitioner and perhaps guided by domain experts. Second, the proposed test statistics rely on
a self-normalization principle that helps to avoid the notoriously difficult task of estimating the long-run
covariance structure of the underlying functional time series. The main theoretical result of this paper is
the derivation of the large-sample behavior of the proposed test statistics. Empirical evidence, indicating
that the proposed procedures work well in finite samples and compare favorably with competing methods,
is provided through a simulation study, and an application to annual temperature data.2019-10-02T13:48:07ZA generalized method of moments estimator for structural vector autoregressions based on higher moments
http://hdl.handle.net/2003/38224
Title: A generalized method of moments estimator for structural vector autoregressions based on higher moments
Authors: Keweloh, Alexander Sascha
Abstract: I propose a generalized method of moments estimator for structural vector
autoregressions with independent and non-Gaussian shocks. The shocks are
identified by exploiting information contained in higher moments of the
data. Extending the standard identification approach, which relies on the
covariance, to the coskewness and cokurtosis allows to identify and
estimate the simultaneous interaction without any further restrictions. I
analyze the finite sample properties of the estimator and apply it to
illustrate the simultaneous interaction between economic activity, oil and
stock prices.2019-09-11T15:31:03ZEfficient model-based bioequivalence testing
http://hdl.handle.net/2003/38213
Title: Efficient model-based bioequivalence testing
Authors: Möllenhoff, Kathrin; Loingeville, Florence; Bertrand, Julie; Nguyen, Thu Thuy; Sharan, Satish; Sun, Guoying; Grosser, Stella; Zhao, Liang; Fang, Lanyan; Mentré, France; Dette, Holger
Abstract: The classical approach to analyze pharmacokinetic (PK) data in bioequivalence studies
aiming to compare two different formulations is to perform noncompartmental analysis
(NCA) followed by two one-sided tests (TOST). In this regard the PK parameters AUC
and Cmax are obtained for both treatment groups and their geometric mean ratios are
considered. According to current guidelines by the U.S. Food and Drug Administration
and the European Medicines Agency the formulations are deemed to be similar if the
90%- confidence interval for these ratios falls between 0:8 and 1:25. As NCA is not a
reliable approach in case of sparse designs, a model-based alternative has already been
proposed for the estimation of AUC and Cmax using non-linear mixed effects models.
Here we propose another test than the TOST, called BOT, and evaluate it through a
simulation study both for NCA and model-based approaches. For products with high
variability on PK parameters, this method appears to have closer type I errors to the
conventionally accepted significance level of 0:05, suggesting its potential use in situations
where conventional bioequivalence analysis is not applicable.2019-09-10T09:07:58ZA note on Herglotz’s theorem for time series on function spaces
http://hdl.handle.net/2003/38207
Title: A note on Herglotz’s theorem for time series on function spaces
Authors: van Delft, Anne; Eichler, Michael
Abstract: In this article, we prove Herglotz’s theorem for Hilbert-valued time series. This requires the notion of an operator-valued measure, which we shall make precise for our setting. Herglotz’s theorem for functional time series allows to generalize existing results that are central to frequency domain analysis on the function space. In particular, we use this result to prove the existence of a functional Cramér representation of a large class of processes, including those with jumps in the spectral distribution and long-memory processes. We furthermore obtain an optimal ﬁnite dimensional reduction of the time series under weaker assumptions than available in the literature. The results of this paper therefore enable Fourier analysis for processes of which the spectral density operator does not necessarily exist.2019-09-06T13:29:27ZTesting for stationarity of functional time series in the frequency domain
http://hdl.handle.net/2003/38206
Title: Testing for stationarity of functional time series in the frequency domain
Authors: Aue, Alexander; van Delft, Anne
Abstract: Interest in functional time series has spiked in the recent past with papers covering both methodology and applications being published at a much increased pace. This article contributes to the research in this area by proposing a new stationarity test for functional time series based on frequency domain methods. The proposed test statistics is based on joint dimension reduction via functional principal components analysis across the spectral density operators at all Fourier frequencies, explicitly allowing for frequency-dependent levels of truncation to adapt to the dynamics of the underlying functional time series. The properties of the test are derived both under the null hypothesis of stationary functional time series and under the smooth alternative of locally stationary functional time series. The methodology is theoretically justiﬁed through asymptotic results. Evidence from simulation studies and an application to annual temperature curves suggests that the test works well in ﬁnite samples.2019-09-06T13:27:32ZA note on quadratic forms of stationary functional time series under mild conditions
http://hdl.handle.net/2003/38205
Title: A note on quadratic forms of stationary functional time series under mild conditions
Authors: van Delft, Anne
Abstract: We study the distributional properties of a quadratic form of a stationary functional time series under mild moment conditions. As an important application, we obtain consistency rates of estimators of spectral density operators and prove joint weak convergence to a vector of complex Gaussian random operators. Weak convergence is established based on an approximation of the form via transforms of Hilbert-valued martingale difference sequences. As a side-result, the distributional properties of the long-run covariance operator are established.2019-09-06T13:25:34ZSampling distributions of optimal portfolio weights and characteristics in low and large dimensions
http://hdl.handle.net/2003/38204
Title: Sampling distributions of optimal portfolio weights and characteristics in low and large dimensions
Authors: Bodnar, Taras; Dette, Holger; Parolya, Nestor; Thorsén, Erik
Abstract: Optimal portfolio selection problems are determined by the (unknown) parameters of
the data generating process. If an investor want to realise the position suggested by the
optimal portfolios he/she needs to estimate the unknown parameters and to account the
parameter uncertainty into the decision process. Most often, the parameters of interest
are the population mean vector and the population covariance matrix of the asset re
turn distribution. In this paper we characterise the exact sampling distribution of the
estimated optimal portfolio weights and their characteristics by deriving their sampling
distribution which is present in terms of a stochastic representation. This approach pos
sesses several advantages, like (i) it determines the sampling distribution of the estimated
optimal portfolio weights by expressions which could be used to draw samples from this
distribution efficiently; (ii) the application of the derived stochastic representation pro
vides an easy way to obtain the asymptotic approximation of the sampling distribution.
The later property is used to show that the high-dimensional asymptotic distribution
of optimal portfolio weights is a multivariate normal and to determine its parameters.
Moreover, a consistent estimator of optimal portfolio weights and their characteristics
is derived under the high-dimensional settings. Via an extensive simulation study, we
investigate the ﬁnite-sample performance of the derived asymptotic approximation and
study its robustness to the violation of the model assumptions used in the derivation of
the theoretical results.2019-09-06T13:23:21ZIdentifying shifts between two regression curves
http://hdl.handle.net/2003/38196
Title: Identifying shifts between two regression curves
Authors: Dette, Holger; Sankar Dhar, Subhra; Wu, Weichi
Abstract: This article studies the problem whether two convex (concave) regression functions
modelling the relation between a response and covariate in two samples differ by a shift
in the horizontal and/or vertical axis. We consider a nonparametric situation assuming
only smoothness of the regression functions. A graphical tool based on the derivatives
of the regression functions and their inverses is proposed to answer this question and
studied in several examples. We also formalize this question in a corresponding hypothesis
and develop a statistical test. The asymptotic properties of the corresponding
test statistic are investigated under the null hypothesis and local alternatives. In contrast
to most of the literature on comparing shape invariant models, which requires
independent data the procedure is applicable for dependent and non-stationary data.
We also illustrate the finite sample properties of the new test by means of a small
simulation study and a real data example.2019-08-30T14:17:53ZPrediction in regression models with continuous observations
http://hdl.handle.net/2003/38195
Title: Prediction in regression models with continuous observations
Authors: Dette, Holger; Pepelyshev, Andrey; Zhigljavsky, Anatoly
Abstract: We consider the problem of predicting values of a random process or ﬁeld satisfying a linear model y(x) = θ>f(x) + ε(x), where errors ε(x) are correlated. This is a common problem in kriging, where the case of discrete observations is standard. By focussing on the case of continuous observations, we derive expressions for the best linear unbiased predictors and their mean squared error. Our results are also applicable in the case where the derivatives of the process y are available, and either a response or one of its derivatives need to be predicted. The theoretical results are illustrated by several examples in particular for the popular Matérn 3/2 kernel.2019-08-30T14:16:22ZVolatility forecasting accuracy for Bitcoin
http://hdl.handle.net/2003/38165
Title: Volatility forecasting accuracy for Bitcoin
Authors: Köchling, Gerrit; Schmidtke, Philipp; Posch, Peter N.
Abstract: We analyse the quality of Bitcoin volatility forecasting of GARCH-type
models applying the commonly used volatility proxy based on squared daily
returns as well as a jump-robust proxy based on intra-day returns and vary
the degrees of asymmetry in robust loss functions. We construct model
confidence sets (MCS) which contain superior models with a high probability
and find them to be systematically smaller for asymmetric loss functions
and the jump robust proxy. Our findings suggest a cautious use of GARCH
models in forecasting Bitcoin's volatility.2019-08-05T12:52:14ZOptimal designs for estimating individual coefficients in polynomial regression with no intercept
http://hdl.handle.net/2003/38137
Title: Optimal designs for estimating individual coefficients in polynomial regression with no intercept
Authors: Dette, Holger; Melas, Viatcheslav B.; Shpilev, Petr
Abstract: In a seminal paper Studden (1968) characterized c-optimal designs in regression
models, where the regression functions form a Chebyshev system. He used these
results to determine the optimal design for estimating the individual coefficients in a
polynomial regression model on the interval [-1; 1] explicitly. In this note we identify
the optimal design for estimating the individual coefficients in a polynomial regression
model with no intercept (here the regression functions do not form a Chebyshev
system).2019-07-12T10:43:47ZFinancial risk measures for a network of individual agents holding portfolios of lighttailed objects
http://hdl.handle.net/2003/38088
Title: Financial risk measures for a network of individual agents holding portfolios of lighttailed objects
Authors: Klüppelberg, Claudia; Seifert, Miriam Isabel
Abstract: We investigate a financial network of agents holding portfolios of independent
light-tailed risky objects whose losses are asymptotically exponentially
distributed with distinct tail parameters. We show that the
asymptotic distributions of portfolio losses belong to the class of functional
exponential mixtures which we introduce in this paper. We also
provide statements for Value-at-Risk and Expected Shortfall risk measures
as well as for their conditional counterparts. Compared to heavy
tail settings we establish important qualitative differences in the asymptotic
behavior of portfolio risks under a light tail assumption which have
to be accounted for in practical risk management.2019-06-07T13:25:12ZA new approach for open-end sequential change point monitoring
http://hdl.handle.net/2003/38081
Title: A new approach for open-end sequential change point monitoring
Authors: Gösmann, Josua; Kley, Tobias; Dette, Holger
Abstract: We propose a new sequential monitoring scheme for changes in the parameters of
a multivariate time series. In contrast to procedures proposed in the literature which
compare an estimator from the training sample with an estimator calculated from the
remaining data, we suggest to divide the sample at each time point after the training
sample. Estimators from the sample before and after all separation points are then
continuously compared calculating a maximum of norms of their differences. For openend
scenarios our approach yields an asymptotic level a procedure, which is consistent
under the alternative of a change in the parameter.2019-06-06T11:30:05ZWirtschaftliche Aktivität und Emissionen: Die Umweltkuznetskurve
http://hdl.handle.net/2003/38076
Title: Wirtschaftliche Aktivität und Emissionen: Die Umweltkuznetskurve
Authors: Wagner, Martin; Knorre, Fabian
Abstract: Seit dem Beginn der industriellen Revolution ist die mittlere globale Temperatur um circa
ein Grad Celsius gestiegen. Es steht außer Zweifel, dass dieser Anstieg wesentlich auch
durch menschliche Aktivitäten getrieben ist - durch Emissionen von Kohlenstoffdioxid
und anderen Treibhausgasen. Wie sehen die Zusammenhänge zwischen wirtschaftlicher
Aktivität und Emissionen aus? Steigen die Emissionen zwingend mit steigender
wirtschaftlicher Aktivität? In diesem Kapitel wollen wir einige grundlegende Probleme
beleuchten, die bei der statistischen - eigentlich ökonometrischen - Analyse dieser
Zusammenhänge auftreten. Diese Probleme sind symptomatisch für wirtschaftswissenschaftliche
Beziehungen und ein Grund warum sich die Ökonometrie als eigenständige
Disziplin etabliert hat.2019-05-29T14:22:06ZLimit theorems for locally stationary processes
http://hdl.handle.net/2003/38046
Title: Limit theorems for locally stationary processes
Authors: Kawka, Rafael
Abstract: We present limit theorems for locally stationary processes that have a one sided
time-varying moving average representation. In particular, we prove a central limit
theorem (CLT), a weak and a strong law of large numbers (WLLN, SLLN) and a
law of the iterated logarithm (LIL) under mild assumptions that are closely related
to those originally imposed by Dahlhaus and Polonik (2006).2019-05-10T14:01:56ZSome explicit solutions of c-optimal design problems for polynomial regression
http://hdl.handle.net/2003/38039
Title: Some explicit solutions of c-optimal design problems for polynomial regression
Authors: Dette, Holger; Melas, Viatcheslav B.; Shpilev, Petr
Abstract: In this paper we consider the optimal design problem for extrapolation and estimation
of the slope at a given point, say z, in a polynomial regression with no intercept.
We provide explicit solutions of these problems in many cases and characterize those
values of z, where this is not possible.2019-05-03T11:27:26ZOn scale estimation under shifts in the mean
http://hdl.handle.net/2003/38014
Title: On scale estimation under shifts in the mean
Authors: Axt, Ieva; Fried, Roland
Abstract: In many situations it is crucial to estimate the variance properly. Ordinary variance estimators
perform poorly in the presence of shifts in the mean. We investigate an approach
based on non-overlapping blocks, which yields good results in this change-point scenario.
We show the strong consistency and the asymptotic normality of such blocks-estimators
of the variance under rather general conditions. For estimation of the standard deviation
a blocks-estimator based on average standard deviations turns out to be preferable over
the square root of the average variances. We provide recommendations on the appropriate
choice of the block size and compare this blocks-approach with difference-based
estimators. If level shifts occur rather frequently even better results can be obtained by
adaptive trimming of the blocks under the assumption of normality.2019-04-12T11:16:18ZOptimal designs for model averaging in non-nested models
http://hdl.handle.net/2003/37979
Title: Optimal designs for model averaging in non-nested models
Authors: Alhorn, Kira; Dette, Holger; Schorning, Kirsten
Abstract: In this paper we construct optimal designs for frequentist model averaging estimation.
We derive the asymptotic distribution of the model averaging estimate with fixed weights
in the case where the competing models are non-nested and none of these models is correctly
specified. A Bayesian optimal design minimizes an expectation of the asymptotic
mean squared error of the model averaging estimate calculated with respect to a suitable
prior distribution. We demonstrate that Bayesian optimal designs can improve the
accuracy of model averaging substantially. Moreover, the derived designs also improve
the accuracy of estimation in a model selected by model selection and model averaging
estimates with random weights.2019-04-03T15:30:14ZWTA-WTP disparity: The role of perceived realism of the valuation setting
http://hdl.handle.net/2003/37944
Title: WTA-WTP disparity: The role of perceived realism of the valuation setting
Authors: Frondel, Manuel; Sommer, Stephan; Tomberg, Lukas
Abstract: Based on a survey among more than 5,000 German households and a single-binary
choice experiment in which we randomly split the respondents into two groups, this
paper elicits both households’ willingness to pay (WTP) for power supply security
and their willingness to accept (WTA) compensations for a reduced security level.
In accord with numerous empirical studies, we find that the mean WTA value substantially
exceeds the mean WTP bid, in our empirical example by a factor of 3.56.
Yet, the WTA-WTP ratio decreases to 2.35 among respondents who believe that the
hypothetical valuation setting is likely to become true. Conversely, the WTA-WTP
ratio increases to 3.81 among respondents who deem the setting unlikely. Given this
discrepancy, we conclude that to diminish the WTA-WTP disparity resulting from
stated-preference surveys at least to some extent, inquiring about respondents’ perception
on the realism of the valuation setting is an essential element of any survey
design.2019-03-18T11:07:26ZEmployee representation and innovation – disentangling the effect of legal and voluntary representation institutions in Germany
http://hdl.handle.net/2003/37916
Title: Employee representation and innovation – disentangling the effect of legal and voluntary representation institutions in Germany
Authors: Kraft, Kornelius; Lammers, Alexander
Abstract: This paper studies the effect of employee representation bodies provided by management on product and process innovations. In contrast to statutory forms of co-determination such as works councils, participative practices initiated by management are not equipped with any legally granted rights at all. Such alternative forms of employee representation are far less frequently and thoroughly analyzed than works councils. We compare the effects of these co-determination institutions established voluntarily with those initiated on a legal basis on different kinds of innovation measures. We differentiate between process and product (incremental and radical) innovations. To tackle endogeneity, the estimations are based on recursive bivariate and multivariate probit models. Results show that employee representation provided voluntarily by management supports incremental as well as radical product and process innovations. The effect is much more pronounced when endogeneity is taken into account. Works councils, however, only exhibit a positive effect on incremental innovations. Moreover, the results point to a substitutive relationship between both types of employee representation.2019-02-14T15:33:46ZEquivalence of regression curves sharing common parameters
http://hdl.handle.net/2003/37915
Title: Equivalence of regression curves sharing common parameters
Authors: Möllenhoff, Kathrin; Bretz, Frank; Dette, Holger
Abstract: In clinical trials the comparison of two different populations is a frequently addressed
problem. Non-linear (parametric) regression models are commonly used to
describe the relationship between covariates as the dose and a response variable in
the two groups. In some situations it is reasonable to assume some model parameters
to be the same, for instance the placebo effect or the maximum treatment effect. In
this paper we develop a (parametric) bootstrap test to establish the similarity of two
regression curves sharing some common parameters. We show by theoretical arguments
and by means of a simulation study that the new test controls its level and
achieves a reasonable power. Moreover, it is demonstrated that under the assumption
of common parameters a considerable more powerful test can be constructed compared
to the test which does not use this assumption. Finally, we illustrate potential
applications of the new methodology by a clinical trial example.2019-02-14T15:31:27ZThe empirical process of residuals from an inverse regression
http://hdl.handle.net/2003/37904
Title: The empirical process of residuals from an inverse regression
Authors: Kutta, Tim; Bissantz, Nicolai; Chown, Justin; Dette, Holger
Abstract: In this paper we investigate an indirect regression model characterized by the
Radon transformation. This model is useful for recovery of medical images obtained by computed tomography scans. The indirect regression function is estimated using a series estimator
motivated by a spectral cut-off technique. Further, we investigate the empirical process of
residuals from this regression, and show that it satsifies a functional central limit theorem.2019-02-06T12:57:27ZGeneralized sign tests based on sign depth
http://hdl.handle.net/2003/37839
Title: Generalized sign tests based on sign depth
Authors: Leckey, Kevin; Malcherczyk, Dennis; Müller, Christine H.
Abstract: We introduce generalized sign tests based on K-sign depth, shortly denoted
by K-depth. These so-called K-depth tests are motivated by simplicial regression
depth. Since they depend only on the signs of the residuals, these test statistics
are easy to comprehend and outlier robust. We show that the K-depth test with
K = 2 is equivalent to the classical sign test so that K-depth tests with K > 2
are generalizations of the classical sign test. Since the K-depth test with K = 2 is
equivalent to the classical sign test, it has the same drawbacks as the classical sign
test. However, the generalized sign tests with K > 2 are much more powerful. We
show this by deriving their behavior at observations with few sign changes. Thereby
we also prove an upper bound for the K-depth which is attained by observations
with alternating signs of residuals. Furthermore, we prove the consistency of the K-
depth. Finally, we demonstrate the good power of the K-depth tests for relevance
testing, quadratic regression, and tests for explosive AR(2) and nonlinear AR(1)
regression.2018-12-17T16:56:45ZOptimal designs for series estimation in nonparametric regression with correlated data
http://hdl.handle.net/2003/37836
Title: Optimal designs for series estimation in nonparametric regression with correlated data
Authors: Dette, Holger; Schorning, Kirsten; Konstantinou, Maria
Abstract: In this paper we investigate the problem of designing experiments for series estimators in nonparametric regression models with correlated observations. We use projection based estimators to derive an explicit solution of the best linear oracle estimator in the continuous time model for all Markovian-type error processes. These solutions are then used to construct estimators, which can be calculated from the available data along with their corresponding optimal design points. Our results are illustrated by means of a simulation study, which demonstrates that the new series estimator has a better performance than the commonly used techniques based on the optimal linear unbiased estimators. Moreover, we show that the performance of the estimators proposed in this paper can be further improved by choosing the design points appropriately.2018-12-14T14:05:07ZGoodness-of-fit testing the error distribution in multivariate indirect regression
http://hdl.handle.net/2003/37835
Title: Goodness-of-fit testing the error distribution in multivariate indirect regression
Authors: Chown, Justin; Bissantz, Nicolai; Dette, Holger
Abstract: We propose a goodness-of-fit test for the distribution of errors from a multivariate
indirect regression model. The test statistic is based on the Khmaladze transformation of the
empirical process of standardized residuals. This goodness-of-fit test is consistent at the root-n
rate of convergence, and the test can maintain power against local alternatives converging to
the null at a root-n rate.2018-12-14T14:03:05ZA similarity measure for second order properties of non-stationary functional time series with applications to clustering and testing
http://hdl.handle.net/2003/37828
Title: A similarity measure for second order properties of non-stationary functional time series with applications to clustering and testing
Authors: van Delft, Anne; Dette, Holger
Abstract: Due to the surge of data storage techniques, the need for the development of appropri-ate techniques to identify patterns and to extract knowledge from the resulting enormous data sets, which can be viewed as collections of dependent functional data, is of increasing interest in many scientific areas. We develop a similarity measure for spectral density oper-ators of a collection of functional time series, which is based on the aggregation of Hilbert-Schmidt differences of the individual time-varying spectral density operators. Under fairly general conditions, the asymptotic properties of the corresponding estimator are derived and asymptotic normality is established. The introduced statistic lends itself naturally to quantify (dis)-similarity between functional time series, which we subsequently exploit in order to build a spectral clustering algorithm. Our algorithm is the first of its kind in the analysis of non-stationary (functional) time series and enables to discover particular pat-terns by grouping together ‘similar’ series into clusters, thereby reducing the complexity of the analysis considerably. The algorithm is simple to implement and computationally fea-sible. As a further application we provide a simple test for the hypothesis that the second order properties of two non-stationary functional time series coincide.2018-12-04T08:35:05ZAliasing effects for random fields over spheres of arbitrary dimension
http://hdl.handle.net/2003/37827
Title: Aliasing effects for random fields over spheres of arbitrary dimension
Authors: Durastanti, Claudio; Patschkowski, Tim
Abstract: In this paper, aliasing effects are investigated for random ﬁelds deﬁned on the d-dimensional
sphere Sd, and reconstructed from discrete samples. First, we introduce the concept of an aliasing function
on Sd. The aliasing function allows to identify explicitly the aliases of a given harmonic coefficient in
the Fourier decomposition. Then, we exploit this tool to establish the aliases of the harmonic coefficients approximated by means of the quadrature procedure named spherical uniform sampling. Subsequently, we
study the consequences of the aliasing errors in the approximation of the angular power spectrum of an isotropic random ﬁeld, the harmonic decomposition of its covariance function. Finally, we show that band-
limited random ﬁelds are aliases-free, under the assumption of a sufficiently large amount of nodes in the quadrature rule.2018-12-04T08:32:55ZIncreased market transparency in Germany’s gasoline market: The death of rockets and feathers?
http://hdl.handle.net/2003/37826
Title: Increased market transparency in Germany’s gasoline market: The death of rockets and feathers?
Authors: Frondel, Manuel; Horvath, Marco; Vance, Colin; Kihm, Alexander
Abstract: Drawing on a consumer search model and a unique panel data set of daily
fuel prices covering over 5,000 fuel stations in Germany, this paper documents a
change in the price setting behavior of retail gas stations following the introduction of
a legally mandated on-line price portal. Prior to the introduction of the portal in 2013,
positive asymmetry is found on the basis of error correction models, with prices following
the “rockets and feathers” pattern documented in many commodity markets,
particularly in retail markets for fuels. In the aftermath of the portal’s introduction, by
contrast, negative asymmetry is observed: fuel price decreases in response to refinery
price decreases are stronger than fuel price increases due to refinery price increases.
This reversal in price pass-through, which is found among both branded and unbranded
stations, suggests welfare gains for consumers from increased market transparency.2018-12-04T08:30:36ZStatistical analysis of the lifetime of diamond impregnated tools for core drilling of concrete
http://hdl.handle.net/2003/37814
Title: Statistical analysis of the lifetime of diamond impregnated tools for core drilling of concrete
Authors: Malevich, Nadja; Müller, Christine H.; Kansteiner, Michael; Biermann, Dirk; Ferreira, Manuel; Tillmann, Wolfgang
Abstract: The lifetime of diamond impregnated tools for core drilling of concrete
is studied via the lifetimes of the single diamonds on the tool. Thereby, the number
of visible and active diamonds on the tool surface is determined by microscopical
inspections of the tool at given points in time. This leads to interval-censored lifetime
data if only the diamonds visible at the beginning are considered. If also the
lifetimes of diamonds appearing during the drilling process are included then the
lifetimes are doubly interval-censored. A statistical method is presented to analyse
the interval-censored data as well as the doubly interval-censored data. The method
is applied to three series of experiments which differ in the size of the diamonds
and the type of concrete. It turns out that the lifetimes of small diamonds used for
drilling into conventional concrete is much shorter than the lifetimes when using
large diamonds or high strength concrete.2018-11-27T11:46:56ZDetection of anomalous sequences in crack data of a bridge monitoring
http://hdl.handle.net/2003/37813
Title: Detection of anomalous sequences in crack data of a bridge monitoring
Authors: Abbas, Sermad; Fried, Roland; Heinrich, Jens; Horn, Melanie; Jakubzik, Mirko; Kohlenbach, Johanna; Maurer, Reinhard; Michels, Anne; Müller, Christine H.
Abstract: For estimating the remaining lifetime of old prestressed concrete bridges,
a monitoring of crack widths can be used. However, the time series of crack widths
show a strong variation mainly caused by temperature and traffic. Additionally, sequences
with extreme volatility appear where the cause is unknown. They are called
anomalous sequences in the following.We present and compare four methods which
aim to detect these anomalous sequences in the time series. Volatilities caused by
traffic should not be detected.2018-11-27T11:45:06ZMultiscale change point detection for dependent data
http://hdl.handle.net/2003/37806
Title: Multiscale change point detection for dependent data
Authors: Dette, Holger; Schüler, Theresa; Vetter, Mathias
Abstract: In this paper we study the theoretical properties of the simultaneous multiscale change
point estimator (SMUCE) proposed by Frick et al. (2014) in regression models with dependent
error processes. Empirical studies show that in this case the change point estimate
is inconsistent, but it is not known if alternatives suggested in the literature for correlated
data are consistent. We propose a modification of SMUCE scaling the basic statistic by
the long run variance of the error process, which is estimated by a difference-type variance
estimator calculated from local means from different blocks. For this modification we prove
model consistency for physical dependent error processes and illustrate the finite sample
performance by means of a simulation study.2018-11-16T13:21:21ZPanel cointegrating polynomial regressions: Group-mean fully modified OLS estimation and inference
http://hdl.handle.net/2003/37669
Title: Panel cointegrating polynomial regressions: Group-mean fully modified OLS estimation and inference
Authors: Wagner, Martin; Reichold, Karsten
Abstract: This paper considers group-mean fully modified OLS estimation for a panel of cointegrating
polynomial regressions, i. e., regressions that include an integrated process and its powers as
explanatory variables. The stationary errors are allowed to be serially correlated, the regressor
to be endogenous and { as usual in the nonstationary panel literature { we include individual
specific fixed effects. We consider a fixed cross-section dimension, asymptotics in the time
dimension only and show that the estimator allows for standard asymptotic inference in this
setting. In both the simulations as well as an illustrative application estimating environmental
Kuznets curves for carbon dioxide emissions we compare our group-mean estimator with the
pooled fully modified OLS estimator of de Jong and Wagner (2018).2018-11-13T12:32:37ZConsistency for the negative binomial regression with fixed covariate
http://hdl.handle.net/2003/37352
Title: Consistency for the negative binomial regression with fixed covariate
Authors: Weißbach, Rafael; Radloff, Lucas
Abstract: We model an overdispersed count as a dependent measurement, by means of
the Negative Binomial distribution. We consider quantitative regressors that
are ﬁxed by design. The expectation of the dependent variable is assumed to
be a known function of a linear combination involving regressors and their coefficients. In the NB1-parametrization of the negative binomial distribution,
the variance is a linear function of the expectation, inﬂated by the dispersion
parameter, and not a generalized linear model. We apply a general result of
Bradley and Gart (1962) to derive weak consistency and asymptotic normality of the maximum likelihood estimator for all parameters. To this end, we
show (i) how to bound the logarithmic density by a function that is linear
in the outcome of the dependent variable, independently of the parameter.
Furthermore (ii) the positive deﬁniteness of the matrix related to the Fisher
information is shown with the Cauchy-Schwarz inequality.2018-10-31T13:29:33ZUsing the extremal index for value-at-risk backtesting
http://hdl.handle.net/2003/37201
Title: Using the extremal index for value-at-risk backtesting
Authors: Bücher, Axel; Posch, Peter N.; Schmidtke, Philipp
Abstract: We introduce a set of new Value-at-Risk independence backtests by establishing a
connection between the independence property of Value-at-Risk forecasts and the
extremal index, a general measure of extremal clustering of stationary sequences.
We introduce a sequence of relative excess returns whose extremal index has to
be estimated. We compare our backtest to both popular and recent competitors
using Monte-Carlo simulations and find considerable power in many scenarios.
In an applied section we perform realistic out-of-sample forecasts with common
forecasting models and discuss advantages and pitfalls of our approach.2018-10-19T14:45:07ZSwitching to green electricity: Spillover effects on household consumption
http://hdl.handle.net/2003/37200
Title: Switching to green electricity: Spillover effects on household consumption
Authors: Sommer, Stephan
Abstract: One way to reduce emissions from the consumption of electricity is switching to
green electricity suppliers. This paper identifies the determinants of adopting green electricity
and the effect on electricity consumption, using panel data on more than 9,000
households. To control for potential self-selection into green electricity tariffs, an endogenous
dummy treatment effects model is estimated. The results suggest that wealthier
and better-educated households are more likely to adopt green electricity. Moreover, we
find that switching to green electricity decreases electricity consumption and households
supplied by green electricity are less price-responsive. Consequently, enforcing higher
prices for conventional electricity might prove effective in reducing both greenhouse gas
emissions and electricity consumption at the household level.2018-10-19T14:43:05ZRISE Germany Internship: Applying Deep Learning Methods to the Search for Astrophysical Tau Neutrinos
http://hdl.handle.net/2003/37190
Title: RISE Germany Internship: Applying Deep Learning Methods to the Search for Astrophysical Tau Neutrinos
Authors: Martin, William2018-10-12T12:28:22ZFeature Selection for High-Dimensional Data with RapidMiner
http://hdl.handle.net/2003/37189
Title: Feature Selection for High-Dimensional Data with RapidMiner
Authors: Sangkyun, Lee; Schowe, Benjamin; Sivakumar, Viswanath; Morik, Katharina
Abstract: Feature selection is an important task in machine learning, reducing dimensionality of learning problems by selecting few relevant features without losing too much information. Focusing on smaller sets of features, we can learn simpler models from data that are easier to understand and to apply. In fact, simpler models are more robust to input noise and outliers, often leading to better prediction performance than the models trained in higher dimensions with all features. We implement several feature selection algorithms in an extension of RapidMiner, that scale well with the number of features compared to the existing feature selection operators in RapidMiner.2018-10-12T12:25:02ZEnergy-Efficient GPS-Based Positioning in the Android Operating System
http://hdl.handle.net/2003/37188
Title: Energy-Efficient GPS-Based Positioning in the Android Operating System
Authors: Streicher, Jochen; Spincyk, Olaf
Abstract: We present our ongoing collaborative work on EnDroid, an energy-efficient GPS-based positioning system for the Android Operating System. EnDroid is based on the EnTracked positioning system, developed at the University of Aarhus, Denmark. We describe the current prototypical state of our implementation and present our experiences and conclusions from preliminarily evaluating EnDroid on the Google Nexus One Smartphone. Although the preliminary results seem to sup- port the approach, there are still several open questions, both at the application interface, as well as at the hardware management level.2018-10-12T12:23:41ZProbabilistic Graphical Models in RapidMiner
http://hdl.handle.net/2003/37187
Title: Probabilistic Graphical Models in RapidMiner
Authors: Piatkowski, Nico
Abstract: This Report describes the technical background and usage of the GraphMod plug-in for RapidMiner. The plug-in enables RapidMiner to load factor graphs and interpret Label and Attributes which are contained in an Example as assignments to random variables. A set of examples which belong to the same Batch is treated as assignment to a whole factor graph. New operators allow the estimation of factor weights, the computation of the single-node marginal probability functions and the computation of the most probable assignment for each Labelnode with several methods. All algorithms are optimized for parallel execution on common multi-core processors and NVIDIA CUDA capable many-core processors (also known as Graphics Processing Unit).2018-10-12T12:22:11ZTechnical report for Collaborative Research Center SFB 876 - Graduate School
http://hdl.handle.net/2003/37186
Title: Technical report for Collaborative Research Center SFB 876 - Graduate School
Authors: Morik, Katharina; Rhode, Wolfgang2018-10-12T09:18:51ZComputing on High Performance Clusters with R: Packages BatchJobs and BatchExperiments
http://hdl.handle.net/2003/37185
Title: Computing on High Performance Clusters with R: Packages BatchJobs and BatchExperiments
Authors: Bischl, Bernd; Lang, Michel; Mersmann, Olaf; Rahnenführer, Jörg; Weihs, Claus
Abstract: Empirical analysis of statistical algorithms often demands time-consuming experiments which are best performed on high performance computing clusters. We present two R packages which greatly simplify working in batch computing environments. The package BatchJobs implements the basic objects and procedures to control a batch cluster within R. It is structured around cluster versions of the well-known higher order functions Map, Reduce and Filter from functional programming. An important feature is that the state of computation is persistently available in a database. The user can query the status of jobs and then continue working with a desired subset. The second package, BatchExperiments, is tailored for the still very general scenario of analyzing arbitrary algorithms on problem instances. It extends BatchJobs by letting the user define an array of jobs of the kind “apply algorithm A to problem instance P and store results”. It is possible to associate statistical designs with parameters of algorithms and problems and therefore to systematically study their influence on the results. In general our main contributions are: (a) Portability : Both packages use a clear and well-defined interface to the batch system which makes them applicable in most high-performance computing environments. (b) Reproducibility: Every computational part has an associated seed that the user can control to ensure reproducibility even when the underlying batch system changes. (c) Efficiency: Efficiently use batch computing clusters completely within R. (d) Abstraction and good software design: The code layers for algorithms, experiment definitions and execution are cleanly separated and enable the writing of readable and maintainable code.2018-10-12T09:16:55ZTechnical report for Collaborative Research Center SFB 876 - Graduate School
http://hdl.handle.net/2003/37184
Title: Technical report for Collaborative Research Center SFB 876 - Graduate School
Authors: Morik, Katharina; Rhode, Wolfgang2018-10-12T09:14:26ZOptimization plugin for RapidMiner
http://hdl.handle.net/2003/37183
Title: Optimization plugin for RapidMiner
Authors: Umaashankar, Venkatesh; Sangkyun, Lee
Abstract: Optimization in general means selecting a best choice out of various alternatives, which reduces the cost or disadvantage of an objective. Optimization problems are very popular in the fields such as economics, finance, logistics, etc. Optimization is a science of its own and machine learning or data mining is a diverse growing field which applies techniques from various other areas to find useful insights from data. Many of the machine learning problems can be modelled and solved as optimization problems, which means optimization already provides a set of well established methods and algorithms to solve machine learning problems. Due to the importance of optimization in machine learning, in recent times, machine learning researchers are contributing remarkable improvements in the field of optimization. We implement several popular optimization strategies and algorithms as a plugin for RapidMiner, which adds an optimization tool kit to the list of existing arsenal of operators in RapidMiner.2018-10-12T09:12:51ZThe Streams Framework
http://hdl.handle.net/2003/37182
Title: The Streams Framework
Authors: Bockermann, Christian; Blom, Hendrik
Abstract: In this report, we present the streams library, a generic Java-based library for designing data stream processes. The streams library defines a simple abstraction layer for data processing and provides a small set of online algorithms for counting and classification. Moreover it integrates existing libraries such as MOA. Processes are defined in XML files following the semantics and ideas of well established tools like Ant, Maven or the Spring Framework. The streams library can be easily embedded into existing software, used as a standalone tool or be used to define compute graphs that are executed on other back end systems such as the Stormstream engine. This report reflects the status of the streams framework in version 0.9.6. As the framework is continuously enhanced, the report is extended along. The most recent version of this report is available online.2018-10-12T09:11:13ZMeasuring the Power Consumption of Smartphones
http://hdl.handle.net/2003/37181
Title: Measuring the Power Consumption of Smartphones
Authors: Manning-Dahan, Tyler; Putzke, Markus; Wietfeld, Christian
Abstract: Smartphones are becoming a part of everyday life and as such, a better understanding of hardware and software power consumption is crucial to develop more efficient smartphones. In order to extend battery life, application developers and phone designers must become aware of the limitations of a phone’s CPU power, as well as the LCD display consumption and connectivity via WiFi, 3G, and GPS systems. We present power consumption measurements of an HTC Incredible S and compare these results to known analytical models. The evaluation shows that power consumption is considerably varying with different types of smartphones and that well known models underestimate the actual consumption. The results illustrate that touching the screen nearly doubles the power consumption , which is not captured by any analytical model. Moreover, we present in which way the transmitted packet size of WiFi and cellular communications affect the power consumption.2018-10-12T09:08:46ZUnimodal regression using Bernstein-Schoenberg-splines and penalties
http://hdl.handle.net/2003/37180
Title: Unimodal regression using Bernstein-Schoenberg-splines and penalties
Authors: Köllmann, Claudia; Bornkamp, Björn; Ickstadt, Katja
Abstract: Research in the field of nonparametric shape constrained regression has been intensive. However, only few publications explicitly deal with unimodality although there is need for such methods in applications, for example, in dose-response analysis. In this paper we propose unimodal spline regression methods that make use of Bernstein-Schoenberg-splines and their shape preservation property. To achieve unimodal and smooth solutions we use penalized splines, and extend the penalized spline approach towards penalizing against general parametric functions, instead of using just difference penalties. For tuning parameter selection under a unimodality constraint a restricted maximum likelihood and an alternative Bayesian approach for unimodal regression are developed. We compare the proposed methodologies to other common approaches in a simulation study and apply it to a dose-response data set. All results suggest that the unimodality constraint or the combination of unimodality and a penalty can substantially improve estimation of the functional relationship.2018-10-12T09:07:11ZPreserving Confidentiality in Multiagent Systems - An Internship Project within the DAAD RISE Program
http://hdl.handle.net/2003/37179
Title: Preserving Confidentiality in Multiagent Systems - An Internship Project within the DAAD RISE Program
Authors: Dilger, Daniel; Krümpelmann, Patrick; Tadros, Cornelia
Abstract: RISE (Research Internships in Science and Engineering) is a summer internship program for undergraduate students from the United States, Canada and the UK organized by the DAAD (Deutscher Akademischer Austausch Dienst). Within the project A5 in the Collaborative Research Center SFB 876, we have planned and conducted an internship project in the RISE program that should support our research. Daniel Dilger was the intern and has been supervised by the PhD students Patrick Krümpelmann and Cornelia Tadros. The aim was to model an application scenario for our prototype implementation of a confidentiality preserving multiagent system and to run experiments with that prototype.2018-10-12T09:05:30ZTechnical report for Collaborative Research Center SFB 876 - Graduate School
http://hdl.handle.net/2003/37178
Title: Technical report for Collaborative Research Center SFB 876 - Graduate School
Authors: Morik, Katharina; Rhode, Wolfgang2018-10-12T08:47:15ZRobPer: An R Package to Calculate Periodograms for Light Curves Based On Robust Regression
http://hdl.handle.net/2003/37177
Title: RobPer: An R Package to Calculate Periodograms for Light Curves Based On Robust Regression
Authors: Thieler, Anita Monika; Fried, Roland; Rathjens, Jonathan
Abstract: An important task in astroparticle physics is the detection of periodicities in irregularly sampled time series, called light curves. The classic Fourier periodogram cannot deal with irregular sampling and with the measurement accuracies that are typically given for each observation of a light curve. Hence, methods to fit periodic functions using weighted regression were developed in the past to calculate periodograms. We present the R Package RobPer which allows to combine different periodic functions and regression techniques to calculate periodograms. Possible regression techniques are least squares, least absolute deviation, least trimmed, M-, S- and {\tau} -regression. Measurement accuracies can be taken into account including weights. Our periodogram function covers most of the attempts that have been tried earlier and provides new model-regression-combinations that have not been used before. To detect valid periods, we apply an outlier search on the periodogram instead of using fixed critical values that are theoretically only justified in case of least squares regression, independent periodogram bars and a null hypothesis allowing only normal white noise. This outlier search can be performed using RobPer as well. Finally, the package also includes a generator to generate artificial light curves e.g., for simulation studies.2018-10-12T08:44:24ZPreprocessing of Affymetrix Exon Expression Arrays
http://hdl.handle.net/2003/37176
Title: Preprocessing of Affymetrix Exon Expression Arrays
Authors: Sangkyun, Lee; Schramm, Alexander
Abstract: The activity of genes can be captured by measuring the amount of messenger RNAs transcribed from the genes, or from their subunits called exons. In our study, we use the Affymetrix Human Exon ST v1.0 micro arrays to measure the activity of exon s in Neuroblastoma cancer patients. The purpose is to discover a small number of genes or exons that play important roles in differentiating high - risk patients fro m low - risk counterparts. Although the technology has been improved for the past 15 years, array measurements still can be contaminated by various factors, including human error. Since the number of arrays is often only few hundreds, atypical errors can hardly be canceled by large numbers of normal arrays. In this article we describe how we filter out low - quality arrays in a principled way, so that we can obtain more reliable results in downstream analyses.2018-10-12T08:39:44ZA Survey of the Stream Processing Landscape
http://hdl.handle.net/2003/37175
Title: A Survey of the Stream Processing Landscape
Authors: Bockermann, Christian
Abstract: The continuous processing of streaming data has become an important aspect in many applications. Over the last years a variety of different streaming platforms has been developed and a number of open source frameworks is available for the implementation of streaming applications. In this report, we will survey the landscape of existing streaming platforms. Starting with an overview of the evolving developments in the recent past, we will discuss the requirements of modern streaming architectures and present the ways these are approached by the different frameworks.2018-10-12T08:38:07ZRandom projections for Bayesian regression
http://hdl.handle.net/2003/37174
Title: Random projections for Bayesian regression
Authors: Geppert, Leo N.; Ickstadt, Katja; Munteanu, Alexander; Sohler, Christian
Abstract: This article introduces random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire d -dimensional distribution is preserved under random projections by reducing the number of data points from n to k element of O(poly(d/epsilon)) in the case n >> d . Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a (1+ O(epsilon))-approximation in the l_2-Wasserstein distance. Our main result states that the posterior distribution of a Bayesian linear regression is approximated up to a small error depending on only an epsilon-fraction of its defining parameters when using either improper non-informative priors or arbitrary Gaussian priors. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model while considerably reducing the total run-time.2018-10-12T08:35:55ZRessourcenbeschränkte Analyse von Ionenmobilitätsspektren mit dem Raspberry Pi
http://hdl.handle.net/2003/37173
Title: Ressourcenbeschränkte Analyse von Ionenmobilitätsspektren mit dem Raspberry Pi
Authors: Egorov, Alexey; König, Alexander; Köppen, Marcel; Kühn, Henning; Kullack, Isabell; Kuthe, Elias; Mitkovska, Suzana; Niehage, Robert; Pawelko, Andreas; Sträßer, Manuel; Striewe, Christian; D'Addario, Marianna; Kopczynski, Dominik; Rahmann, Sven
Abstract: Die Zusammensetzung der Umgebungs- oder Ausatemluft kann viele Informationen liefern, die z. B. helfen können, eine Erkrankung oder deren Ursache festzustellen. Die Moleküle der in der Luft enthaltenen Substanzen haben jeweils unterschiedliche Größen und Formen, so dass es möglich ist, sie voneinander zu trennen über Ausschläge in einer Luftmessung die Häufigkeit ihres Vorkommens zu bestimmen. Diese Ausschläge werden als Peaks bezeichnet. Ihre Erkennung ist Gegenstand aktueller Forschung. Das Einsatzgebiet solcher Messungen erstreckt sich von medizinischer Überwachung von Patienten im Krankenhaus bis zur Überprüfung der Umgebungsluft bestimmter Gegenden.2018-10-12T08:34:17ZTechnical report for Collaborative Research Center SFB 876 - Graduate School
http://hdl.handle.net/2003/37172
Title: Technical report for Collaborative Research Center SFB 876 - Graduate School
Authors: Morik, Katharina; Rhode, Wolfgang2018-10-12T08:30:12ZDemixing empirical distribution functions
http://hdl.handle.net/2003/37171
Title: Demixing empirical distribution functions
Authors: Munteanu, Alexander; Wornowizki, Max
Abstract: We consider the two-sample homogeneity problem where the information contained in two samples is used to test the equality of the underlying distributions. For instance, in cases where one sample stems from a simulation procedure modelling the data generating process of the other sample consisting of observed data, a mere rejection of the null hypothesis is unsatisfactory. Instead, the data analyst would like to know how the simulation can b e improved while changing it as little as possible. Based on the popular Kolmogorov-Smirnov test and a general nonparametric mixture model, we propose an algorithm which determines an appropriate correction distribution function describing how the simulation procedure can b e corrected. It is constructed in such a way that complementing the simulation sample by a given proportion of observations sampled from the correction distribution do es not lead to a rejection of the null hypothesis of equal distributions when the modified and the observed sample are compared. We prove our algorithm to run in linear time and evaluate it on simulated and real spectrometry data showing that it leads to intuitive results. We illustrate its practical performance considering runtime as well as accuracy in a real world scenario.2018-10-12T08:28:28ZData Modeling of Ubiquitous System Software
http://hdl.handle.net/2003/37170
Title: Data Modeling of Ubiquitous System Software
Authors: Streicher, Jochen
Abstract: The multitude of events and internal data structures in complex modern system software are an excellent target for data analysis. The tools to collect the data range from low-level tracing frameworks to more sophisticated ones with specialized data collection and processing languages. However, these lack information on the relationship between different data sources and between currently and already collected data. We describe a formal data model that captures the structure of data streams in the system software as well as the relationships between them.2018-10-12T08:26:55ZBeyond unimodal regression: modelling multimodality with piecewise unimodal, mixture or additive regression
http://hdl.handle.net/2003/37169
Title: Beyond unimodal regression: modelling multimodality with piecewise unimodal, mixture or additive regression
Authors: Köllmann, Claudia; Ickstadt, Katja; Fried, Roland
Abstract: Research in the field of nonparametric shape constrained regression has been extensive and there is need for such methods in various application are as, since shape constraints can reflect prior knowledge about the underlying relationship. It is, for example, often natural that some intensity first increases and then decreases over time, which can be described by a unimodal shape constraint. But the prior knowledge in different applications is also of increasing complexity and data shapes may vary fro m few to plenty of modes and from piecewise unimodal to superpositions of unimodal function courses. Thus, we go beyond unimodal regression in this report and capture multimodality by employing piecewise unimodal regression, mixture regression or additive regression models. We give an overview of the statistical methods, namely the unimodal spline regression approach by and its aforementioned extensions for use with multimodal data. The usefulness of the methods is demonstrated by applying them to data sets from three different application areas: breath gas analysis, marine biology and astroparticle physics. Though the three application areas are quite different, the propose d extensions of unimodal regression yield very helpful results in each of it. This encourages using the methodologies proposed here in many other areas of application as well.2018-10-12T08:25:06ZLogistic Regression in Datastreams
http://hdl.handle.net/2003/37168
Title: Logistic Regression in Datastreams
Authors: Schwiegelshohn, Chris; Sohler, Christian
Abstract: Learning from data streams is a well researched task both in theory and practice. As remarked by Clarkson, Hazan and Woodruff, many classification problems cannot be very well solved in a streaming setting. For previous model assumptions, there exist simple, yet highly artificial lower bounds prohibiting space efficient one- pass algorithms. At the same time, several classification algorithms are often successfully used in practice. To overcome this gap, we give a model relaxing the constraints that previously made classification impossible from a theoretical point of view and under these model assumptions provide the first (1 + epsilon) -approximate algorithms for sketching the objective values of logistic regression and perceptron classifiers in data streams.2018-10-12T08:23:06ZUnderstanding Where Your Classifier Does (Not) Work - the SCaPE Model Class for Exceptional Model Mining
http://hdl.handle.net/2003/37167
Title: Understanding Where Your Classifier Does (Not) Work - the SCaPE Model Class for Exceptional Model Mining
Authors: Duivesteijn, Wouter; Thaele, Julia
Abstract: FACT, the First G-APD Cherenkov Telescope, detects air showers induced by high-energetic cosmic particles. It is desirable to classify a shower as being induced by a gamma ray or a background particle. Generally, it is nontrivial to get any feedback on the real-life training task, but we can attempt to understand how our classifier works by investigating its performance on Monte Carlo simulated data. To this end, in this paper we develop the SCaPE (Soft Classifier Performance Evaluation) model class for Exceptional Model Mining, which is a Local Pattern Mining framework devoted to highlighting unusual interplay between multiple targets. In our Monte Carlo simulated data, we take as targets the computed classifier probabilities and the binary column containing the ground truth: which kind of particle induced the corresponding shower. Using a newly developed quality measure based on ranking loss, the SCaPE model class highlights subspaces of the search space where the classifier performs particularly well or poorly. These subspaces arrive in terms of conditions on attributes of the data, hence they come in a language a domain expert understands, which should aid him in understanding where his/her classifier does (not) work. Additional experiments are carried out on nine UCI datasets. Found subgroups highlight subspaces whose difficulty for classification is corroborated by astrophysical interpretation, as well as subspaces that warrant further investigation.2018-10-12T08:21:08ZAngerona - A Multiagent Framework for Logic Based Agents with Application to Secrecy Preservation
http://hdl.handle.net/2003/37166
Title: Angerona - A Multiagent Framework for Logic Based Agents with Application to Secrecy Preservation
Authors: Krümpelmann, Patrick; Janus, Tim; Kern-Isberner, Gabriele2018-10-11T13:54:58ZUntersuchungen zur Analyse von deutschsprachigen Textdaten
http://hdl.handle.net/2003/37165
Title: Untersuchungen zur Analyse von deutschsprachigen Textdaten
Authors: Morik, Katharina; Jung, Alexander; Weckwerth, Jan; Rötner, Stefan; Hess, Sibylle; Buschjäger, Sebastian; Pfahler, Lukas2018-10-11T13:53:10ZTechnical report for Collaborative Research Center SFB 876 - Graduate School
http://hdl.handle.net/2003/37164
Title: Technical report for Collaborative Research Center SFB 876 - Graduate School
Authors: Morik, Katharina; Rhode, Wolfgang2018-10-11T13:50:51ZPerformance Analysis for Parallel R Programs: Towards Efficient Resource Utilization
http://hdl.handle.net/2003/37163
Title: Performance Analysis for Parallel R Programs: Towards Efficient Resource Utilization
Authors: Kotthaus, Helena; Korb, Ingo; Marwedel, Peter
Abstract: Parallel computing is becoming more and more popular, since R is increasingly used to process large data sets. We therefore have improved traceR to allow for profiling parallel applications also. TraceR can be used for common cases like parallelization on multiple cores or parallelization on multiple machines. For the parallel performance analysis we added measurements like CPU utilization of parallel tasks and measurements for analyzing the memory usage of parallel programs during execution. With our parallel performance analysis we concentrate on applications that are embarrassingly par- allel consisting of independent tasks. One example application which is embarrassingly parallel and also has a high resource utilization is the model selection. Here the goal is to find the best machine learning algorithm configuration for building a model for the given data. Therefore one has to search through a huge model space. Since the gain from parallel execution can be negated if the memory requirements of all parallel processes exceed the capacity of the system, our profiling data can serve as a constraint to determine the degree of parallelism and also to guide distribution of parallel R applications. Our goal is to provide a resource-aware parallelization strategy. To develop such a strategy we first need to analyze the performance of parallel applications. In the following we therefore will describe different parallel example applications and show how traceR is applied to analyze parallel R applications.2018-10-11T13:48:34ZData Reduction for CORSIKA
http://hdl.handle.net/2003/37162
Title: Data Reduction for CORSIKA
Authors: Baack, Dominik
Abstract: For the analysis of measured data by experiments, simulated Monte Carlo data is essential. It is used to test the understanding of the experiment, for separation of signal and background and for reconstruction of real physical properties from observable parameters. With increasing size of the experiments, more and more simulated data is needed. To optimize the simulation and to reduce the huge amount of calculation time needed, two different methods exist. The first method is the low-level optimization of the source code. The second one is the reduction of the actually needed Monte Carlo data. This report focuses on the cosmic ray simulation CORSIKA, which simulates cosmic ray induced particle showers within the atmosphere. In case of CORSIKA, big parts of the program are already optimized. Additionally, parts of the source code are only accessible in binary form so the first method of optimization is nearly impossible. Therefore the preferred method here is the reduction of unnecessarily generated data. This report presents a modified and extended internal structure for CORSIKA, which is shown in Figure 2. The modifications can be divided in two modules: Dynamic Stack and Remote Control. Both have complementary approaches to reduce the amount of needed simulation cycles and provide an easy API for customizations without assuming any of the CORSIKA code or structure.2018-10-11T13:44:45ZRISE Germany Internship: Application of Data Mining Methods on IceCube Event Reconstructions
http://hdl.handle.net/2003/37161
Title: RISE Germany Internship: Application of Data Mining Methods on IceCube Event Reconstructions
Authors: Bhasin, Srishti; Börner, Mathis
Abstract: In this report the results from a 3-month internship are presented. The goal of the internship was to apply data mining methods to low level IceCube data in order to reconstruct the particle energies. IceCube is a neutrino observatory located at the geographical South Pole, built with the aim of detecting high energy astrophysical neutrinos. The detector consists of 5160 photomultipliers, located 1.5-2.5 kilometers beneath the icecap, which detect Cherenkov light radiated by charged particle propagation through the ice. The reconstruction of detected events directly at the pole is challenging, due to heavy constraints on resources. Due to this, only rudimentary reconstructions are performed on-site. The final results are obtained months later, once the data has been transported from the detector. An effective and prompt reconstruction directly at the pole would open a lot of new possibilities for follow-up studies of detected events. The application of state-of-the-art data mining methods can help to obtain these reconstructions on-site.2018-10-11T13:42:44ZOnline Gauß-Prozesse zur Regression auf FPGAs
http://hdl.handle.net/2003/37160
Title: Online Gauß-Prozesse zur Regression auf FPGAs
Authors: Buschjäger, Sebastian
Abstract: FPGAs köonnen als eine schnelle und energiesparende Ausführungsplattform genutzt werden, welche jedoch keinerlei Laufzeitumgebung für Dateiabstraktionen oder Peripheriezugriffe anbietet. Aus diesem Grund muss neben der eigentlichen Implementierung auch der Entwurf des umliegenden Systems erfolgen. Dieser Systementwurf hat sich mit der dritten Generation der verf ̈ ugbaren Werkzeuguntersützung für FPGAs stark gewandelt, wodurch sich Unterschiede zu der vorhandenen Literatur ergeben. Das Entwurfsvorgehen für die aktuelle FPGA- und Werkzeuggeneration soll zunächst vorgestellt werden, um darauf aufbauend eine passende Laufzeitumgebung für maschinelle Lernalgorithmen auf dem FPGA zu entwerfen. Hierbei soll eine möglichst modulare und energiesparende Systemarchitektur entworfen werden, sodass sich die hier vorgestellte Systemarchitektur gut in eingebettete System anwenden lässt und zusätzlich der maschinelle Lernalgorithmus wegen der Modularität des Systems einfach ausgetauscht werden kann. Anschließend soll eine beispielhafte Umsetzung eines Gauß-Prozesses auf dem FPGA die Einbettung in das Gesamtsystem zeigen, wobei hier Wert auf eine möglichst hohe Geschwindigkeit der Hardwareimplementierung gelegt werden soll. Die Umsetzung einer energiesparenden Systemarchitektur für verschiedene maschinelle Lernalgorithmen ist nach Wissen des Autors neu, da in der vorhandenen Literatur jeweils ein neues System für einen anderen Algorithmus entworfen wird. Des Weiteren ist Umsetzung von Gauß-Prozessen auf FPGAs ist nach Wissen des Autors ebenfalls neu, sodass ich hier weitere Unterschiede zur vorhanden Literatur ergeben.2018-10-11T13:40:05ZEasyTCGA: An R package for easy batch downloading of TCGA data from FireBrowse
http://hdl.handle.net/2003/37159
Title: EasyTCGA: An R package for easy batch downloading of TCGA data from FireBrowse
Authors: Kliewer, Viktoria; Sangkyun, Lee
Abstract: Many organizations deal with the investigation of cancer including the National Institutes of Health, USA. Genomics(CCG). The Cancer Genome Atlas (TCGA) is an establishment of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) that has created maps of the key genomic changes in more than 30 cancer types. The aim of TCGA is the improvement of the effectiveness to diagnose, to treat and to guard against cancer through genome analysis. TCGA provides a publically available dataset. The Broad Institute TCGA GDAC Firehose arranges this data set that can be loaded directly with use of FireBrowse. FireBrowse allows simple and smart download and study TCGA data and TCGA analyses. The data is downloaded as zip files. Mario Deng created an R client called FirebrowseR with the objective of getting the TCGA data from FireBrowse conveniently. The size of record sets to download is limited. EasyTCGA is an R package providing easy batch downloading of particular TCGA data from FireBrowse using FirebrowseR. The key advantage of EasyTCGA is the downloading of the whole available data set you are interested in at once as a single data frame. The focus of this technical report is on the presentation of the R package EasyTCGA . That is why all specific expressions and variables like biological data and the like will not be explained. You get all relevant background informations on the given URL’s. EasyTGCA can download clinical data, sample-level log2 miRSeq and mRNASeq expression values, selected columns from the MAF (Mutation Annotation File) generated by MutSig and significantly mutated genes, as scored by MutSig.2018-10-11T13:27:16ZTechnical report for Collaborative Research Center SFB 876 - Graduate School
http://hdl.handle.net/2003/37158
Title: Technical report for Collaborative Research Center SFB 876 - Graduate School
Authors: Morik, Katharina; Rhode, Wolfgang2018-10-11T11:46:05ZPG594 -- Big Data
http://hdl.handle.net/2003/37157
Title: PG594 -- Big Data
Authors: Asmi, Mohamed; Bainczyk, Alexander; Bunse, Mirko; Gaidel, Dennis; May, Michael; Pfeiffer, Christian; Schieweck, Alexander; Schönberger, Lea; Stelzner, Karl; Sturm, David; Wiethoff, Carolin; Xu, Lili
Abstract: In der heutigen Welt wird die Verarbeitung großer Mengen von Daten immer wichtiger. Dabei wird eine Vielzahl von Technologien, Frameworks und Software-Lösungen eingesetzt, die explizit für den Big-Data-Bereich konzipiert wurden oder aber auf Big-Data-Systeme portiert werden können. Ziel dieser Projektgruppe (PG) ist der Erwerb von Expertenwissen hinsichtlich aktueller Tools und Systeme im Big-Data-Bereich anhand einer realen, wissenschaftlichen Problemstellung. Vom Wintersemester 2015/2016 bis zum Ende des Sommersemesters 2016 beschäftigte sich diese Projektgruppe mit der Verarbeitung und Analyse der Daten des durch den Fachbereich Physik auf der Insel La Palma betriebenen First G-APD Cherenkov Telescope (FACT). Dieses liefert täglich Daten im Terabyte- Bereich, die mit Hilfe des Clusters des Sonderforschungsbereiches 876 zunächst indiziert und dann auf effiziente Weise verarbeitet werden müssen, sodass diese Projektgruppe im besten Falle die Tätigkeit der Physiker mit ihren Ergebnissen unterstützen kann. Wie genau dies geschehen soll, sei auf den nachfolgenden Seiten genauer beleuchtet - begonnen mit dem dezidierten Anwendungsfall, unter Berücksichtigung der notwendigen fachlichen sowie technischen Grundlagen, bis hin zu den finalen Ergebnissen.2018-10-11T11:43:22ZRISE Germany Internship: Unfolding FACT Data
http://hdl.handle.net/2003/37156
Title: RISE Germany Internship: Unfolding FACT Data
Authors: Bieker, Jacob; Börner, Mathis; Brügge, Kai; Nöthe, Maximillian
Abstract: In this report the results from a 10 week internship are presented. The goal of the internship was to apply different unfolding approaches to conduct measurements of energy spectra from data aquired by FACT, the First G-APD Cherenkov Telescope. FACT is the first operational telescope of its kind, employing a camera equipped with silicon photo multipliers (G-APD aka SiPM) to primarily detect gamma rays. Improving the unfolding method can help with better interpretation of the data and more accurate physics results without the need for new equipment or more observations. The approaches tested during this internship range from simplistic matrix inversion to an improvement over of the previous standard (TRUEE).2018-10-11T11:41:22ZAutomated Data Collection for Modelling Texas Instruments Ultra Low-Power Chargers
http://hdl.handle.net/2003/37155
Title: Automated Data Collection for Modelling Texas Instruments Ultra Low-Power Chargers
Authors: Masoudinejad, Mojtaba
Abstract: Some IoT designers develop their ad-hoc conversion solution specifically designed for their entity. However, having Maximum Power Point Tracking (MPPT), battery control, converter and switching logic would require a series of components. These devices will increase the initial cost and the overall energy loss overhead of this middle-ware between the EH and the storage. Nevertheless, these issues can be conquered by integrating all these elements and logics into one single chip. Currently, there are three Texas Instruments (TI) chips from the BQ255XX series and ST (SPV1050) chip available on-the-shelf, specially designed for low energy environments. Among them, TI's BQ25505 and BQ25570 chips promise a better performance out of the box and are dominant in the market. Although multiple designers have used these chips in their IoT devices, no analytical analysis on them is available. Some basic information about these devices are available through their datasheets. However, for a reliable design and fast analysis of the overall energy performance of an IoT device, these chips have to be modelled.2018-10-11T11:39:45ZTechnical report for Collaborative Research Center SFB 876 - Graduate School
http://hdl.handle.net/2003/37154
Title: Technical report for Collaborative Research Center SFB 876 - Graduate School
Authors: Morik, Katharina; Rhode, Wolfgang2018-10-11T11:37:34ZA Power Model for DC-DC Boost Converters Operating in PFM Mode
http://hdl.handle.net/2003/37153
Title: A Power Model for DC-DC Boost Converters Operating in PFM Mode
Authors: Masoudinejad, Mojtaba
Abstract: Next generation of computing is going to be outside of the traditional stationary computing realm. In the future paradigm, many non-stationary objects around us sense and actuate on the environment while they are connected to each other via internet. During the last few years, the number of these devices has been growing rapidly. This is making an explosion of small computing platforms for commercial, consumer, and industrial use cases. The overall concept of IoT is based on the communication (mainly through the internet) between multiple entities which are generalised as things . According to the diversity of the application fields, large number of entities are considered as things . From simple one-bit sensors to complex robots. Even some concepts consider human being as an entity within an IoT system. This leads into ambiguity of the definition for objects. Consequently, no unified definition for things is accepted among different communities. However, Cyber Physical Systems (CPS) as embedded devices with communication capabilities would fit into most (if not all) of them.2018-10-11T11:34:22ZMathematical modelling of the quality-based order assignment problem
http://hdl.handle.net/2003/37152
Title: Mathematical modelling of the quality-based order assignment problem
Authors: Schmitt, Jacqueline; Hahn, Florian; Deuse, Jochen
Abstract: The increasing global comp etition forces companies to reduce their pro duction costs and increase the quality of their pro ducts at the same time. Due to individualized customer needs, there can b e numerous customer requirements to the pro ducts that need to b e fulfilled to ensure customer satisfaction. Therefore, many companies established a quality management (QM) system, which aims for continuous improvement of p erformance regarding system, pro cess, and pro duct quality. Basic concepts and requirements for QM systems can be found in the ISO 9000 standards series. A main principle hereby is the customer orientation so that individualized customer needs can be considered within the design of internal quality testing gates. Within this technical report we present two approaches to model the product to customer order assignment problem (PCO-AP) mathematically as a 0,1 assignment problem (0,1- AP) and generalized assignment problem (GAP).2018-10-11T11:32:41ZModel-Based Optimization of Subgroup Weights for Survival Analysis
http://hdl.handle.net/2003/37151
Title: Model-Based Optimization of Subgroup Weights for Survival Analysis
Authors: Richter, Jakob; Madjar, Katrin; Rahnenführer, Jörg
Abstract: To obtain a reliable prediction model for a specific cancer subgroup or cohort is often difficult due to the limited number of samples and, in survival analysis, even more due to potentially high censoring rates. Sometimes similar datasets are available for other patient subgroups with the same or a similar disease and treatment, e.g., from other clinical centers. Simple pooling of all subgroups can decrease the variance of the predicted parameters of the prediction models, but also increase the bias due to potential high heterogeneity between the cohorts.
A promising compromise is to identify which subgroups are similar enough to the specific subgroup of interest and then include only these for model building.
Similarity here refers to the relationship between input and output in the prediction model, and not necessarily to the distributions of the input and output variables themselves.
Here, we propose a subgroup-based weighted likelihood approach and evaluate it on a set of lung cancer cohorts. When interested in a prediction model for a specific subgroup, then for every other subgroup, an individual weight determines the strength with which its observations enter into the likelihood-based optimization of the model parameters. A weight close to 0 indicates that a subgroup should be discarded, and a weight close to 1 indicates that the subgroup fully enters into the model building process.
MBO (model based optimization) can be used to quickly find a good prediction model in the presence of a large number of hyperparameters to be tuned. Here, we use MBO to identify the best model for survival prediction in lung cancer subgroups, where besides the parameters of a Cox model additionally the individual values of the subgroup weights are optimized. Interestingly, often the resulting models with highest prediction quality are obtained for a mixed weight structure, i.e. both weights close to 0, weights close to 1, and medium weights are optimal, reflecting the similarity of the corresponding cancer subgroups.2018-10-11T11:30:33ZEfficient Track Reconstruction on Modern Hardware
http://hdl.handle.net/2003/37150
Title: Efficient Track Reconstruction on Modern Hardware
Authors: Lindemann, Thomas
Abstract: Particle physics has become a massively data-intensive discipline. Huge particle accelerators — such as the Large Hadron Collider (LHC) at CERN — produce vast amounts of experimental data — 4 TB/s in the case of the LHCb experiment at CERN — which often must be processed in real time. Named after the b-quark, LHCb is one of the four big experiments at CERN. The general scope is to explain the matter/anti-matter asymmetry. The main focus is the study of particle decays involving beauty and charm quarks.
In the LHCb Project, a continuous stream of hits is produced by the several stages of the LHCb detector. Given the low probability of observing an “interesting” collision, physicists produce a vast number of collision experiments in the hope of finding a few interesting ones. Thus, the event data have to be processed in real time, since there are no capabilities to store all collision event permanently with the current storage technology. Analyzing these data volumes has become the key limitation of the domain: any improvement in analysis performance translates into better insights on the physics side.
In this report, we present the results of our experiments of our current work with the HybridSeeding track reconstruction algorithm.2018-10-11T11:28:48ZPanel cointegrating polynomial regression analysis and the environmental Kuznets curve
http://hdl.handle.net/2003/37148
Title: Panel cointegrating polynomial regression analysis and the environmental Kuznets curve
Authors: de Jong, Robert M.; Wagner, Martin
Abstract: This paper develops a modified and a fully modified OLS estimator for a panel of cointegrating
polynomial regressions, i.e. regressions that include an integrated process and its powers
as explanatory variables. The stationary errors are allowed to be serially correlated and the
regressors are allowed to be endogenous and we allow for individual and time fixed effects. Inspired
by Phillips and Moon (1999) we consider a cross-sectional i.i.d. random linear process
framework. The modified OLS estimator utilizes the large cross-sectional dimension that allows
to consistently estimate and subtract an additive bias term without the need to also transform
the dependent variable as required in fully modified OLS estimation. Both developed estimators
have zero mean Gaussian limiting distributions and thus allow for standard asymptotic inference.
Our illustrative application indicates that the developed methods are a potentially useful
addition to not least the environmental Kuznets curve literature's toolkit.2018-10-10T13:23:00ZCombining uncertainty with uncertainty to get certainty? Efficiency analysis for regulation purposes
http://hdl.handle.net/2003/37146
Title: Combining uncertainty with uncertainty to get certainty? Efficiency analysis for regulation purposes
Authors: Andor, Mark; Parmeter, Christopher; Sommer, Stephan
Abstract: Data envelopment analysis (DEA) and stochastic frontier analysis (SFA),
as well as combinations thereof, are widely applied in incentive regulation
practice, where the assessment of efficiency plays a major role in regulation
design and benchmarking. Using a Monte Carlo simulation experiment,
this paper compares the performance of six alternative methods commonly
applied by regulators. Our results demonstrate that combination approaches,
such as taking the maximum or the mean over DEA and SFA efficiency
scores, have certain practical merits and might offer an useful alternative
to strict reliance on a singular method. In particular, the results highlight
that taking the maximum not only minimizes the risk of underestimation,
but can also improve the precision of efficiency estimation. Based on our results,
we give recommendations for the estimation of individual efficiencies
for regulation purposes and beyond.2018-10-10T13:17:53ZTesting relevant hypotheses in functional time series via self-normalization
http://hdl.handle.net/2003/37138
Title: Testing relevant hypotheses in functional time series via self-normalization
Authors: Dette, Holger; Kokot, Kevin; Volgushev, Stanislav
Abstract: In this paper we develop methodology for testing relevant hypotheses in a tuning-free
way. Our main focus is on functional time series, but extensions to other settings are also
discussed. Instead of testing for exact equality, for example for the equality of two mean
functions from two independent time series, we propose to test a relevant deviation under
the null hypothesis. In the two sample problem this means that an L2-distance between
the two mean functions is smaller than a pre-specified threshold. For such hypotheses
self-normalization, which was introduced by Shao (2010) and Shao and Zhang (2010) and
is commonly used to avoid the estimation of nuisance parameters, is not directly applicable.
We develop new self-normalized procedures for testing relevant hypotheses in the one
sample, two sample and change point problem and investigate their asymptotic properties.
Finite sample properties of the proposed tests are illustrated by means of a simulation study
and a data example.2018-10-05T12:05:01ZOptimal designs for inspection times of interval-censored data
http://hdl.handle.net/2003/37137
Title: Optimal designs for inspection times of interval-censored data
Authors: Malevich, Nadja; Müller, Christine H.
Abstract: We treat optimal equidistant and optimal non-equidistant inspection
times for interval-censored data with exponential distribution.We provide
in particular a recursive formula for calculating the optimal non-equidistant
inspection times which is similar to a formula for optimal spacing of quantiles
for asymptotically best linear estimates based on order statistics. This formula
provides an upper bound for the standardized Fisher information which
is reached for the optimal non-equidistant inspection times if the number of
inspections is converging to infinity. The same upper bound is also shown for
the optimal equidistant inspection times. Since optimal equidistant inspection
times are easier to calculate and easier to handle in practice, we study the
efficiency of optimal equidistant inspection times with respect to optimal nonequidistant
inspection times. Moreover, since the optimal inspection times are
only locally optimal, we provide also some results concerning maximin efficient
designs.2018-10-05T12:02:35ZOn second order conditions in the multivariate block maxima and peak over threshold method
http://hdl.handle.net/2003/37120
Title: On second order conditions in the multivariate block maxima and peak over threshold method
Authors: Bücher, Axel; Volgushev, Stanislav; Zou, Nan
Abstract: Second order conditions provide a natural framework for establishing asymptotic
results about estimators for tail related quantities. Such conditions are typically
tailored to the estimation principle at hand, and may be vastly different for estimators
based on the block maxima (BM) method or the peak-over-threshold (POT)
approach. In this paper we provide details on the relationship between typical second
order conditions for BM and POT methods in the multivariate case. We show that the
two conditions typically imply each other, but with a possibly different second order
parameter. The latter implies that, depending on the data generating process, one
of the two methods can attain faster convergence rates than the other. The class of
multivariate Archimax copulas is examined in detail; we find that this class contains
models for which the second order parameter is smaller for the BM method and vice
versa. The theory is illustrated by a small simulation study.2018-09-05T09:43:56ZThe Phillips unit root tests for polynomials of integrated processes
http://hdl.handle.net/2003/37119
Title: The Phillips unit root tests for polynomials of integrated processes
Authors: Stypka, Oliver; Wagner, Martin
Abstract: We show that the Phillips (1987) unit root tests have nuisance parameter free limiting dis-
tributions when applied to polynomials of integrated processes driven by linear process errors.
This substantially generalizes a similar result of Wagner (2012) allowing only for serially uncor-
related errors. The result is based on novel kernel weighted sum limit results involving powers
of integrated processes. These results allow us also consider additional modifications of the
Phillips (1987) tests applicable to polynomials of integrated processes.2018-09-05T09:41:26ZDetecting deviations from second-order stationarity in locally stationary functional time series
http://hdl.handle.net/2003/37118
Title: Detecting deviations from second-order stationarity in locally stationary functional time series
Authors: Bücher, Axel; Dette, Holger; Heinrichs, Florian
Abstract: A time-domain test for the assumption of second order stationarity of a
functional time series is proposed. The test is based on combining individual cumulative
sum tests which are designed to be sensitive to changes in the mean, variance and
autocovariance operators, respectively. The combination of their dependent p-values
relies on a joint dependent block multiplier bootstrap of the individual test statistics.
Conditions under which the proposed combined testing procedure is asymptotically
valid under stationarity are provided. A procedure is proposed to automatically choose
the block length parameter needed for the construction of the bootstrap. The finitesample
behavior of the proposed test is investigated in Monte Carlo experiments and
an illustration on a real data set is provided.2018-09-05T09:38:49ZThe U. S. fracking boom: Impact on oil prices
http://hdl.handle.net/2003/37078
Title: The U. S. fracking boom: Impact on oil prices
Authors: Frondel, Manuel; Horvath, Marco
Abstract: As of late 2008, the steady decline of U. S. crude oil production over the last decades was reversed by the increased adoption of the hydraulic fracturing (“fracking”) technology. Adapting the supply-side model proposed by Kaufmann et al. (2004) to assess OPEC’s ability to inﬂuence real oil prices, this paper investigates the effect of the increase in U. S. oil production due to fracking on world oil prices. Among our key results obtained from (dynamic) OLS estimations, there is a statistically signiﬁcant negative long-run relationship between increased U.S. oil production and oil prices.2018-07-31T14:08:06ZForeign competition and executive compensation in the manufacturing industry – A comparison between Germany and the U.S.
http://hdl.handle.net/2003/37058
Title: Foreign competition and executive compensation in the manufacturing industry – A comparison between Germany and the U.S.
Authors: Dyballa, Katharina; Kraft, Kornelius
Abstract: In this study we use import penetration as a proxy for foreign competition in order to empirically analyze (1) the impact of foreign competition on managerial compensation, (2) differences in the impact between Germany and the U.S and (3) whether the impact of import penetration is driven by implied efficiency effects. We use data from the manufacturing industry covering the period from 1984-2010 for Germany respectively 1992-2011 for the U.S and apply system GMM in order to solve potential endogeneity problems. It turns out that foreign competition leads to an increase of average per capita executive compensation in both countries. The impact of foreign competition on payperformance sensitivity differs between the US and Germany. A differentiation between imported intermediates (efficient sourcing strategy) and final inputs (competition) reveals that the impact of import penetration is not biased by efficiency effects.2018-07-24T13:42:57ZOptimal designs for frequentist model averaging
http://hdl.handle.net/2003/37014
Title: Optimal designs for frequentist model averaging
Authors: Alhorn, Kira; Schorning, Kirsten; Dette, Holger
Abstract: We consider the problem of designing experiments for the estimation of a target in
regression analysis if there is uncertainty about the parametric form of the regression
function. A new optimality criterion is proposed, which minimizes the asymptotic mean
squared error of the frequentist model averaging estimate by the choice of an experimental
design. Necessary conditions for the optimal solution of a locally and Bayesian optimal
design problem are established. The results are illustrated in several examples and it is
demonstrated that Bayesian optimal designs can yield a reduction of the mean squared
error of the model averaging estimator up to 45%.2018-07-16T12:03:35ZThe price response of residential electricity demand in Germany: A dynamic approach
http://hdl.handle.net/2003/36966
Title: The price response of residential electricity demand in Germany: A dynamic approach
Authors: Frondel, Manuel; Kussel, Gerhard; Sommer, Stephan
Abstract: Due to growing concerns about climate change, policy-makers from all
around the world establish measures, such as carbon taxes, to lower electricity demand
and energy consumption in general. Drawing on household panel data from
the German Residential Energy Consumption Survey (GRECS) that span over nine
years (2006-2014) and employing the sum of regulated price components as an instrument
for the likely endogenous electricity price, we gauge the response of residential
electricity demand to price increases on the basis of the dynamic Blundell-Bond estimator
to account for potential simultaneity and endogeneity problems, as well as
the Nickell bias. Estimating short- and long-run price elasticities of -0.44 and -0.66,
respectively, our results indicate that price measures may be effective in dampening
residential electricity consumption, particularly in the long run. Yet, we also find that
responses to price changes are very heterogeneous across household groups.2018-07-09T14:49:29ZOn axiomizing and extending the quasi-arithmetic mean
http://hdl.handle.net/2003/36880
Title: On axiomizing and extending the quasi-arithmetic mean
Authors: Hansen, Maurice
Abstract: Quasi-arithmetic means contain many other mean value concepts
such as the arithmetic, the geometric or the harmonic mean as
special cases. Treating quasi-arithmetic means as sequences of mappings
from I^n into I (for some real interval I) this paper shows that under
mild additional conditions this mapping is uniquely determined by its
values on I^2. This extends a well-known result by Huntington [4] where
this claim is proven only for special cases.2018-05-29T08:46:48ZSimar and Wilson two-stage efficiency analysis for Stata
http://hdl.handle.net/2003/36879
Title: Simar and Wilson two-stage efficiency analysis for Stata
Authors: Badunenko, Oleg; Tauchmann, Harald
Abstract: When analyzing what determines the efficiency of production, regressing
efficiency scores estimated by DEA on explanatory variables has much intuitive
appeal. Simar and Wilson (2007) show that this na¨ıve two-stage estimation
procedure suffers from severe flaws, that render its results, and in particular
statistical inference based on them, questionable. At the same time they propose
a statistically grounded bootstrap based two-stage estimator that eliminates the
above mentioned weaknesses of its na¨ıve predecessors and comes in two variants.
This article introduces the new Stata command simarwilson that implements
either variant of the suggested estimator in Stata. The command allows for various
options, and extends the original procedure in some respects. For instance, it
allows for analyzing both, output- and input-oriented efficiency. To demonstrate
the capabilities of the new command simarwilson we use data from the Penn
World Tables and the Global Competitiveness Report by the World Economic
Forum to perform a cross-country empirical study about the importance of quality
of governance of a country for its efficiency of output production.2018-05-25T13:51:06ZRobust discrimination between long-range dependence and a change in mean
http://hdl.handle.net/2003/36842
Title: Robust discrimination between long-range dependence and a change in mean
Authors: Gerstenberger, Carina
Abstract: In this paper we introduce a robust to outliers Wilcoxon change-point testing procedure,
for distinguishing between short-range dependent time series with a change in mean at unknown
time and stationary long-range dependent time series. We establish the asymptotic
distribution of the test statistic under the null hypothesis for L1 near epoch dependent
processes and show its consistency under the alternative. The Wilcoxon-type testing procedure
similarly as the CUSUM-type testing procedure of Berkes, Horvath, Kokoszka and
Shao (2006), requires estimation of the location of a possible change-point, and then using
pre- and post-break subsamples to discriminate between short and long-range dependence.
A simulation study examines the empirical size and power of the Wilcoxon-type testing
procedure in standard cases and with disturbances by outliers. It shows that in standard
cases the Wilcoxon-type testing procedure behaves equally well as the CUSUM-type testing
procedure but outperforms it in presence of outliers.2018-04-23T12:22:48ZDeviations from triangular arbitrage parity in foreign exchange and bitcoin markets
http://hdl.handle.net/2003/36820
Title: Deviations from triangular arbitrage parity in foreign exchange and bitcoin markets
Authors: Reynolds, Julia; Sögner, Leopold; Wagner, Martin; Wied, Dominik
Abstract: This paper applies new econometric tools to monitor and detect so-called "financial market dislocations",
defined as periods in which substantial deviations from arbitrage parities take place. In particular,
we focus on deviations from the triangular arbitrage parity for exchange rate triplets. Due to
increasing media attention towards mispricing in the market for cryptocurrencies, we include the cryptocurrency Bitcoin in addition to fiat currencies. We do not find evidence for substantial deviations
from the triangular arbitrage parity when only traditional fiat currencies are concerned. However, we
document significant deviations from triangular arbitrage parities in the newer markets for Bitcoin.2018-03-27T14:57:15ZEfficient designs for the estimation of mixed and self carryover effects
http://hdl.handle.net/2003/36819
Title: Efficient designs for the estimation of mixed and self carryover effects
Authors: Kunert, Joachim; Mielke, Johanna
Abstract: Biosimilars are copies of biological medicines that are developed by a competitor
after the patent for the originator drug has expired. Extensive clinical trials are
required to show therapeutic equivalence between the biosimilar and its reference
product before a biosimilar can be sold on the market. However, even after more
than 10 years of experience with biosimilars in Europe, there is still some uncertainty
if the patients who are already taking the reference product can switch between
the biosimilar and its reference product. One convenient way to assess the impact
of switches is the analysis of mixed and self carryover effects: if the products are
switchable, there should not be any difference in the carryover effects. This paper
determines a series of simple designs which are highly efficient for the comparison
of the mixed and self carryover effects of two treatments. The proof of efficiency
is not straightforward because the information matrix of the efficient designs is not
completely symmetric.2018-03-27T14:54:28Z