# Sonderforschungsbereich (SFB) 475

## Permanent URI for this collection

Seit dem 1. Juli 1997 ist an der UniversitÃ¤t Dortmund der Sonderforschungsbereich 475 "KomplexitÃ¤tsreduktion in multivariaten Datenstrukturen" eingerichtet worden. Hier arbeiten Wissenschaftler der Fachrichtungen Statistik, Informatik, Mathematik, Maschinenbau, Biometrie, Medizin, Wirtschafts- und Ingenieurwissenschaften der UniversitÃ¤ten Dortmund, Essen und Bochum sowie der Institute fÃ¼r Wirtschaftsforschung (RWI Essen), fÃ¼r Arbeitsphysiologie (IfADo, Dortmund) und der Chirurgischen Klinik (Dortmund) interdisziplinÃ¤r zusammen.

Hauptziel des SFB ist die Erforschung datenorientierter statistischer Modellbildung fÃ¼r komplexe Fragestellungen in den empirischen wie experimentellen Wissenschaften.

Ein Anwendungsschwerpunkt ist die Analyse von Wirtschaftsdaten, die auf KapitalmÃ¤rkten, bei der Konjunkturdiagnose und bei Wirtschaftsprognosen in nahezu unbegrenzter Menge vorliegen. Der Sonderforschungsbereich will Methoden und Strategien entwickeln, um aus diesen umfangreichen und komplexen Datenmengen die wesentlichen Informationen herauszuarbeiten.

Die statistische Untersuchung und Modellierung von biologischen und medizinischen sowie ingenieurwissenschaftlichen PhÃ¤nomenen ist ein zweiter Schwerpunkt. Eine Analyse der Daten, die z. B. am Krankenbett auf der Intensivstation online erfasst werden oder bei komplexen Produktionsprozessen anfallen, soll den EntscheidungstrÃ¤gern praktisch zeitgleich zu Diagnose und geeigneter Beeinflussung zur VerfÃ¼gung gestellt werden.

Insgesamt ist der wechselseitige Austausch von neu entwickelten statistischen Methoden einerseits und datenintensiven Anwendungen in den Bio-, Ingenieur- und Wirtschaftswissenschaften andererseits ein Charakteristikum des SFB 475.

Der Sonderforschungsbereich verfÃ¼gt derzeit Ã¼ber jÃ¤hrlich etwa eine Million â‚¬ bei etwa 20 zusÃ¤tzlichen Mitarbeiterstellen.

## Browse

### Recent Submissions

Item Classifying U. S. Business Cycles 1948 to 1997 - Meyer/Weinberg Revisited(2002-06) Heilemann, Ullrich; MÃ¼nch, Heinz JosefItem D-optimal plans for variable selection in data bases(2009-08-05T10:05:44Z) Schiffner, Julia; Weihs, ClausThis paper is based on an article of PumplÃ¼n et al. (2005a) that investigates the use of Design of Experiments in data bases in order to select variables that are relevant for classication in situations where a sufficient number of measurements of the explanatory variables is available, but measuring the class label is hard, e. g. expensive or time-consuming. PumplÃ¼n et al. searched for D-optimal designs in existing data sets by means of a genetic algorithm and assessed variable importance based on the found plans. If the design matrix is standardized these D-optimal plans are almost orthogonal and the explanatory variables are nearly uncorrelated. Thus PumplÃ¼n et al. expected that their importance for discrimination can be judged independently of each other. In a simulation study PumplÃ¼n et al. applied this approach in combination with five classiffication methods to eight data sets and the obtained error rates were compared with those resulting from variable selection on the basis of the complete data sets. Based on the D-optimal plans in some cases considerably lower error rates were achieved. Although PumplÃ¼n et al. (2005a) obtained some promising results, it was not clear for different reasons if D-optimality actually is beneficial for variable selection. For example, D-efficiency and orthogonality of the resulting plans were not investigated and a comparison with variable selection based on random samples of observations of the same size as the D-optimal plans was missing. In this paper we extend the simulation study of PumplÃ¼n et al. (2005a) in order to verify their results and as basis for further research in this field. Moreover, in PumplÃ¼n et al. D-optimal plans are only used for data preprocessing, that is variable selection. The classiffication models are estimated on the whole data set in order to assess the effects of D-optimality on variable selection separately. Since the number of measurements of the class label in fact is limited one would normally employ the same observations that were used for variable selection for learning, too. For this reason in our simulation study the appropriateness of D-optimal plans for training classiffication methods is additionally investigated. It turned out that in general in terms of the error rate there is no difference between variable selection on the basis of D-optimal plans and variable selection on random samples. However, for training of linear classiffication methods D-optimal plans seem to be beneficial.Item Heterogeneity in the cyclical sensitivity of job-to-job flows(2009-08-05T10:04:43Z) Schaffner, SandraAlthough the cyclical aspects of worker reallocation are investigated in numerous studies, only scarce empirical evidence exists for Germany. Kluve, Schaffner, and Schmidt (2009) emphasize the heterogeneity of cyclical influences for different subgroups of workers, defined by age, gender and skills. This paper contributes to this literature by extending this analysis to job-to-job flows. In fact, job-to-job transitions are found to be the largest flows in the German labor market. The findings suggest that job-finding rates and job-to-job transitions are procyclical while separation rates are acyclical or even countercyclical. The empirical framework employed here allows demographic groups to vary in their cyclical sensitivity. In Germany, young workers have the highest transition rates into and out of employment and between different jobs. Additionally, these transitions are more volatile than those of medium-aged or old workers. By contrast, old workers experience low transition rates and less pronounced swings than the core group of medium-aged, medium-skilled men. JEL Codes: E32, J63, J64, E24Item Biases in the measurement of labour market dynamics(2009-08-05T10:03:46Z) Bachmann, Ronald; Schaffner, SandraThis paper analyses worker transitions on the German labour market derived from different data sources. These include the two German micro data sets which provide high-frequency observations on workers' employment and unemployment histories: the German Socioeconomic Panel (SOEP) and the IAB Employment Subsample (IABS). This exercise thus yields a comprehensive overview of German labour market dynamics. Furthermore, it highlights the differences between the results obtained from a retrospective survey, the SOEP, and a process-induced administrative data set, the IABS. In particular, our analysis shows which groups of the labour market are particularly affected by measurement error. We also show which role measurement issues play when establishing the stylised facts about the cyclicality of labour market dynamics. JEL classi cation: J63; J64; J62Item Interventions in ingarch processes(2009-08-05T10:02:48Z) Fokianos, Konstantinos; Fried, RolandWe study the problem of intervention effects generating various types of outliers in a linear count time series model. This model belongs to the class of observation driven models and extends the class of Gaussian linear time series models within the exponential family framework. Studies about effects of covariates and interventions for count time series models have largely fallen behind due to the fact that the underlying process, whose behavior determines the dynamics of the observed process, is not observed. We suggest a computationally feasible approach to these problems, focusing especially on the detection and estimation of sudden shifts and outliers. To identify successfully such unusual events we employ the maximum of score tests, whose critical values in finite samples are determined by parametric bootstrap. The usefulness of the proposed methods is illustrated using simulated and real data examples.Item Smoothing densities under shape constraints(2009-08-05T10:01:31Z) Davies, Paul Laurie; Meise, MonikaIn Davies and Kovac (2004) the taut string method was proposed for calculating a density which is consistent with the data and has the minimum number of peaks. The main disadvantage of the taut string density is that it is piecewise constant. In this paper a procedure is presented which gives a smoother density by minimizing the total variation of a derivative of the density subject to the number, positions and heights of the local extreme values obtained from the taut string density. 2000 MSC primary: 62G07Item Optimal designs for an interference model(2009-08-05T09:59:58Z) Kunert, Joachim; Mersmann, SabineKunert and Martin (2000) determined optimal and efficient block designs in a model for fi eld trials with interference effects, for block sizes up to 4. In this paper we use Kushner's method (Kushner, 1997) of fi nding optimal approximate designs to extend the work of Kunert and Martin (2000) to optimal designs with five or more plots per block. We give an overall upper bound a_(t,b,k) for the trace of the information matrix of any design and show that an universally optimal approximate design will have all its sequences from merely four di fferent equivalence classes. We further determine the efficiency of a binary type I orthogonal array under the general phi_p-criterion. We find that these designs achieve high efficiencies of more than 0.94.Item Optimal designs for estimating critical effective dose under model uncertainty in a dose response study(2009-05-11T12:06:19Z) Dette, Holger; Pepelyshev, Andrey; Shpilev, Piter; Wong, Weng KeeToxicologists have been increasingly using a class of models to describe a continuous response in the last few years. This class consists of nested nonlinear models and is used for estimating various parameters in the models or some meaningful function of the model parameters. Our work here is the first to address design issues for this popular class of models among toxicologists. Specifically we construct a variety of optimal designs under model uncertainty and study their properties for estimating the critical effective dose (CED), which is model dependent. Two types of optimal designs are proposed: one type maximizes the minimum of efficiencies for estimating the CED regardless which member in the class of models is the appropriate model, and (ii) dual-objectives optimal design that simultaneously selects the most appropriate model and provide the best estimates for CED at the same time. We compare relative efficiencies of these optimal designs and other commonly used designs for estimating CED. To facilitate use of these designs, we have constructed a website that practitioners can generate tailor-made designs for their settings.Item The importance of two-sided heterogeneity for the cyclicality of labour market dynamics(2009-04-30T10:46:55Z) Bachmann, Ronald; David, PeggyUsing two data sets derived from German administrative data, including a linked employer-employee data set, we investigate the cyclicality of worker and job flows. The analysis stresses the importance of two-sided labour market heterogeneity in this context, taking into account both observed and unobserved characteristics. We find that small firms hire mainly unemployed workers, and that they do so at the beginning of an economic expansion. Later on in the expansion, hirings more frequently result from direct job-to-job transitions, with employed workers moving to larger firms. Contrary to our expectations, workers moving to larger firms do not experience significantly larger wage gains than workers moving to smaller establishments. Furthermore, our econometric analysis shows that the interaction of unobserved heterogeneities on the two sides of the labour market plays a more important role for employed job seekers than for the unemployed.Item Constructing irregular histograms by penalized likelihood(2009-04-30T10:42:23Z) Gather, Ursula; Mildenberger, Thoralf; Rozenholc, YvesWe propose a fully automatic procedure for the construction of irregular histograms. For a given number of bins, the maximum likelihood histogram is known to be the result of a dynamic programming algorithm. To choose the number of bins, we propose two different penalties motivated by recent work in model selection by Castellan [1] and Massart [2]. We give a complete description of the algorithm and a proper tuning of the penalties. Finally, we compare our procedure to other existing proposals for a wide range of different densities and sample sizes. [1] Castellan, G., 1999. Modified Akaike's criterion for histogram density estimation. Technical Report 99.61, UniversitÃ© de Paris-Sud. [2] Massart, P., 2007. Concentration inequalities and model selection. Lecture Notes in Mathematics Vol. 1896, Springer, New York.Item Consistency of the kernel density estimator - a survey(2009-04-30T10:40:09Z) WeiÃŸbach, Rafael; Wied, DominikVarious consistency proofs for the kernel density estimator have been developed over the last few decades. Important milestones are the pointwise consistency and almost sure uniform convergence with a fixed bandwidth on the one hand and the rate of convergence with a fixed or even a variable bandwidth on the other hand. While considering global properties of the empirical distribution functions is sufficient for strong consistency, proofs of exact convergence rates use deeper information about the underlying empirical processes. A unifying character, however, is that earlier and more recent proofs use bounds on the probability that a sum of random variables deviates from its mean.Item Kernelized design of experiments(2009-04-30T10:38:27Z) RÃ¼ping, Stefan; Weihs, ClausThis paper describes an approach for selecting instances in regression problems in the cases where observations x are readily available, but obtaining labels y is hard. Given a database of observations, an algorithm inspired by statistical design of experiments and kernel methods is presented that selects a set of k instances to be chosen in order to maximize the prediction performance of a support vector machine. It is shown that the algorithm significantly outperforms related approaches on a number of real-world datasets.Item An exact upper limit for the variance bias in the carry-over model with correlated errors(2009-04-30T10:31:47Z) Sailer, OliverThe analysis of crossover designs assuming i.i.d. errors leads to biased variance estimates whenever the true covariance structure is not spherical. As a result, the OLS F-Test for treatment differences is not valid. Bellavance et al. (Biometrics 52:607-612, 1996) use simulations to show that a modified F-Test based on an estimate of the within subjects covariance matrix allows for nearly unbiased tests. Kunert and Utzig (JRSS B 55:919-927, 1993) propose an alternative test that does not need an estimate of the covariance matrix. However, for designs with more than three observations per subject Kunert and Utzig (1993) only give a rough upper bound for the worst-case variance bias. This may lead to overly conservative tests. In this paper we derive an exact upper limit for the variance bias due to carry-over for an arbitrary number of observations per subject. The result holds for a certain class of highly efficient carry-over balanced designs.Item Frequency estimation by DFT interpolation(2009-04-30T10:23:11Z) Bischl, Bernd; Ligges, Uwe; Weihs, ClausThis article comments on a frequency estimator which was proposed by [1] and shows empirically that it exhibits a much larger mean squared error than a well known frequency estimator by [2]. It is demonstrated that by using a heuristical adjustment [3] the performance can be greatly improved. Furthermore, references to two modern techniques are given, which both nearly attain the CramÃ©r-Rao bound for this estimation problem. [1] U. Ligges. Transkription monophoner Gesangszeitreihen. PhD thesis, TU Dortmund, 2006. [2] B. G. Quinn. Estimating frequency by interpolation using Fourier coefficients. Signal Processing, IEEE Transactions on, 42(5):1264-1268, May 1994. [3] E. Jacobsen. Frequency estimation page. www.ericjacobsen.org/fe2/fe2.htm.Item A likelihood ratio test for stationarity of rating transitions(2009-01-13T08:05:43Z) Walter, Ronja; WeiÃŸbach, RafaelFor a time-continuous discrete-state Markov process as model for rating transitions, we study the time-stationarity by means of a likelihood ratio test. For multiple Markov process data from a multiplicative intensity model, maximum likelihood parameter estimates can be represented as martingale transform of the processes counting transitions between the rating states. As a consequence, the profile partial likelihood ratio is asymptotically x^2-distributed. An internal rating data set reveals highly significant instationarity.Item A geometric characterization of c-optimal designs for heteroscedastic regression(2009-01-13T08:04:40Z) Dette, Holger; Holland-Letz, TimWe consider the common nonlinear regression model where the variance as well as the mean is a parametric function of the explanatory variables. The c-optimal design problem is investigated in the case when the parameters of both the mean and the variance function are of interest. A geometric characterization of c-optimal designs in this context is presented, which generalizes the classical result of Elfving (1952) for c-optimal designs. As in Elfving's famous characterization c-optimal designs can be described as representations of boundary points of a convex set. However, in the case where there appear parameters of interest in the variance, the structure of the Elfving set is di fferent. Roughly speaking the Elfving set corresponding to a heteroscedastic regression model is the convex hull of a set of ellipsoids induced by the underlying model and indexed by the design space. The c-optimal designs are characterized as representations of the points where the line in direction of the vector c intersects the boundary of the new Elfving set. The theory is illustrated in several examples including pharmacokinetic models with random effects.Item Convergence analysis of generalized iteratively reweighted least squares algorithms on convex function spaces(2009-01-13T08:03:26Z) Bissantz, Nicolai; DÃ¼mbgen, Lutz; Munk, Axel; Stratmann, BerndThe computation of robust regression estimates often relies on minimization of a convex functional on a convex set. In this paper we discuss a general technique for a large class of convex functionals to compute the minimizers iteratively which is closely related to majorization-minimization algorithms. Our approach is based on a quadratic approximation of the functional to be minimized and includes the iteratively reweighted least squares algorithm as a special case. We prove convergence on convex function spaces for general coercive and convex functionals F and derive geometric convergence in certain unconstrained settings. The algorithm is applied to TV penalized quantile regression and is compared with a step size corrected Newton-Raphson algorithm. It is found that typically in the first steps the iteratively reweighted least squares algorithm performs significantly better, whereas the Newton type method outpaces the former only after many iterations. Finally, in the setting of bivariate regression with unimodality constraints we illustrate how this algorithm allows to utilize highly efficient algorithms for special quadratic programs in more complex settings.Item Bipower-type estimation in a noisy diffusion setting(2009-01-13T08:02:17Z) Podolskij, Mark; Vetter, MathiasWe consider a new class of estimators for volatility functionals in the setting of frequently observed Ito diffusions which are disturbed by i.i.d. noise. These statistics extend the approach of pre-averaging as a general method for the estimation of the integrated volatility in the presence of microstructure noise and are closely related to the original concept of bipower variation in the no-noise case. We show that this approach provides efficient estimators for a large class of integrated powers of volatility and prove the associated (stable) central limit theorems. In a more general Ito semimartingale framework this method can be used to define both estimators for the entire quadratic variation of the underlying process and jump-robust estimators which are consistent for various functionals of volatility. As a by-product we obtain a simple test for the presence of jumps in the underlying semimartingale.Item Robustness of optimal designs for the Michaelis-Menten model under a variation of criteria(2009-01-13T08:00:41Z) Dette, Holger; Kiss, Christine; Wong, Weng KeeThe Michaelis-Menten model has and continues to be one of the most widely used models in many diverse fields. In the biomedical sciences, the model continues to be ubiquitous in biochemistry, enzyme kinetics studies, nutrition science and in the pharmaceutical sciences. Despite its wide ranging applications across disciplines, design issues for this model are given short shrift. This paper focuses on design issues and provides a variety of optimal designs of this model. In addition, we evaluate robustness properties of the optimal designs under a variation in optimality criteria. To facilitate use of optimal design ideas in practice, we design a web site for generating and comparing different types of tailor-made optimal designs and user-supplied designs for the Michaelis-Menten and related models.Item Practical considerations for optimal designs in clinical dose finding studies(2009-01-13T07:59:29Z) Bretz, Frank; Dette, Holger; Pinheiro, JoseDetermining an adequate dose level for a drug and, more broadly, characterizing its dose response relationship, are key objectives in the clinical development of any medicinal drug. If the dose is set too high, safety and tolerability problems are likely to result, while selecting too low a dose makes it difficult to establish adequate efficacy in the confirmatory phase, possibly leading to a failed program. Hence, dose finding studies are of critical importance in drug development and need to be planned carefully. In this paper we focus on practical considerations for establishing efficient study designs to estimate target doses of interest. We consider optimal designs for both the estimation of the minimum effective dose (MED) and the dose achieving 100p% of the maximum treatment effect (EDp). These designs are compared with D-optimal designs for a given dose response model. Extensions to robust designs accounting for model uncertainty are also discussed. A case study is used to motivate and illustrate the methods from this paper.