SFB 823 A new approach for open-end sequential change point monitoring D iscussion P aper Josua Gösmann, Tobias Kley, Holger Dette Nr. 11/2019 A new approach for open-end sequential change point monitoring Josua Go¨smann Ruhr-Universita¨t Bochum Fakulta¨t fu¨r Mathematik 44780 Bochum, Germany josua.goesmann@ruhr-uni-bochum.de (corresponding author) Tobias Kley University of Bristol School of Mathematics Bristol BS8 1TH, United Kingdom tobias.kley@bristol.ac.uk Holger Dette Ruhr-Universita¨t Bochum Fakulta¨t fu¨r Mathematik 44780 Bochum, Germany holger.dette@ruhr-uni-bochum.de June 3, 2019 Abstract We propose a new sequential monitoring scheme for changes in the parameters of a multivariate time series. In contrast to procedures proposed in the literature which compare an estimator from the training sample with an estimator calculated from the remaining data, we suggest to divide the sample at each time point after the training sample. Estimators from the sample before and after all separation points are then continuously compared calculating a maximum of norms of their differences. For open- end scenarios our approach yields an asymptotic level α procedure, which is consistent under the alternative of a change in the parameter. JEL classification: C01,C22 Keywords and phrases: change point analysis, open-end procedures, sequential moni- toring 1 1 Introduction Nowadays, nearly all fields of applications require sophisticated statistical modelling and statistical inference to draw scientific conclusions from the observed data. In many cases data is time dependent and the involved model parameters or the model itself may not be necessarily stable. In such situations it is of particular importance to detect changes in the processed data as soon as possible and to adapt the statistical analysis accordingly. These changes are usually called change points or structural breaks in the literature. Due to its universality, methods for change point analysis have a vast field of possible applica- tions - ranging from natural sciences [for example biology and meteorology] to humanities [economics, finance, social sciences]. Since the seminal papers of Page (1954, 1955) the problem of detecting change points in time series has received substantial attention in the statistical literature. The contributions to this field can be roughly divided into the areas of retrospective and sequential change point analysis. In the retrospective case, historical data sets are examined with the aim to test for changes and identify their position within the data. In this setup, the data is assumed to be com- pletely available before the statistical analysis is started (a-posteriori analysis). A compre- hensive overview of retrospective change point analysis can be found in Aue and Horva´th (2013). In many practical applications, however, data arrives consecutively and breaks can occur at any new data point. In such cases the statistical analysis for changes in the pro- cessed data has to start immediately with the target to detect changes as soon as possible. This field of statistics is called sequential change point detection [sometimes also: online change point detection]. In the major part of the 20th century the problem of sequential change point detection was tackled using procedures [mostly called control charts, see Lai (1995, 2001) for comprehen- sive reviews], which are optimized to have a minimal detection delay but do not control the probability of a false alarm (type I error). A new paradigm was then introduced by Chu et al. (1996), who use initial data sets and therefrom employ invariance principles to also control the type I error. The methods developed under this paradigm [see below] can again be subdivided into closed-end and open-end approaches. In closed-end scenarios monitoring is stopped at a fixed pre-defined point of time, while in open-end scenarios monitoring can - in principle - continue forever if no change point is detected. In the paper at hand we develop a new approach for sequential change point detection in an open-end scenario. To be more precise let {Xt}t∈Z denote a d-dimensional time series and let Ft be the distribution function of the random variable Xt at time t. We are studying monitoring procedures for detecting changes of a parameter θt = θ(Ft), where θ = θ(F ) is a p-dimensional parameter of a distribution function F on Rd (such as the mean, variance, correlation etc.). In particular we will develop a decision rule for the hypothesis 2 of a constant parameter, that is H0 : θ1 = · · · = θm = θm+1 = θm+2 = . . . , (1.1) against the alternative that the parameter changes (once) at some time m+k? with k? ≥ 1, that is H1 : ∃k? ∈ N : θ1 = · · · = θm+k?−1 6= θm+k? = θm+k?+1 = . . . . (1.2) In this setup, which was originally introduced by Chu et al. (1996), the first m observations are assumed to be stable and will serve as an initial training set. The problem of sequential change point detection in the hypotheses paradigm as pictured above has received substan- tial interest in the literature. Since the seminal paper of Chu et al. (1996) several authors have worked in this area. Aue et al. (2006), Aue et al. (2009), Fremdt (2014b) and Aue et al. (2014) developed methodology for detecting changes in the coefficients of a linear model, while Wied and Galeano (2013) and Pape et al. (2016) considered sequential mon- itoring schemes for changes in special functionals such as the correlation or variance. A Mosum-approach was employed by Leisch et al. (2000), Horva´th et al. (2008) or Chen and Tian (2010) to monitor the mean and linear models, respectively. Recently, Hoga (2017) used an `1-norm to detect changes in the mean and variance of a multivariate time series, Kirch and Weber (2018) defined a unifying framework for detecting changes in different parameters with the help of several statistics and Otto and Breitung (2019) considered a Backward CUSUM, which monitors changes based on recursive residuals in a linear model. A helpful but not exhaustive overview of different sequential procedures can be found in Section 1, in particular Table 1, of Anatolyev and Kosenok (2018). A common feature of all procedures in the cited literature consists in the comparison of estimators from different subsamples of the data. To be precise, let X1, . . . , Xm denote an initial training sample and X1, . . . , Xm, . . . , Xm+k be the available data at time m + k. Several authors propose to investigate the differences θˆm1 − θˆm+km+1 , (1.3) (in dependence of k), where θˆji denotes the estimator of the parameter from the sample Xi, . . . , Xj . In the sequential change point literature monitoring schemes based on the differences (1.3) are usually called (ordinary) CUSUM procedures and have been considered by Horva´th et al. (2004), Aue et al. (2006, 2009, 2014), Schmitz and Steinebach (2010) or Hoga (2017). Other authors suggest using a function of the differences { θˆm1 − θˆm+km+j+1 } j=0,...,k−1 (1.4) 3 (in dependence of k) and the corresponding procedures are usually called Page-CUSUM tests [see Fremdt (2014b), Aue et al. (2015), or Kirch and Weber (2018) among others]. As an alternative we propose - following ideas of Dette and Go¨smann (2018) - a monitoring scheme based on a function of the differences { θˆm+j1 − θˆm+km+j+1 } j=0,...,k−1 . (1.5) The intuitive advantage of (1.5) over (1.3) is the screening for all possible positions of the change point, which takes into account that the change point not necessarily comes with observation Xm+1 and so θˆ m+k m+1 maybe ‘corrupted’ by pre-change observations. This issue is also partially addressed by (1.4), where different positions are examined and compared with the estimator of the parameter from the training sample. We will demonstrate in Sec- tion 4 that sequential monitoring schemes based on the differences (1.5) yield a substantial improvement in power compared to the commonly used methods based on (1.3) and (1.4). To avoid misunderstandings, the reader should note that a (total) comparison based on differences of the form (1.5), is typically also called a CUSUM-approach in the retrospec- tive change point analysis [see Aue and Horva´th (2013) for a comprehensive overview of (retrospective) change point analysis]. The present paper is devoted to a rigorous statistical analysis of a sequential monitoring based on the differences defined in (1.5) in the context of an open-end scenario. In Section 2 we introduce the new procedure and develop a corresponding asymptotic theory to obtain critical values such that monitoring can be performed at a controlled type I error. The theory is broadly applicable to detect changes in a general parameter θ of a multivariate time series. As all monitoring schemes in this context the method depends on a threshold function and we also discuss the choice of this function. In particular we establish an interesting result regarding this choice and establish a connection to corresponding ideas made by Horva´th et al. (2004) and Fremdt (2014b), which may also be of interest in closed- end scenarios. In Section 3 we discuss several special cases and demonstrate that the new methodology is applicable to detect changes in the mean and the parameters of a linear model. Finally, we present a small simulation study in Section 4, where we compare our approach to those developed by Horva´th et al. (2004) and Fremdt (2014b). In particular we demonstrate that the monitoring scheme based on the differences (1.5) yields a test with a controlled type I error and a smaller type II error than the procedures in the cited references. 4 2 Asymptotic properties Throughout this paper let F denote a d-dimensional distribution function and θ = θ(F ) a p-dimensional parameter of F . We will denote by Fˆ ji (z) = 1 j − i+ 1 j∑ t=i I{Xt ≤ z} (2.1) the empirical distribution function of observations Xi, . . . , Xj (here the inequality is under- stood component-wise) and consider the canonical estimator θˆji = θ(Fˆ j i ) of the parameter θ from the sample Xi, . . . , Xj . To test the hypotheses (1.1) and (1.2) in the described online setting in a open-end scenario we propose a monitoring scheme defined by Eˆm(k) = m −1/2 k−1max j=0 (k − j) ∥∥∥θˆm+j1 − θˆm+km+j+1∥∥∥ Σˆ−1 , (2.2) where the statistic Σˆ denotes an estimator of the long-run variance matrix Σ (defined in Assumption 2.2) and the symbol ‖v‖2A = v>Av denotes a weighted norm of the vector v induced by the positive definite matrix A. The monitoring is then performed as follows. With observation Xm+k arriving, one computes Eˆm(k) and compares it to an appropriate threshold function, which is also called weighting function in the literature, say w. If Eˆm(k) > cαw(k/m) (2.3) monitoring is stopped and the null hypothesis (1.1) is rejected in favor of the alternative (1.2). If the inequality (2.3) does not hold, monitoring is continued with the next observa- tion Xm+k+1. We will derive the limiting distribution of sup ∞ k=1 Eˆm(k)/w(k/m) in Theorem 2.6 below to determine the constant cα involved in (2.3), such that the test keeps a nominal level of α (asymptotically as m→∞). Remark 2.1 The statistic (2.2) is related to a detection scheme, which was recently pro- posed by Dette and Go¨smann (2018) for the closed-end case, where monitoring ends with observation mT , for some T ∈ N. These authors considered the statistic Dˆm(k) = m −3/2 k−1max j=0 (m+ j)(k − j)‖θˆm+j1 − θˆm+km+j+1‖Σˆ−1 , (2.4) and showed mT max k=1 Dˆm(k) w(k/m) D =⇒ max t∈[0,T ] max s∈[0,T ] |(s+ 1)W (t+ 1)− (t+ 1)W (s+ 1)| w(t) , (2.5) where W denotes a p-dimensional Brownian motion and throughout this paper the symbol 5 D =⇒ denotes weak convergence (in the space under consideration). However, this statistic cannot be considered in an open-end scenario for the typical threshold functions considered in the literature satisfying lim supt→∞ t/w(t) <∞ (in this case the limit on the right-hand side of (2.5) would be almost surely infinite for T =∞). As threshold functions satisfying lim supt→∞ t2/w(t) < ∞ will cause a loss in power as demonstrated in an unpublished simulation study, we propose to replace the factor (m+ j) in (2.4) by the size of the initial sample m, which leads to the monitoring scheme defined by (2.2). To discuss the asymptotic properties of our approach, we require the following notation. The symbol P =⇒ denotes convergence in probability. The process {W (s)}s∈[0,∞) will usually represent a standard p-dimensional Brownian motion. For a vector v ∈ Rd, we denote by |v| = (∑di=1 v2i )1/2 its Euclidean norm. For the sake of a clear distinction we will employ n sup i=1 a(i) for discrete indexing (with integer arguments) and sup 0≤x≤1 a(x) for continuous indexing (with arguments taken from the interval [0, 1] or another subset of R). Next, we define the influence function (assuming its existence) by IF(x, F, θ) = lim ε↘0 θ((1− ε)F + εδx)− θ(F ) ε , (2.6) where δx(z) = I{x ≤ z} is the distribution function of the Dirac measure at the point x ∈ Rd and the inequality in the indicator is again understood component-wise. We will focus on functionals that allow for an asymptotic linearization in terms of the influence function, that is θˆji − θ = θ(Fˆ ji )− θ(F ) = 1 j − i+ 1 j∑ t=i IF(Xt, F, θ) +Ri,j (2.7) with asymptotically negligible remainder terms Ri,j . Finally, for the sake of readability we introduce the following abbreviation IF t = IF(Xt, Ft, θ) , where Ft is again the distribution function of Xt. Under the null hypothesis (1.1) we will impose the following assumptions on the underlying time series. 6 Assumption 2.2 (Approximation) The time series {Xt}t∈Z is (strictly) stationary, such that Ft = F for all t ∈ Z. Further, for each m ∈ N there exist two independent, p- dimensional standard Brownian motions Wm,1 and Wm,2, such that for some positive con- stant ξ < 1/2 the following approximations hold ∞ sup k=1 1 kξ ∣∣∣∣ m+k∑ t=m+1 IF t − √ ΣWm,1(k) ∣∣∣∣ = OP(1) (2.8) and 1 mξ ∣∣∣∣ m∑ t=1 IF t − √ ΣWm,2(k) ∣∣∣∣ = OP(1) (2.9) as m→∞, where Σ = ∑t∈Z Cov (IF0, IF t) ∈ Rp×p denotes the long-run variance matrix of the process {IF t}t∈Z, which we assume to exist and to be non-singular. Assumption 2.3 (Threshold function) The threshold function w : R≥0 → R+ is uniformly continuous, has a positive lower bound, say `w > 0, and satisfies lim sup t→∞ t w(t) <∞ . Assumption 2.4 (Linearization) The remainder terms in the linearization (2.7) satisfy k max i,j=1 i c(α)) ≤ α . (2.12) 8 The following corollary then states that our approach leads to a level α detection scheme. Corollary 2.7 Grant the Assumptions of Theorem 2.6 and further let c(α) satisfy inequality (2.12), then lim sup m→∞ P ( ∞ sup k=1 Eˆm(k) w(k/m) > c(α) ) ≤ α . The limit distribution obtained in Theorem 2.6 strongly depends on the considered threshold. A special family of thresholds that has received considerable attention in the literature [see Horva´th et al. (2004), Fremdt (2014b), Kirch and Weber (2018) among many others] is given by wγ(t) = (1 + t) max {( t 1 + t )γ , ε } with 0 ≤ γ < 1/2 , (2.13) where the cutoff ε > 0 can be chosen arbitrary small in applications and only serves to reduce the assumptions and technical arguments in the proof [see also Wied and Galeano (2013)]. With these functions the limit distribution in (2.11), with the threshold function as the denominator, can be simplified to an expression that is more easily tractable via simulations. Straightforward calculations yield that Assumption 2.3 is satisfied by the function wγ and the limit distribution in Theorem 2.6 simplifies as follows. Corollary 2.8 For a p-dimensional Brownian motion W with independent components it holds that sup 0≤t<∞ max 0≤s≤t t+ 1 wγ(t) ∣∣∣W( s s+ 1 ) −W ( t t+ 1 )∣∣∣ D= sup 0≤t<1 max 0≤s≤t 1 max{tγ , ε} ∣∣∣W (t)−W (s)∣∣∣ := L1,γ . For the investigation of the consistency of the monitoring scheme (2.2) we require the following assumption. Assumption 2.9 Under the alternative H1 defined in (1.2) let θ(1) := θ(F1) = θ(F2) = · · · = θ(Fm+k∗) 6= θ(2) := θ(Fm+k∗+1) = θ(Fm+k∗+1) = · · · Further assume that k∗ is independent of m and that the process {IF t} is of the following order before and after the change, respectively, 1√ m ∣∣∣∣m+k∗∑ t=1 IF t ∣∣∣∣ = OP(1) and 1√m ∣∣∣∣ 2m∑ t=m+k∗+1 IF t ∣∣∣∣ = OP(1) . (2.14) 9 Additionally assume that the remainders defined in (2.4) satisfy |R1,m+k∗ | = OP ( 1√ m ) and |Rm+k∗+1,2m| = OP ( 1√ m ) . (2.15) Remark 2.10 The assumptions stated above are substantially weaker than those used to investigate the asymptotic properties of sup∞k=1 Eˆm(k)/w( k m) under the null hypothesis. Basically, we only assume reasonable behavior of the time series before and after the change point and can drop the uniform approximation in Assumption 2.2 and the uniform negligi- bility of the remainders in Assumption 2.4. It is easy to see, that (2.14) is already satisfied if both, the phases before and after the change fulfill a central limit theorem. Finally, it is worth mentioning that one can also derive the subsequent results when replacing the 2m by cm for an arbitrary constant c > 1, however - for the sake of better readability - we will work with this (minimally stricter) assumption. The next Theorem yields consistency under the alternative hypothesis. Theorem 2.11 Assume that the alternative hypothesis (1.2) and Assumptions 2.3 and 2.9 hold. If further Σˆm is a consistent and non-singular estimator of the long-run variance matrix Σ it holds that ∞ sup k=1 Eˆm(k) w(k/m) P =⇒∞ . Consequently, lim m→∞P ( ∞ sup k=1 Eˆm(k) w(k/m) > c ) = 1 for any constant c ∈ R. 3 Some specific change point problems In this section we briefly illustrate how the theory developed in Section 2 can be employed to construct monitoring schemes for a specific parameter of the distribution function. For the sake of brevity we restrict ourselves to the mean and the parameters in a linear model. Other examples such as the variance or quantiles can be found in Dette and Go¨smann (2018). 10 3.1 Changes in the mean The sequential detection of changes in the mean µ(F ) = EF [X] = ∫ Rd xdF (x) . has been extensively discussed in the literature [see Aue and Horva´th (2004), Fremdt (2014b) or Hoga (2017) among many others]. Is is easy to verify (and well known), that the influence function for the mean is given by IF(x, F, µ) = x− EF [X] , and Assumption 2.4 and the second part of Assumption 2.9 are obviously satisfied in this case since we have Ri,j = 0 for all i, j. For the remaining assumptions in Section 2 it now suffices that the centered time series { Xt − E[Xt] } t∈Z fulfills Assumption 2.2, which also implies the remaining part of Assumption 2.9 [see also the discussion in Remark 2.5]. In this situation both, Theorem 2.6 and Theorem 2.11 are valid provided that the chosen threshold fulfills Assumption 2.3. 3.2 Changes in linear models Consider the time-dependent linear model Yt = P > t βt + εt , (3.1) where the random variables {Pt}t∈N are the Rp-valued predictors, βt ∈ Rp is a p-dimensional parameter and {εt}t∈N is a centered random sequence independent of {Pt}t∈N. The identi- fication of changes in the vector of parameters in the linear model represents the prototype problem in sequential change point detection as it has been extensively studied in the lit- erature [see Chu et al. (1996), Horva´th et al. (2004), Aue et al. (2009), Fremdt (2014b), among many others]. This situation is covered by the general theory developed in Section 2 and 3. To be precise let Xt = (P > t , Yt) > ∈ Rd , d = p+ 1 and t = 1, 2 . . . (3.2) 11 be the the joint vectors of predictor and response with (joint) distribution function Ft, such that the marginal distributions of Yt and Pt are given by Ft,Y = Ft(∞, . . . ,∞, ·) and Ft,P = Ft(·, . . . , ·,∞) , respectively, where we will assume that the predictor sequence is stationary, that is Ft,P = FP . In a first step we will consider the case, where the moment matrix M := E[P1P>1 ] = ∫ Rd p · p>dFP (p) is known (we will discuss later on why this assumption is non-restrictive) and non-singular. In this setup, the parameter βt can be represented as a functional of the distribution function Ft, that is βt = β(Ft) := M −1 · ∫ Rd p · ydFt(y, p) = M−1 · E [ PtYt ] , which leads to the estimators βˆji = β(Fˆ j i ) = M−1 j − i+ 1 j∑ t=i PtYt (3.3) from the sample (Pi, Yi), . . . , (Pj , Yj). To compute the influence function, let (p, y) ∈ Rp×R, then IF((p, y), Ft, β) = lim η↘0 β ( (1− η)Ft + εδ(p,y) )− β(Ft) η = lim η↘0 M−1 [ (1− η)E[PtYt] + ηpy η − βt η ] = M−1 ( py − E[PtYt] ) , which is the influence function (for β) in the linear model stated above [see for example Hampel et al. (1986) for a comprehensive discussion of influence functions]. In the following, we will use the notation IF t = IF ( Xt, Ft, β ) again. Note that IF t = M−1 ( PtYt − E[PtYt] ) = M−1PtYt − βt , (3.4) 12 which directly gives E[IF t] = 0. Under the null hypothesis the random sequence (Xt)t∈N is stationary and the linearization defined in (2.7) simplifies to βˆji − β1 = β(Fˆ ji )− β1 = M−1 j − i+ 1 j∑ t=i PtYt − β1 = 1 j − i+ 1 j∑ t=i ( M−1PtYt − βt ) = 1 j − i+ 1 j∑ t=i IF t . (3.5) Consequently, the remainders in (2.7) vanish and Assumption 2.4 is obviously satisfied. Next, note that the long-run variance matrix is given by Σ = ∑ t∈Z Cov (IF0, IF t) = M−1ΓM−1 (3.6) with Γ = ∑ t∈Z Cov ( Y0P0, YtPt ) , which can be estimated by Σˆ = M−1ΓˆM−1 where Γˆ is an estimator for Γ. Observing (3.5) it is now easy to see that in the resulting statistic Eˆm the matrix M cancels out, that is Eˆm(k) = m −1/2 k−1max j=0 (k − j) ∥∥∥βˆm+j1 − βˆm+km+j+1∥∥∥ Σˆ−1 = m−1/2 k−1max j=0 (k − j) ∥∥∥ 1 m+ j m+j∑ t=1 YtPt − 1 k − j m+k∑ t=m+j+1 YtPt ∥∥∥ Γˆ−1 (3.7) and therefore does not depend on the matrix M . We therefore obtain the following result, which describes the asymptotic properties of the monitoring scheme based on the statistic Eˆm for a change in the parameter in the linear regression model (3.1). The proof is a direct consequence of Theorem 2.6 and 2.11. Corollary 3.1 Assume that the predictor sequence {Pt}t∈N is strictly stationary with a non- singular second moment matrix M = E[P1P>1 ]. Let Γˆm denote a non-singular, consistent estimator of the non-singular long-run variance matrix Γ defined in (4.8). Further suppose that the sequences {Pt}t∈N and {εt}t∈N are independent and let the threshold function under consideration fulfill Assumption 2.3. (i) Under the null hypothesis H0 of no change assume additionally that the sequence {IF t}t∈N defined in (3.4) admits the approximation in Assumption 2.2. Then moni- toring based on the statistic Eˆ in (3.7) is an asymptotic level α procedure. (ii) Under the alternative hypothesis H1 assume that {IF t}t∈N fulfills (2.14) of Assump- tion 2.9. Then the monitoring based on the statistic Eˆ in (3.7) is consistent. 13 4 Finite sample properties In this section we investigate the finite sample properties of our monitoring procedure and demonstrate its superiority with respect to the available methodology. We choose the following two statistics as our benchmark Qˆm(k) := k m1/2 ∥∥∥θˆm1 − θˆm+km ∥∥∥ Σˆ−1 , Pˆm(k) := k−1 max j=0 k − j m1/2 ∥∥∥θˆm1 − θˆm+km+j+1∥∥∥ Σˆ−1 . (4.1) The procedure based on Qˆ was originally proposed by Horva´th et al. (2004) for detecting changes in the parameters of linear models and since then reconsidered for example by Aue et al. (2012), Wied and Galeano (2013) and Pape et al. (2016) for the detection of changes in the CAPM-model, correlation and variances, respectively. A statistic of the type Pˆ was recently proposed by Fremdt (2014b) and has been already reconsidered by Kirch and Weber (2018). In the simulation study we will restrict ourselves to the commonly used class of threshold functions wγ defined in (2.13), where we set the involved, technical constant ε = 0. Under the assumptions made in Section 2, it can be shown by similar arguments as given in Appendix A that ∞ sup k=1 Qˆm(k) wγ(k/m) D =⇒ sup 0≤t<1 |W (t)| max{tγ , ε} =: L2,γ (4.2) and ∞ sup k=1 Pˆm(k) wγ(k/m) D =⇒ sup 0≤t<1 max 0≤s≤t 1 max{tγ , ε} ∣∣∣W (t)− 1− t 1− sW (s) ∣∣∣ =: L3,γ , (4.3) where W denotes a p-dimensional Brownian motion. For detailed proofs (under slightly different assumptions) of (4.2) and (4.3), the reader is relegated to Horva´th et al. (2004) and Fremdt (2014b), where procedures of these types are considered in the special case of a linear model. Recall the notation of L1,γ introduced in Corollary 2.8. By (4.2), (4.3) and Corollary 2.7 the necessary critical values for the procedures Eˆ, Qˆ and Pˆ combined with threshold wγ are given as the (1− α)-quantiles of the distributions L1,γ , L2,γ and L3,γ , respectively and can easily be obtained by Monte Carlo simulations. The quantiles are listed in Table 1 for dimensions p = 1 and p = 2 and have been calculated by 10000 runs simulating the cor- responding distributions where the underlying Brownian motions have been approximated on a grid of 5000 points. In Sections 4.1 and 4.2 below, we will examine the finite sample properties of the three statistics for the detection of changes in the mean and in the regres- 14 sion coefficients of a linear model, respectively. All subsequent results presented in these sections are based on 1000 independent simulation runs and a fixed test level of α = 0.05. L1,γ L2,γ L3,γ p γ \α 0.01 0.05 0.1 0.01 0.05 0.1 0.01 0.05 0.1 1 0 2.9762 2.4721 2.2175 2.8262 2.2599 1.9914 2.7912 2.2365 1.9497 0.25 3.1050 2.5975 2.3542 2.9638 2.4296 2.1758 2.9445 2.3860 2.1060 0.45 3.4269 2.9701 2.7398 3.3817 2.9241 2.7002 3.3015 2.7992 2.5437 2 0 3.4022 2.8943 2.6562 3.2272 2.6794 2.4008 3.2461 2.6957 2.4266 0.25 3.5279 3.0948 2.7781 3.3322 2.7981 2.5481 3.3630 2.8433 2.5911 0.45 3.8502 3.3912 3.1509 3.7010 3.2046 2.9543 3.7467 3.2966 3.0620 Table 1: (1-α)-quantiles of the distributions L1,γ, L2,γ and L3,γ for different choices of γ and different dimensions of the parameter. The cutoff constant was set to ε = 0. The results for L2,γ and L3,γ for p = 1 are taken from Fremdt (2014b) and Horva´th et al. (2004), respectively. 4.1 Changes in the mean In this section we will compare the finite sample properties of the procedures based on the statistics Eˆ, Pˆ and Qˆ for changes in the mean as outlaid in Section 3.1. Here we test the null hypothesis of no change which is given by H0 : µ1 = · · · = µm = µm+1 = µm+2 = . . . , (4.4) while the alternative, that the parameter µt changes beyond the initial data set, is defined as H1 : ∃k? ∈ N : µ1 = · · · = µm+k?−1 6= µm+k? = µm+k?+1 = . . . . (4.5) We will consider two different data generating models, a white noise process and an autore- gressive process given by (M1) Xt i.i.d. ∼ N (0, 1) , (M2) Xt = 0.1Xt−1 + εt with εt i.i.d. ∼ N (0, 1) . For the AR(1)-process specified in model (M2), we create a burn-in sample of 100 obser- vations in the first place. To simulate the alternative hypotheses, changes in the mean are added to the data, that is Xδt = Xt if t < m+ k∗ ,Xt + δ if t ≥ m+ k∗ , 15 where δ = E[Xm+k∗ ]− E[Xm+k∗−1] denotes the desired change amount. For the necessary covariance estimation we employ the well known quadratic spectral estimator [see (Andrews, 1991)] with its implementation in the R-package ‘sandwich’ [see Zeileis (2004)]. To take into account the possible appearance of changes only the initial stable segment X1, . . . , Xm is used for this estimate, which is standard in the literature [see for example Horva´th et al. (2004), Wied and Galeano (2013), or Dette and Go¨smann (2018) among many others]. In Table 2 we display the type 1-Error for both time series models and different choices of γ in the threshold functions. The principle observation is, that all three statistical procedures offer a reasonable approximation of the desired nominal level of α = 0.05. The results for the dependent model (M2) are slightly worse than those for the white noise model (M1). This effect may be caused by a less precise estimation of the long-run variance for small sample sizes. Accordingly, this effect is weaker for the case m = 100. In Figures 1, 2, 3 and 4 we illustrate the power of the procedures under the alternative hy- pothesis for increasing values of the change and different change positions for combinations of γ = 0, γ = 0.45 and m = 50, m = 100. The basic tendency in all four plots is similar: While the procedures behave similar for a change close to the initial data set (first row), the method based on Eˆ is clearly superior to the others the more the distance to the initial set grows. To give an example, consider the left plot of the last row in Figure 1. Here the test based on the statistic Eˆ already has a power of 32.9% for a change of µ = 1 whereas the tests based on the statistics Pˆ and Qˆ have power of 24.4% and 22.7%, respectively. The superior performance of Eˆ can most likely be explained by the more accurate estimate of the pre-change parameter by θˆm+j1 , while the the other statistics only involve the estimator θˆm1 [see formulas (2.2) and (4.1)]. For the sake of an appropriate understanding of our findings, the reader should be aware of the fact, that - although we consider open-end procedures here - simulations have to be stopped eventually. Here we chose this stopping point as 1000 (m = 50) or 3000 (m = 100) observations and it is expectable that the testing power of all procedures increases with a later stopping point. Therefore the observed superiority of Eˆ refers to the type 2-Error until the specified stopping point. 16 (M1) (M2) m γ Eˆ Qˆ Pˆ Eˆ Qˆ Pˆ 50 0 5.4% 5.2% 5.5% 8.1% 7.1% 8.2% 0.25 5.0% 4.9% 5.4% 8.3% 7.0% 9.5% 0.45 4.5% 3.6% 4.8% 7.6% 5.6% 9.2% 100 0 4.2% 4.3% 4.9% 6.9% 6.5% 6.9% 0.25 5.0% 4.9% 5.9% 7.6% 6.5% 7.0% 0.45 6.0% 4.9% 7.0% 6.5% 4.8% 7.7% Table 2: Type 1-Error for the open-end procedures based on Eˆ, Qˆ and Pˆ . The size of the known stable data was set to m = 50 (upper part), m = 100 (lower part). 17 (M1) (M2) m+ k∗ = 51 m+ k∗ = 101 m+ k∗ = 501 m+ k∗ = 801 Figure 1: Power of the monitoring procedures for a change in the mean based on the statis- tics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0 and m = 50. 18 (M1) (M2) m+ k∗ = 51 m+ k∗ = 101 m+ k∗ = 501 m+ k∗ = 801 Figure 2: Power of the monitoring procedures for a change in the mean based on the statis- tics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0.45 and m = 50. 19 (M1) (M2) m+ k∗ = 101 m+ k∗ = 501 m+ k∗ = 1001 Figure 3: Power of the monitoring procedures for a change in the mean based on the statis- tics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0 and m = 100. 20 (M1) (M2) m+ k∗ = 101 m+ k∗ = 501 m+ k∗ = 1001 Figure 4: Power of the monitoring procedures for a change in the mean based on the statis- tics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0.45 and m = 100. 21 4.2 Changes in linear models In this section we present some simulation results for the detection of changes in the linear model (3.1). We aim to detect changes in the unknown parameter vector βt ∈ Rp by testing the null hypothesis H0 : β1 = · · · = βm = βm+1 = βm+2 = . . . , (4.6) against the alternative that the parameter βt changes beyond the initial data set, that is H1 : ∃k? ∈ N : β1 = · · · = βm+k?−1 6= βm+k? = βm+k?+1 = . . . . (4.7) To be precise, we consider the model (3.1) with p = 2 and the following choice of predictors (LM1) Pt = (1, Zt) > , (LM2) Pt = (1, 1 +Gt) > with Gt = σ¯tZt and σ¯2t = 0.5 + 0.2Zt−1 + 0.3σ¯2t−1 , where Zt denotes an i.i.d. sequence of N (0, 0.5) random variables in both models. The parameter vector is fixed at βt = (1, 1) under the null hypothesis and to examine the alternative hypothesis, changes are added to its second component, that is βδt = (1, 1)> if t < m+ k∗ ,(1, 1 + δ)> if t ≥ m+ k∗ . For both scenarios we simulated the residuals εt in model (3.1) as i.i.d. N (0, 0.5) sequences. Note that the models specified above have been already considered by Fremdt (2014b). As pointed out in Section 3.2 the asymptotic variance that needs to be estimated within our procedures is given by Γ = ∑ t∈Z Cov ( P0Y0, PtYt ) . (4.8) We estimate this quantity based on the stable segment of observations (Y1, P1), . . . , (Ym, Pm) using the well known quadratic spectral estimator [see Andrews (1991)] with its implemen- tation in the R-package ‘sandwich’ [see Zeileis (2004)]. The problem of detecting changes in the parameter of the linear model has also been addressed using partial sums of the residuals εˆt = Yt − P>t βˆI in statistics similar to (4.1), where βˆI is an initial estimate of β computed from the initial stable segment [see for example Chu et al. (1996), Horva´th et al. (2004), Fremdt (2014a) among many others]. Our approach directly compares estimators for the vector βt, which are derived using the 22 general methodology introduced in Section 2 and 3. The resulting statistics are obtained replacing θˆ by βˆ in equation (4.1). We also refer to Leisch et al. (2000) for a comparison of residual based methods with methods using the estimators directly (these authors consider a statistic similar to Qˆ). In Table 3 we display the approximation of the nominal level for the three statistics with different values of the parameter γ in the threshold function, where monitoring was stopped after 1500 observations. We observe a reasonable approximation of the nominal level 5% in the case γ = 0, while the rejection probabilities for for γ = 0.25 or γ = 0.45 slightly exceed the desired level of 5%. The fact that larger values of γ ∈ [0, 1/2) can lead to a worse approximation of the desired type 1-Error has also been observed by other authors [see, for example, Wied and Galeano (2013)] and can be explained by a more sensitive threshold function at the monitoring start if γ is chosen close to 1/2. Overall, the approximation is slightly better for the independent case in model (LM1). In Figure 5 we compare the power with respect to the change amount for different change positions, where we restrict ourselves to the case γ = 0 for the sake of brevity. The results are very similar to those provided for the mean functional in Section 4.1. Again the monitoring scheme based on Eˆ outperforms the procedures based on Qˆ and Pˆ , and the superiority is larger for a later change. We omit a detailed discussion and summarize that the empirical findings have indicated superiority (w.r.t. testing power) of the monitoring scheme based on the statistic Eˆ. (LM1) (LM2) γ Eˆ Qˆ Pˆ Eˆ Qˆ Pˆ 0 6.4% 6.5% 6.7% 7.2% 6.7% 7.2% 0.25 7.6% 8.8% 9.1% 8.5% 9.6% 9.5% 0.45 12.0% 12.2% 12.1% 12.6% 12.2% 12.6% Table 3: Type 1-Error for the open-end procedures based on Eˆ, Qˆ and Pˆ . The size of the known stable data was set to m = 100. 23 (LM1) (LM2) m+ k∗ = 51 m+ k∗ = 201 m+ k∗ = 501 Figure 5: Power of the monitoring procedures for a change in the regression parameters for the open-end procedures based on the statistics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0 and m = 100. 24 5 Closed-end scenarios It is worthwhile to mention that the theory developed so far also covers the case of closed- end scenarios [sometimes also called finite time horizon in the literature]. In this section, we will very briefly discuss this situation and present a small batch of simulation results, which also indicate the superiority of the statistic Eˆ for closed-end scenarios. Note that the null hypothesis in this setup is given by H0 : θ1 = · · · = θm = θm+1 = θm+2 = . . . = θTm , (5.1) which is tested against the alternative that the parameters changes (once) at some time m+ 1 ≤ m+ k? ≤ Tm, that is H1 : ∃k? ∈ N : θ1 = · · · = θm+k?−1 6= θm+k? = θm+k?+1 = . . . = θTm . (5.2) Here the factor T ∈ N controls the length of the monitoring period compared to the size of the initial data set. Under the assumptions stated in Section 2, we can prove a corresponding statement of Theorem 2.6 and Corollary 2.8. Theorem 5.1 Assume that the null hypothesis (5.1) and Assumptions 2.2 - 2.4 hold. If further Σˆm is a consistent and non-singular estimator of the long-run variance matrix Σ it holds that Tm sup k=1 Eˆm(k) wγ(k/m) D =⇒ max 0≤t. The objective function in (6.2) resembles the one in the detector 29 defined in (3.7), but we use the weights of the retrospective tests defined in (B.1). Note that - once monitoring has stopped - the problem of estimating the change time is of retro- spective nature with the constraint that the first m observations are known as stable. The estimated time of change ˆ` is then labeled as ‘found during monitoring’ and recorded. For the next monitoring phase the observations from the change point that was just iden- tified to the last observation previously monitored are considered as a candidate for the next stable segment. In the case where this set would have fewer than mmin elements it is enlarged by adding incoming observations until mmin observations are available. This is described in lines 14–16 of Algorithm 1. The procedure then goes back to line 4 where the retrospective test is applied to assess the suitability of the candidate to be the next stable segment. Note that, if a change is identified in the candidate for the next stable segment, a new candidate is chosen to be the observations starting with that identified change point to the time where monitoring has last stopped or mmin observations onwards from the identi- fied change in case that the candidate would otherwise have fewer than mmin observations; see line 8 of Algorithm 1. In the remaining part of this section, we present the outcomes of the statistical analysis by Algorithm 1 for two data sets related to the United Kingdom (UK) European Union (EU) membership referendum, which took place on 23 June 2016. For our analysis we chose the significance levels to be α = 0.1 and the threshold function w0, as defined in (2.13). All data used was obtained from https://www.ariva.de on 6 April 2019. As our first example, we consider the relation of UK’s currency, Pound Sterling (GBP), to the Eurozone’s currency, the Euro (EUR), and Switzerland’s currency, the Swiss franc (CHF). More precisely, we consider daily log returns of the exchange rate of GBP to the United States dollar (USD) as a response Yt of a linear model as described in (6.1). As predictors we now consider the log returns of EUR to USD (P1,t) and CHF to USD (P2,t). A graphical representation of the exchange rates and associated log returns for the period from Januar 2016 to March 2019 can be seen in Figure 8. The outcomes of Algorithm 1 are presented visually in the graphs. We chose mmin := 30, which corresponds to six weeks of data. No evidence was found for the set of the first 30 observation (6 Jan 2016 to 16 Feb 2016) to contain a change and so it was used as the stable segment for monitoring. The monitoring went on until 30 Jun 2016 and 9 Jun 2016 was obtained as an estimate for the time of change. In the next step a retrospective test is performed to verify whether the observations from 9 Jun 2016 to 20 Jul 2016 contain any changes. No evidence for a change was found. The monitoring procedure is restarted with these 30 observations as the stable segment and monitoring continues until it is stopped on 21 Nov 2016 and the 15 Aug 2016 is identified as a time for a possible change. The period from 15 Aug to 21 Nov 2016 is considered as a candidate for a stable segment and no evidence against this is found. The 71 observation in this segment are used for the third iteration of monitoring, 30 Data: Univariate responses Yt and bivariate predictors (P1,t, P2,t); Input: Minimum size mmin for the stable segment, γ to be used for the threshold function wγ , significance level α; Output: A sequence of change points c1, c2, . . . and labels T1, T2, . . . that indicate whether the changes were identified by the retrospective test during setup (R) or by the sequential procedure during monitoring (S); 1 begin 2 m := mmin, i := 1, s := 1; 3 repeat // Find stable segment 4 while retrospective test does reject do 5 Apply test from Section B to (Yt, P1,t, P2,t), t = s, . . . , s+m− 1; 6 if test rejected then 7 Record the change point: ci := s+ j ∗ − 1, Ti := R, i+ +; 8 Prepare the next test: s := j∗, m := max{j∗ − s,mmin}; 9 end 10 end // Monitor for changes 11 repeat 12 Compute detector Eˆm(k) from (Yt, P1,t, P2,t), t = s, . . . , s+m+ k; 13 until Eˆm(k) exceeds the critical curve cαwγ(k/m); // Estimate time of change 14 Compute ˆ` defined in (6.2); 15 Record the change point: ci := s+ ˆ`, Ti := S, i+ +; 16 Prepare the restart: m := max{s+m+ k − ci + 1,mmin}, s := ci; 17 until terminated by user ; 18 end Algorithm 1: Procedure for repeated monitoring. 31 2016 2017 2018 2019 1. 20 1. 30 1. 40 1. 50 G BP /U SD l l l 2016 2017 2018 2019 − 0. 08 − 0. 02 rtn . G BP /U SD l l l 2016 2017 2018 2019 1. 05 1. 15 1. 25 EU R/ US D l l l 2016 2017 2018 2019 − 0. 02 0. 00 0. 02 rtn . E UR /U SD l l l 2016 2017 2018 2019 0. 98 1. 02 1. 06 CH F/ US D l l l 2016 2017 2018 2019− 0. 01 5 0. 00 0 0. 01 5 rtn . C HF /U SD l l l Figure 8: Results of Algorithm 1 applied to log returns of three foreign exchange rates. Re- sponse: GBP/USD (top), predictors: EUR/USD (middle) and CHF/USD (bottom). Set- tings for Algorithm 1 were mmin := 30, α = 0.1, γ = 0. Shaded areas indicate observations that were used as the stable segment (light gray) or monitored for changes (dark gray). Ver- tical lines indicate times the sequential procedure rejected (blue box) or the estimate time of change (red circle). The estimated times of the change points (times the sequential pro- cedure stopped in brackets) were: 2016-06-09 (2016-06-30), 2016-08-15 (2016-11-21), and 2017-05-17 (2017-12-08). which continues until 8 Dec 2017, when 17 May 2017 is identified as a possible time of a change. Finally, a retrospective test is applied to the observations from 17 May 2017 to 8 Dec 2017 and as no evidence of a change is found monitoring commences with those 148 observations as a stable set without stopping until the end of the available data. As our second example, we consider the relation of the UK’s market to that of the United States (US) and the EU. More precisely, we consider daily log returns of the FTSE 100, a share index of the 100 companies listed on the London Stock Exchange with the highest market capitalization, as a response Yt of the linear model described in (6.1). As predictors we consider the log returns of two similarly constructed indices that are related to the US and EU markets, namely the S&P 500 (P1,t) and the EuroStoxx 50 (P2,t). A graphical representation of the prices and log returns for the period from January 2016 to March 32 2016 2017 2018 20195 50 0 65 00 75 00 FT SE l l 2016 2017 2018 2019 − 0. 04 0. 00 FT SE l l 2016 2017 2018 20191 80 0 22 00 26 00 SP 50 0 l l 2016 2017 2018 2019 − 0. 04 0. 00 0. 04 SP 50 0 l l 2016 2017 2018 2019 28 00 32 00 36 00 Eu rS to xx 50 l l 2016 2017 2018 2019 − 0. 08 − 0. 02 0. 04 Eu rS to xx 50 l l Figure 9: Results of Algorithm 1 applied to log returns of three market indices. Response: FTSE 100 (top), predictors: S&P 500 (middle) and EuroStoxx 50 (bottom). Settings for Algorithm 1 were mmin := 30, α = 0.1, γ = 0. Shaded areas indicate observations that were used as the stable segment (light gray) or monitored for changes (dark gray). Vertical lines indicate times the sequential procedure rejected (blue box) or the estimate time of change (red circle). The estimated times of the change points (times the sequential procedure stopped in brackets) were: 2016-03-10 (2016-05-20), and 2016-06-21 (2016-06-23). 2019 can be seen in Figure 9. The outcomes of Algorithm 1 are presented visually in the graphs. We chose mmin := 30. As the retrospective test affirms that the first 30 observation (5 Jan 2016 to 12 Feb 2016) are stable, they are employed as the stable segment for the first iteration of monitoring. The monitoring went on until 20 May 2016, when the sequential procedure stopped and returned 10 Mar 2016 as the estimated time of the change. The retrospective test identified no changes for the 55 days from 10 Mar 2016 to 20 May 2016 and the second iteration of monitoring then went on until 23 Jun 2016, the day of the UK’s EU referendum. The estimated time for a change point was 21 Jun 2016. The retrospective test found no evidence for the period from the 21 Jun 2016 to the 28 Jul 2016, so these 30 days of observations are used as the stable segment with which monitoring continues without stopping until 29 March 2019 (the end of the available data). 33 7 Summary and outlook In this paper we developed a new monitoring scheme for change point detection in a parame- ter of multivariate time series which is applicable in an open-end scenario. Compared to the commonly used methods we replace the estimator of the parameter from the initial sample X1, . . . , Xm by an estimator from the sample X1, . . . , Xm+j . We then compare this estima- tor with the estimator from the sample Xm+j+1, . . . , Xm+k for every j = 0, . . . , k − 1, For the new statistic the asymptotic distribution under the null hypothesis and the consistency of a corresponding test, which controls the type-1 error, are established. By considering a common class of thresholds wγ defined in (2.13) the limit reduces to an elementary distri- bution, for which quantiles can be obtained by straightforward Monte Carlo simulations. Finally, we demonstrate by a comprehensive simulation study that the new monitoring scheme is superior (in terms of testing power) to a benchmark consisting of common meth- ods proposed in the literature. The new statistic can also be used in closed-end scenarios, for which the same superiority in power is observed. A very challenging subject for future research is the characterization of the asymptotic distribution for the stopping times based on the statistic Eˆ defined in (2.2). Correspond- ing results are already known for the methods based on Qˆ and Pˆ , see Aue and Horva´th (2004) and Fremdt (2014a), respectively. Moreover, as the finite sample properties depend sensitively on the efficient estimation of the long-run variance it is of interest to develop a concept of self-normalization [see Shao and Zhang (2010)], which is applicable in an open-end scenario. Acknowledgments This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Teilprojekt A1, C1) and the Research Training Group ’High-dimensional phenomena in probability - fluctuations and discontinuity’ (RTG 2131) of the German Research Foundation (DFG). References Anatolyev, S. and Kosenok, G. (2018). Sequential Testing with Uniformly Distributed Size. Journal of Time Series Econometrics, 10(2):1–22. Andrews, D. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59(3):817–58. Aue, A., Dienes, C., Fremdt, S., and Steinebach, J. (2015). Reaction times of monitoring schemes for arma time series. Bernoulli, 21(2):1238–1259. Aue, A., Ho¨rmann, S., Horva´th, L., Husˇkova´, M., and Steinebach, J. G. (2012). Sequential testing for the stability of high-frequency portfolio betas. Econometric Theory, 28(4):804–837. Aue, A. and Horva´th, L. (2004). Delay time in sequential detection of change. Statistics & Probability Letters, 67(3):221–231. 34 Aue, A. and Horva´th, L. (2013). Structural breaks in time series. Journal of Time Series Analysis, 34(1):1–16. Aue, A., Horva´th, L., Husˇkova´, M., and Kokoszka, P. (2006). Change-point monitoring in linear models. The Econometrics Journal, 9(3):373–403. Aue, A., Horva´th, L., and Reimherr, M. L. (2009). Delay times of sequential procedures for multiple time series regression models. Journal of Econometrics, 149(2):174 – 190. Aue, A., Ho¨rmann, S., Horva´th, L., and Husˇkova´, M. (2014). Dependent functional linear models with applications to monitoring structural change. Statistica Sinica, 24(3):1043–1073. Chen, Z. and Tian, Z. (2010). Modified procedures for change point monitoring in linear models. Mathematics and Computers in Simulation, 81(1):62 – 75. Chu, C.-S. J., Stinchcombe, M., and White, H. (1996). Monitoring structural change. Econometrica, 64(5):1045–1065. Cso¨rgo¨, M. and Horva´th, L. (1997). Limit theorems in change-point analysis. Wiley series in probability and statistics. Wiley. Dette, H. and Go¨smann, J. (2018). A likelihood ratio approach to sequential change point detection in a genereal class of parameters. To appear in: Journal of the American Statistical Association; https://arxiv.org/abs/1802.07696. Fremdt, S. (2014a). Asymptotic distribution of the delay time in page’s sequential procedure. Journal of Statistical Planning and Inference, 145:74 – 91. Fremdt, S. (2014b). Page’s sequential procedure for change-point detection in time series regression. Statistics: A Journal of Theoretical and Applied Statistics, 48(1):1–28. Hampel, F., Ronchetti, E., Rousseeuw, P., and Stahel, W. (1986). Robust statistics: The approach based on influence functions. Wiley series in probability and mathematical statistics. Probability and mathematical statistics. Wiley. Hoga, Y. (2017). Monitoring multivariate time series. Journal of Multivariate Analysis, 155:105 – 121. Horva´th, L., Husˇkova´, M., Kokoszka, P., and Steinebach, J. (2004). Monitoring changes in linear models. Journal of Statistical Planning and Inference, 126(1):225 – 251. Horva´th, L., Ku¨hn, M., and Steinebach, J. (2008). On the performance of the fluctuation test for structural change. Sequential Analysis, 27(2):126–140. Kirch, C. and Weber, S. (2018). Modified sequential change point procedures based on estimating functions. Electron. J. Statist., 12(1):1579–1613. Lai, T. L. (1995). Sequential changepoint detection in quality control and dynamical systems. Journal of the Royal Statistical Society. Series B (Methodological), 57(4):613–658. Lai, T. L. (2001). Sequential analysis: Some classical problems and new challenges. 11:303–408. Leisch, F., Hornik, K., and Kuan, C.-M. (2000). Monitoring structural changes with the generalized fluctuation test. Econometric Theory, 16(6):835–854. Otto, S. and Breitung, J. (2019). A likelihood ratio approach to sequential change point detection in a genereal class of parameters. http://www.wisostat.uni- koeln.de/sites/statistik/BackwardCUSUM.pdf. Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1/2):100–115. Page, E. S. (1955). Control charts with warning lines. Biometrika, 42(1-2):243–257. 35 Pape, K., Wied, D., and Galeano, P. (2016). Monitoring multivariate variance changes. Journal of Empirical Finance, 39(Part A):54 – 68. Schmitz, A. and Steinebach, J. G. (2010). A note on the monitoring of changes in linear models with dependent errors, pages 159–174. Springer Berlin Heidelberg, Berlin, Heidelberg. Shao, X. and Zhang, X. (2010). Testing for change points in time series. Journal of the American Statistical Association, 105(491):1228–1240. Shorack, G. and Wellner, J. (1986). Empirical processes with applications to statistics. Wiley series in probability and mathematical statistics: Probability and mathematical statistics. Wiley. Wied, D. and Galeano, P. (2013). Monitoring correlation change in a sequence of random variables. Journal of Statistical Planning and Inference, 143(1):186 – 196. Zeileis, A. (2004). Econometric computing with hc and hac covariance matrix estimators. Journal of Statistical Software, 11(10):1–17. 36 A Proofs of the results in Section 2 Proof of Theorem 2.6: In the proof we use the following extra notation. Define the statistic Em(k) = m −1/2 k−1max j=0 (k − j) ∥∥∥θˆm+j1 − θˆm+km+j+1∥∥∥ Σ−1 , (A.1) where we have replaced the long-run variance estimator Σˆ by the (unknown) true long-run variance Σ in definition (2.2). Further define E˜m(k) = m −1/2 k−1max j=0 (k − j) ∥∥∥∥ 1m+ j m+j∑ t=1 IF t − 1 k − j m+k∑ t=m+j+1 IF t ∥∥∥∥ Σ−1 , (A.2) where we have replaced the estimator θˆji by corresponding averages of the influence function in (A.1). Finally, the triple (Ω,A,P) will denote the underlying probability space. The proof itself is now split into several Lemmas A.1 - A.5. The first Lemma shows that E˜m(k) and Em(k) are (asymptotically) equivalent. Lemma A.2 will approximate Em(k) by Brownian motions, while Lemma A.3 then yields a limit for this approximation. Lemma A.4 finishes the proof by plugging in the covariance estimator, meaning that Eˆm(k) and Em(k) are asymptotically equivalent. Finally, Lemma A.5 will establish the other representation of the limit distribution in (2.11). In each Lemma, we suppose that the assumptions of Theorem 2.6 are valid. Lemma A.1 (Remove Remainders) It holds that ∞ sup k=1 1 w(k/m) ∣∣∣Em(k)− E˜m(k)∣∣∣ = oP(1) as m→∞. Proof. By the (reverse) triangle inequality and the linearization in (2.7) we obtain∣∣∣Em(k)− E˜m(k)∣∣∣ ≤ m−1/2 k−1max j=0 (k − j) ∥∥∥θˆm+j1 − θˆm+km+j+1 − 1m+ j m+j∑ t=1 IF t + 1 k − j m+k∑ t=m+j+1 IF t ∥∥∥ Σ−1 = m−1/2 k−1max j=0 (k − j) ∥∥∥R1,m+j −Rm+j+1,m+k∥∥∥ Σ−1 , 37 where we used that θt is constant for the last equality. Next, we obtain that ∞ sup k=1 1 m1/2w(k/m) k−1 max j=0 (k − j) ∥∥∥Rm+j+1,m+k∥∥∥ Σ−1 = ∞ sup k=1 (m+ k)1/2 m1/2w(k/m) k−1 max j=0 k − j (m+ k)1/2 ∥∥∥Rm+j+1,m+k∥∥∥ Σ−1 ≤ ∞sup k=1 (m+ k)1/2 m1/2w(k/m) ∞ sup k=1 max 1≤i0 1 + t w(t) <∞ . (A.5) and by similar arguments it holds that ∞ sup k=1 (m+ k)1/2√ mw(k/m) <∞ . (A.6) Further note that due to Assumption 2.4 with probability one ∞ sup k=1 √ m ∣∣∣R1,m+k∣∣∣ ≤ ∞sup k=1 √ m+ k ∣∣∣R1,m+k∣∣∣ = ∞sup k=m √ k ∣∣∣R1,k∣∣∣ j=k,i=1 ≤ ∞sup k=m max 1≤i0 √ t w(t) = o(1) . 39 Next, we can bound the second summand of the right-hand side in (A.8) by ∞ sup k=1 1 w(k/m) √ m k−1 max j=0 m+ k m+ j ∥∥∥ m+j∑ t=m+1 IF t −WΣm,1(j) ∥∥∥ Σ−1 ≤ ∞sup k=1 m+ k w(k/m) √ m k−1 max j=0 1 jξm1−ξ ∥∥∥ m+j∑ t=m+1 IF t −WΣm,1(j) ∥∥∥ Σ−1 ≤ ∞sup k=1 (m+ k)mξ−1 w(k/m) √ m ∞ sup k=1 1 kξ ∥∥∥ m+k∑ t=m+1 IF t −WΣm,1(k) ∥∥∥ Σ−1 , where we used that (m+ j) = (m+ j)ξ(m+ j)1−ξ ≥ jξm1−ξ. Using again Assumption 2.2, the second factor in the last display is of order OP(1). Moreover, following the idea of the proof of Lemma 3 in Aue et al. (2006) it holds that m sup k=1 (m+ k)mξ−1 w(k/m) √ m ≤ msup k=1 (m+ k)mξ−3/2 `w = 2mξ−1/2 `w = o(1) , ∞ sup k=m+1 (m+ k)mξ−1 w(k/m) √ m = mξ−1/2 ∞sup k=m+1 1 + k/m w(k/m) ≤ mξ−1/2 sup t>1 1 + t w(t) = o(1) (A.9) and it remains to treat the third summand of the right-hand side in (A.8), which can be bounded by ∞ sup k=1 1 w(k/m) √ m k−1 max j=0 k − j m+ j ∥∥∥ m∑ t=1 IF t −WΣm,2(m) ∥∥∥ Σ−1 ≤ ∞sup k=1 1 w(k/m) √ m k m ∥∥∥ m∑ t=1 IF t −WΣm,2(m) ∥∥∥ Σ−1 ≤ ∞sup k=1 kmξ−1 w(k/m) √ m 1 mξ ∥∥∥ m∑ t=1 IF t −WΣm,2(m) ∥∥∥ Σ−1 . Using Assumption 2.2 and the arguments in (A.9) this term is of order oP(1), which finishes the proof of Lemma A.2. Lemma A.3 (Obtain limit process) The following weak convergence holds ∞ sup k=1 Pm(k) w(k/m) D =⇒ sup 0≤t<∞ max 0≤s≤t 1 w(t) ∣∣∣∣W1(t)− 1 + t1 + sW1(s) + t− s1 + sW2(1) ∣∣∣∣ as m → ∞, where W1 and W2 denote independent, p-dimensional standard Brownian motions. Proof. First note that due to the scaling properties of the Brownian motion [see for example 40 Chapter 2 of Shorack and Wellner (1986)], it holds in distribution that Pm(k) D = k−1 max j=0 ∣∣∣W1(k/m)− 1 + k/m 1 + j/m W1(j/m) + k/m− j/m 1 + j/m W2(1) ∣∣∣ := P˜m(k) (A.10) and so within the proof we will - without loss of generality - only consider P˜m(k). Addi- tionally, we define the processes L(1)(s, t) := 1 w(t) ∣∣∣ t+ 1 s+ 1 W1(s)−W1(t) ∣∣∣ , L(2)(s, t) := 1 w(t) t− s s+ 1 |W2(1)| . We will show that L(i)(s, t) for i = 1, 2 are uniformly continuous on R+∆ = {(s, t) ∈ R2 | 0 ≤ s ≤ t} with probability one. Case L(1): In the following let ε > 0 be fixed but arbitrary. For almost every ω ∈ Ω the law of iterated logarithm implies that we can choose a C = C(ε, ω) > 0 sufficiently large such that sup t≥C |W1(t)| t < ε 8B , (A.11) with B defined by B := sup t>0 t+ 2 w(t) <∞ . Depending on C, we can split R+∆ into the (overlapping) sets R+∆ =M1(C) ∪M2(C) ∪M3(C) , (A.12) where we use the definitions M1(C) = R+∆ ∩ [0, C + 1]2 , M2(C) = R+∆ ∩ [0, C + 1]× [C,∞) , M3(C) = R+∆ ∩ [C,∞)2 . Further let d denote the maximum distance that is d ( (s1, t1), (s2, t2) ) = max {|s1 − s2|, |t1 − t2|} . Note that by construction of the decomposition in (A.12), whenever d((s1, t1), (s2, t2)) < δ for sufficiently small δ > 0, then there’s j ∈ 1, 2, 3, such that both pairs are in the same 41 subset Mj(C). Thus the a.s. uniform continuity of L(1) follows if we can choose δ > 0 sufficiently small, such that sup d((s1,t1),(s2,t2))<δ (s1,t1),(s2,t2)∈Mj(C) ∣∣∣L(1)(s1, t1)− L(1)(s2, t2)∣∣∣ < ε (A.13) for j = 1, 2, 3. In the following, we will treat each subset separately. Set M1(C): As this set is compact, the (ordinary) almost sure continuity of L(1) already implies that (A.13) holds for j = 1 and δ > 0 sufficiently small. Set M2(C): We have the following bound sup d((s1,t1),(s2,t2))<δ (s1,t1),(s2,t2)∈M2(C) ∣∣∣L(1)(s1, t1)− L(1)(s2, t2)∣∣∣ ≤ sup |t1−t2|<δ 0≤s≤C+1, t1,t2>C ∣∣∣L(1)(s, t1)− L(1)(s, t2)∣∣∣+ sup |s1−s2|<δ 0≤s1,s2≤C+1, t>C ∣∣∣L(1)(s1, t)− L(1)(s2, t)∣∣∣ . (A.14) We will treat both summands of the last display individually. The first one can be bounded again as follows sup |t1−t2|<δ 0≤s≤C+1, t1,t2>C ∣∣∣L(1)(s, t1)− L(1)(s, t2)∣∣∣ ≤ sup |t1−t2|<δ 0≤s≤C+1, t1,t2>C ∣∣∣ 1 w(t1) − 1 w(t2) ∣∣∣∣∣∣ t1 + 1 s+ 1 W1(s)−W1(t1) ∣∣∣ + 1 w(t2) ∣∣∣ t1 − t2 s+ 1 W1(s)−W1(t1) +W1(t2) ∣∣∣ (A.15) and again we will treat both terms separately. For the first term note that sup |t1−t2|<δ 0≤s≤C+1, t1,t2>C |w(t2)− w(t1)| `w t1 + 1 w(t1) ∣∣∣W1(s) s+ 1 − W1(t1) t1 + 1 ∣∣∣ ≤ 2 sup |t1−t2|<δ |w(t2)− w(t1)| `w sup t≥C t+ 1 w(t) sup t≥0 |W1(t)| t+ 1 , and since w is uniformly continuous this expression is smaller than ε/3 for sufficiently small 42 δ. For the second term of the right-hand side of (A.15), note that we have the bound sup |t1−t2|<δ 0≤s≤C+1, t1,t2>C 1 w(t2) ∣∣∣ t1 − t2 s+ 1 W1(s)−W1(t1) +W1(t2) ∣∣∣ ≤ sup |t1−t2|<δ 0≤s≤C+1, t1,t2>C |t1 − t2| w(t2) |W1(s)| s+ 1 + t1 + 1 w(t2) |W1(t1)| t1 + 1 + t2 + 1 w(t2) |W1(t2)| t2 + 1 ≤ δ `w sup s≥0 |W1(s)| s+ 1 + 2 sup t>0 t+ 2 w(t) sup t>C |W1(t)| t+ 1 , and by the choice of C in (A.11) and for sufficiently small δ this is bounded by ε/3. To complete the treatment of M2(C) it only remains to examine the second term on the right-hand side of (A.14). We obtain that sup |s1−s2|<δ 0≤s1,s2≤C+1, t>C ∣∣∣L(1)(s1, t)− L(1)(s2, t)∣∣∣ ≤ sup t>0 t+ 1 w(t) sup |s1−s2|<δ 0≤s1,s2≤C+1 ∣∣∣W1(s1) s1 + 1 − W1(s2) s2 + 1 ∣∣∣ , which can be bounded by ε/3 for sufficently small δ since the first factor is a constant and the function f(s) = W (s+ 1)/(s+ 1) is uniformly continuous on the compact set [0, C + 1] with probability one. Set M3(C): Note that sup d((s1,t1),(s2,t2))<δ (s1,t1),(s2,t2)∈M3(C) ∣∣∣L(1)(s1, t1)− L(1)(s2, t2)∣∣∣ ≤ 2 sup s,t>C |L(1)(s, t)| ≤ 2 sup t>0 t+ 1 w(t) sup s,t>C ∣∣∣W1(s) s+ 1 − W1(t) t+ 1 ∣∣∣ ≤ 4 sup t>0 t+ 1 w(t) sup t≥C |W1(t)| t < ε , where we used the choice of C in (A.11) for the last estimate. This completes the third case and so the almost sure uniform continuity of L(1) on the set R+∆ is established. Case L(2): Again let ε > 0 and suppose that d ( (s1, t1), (s2, t2) ) < δ. It holds that ∣∣L(2)(s1, t1)− L(2)(s2, t2)∣∣ ≤ ∣∣L(2)(s1, t1)− L(2)(s2, t1)∣∣+ ∣∣L(2)(s2, t1)− L(2)(s2, t2)∣∣ (A.16) 43 and note for the first summand of the last display that ∣∣L(2)(s1, t1)− L(2)(s2, t1)∣∣ = |W2(1)| w(t1) ∣∣∣∣ t1 − s1s1 + 1 − t1 − s2s2 + 1 ∣∣∣∣ = |W2(1)|(t1 + 1)w(t1) ∣∣∣∣ s2 − s1(s1 + 1)(s2 + 1) ∣∣∣∣ ≤ |W2(1)|(t1 + 1) w(t1) |s2 − s1| and by Assumption 2.3 the last term is smaller than ε/2 uniformly for all t1 > 0 if δ > 0 is chosen sufficiently small. It remains to examine the second summand of the right-hand side of (A.16). It holds that |L(2)(s2, t1)− L(2)(s2, t2) ∣∣ ≤ ∣∣∣∣ 1w(t1) − 1w(t2) ∣∣∣∣ |t1 − s2|s2 + 1 |W2(1)|+ |W2(1)|w(t2) |t2 − t1|s2 + 1 ≤ |w(t1)− w(t2)| w(t1)w(t2) t1 + δ s2 + 1 |W2(1)|+ |W2(1)| `w |t2 − t1| ≤ |w(t1)− w(t2)| `w t1 + δ w(t1) |W2(1)|+ |W2(1)| `w |t2 − t1| , where we used that s2 ≤ t2 ≤ t1 + δ whenever d ( (s1, t1), (s2, t2)) < δ. By Assumption 2.3 the last display is smaller than ε/2 whenever δ > 0 is sufficiently small and so the almost sure continuity of L(2) is shown. Finally, we can combine our observations to finish the proof. Note that by the results above, also the process L(s, t) := L(1)(s, t)+L(2)(s, t) is uniformly continuous with probability one and further we have the identity sup 1≤t<∞ sup 0≤s≤t L (bsc/m, btc/m) = ∞sup k=1 P˜m(k) and so the Lemma’s claim now follows if we derive that (with probability one) sup 1≤t<∞ sup 0≤s≤t L (bsc/m, btc/m) =⇒ m→∞ sup0≤t<∞ sup 0≤s≤t L ( s, t ) . First, observe that almost surely sup 1≤t<∞ sup 0≤s≤t L (bsc/m, btc/m) = sup 0≤t<∞ sup 0≤s≤t L (bsc/m, btc/m)+ o(1) as m → ∞. Next, we can use the almost sure uniform continuity of L since for arbitrary 44 ε > 0 and almost every ω ∈ Ω we can choose sufficiently large m = m(ε, ω) such that∣∣∣∣ sup 0≤t<∞ sup 0≤s≤t L (bsc/m, btc/m)− sup 0≤t<∞ sup 0≤s≤t L ( s, t )∣∣∣∣ ≤ sup 0≤t<∞ sup 0≤s≤t ∣∣∣∣L(bsc/m, btc/m)− L(s/m, t/m)∣∣∣∣ ≤ sup d((s1,t1),(s2,t2))<1/m (s1,t1),(s2,t2)∈R+∆ ∣∣∣∣L(s1, t1)− L(s2, t2)∣∣∣∣ < ε . Combining Lemma A.1, A.2 and A.3 we have already proven that ∞ sup k=1 Em(k) w(k/m) D =⇒ max 0≤t<∞ max 0≤s≤t 1 w(t) ∣∣∣W1(t)− 1 + t 1 + s W1(s) + t− s 1 + s W2(1) ∣∣∣ , (A.17) and it only remains to investigate the impact of the covariance estimator. Therefore the following Lemma finishes the proof of Theorem 2.6. Lemma A.4 (Plug in of covariance estimator) We have that ∞ sup k=1 Em(k) w(k/m) − ∞sup k=1 Eˆm(k) w(k/m) = oP(1) . Proof. Observe the bound∣∣∣∣ ∞sup k=1 Em(k) w(k/m) − ∞sup k=1 Eˆm(k) w(k/m) ∣∣∣∣ ≤ ∞sup k=1 1 w(k/m) √ m k−1 max j=0 (k − j) ∣∣∣∣(θˆm+j1 − θˆm+km+j+1)>(Σˆ−1m − Σ−1)(θˆm+j1 − θˆm+km+j+1)∣∣∣∣1/2 . (A.18) Next note that for a symmetric matrix A and an arbitrary vector v the Cauchy-Schwarz inequality implies |v>Av|2 ≤ |Av||v| ≤ ‖A‖op|v|2 , and we can bound (A.18) by∥∥∥Σˆ−1m − Σ−1∥∥∥1/2 op ∞ sup k=1 1 w(k/m) √ m k−1 max j=0 (k − j) ∣∣∣θˆm+j1 − θˆm+km+j+1∣∣∣ . (A.19) Since Σˆ is a consistent estimator of Σ, an application of the continuous mapping theorem 45 yields ∥∥∥Σˆ−1m − Σ−1∥∥∥ op = oP(1) . (A.20) Next, the definition of the operator norm yields ∞ sup k=1 1 w(k/m) √ m k−1 max j=0 (k − j)∣∣θˆm+j1 − θˆm+km+j+1∣∣ ≤ ‖Σ1/2‖op ∞sup k=1 1 w(k/m) √ m k−1 max j=0 (k − j)∥∥θˆm+j1 − θˆm+km+j+1∥∥Σ−1 = ‖Σ1/2‖op ∞sup k=1 Em(k) w(k/m) = OP(1) . Now a combination of (A.17) and (A.20) implies that the expression in (A.19) is of order oP(1), which completes the proof of Lemma A.4 and thus also the proof of Theorem 2.6. Combining Lemmas A.1, A.2, A.3 and A.4 we have now established that ∞ sup k=1 Eˆm(k) w(k/m) D =⇒ sup 0≤t<∞ max 0≤s≤t 1 w(t) ∣∣∣∣W1(t)− 1 + t1 + sW1(s) + t− s1 + sW2(1) ∣∣∣∣ and it remains to show the equality in distribution stated in (2.11). Lemma A.5 (Simplify limit distribution) It holds that sup 0≤t<∞ max 0≤s≤t 1 w(t) ∣∣∣∣W1(t)− 1 + t1 + sW1(s) + t− s1 + sW2(1) ∣∣∣∣ D = sup 0≤t<∞ max 0≤s≤t t+ 1 w(t) ∣∣∣W( t t+ 1 ) −W ( s s+ 1 )∣∣∣ , where W is a standard p-dimensional Brownian motion. Proof. In the following let Z denote a standard Gaussian random variable, that is indepen- dent of W1. Observe that 1 w(t) ∣∣∣∣W1(t)− 1 + t1 + sW1(s) + t− s1 + sW2(1) ∣∣∣∣ = 1 w(t)(s+ 1) ∣∣∣(s+ 1)W1(t)− (t+ 1)W1(s) + (t− s)W2(1)∣∣∣ D = 1 w(t)(s+ 1) ∣∣∣(s+ 1)W (t)− (t+ 1)W (s)− (t− s)Z∣∣∣ = 1 w(t)(s+ 1) ∣∣∣(s+ 1)W (t)− (t+ 1)W (s)− (t− s)Z + stZ − stZ∣∣∣ = 1 w(t)(s+ 1) ∣∣∣(s+ 1){W (t)− tZ}− (t+ 1){W (s)− sZ}∣∣∣ . (A.21) 46 Following Horva´th et al. (2004), Fremdt (2014b), computing the covariance function implies the following identity (in distribution){ W (t)− tZ } t≥0 D = { (1 + t)W ( t t+ 1 )} t≥0 . Applying this to (A.21) yields 1 w(t)(s+ 1) ∣∣∣(t+ 1){W (s)− sZ}− (s+ 1){W (t)− tZ}∣∣∣ D = 1 w(t)(s+ 1) ∣∣∣(t+ 1)(s+ 1)W( s s+ 1 ) − (t+ 1)(s+ 1)W ( t t+ 1 )∣∣∣ = 1 w(t) ∣∣∣(t+ 1)W( s s+ 1 ) − (t+ 1)W ( t t+ 1 )∣∣∣ . This completes the proof of Lemma A.5 and also of Theorem 2.6. Proof of Lemma 2.8: Using the definition w(t) = (1 + t) max {( t 1 + t )γ , ε } , we obtain that sup 0≤t<∞ max 0≤s≤t 1 w(t) ∣∣∣(t+ 1)W( s s+ 1 ) − (t+ 1)W ( t t+ 1 )∣∣∣ = sup 0≤t<∞ max 0≤s≤t 1 max {( t 1 + t )γ , ε }∣∣∣W( s s+ 1 ) −W ( t t+ 1 )∣∣∣ = sup 0≤t<∞ max 0≤s≤t 1 max {( t 1 + t )γ , ε }∣∣∣W( s s+ 1 ) −W ( t t+ 1 )∣∣∣ = sup 0≤t<1 max 0≤s≤t 1 max{tγ , ε} ∣∣∣W (s)−W (t)∣∣∣ , where we used that the mapping x 7→ x/(1 + x) is bijective and increasing on the domain [0,∞) with co-domain [0, 1) . Proof of Theorem 2.11: For m sufficiently large we have k∗ < m. Then observe the lower bound ∞ sup k=1 Eˆm(k) w(k/m) = ∞ sup k=1 1 w(k/m)m1/2 k−1 max j=0 (k − j) ∥∥∥θˆm+j1 − θˆm+km+j+1∥∥∥ Σˆ−1 j=k∗ ≥ sup k∗≤k<∞ 1 w(k/m)m1/2 (k − k∗) ∥∥∥θˆm+k∗1 − θˆm+km+k∗+1∥∥∥ Σˆ−1 k=m≥ m− k ∗ m1/2w(1) ∥∥∥θˆm+k∗1 − θˆ2mm+k∗+1∥∥∥ Σˆ−1 . 47 The last display is bounded from below by m− k∗ m1/2w(1) ∥∥∥θ(1) − θ(2)∥∥∥ Σˆ−1 − m− k ∗ m1/2w(1) ∥∥∥θˆm+k∗1 − θ(1)∥∥∥ Σˆ−1 − m− k ∗ m1/2w(1) ∥∥∥θˆ2mm+k∗+1 − θ(2)∥∥∥ Σˆ−1 . Using that Σˆ is (weakly) convergent with non-singular limit, the first summand above diverges to infinity and so it suffices to proof that the remaining two summands are of order OP(1). This follows from the linearization in equation (2.7) since this gives m− k∗ m1/2w(1) ( θˆm+k ∗ 1 − θ(1) ) = m− k∗ m1/2(m+ k∗)w(1) m+k∗∑ t=1 IF t + m− k ∗ m1/2w(1) R1,m+k∗ = OP(1) and m− k∗ m1/2w(1) ( θˆ2mm+k∗+1 − θ(2) ) = 1 m1/2w(1) 2m∑ t=m+k∗+1 IF t + (m− k ∗) m1/2w(1) Rm+k∗+1,2m = OP(1) , where we also applied Assumption 2.9. Now the claim follows. B Retrospective test for changes in linear models Consider the setting outlined in Section 3.2. In this situation we would like to test for stability of the regression vectors in the initial sample, that is H0 : β1 = · · · = βm . Following the ideas stated in chapter 1 of Cso¨rgo¨ and Horva´th (1997) it is sensible to employ the statistic Tm = 1 m3/2 m max j=1 j(m− j)‖βˆj1 − βˆmj+1‖Σˆ , (B.1) where βˆ was defined in (3.3). A sensible estimate for the time of the change is given by j∗ := arg mmax j=1 j(m− j)‖βˆj1 − βˆmj+1‖Σˆ (B.2) 48 Given the assumptions of Section 3.2 and using the theory of the reference one can verify that under the null hypothesis above Tn D =⇒ max t∈[0,1] |B(t)| , where B denotes a p-dimensional Brownian bridge (with independent components). Thus rejecting the null whenever Tn > b1−α with b1−α denoting the (1−α) quantile of the limit distribution yields an appropriate testing procedure for the retrospective problem. 49