SFB 
823 
A new approach for open-end 
sequential change point 
monitoring 
D
iscussion P
aper 
 
Josua Gösmann, Tobias Kley, 
Holger Dette 
  
 
 
Nr. 11/2019 
 
 
 
 
 
 
 
 
 
A new approach for open-end sequential change point
monitoring
Josua Go¨smann
Ruhr-Universita¨t Bochum
Fakulta¨t fu¨r Mathematik
44780 Bochum, Germany
josua.goesmann@ruhr-uni-bochum.de
(corresponding author)
Tobias Kley
University of Bristol
School of Mathematics
Bristol BS8 1TH, United Kingdom
tobias.kley@bristol.ac.uk
Holger Dette
Ruhr-Universita¨t Bochum
Fakulta¨t fu¨r Mathematik
44780 Bochum, Germany
holger.dette@ruhr-uni-bochum.de
June 3, 2019
Abstract
We propose a new sequential monitoring scheme for changes in the parameters of
a multivariate time series. In contrast to procedures proposed in the literature which
compare an estimator from the training sample with an estimator calculated from the
remaining data, we suggest to divide the sample at each time point after the training
sample. Estimators from the sample before and after all separation points are then
continuously compared calculating a maximum of norms of their differences. For open-
end scenarios our approach yields an asymptotic level α procedure, which is consistent
under the alternative of a change in the parameter.
JEL classification: C01,C22
Keywords and phrases: change point analysis, open-end procedures, sequential moni-
toring
1
1 Introduction
Nowadays, nearly all fields of applications require sophisticated statistical modelling and
statistical inference to draw scientific conclusions from the observed data. In many cases
data is time dependent and the involved model parameters or the model itself may not
be necessarily stable. In such situations it is of particular importance to detect changes
in the processed data as soon as possible and to adapt the statistical analysis accordingly.
These changes are usually called change points or structural breaks in the literature. Due
to its universality, methods for change point analysis have a vast field of possible applica-
tions - ranging from natural sciences [for example biology and meteorology] to humanities
[economics, finance, social sciences]. Since the seminal papers of Page (1954, 1955) the
problem of detecting change points in time series has received substantial attention in the
statistical literature. The contributions to this field can be roughly divided into the areas
of retrospective and sequential change point analysis.
In the retrospective case, historical data sets are examined with the aim to test for changes
and identify their position within the data. In this setup, the data is assumed to be com-
pletely available before the statistical analysis is started (a-posteriori analysis). A compre-
hensive overview of retrospective change point analysis can be found in Aue and Horva´th
(2013). In many practical applications, however, data arrives consecutively and breaks can
occur at any new data point. In such cases the statistical analysis for changes in the pro-
cessed data has to start immediately with the target to detect changes as soon as possible.
This field of statistics is called sequential change point detection [sometimes also: online
change point detection].
In the major part of the 20th century the problem of sequential change point detection was
tackled using procedures [mostly called control charts, see Lai (1995, 2001) for comprehen-
sive reviews], which are optimized to have a minimal detection delay but do not control the
probability of a false alarm (type I error). A new paradigm was then introduced by Chu
et al. (1996), who use initial data sets and therefrom employ invariance principles to also
control the type I error. The methods developed under this paradigm [see below] can again
be subdivided into closed-end and open-end approaches. In closed-end scenarios monitoring
is stopped at a fixed pre-defined point of time, while in open-end scenarios monitoring can
- in principle - continue forever if no change point is detected.
In the paper at hand we develop a new approach for sequential change point detection
in an open-end scenario. To be more precise let {Xt}t∈Z denote a d-dimensional time
series and let Ft be the distribution function of the random variable Xt at time t. We
are studying monitoring procedures for detecting changes of a parameter θt = θ(Ft), where
θ = θ(F ) is a p-dimensional parameter of a distribution function F on Rd (such as the mean,
variance, correlation etc.). In particular we will develop a decision rule for the hypothesis
2
of a constant parameter, that is
H0 : θ1 = · · · = θm = θm+1 = θm+2 = . . . , (1.1)
against the alternative that the parameter changes (once) at some time m+k? with k? ≥ 1,
that is
H1 : ∃k? ∈ N : θ1 = · · · = θm+k?−1 6= θm+k? = θm+k?+1 = . . . . (1.2)
In this setup, which was originally introduced by Chu et al. (1996), the first m observations
are assumed to be stable and will serve as an initial training set. The problem of sequential
change point detection in the hypotheses paradigm as pictured above has received substan-
tial interest in the literature. Since the seminal paper of Chu et al. (1996) several authors
have worked in this area. Aue et al. (2006), Aue et al. (2009), Fremdt (2014b) and Aue
et al. (2014) developed methodology for detecting changes in the coefficients of a linear
model, while Wied and Galeano (2013) and Pape et al. (2016) considered sequential mon-
itoring schemes for changes in special functionals such as the correlation or variance. A
Mosum-approach was employed by Leisch et al. (2000), Horva´th et al. (2008) or Chen and
Tian (2010) to monitor the mean and linear models, respectively. Recently, Hoga (2017)
used an `1-norm to detect changes in the mean and variance of a multivariate time series,
Kirch and Weber (2018) defined a unifying framework for detecting changes in different
parameters with the help of several statistics and Otto and Breitung (2019) considered a
Backward CUSUM, which monitors changes based on recursive residuals in a linear model.
A helpful but not exhaustive overview of different sequential procedures can be found in
Section 1, in particular Table 1, of Anatolyev and Kosenok (2018). A common feature of
all procedures in the cited literature consists in the comparison of estimators from different
subsamples of the data. To be precise, let X1, . . . , Xm denote an initial training sample
and X1, . . . , Xm, . . . , Xm+k be the available data at time m + k. Several authors propose
to investigate the differences
θˆm1 − θˆm+km+1 , (1.3)
(in dependence of k), where θˆji denotes the estimator of the parameter from the sample
Xi, . . . , Xj . In the sequential change point literature monitoring schemes based on the
differences (1.3) are usually called (ordinary) CUSUM procedures and have been considered
by Horva´th et al. (2004), Aue et al. (2006, 2009, 2014), Schmitz and Steinebach (2010) or
Hoga (2017). Other authors suggest using a function of the differences
{
θˆm1 − θˆm+km+j+1
}
j=0,...,k−1 (1.4)
3
(in dependence of k) and the corresponding procedures are usually called Page-CUSUM
tests [see Fremdt (2014b), Aue et al. (2015), or Kirch and Weber (2018) among others]. As
an alternative we propose - following ideas of Dette and Go¨smann (2018) - a monitoring
scheme based on a function of the differences
{
θˆm+j1 − θˆm+km+j+1
}
j=0,...,k−1 . (1.5)
The intuitive advantage of (1.5) over (1.3) is the screening for all possible positions of the
change point, which takes into account that the change point not necessarily comes with
observation Xm+1 and so θˆ
m+k
m+1 maybe ‘corrupted’ by pre-change observations. This issue
is also partially addressed by (1.4), where different positions are examined and compared
with the estimator of the parameter from the training sample. We will demonstrate in Sec-
tion 4 that sequential monitoring schemes based on the differences (1.5) yield a substantial
improvement in power compared to the commonly used methods based on (1.3) and (1.4).
To avoid misunderstandings, the reader should note that a (total) comparison based on
differences of the form (1.5), is typically also called a CUSUM-approach in the retrospec-
tive change point analysis [see Aue and Horva´th (2013) for a comprehensive overview of
(retrospective) change point analysis].
The present paper is devoted to a rigorous statistical analysis of a sequential monitoring
based on the differences defined in (1.5) in the context of an open-end scenario. In Section 2
we introduce the new procedure and develop a corresponding asymptotic theory to obtain
critical values such that monitoring can be performed at a controlled type I error. The
theory is broadly applicable to detect changes in a general parameter θ of a multivariate
time series. As all monitoring schemes in this context the method depends on a threshold
function and we also discuss the choice of this function. In particular we establish an
interesting result regarding this choice and establish a connection to corresponding ideas
made by Horva´th et al. (2004) and Fremdt (2014b), which may also be of interest in closed-
end scenarios. In Section 3 we discuss several special cases and demonstrate that the new
methodology is applicable to detect changes in the mean and the parameters of a linear
model. Finally, we present a small simulation study in Section 4, where we compare our
approach to those developed by Horva´th et al. (2004) and Fremdt (2014b). In particular
we demonstrate that the monitoring scheme based on the differences (1.5) yields a test
with a controlled type I error and a smaller type II error than the procedures in the cited
references.
4
2 Asymptotic properties
Throughout this paper let F denote a d-dimensional distribution function and θ = θ(F ) a
p-dimensional parameter of F . We will denote by
Fˆ ji (z) =
1
j − i+ 1
j∑
t=i
I{Xt ≤ z} (2.1)
the empirical distribution function of observations Xi, . . . , Xj (here the inequality is under-
stood component-wise) and consider the canonical estimator θˆji = θ(Fˆ
j
i ) of the parameter
θ from the sample Xi, . . . , Xj .
To test the hypotheses (1.1) and (1.2) in the described online setting in a open-end
scenario we propose a monitoring scheme defined by
Eˆm(k) = m
−1/2 k−1max
j=0
(k − j)
∥∥∥θˆm+j1 − θˆm+km+j+1∥∥∥
Σˆ−1
, (2.2)
where the statistic Σˆ denotes an estimator of the long-run variance matrix Σ (defined in
Assumption 2.2) and the symbol ‖v‖2A = v>Av denotes a weighted norm of the vector v
induced by the positive definite matrix A. The monitoring is then performed as follows.
With observation Xm+k arriving, one computes Eˆm(k) and compares it to an appropriate
threshold function, which is also called weighting function in the literature, say w. If
Eˆm(k) > cαw(k/m) (2.3)
monitoring is stopped and the null hypothesis (1.1) is rejected in favor of the alternative
(1.2). If the inequality (2.3) does not hold, monitoring is continued with the next observa-
tion Xm+k+1. We will derive the limiting distribution of sup
∞
k=1 Eˆm(k)/w(k/m) in Theorem
2.6 below to determine the constant cα involved in (2.3), such that the test keeps a nominal
level of α (asymptotically as m→∞).
Remark 2.1 The statistic (2.2) is related to a detection scheme, which was recently pro-
posed by Dette and Go¨smann (2018) for the closed-end case, where monitoring ends with
observation mT , for some T ∈ N. These authors considered the statistic
Dˆm(k) = m
−3/2 k−1max
j=0
(m+ j)(k − j)‖θˆm+j1 − θˆm+km+j+1‖Σˆ−1 , (2.4)
and showed
mT
max
k=1
Dˆm(k)
w(k/m)
D
=⇒ max
t∈[0,T ]
max
s∈[0,T ]
|(s+ 1)W (t+ 1)− (t+ 1)W (s+ 1)|
w(t)
, (2.5)
where W denotes a p-dimensional Brownian motion and throughout this paper the symbol
5
D
=⇒ denotes weak convergence (in the space under consideration). However, this statistic
cannot be considered in an open-end scenario for the typical threshold functions considered
in the literature satisfying lim supt→∞ t/w(t) <∞ (in this case the limit on the right-hand
side of (2.5) would be almost surely infinite for T =∞). As threshold functions satisfying
lim supt→∞ t2/w(t) < ∞ will cause a loss in power as demonstrated in an unpublished
simulation study, we propose to replace the factor (m+ j) in (2.4) by the size of the initial
sample m, which leads to the monitoring scheme defined by (2.2).
To discuss the asymptotic properties of our approach, we require the following notation.
The symbol
P
=⇒ denotes convergence in probability. The process {W (s)}s∈[0,∞) will usually
represent a standard p-dimensional Brownian motion. For a vector v ∈ Rd, we denote by
|v| = (∑di=1 v2i )1/2 its Euclidean norm. For the sake of a clear distinction we will employ
n
sup
i=1
a(i)
for discrete indexing (with integer arguments) and
sup
0≤x≤1
a(x)
for continuous indexing (with arguments taken from the interval [0, 1] or another subset of
R).
Next, we define the influence function (assuming its existence) by
IF(x, F, θ) = lim
ε↘0
θ((1− ε)F + εδx)− θ(F )
ε
, (2.6)
where δx(z) = I{x ≤ z} is the distribution function of the Dirac measure at the point
x ∈ Rd and the inequality in the indicator is again understood component-wise. We will
focus on functionals that allow for an asymptotic linearization in terms of the influence
function, that is
θˆji − θ = θ(Fˆ ji )− θ(F ) =
1
j − i+ 1
j∑
t=i
IF(Xt, F, θ) +Ri,j (2.7)
with asymptotically negligible remainder terms Ri,j . Finally, for the sake of readability we
introduce the following abbreviation
IF t = IF(Xt, Ft, θ) ,
where Ft is again the distribution function of Xt. Under the null hypothesis (1.1) we will
impose the following assumptions on the underlying time series.
6
Assumption 2.2 (Approximation) The time series {Xt}t∈Z is (strictly) stationary, such
that Ft = F for all t ∈ Z. Further, for each m ∈ N there exist two independent, p-
dimensional standard Brownian motions Wm,1 and Wm,2, such that for some positive con-
stant ξ < 1/2 the following approximations hold
∞
sup
k=1
1
kξ
∣∣∣∣ m+k∑
t=m+1
IF t −
√
ΣWm,1(k)
∣∣∣∣ = OP(1) (2.8)
and
1
mξ
∣∣∣∣ m∑
t=1
IF t −
√
ΣWm,2(k)
∣∣∣∣ = OP(1) (2.9)
as m→∞, where Σ = ∑t∈Z Cov (IF0, IF t) ∈ Rp×p denotes the long-run variance matrix
of the process
{IF t}t∈Z, which we assume to exist and to be non-singular.
Assumption 2.3 (Threshold function) The threshold function w : R≥0 → R+ is uniformly
continuous, has a positive lower bound, say `w > 0, and satisfies
lim sup
t→∞
t
w(t)
<∞ .
Assumption 2.4 (Linearization) The remainder terms in the linearization (2.7) satisfy
k
max
i,j=1
i<j
(j − i+ 1)√
k
|Ri,j | = o(1) (2.10)
as k →∞ with probability one.
Remark 2.5 Let us give a brief explanation on the assumption stated above.
(i) Assumption 2.2 is a uniform invariance principle and frequently used in the (sequen-
tial) change point literature [see for example Aue et al. (2006) or Fremdt (2014b)
among others]. In the one-dimensional case, Assumption (2.8) was verified by Aue
and Horva´th (2004) for different classes of time series including GARCH or strongly
mixing processes and can be easily extended to the multivariate case considered here.
Assumption 2.2 is stronger than a functional central limit theorem, which is usually
sufficient to work in a closed-end setup [see for example Wied and Galeano (2013),
Pape et al. (2016) or Dette and Go¨smann (2018)]
(ii) Assumption 2.3 gives necessary restrictions on the feasible set of threshold functions,
which are required for the existence of a (weak) limit derived in Theorem 2.6. It is also
worth mentioning that the assumption of a lower bounded threshold can be relaxed
7
to
lim
t→0
tγ
w(t)
= 0
for a constant 0 ≤ γ < 1/2. In this case, the assumption for the remainders in (2.10)
has to be replaced by
m+k
max
1=i,j
i<j
(j − i+ 1)
(m+ k)1/2−γ
|Ri,j | = o(1) a.s.
For the sake of a transparent presentation we use the assumption of a lower bound
here as this also simplifies the technical arguments in the proofs later on.
(iii) Assumption 2.4 is crucial for the proof of our main theorem and directly implies
∞
sup
k=1
m+k
max
1=i,j
i<j
(j − i+ 1)
(m+ k)1/2
|Ri,j | = ∞sup
k=m+1
k
max
i,j=1
i<j
(j − i+ 1)
k1/2
|Ri,j | = o(1) a.s.
Note that in the location model θ(F ) = EF [X] we have Ri,j = 0 and (2.10) obviously
holds. In general however, Assumption 2.4 is highly non-trivial and crucially depends
on the structure of the functional θ and the time series {Xt}t∈Z. For a comprehensive
discussion the reader is referred to Dette and Go¨smann (2018), where the estimate
(2.10) has been verified in probability for different functionals including quantiles and
variance.
The following result is the main theorem of this section.
Theorem 2.6 Assume that the null hypothesis (1.1) and Assumptions 2.2 - 2.4 hold. If
further Σˆm is a consistent and non-singular estimator of the long-run variance matrix Σ it
holds that
∞
sup
k=1
Eˆm(k)
w(k/m)
D
=⇒ sup
0≤t<∞
max
0≤s≤t
t+ 1
w(t)
∣∣∣W( s
s+ 1
)
−W
( t
t+ 1
)∣∣∣ , (2.11)
where W is a p-dimensional Brownian motion with independent components.
For the sake of completeness, the reader should note that due to Assumption 2.3 the
asymptotic behaviour of the threshold guarantees that the random variable on the right-
hand side of (2.11) is finite (with probability one).
In light of Theorem 2.6 one can choose a constant c(α), such that
P
(
sup
0≤t<∞
max
0≤s≤t
t+ 1
w(t)
∣∣∣W( s
s+ 1
)
−W
( t
t+ 1
)∣∣∣ > c(α)) ≤ α . (2.12)
8
The following corollary then states that our approach leads to a level α detection scheme.
Corollary 2.7 Grant the Assumptions of Theorem 2.6 and further let c(α) satisfy inequality
(2.12), then
lim sup
m→∞
P
(
∞
sup
k=1
Eˆm(k)
w(k/m)
> c(α)
)
≤ α .
The limit distribution obtained in Theorem 2.6 strongly depends on the considered
threshold. A special family of thresholds that has received considerable attention in the
literature [see Horva´th et al. (2004), Fremdt (2014b), Kirch and Weber (2018) among many
others] is given by
wγ(t) = (1 + t) max
{( t
1 + t
)γ
, ε
}
with 0 ≤ γ < 1/2 , (2.13)
where the cutoff ε > 0 can be chosen arbitrary small in applications and only serves to
reduce the assumptions and technical arguments in the proof [see also Wied and Galeano
(2013)]. With these functions the limit distribution in (2.11), with the threshold function
as the denominator, can be simplified to an expression that is more easily tractable via
simulations. Straightforward calculations yield that Assumption 2.3 is satisfied by the
function wγ and the limit distribution in Theorem 2.6 simplifies as follows.
Corollary 2.8 For a p-dimensional Brownian motion W with independent components it
holds that
sup
0≤t<∞
max
0≤s≤t
t+ 1
wγ(t)
∣∣∣W( s
s+ 1
)
−W
( t
t+ 1
)∣∣∣ D= sup
0≤t<1
max
0≤s≤t
1
max{tγ , ε}
∣∣∣W (t)−W (s)∣∣∣ := L1,γ .
For the investigation of the consistency of the monitoring scheme (2.2) we require the
following assumption.
Assumption 2.9 Under the alternative H1 defined in (1.2) let
θ(1) := θ(F1) = θ(F2) = · · · = θ(Fm+k∗) 6= θ(2) := θ(Fm+k∗+1) = θ(Fm+k∗+1) = · · ·
Further assume that k∗ is independent of m and that the process {IF t} is of the following
order before and after the change, respectively,
1√
m
∣∣∣∣m+k∗∑
t=1
IF t
∣∣∣∣ = OP(1) and 1√m
∣∣∣∣ 2m∑
t=m+k∗+1
IF t
∣∣∣∣ = OP(1) . (2.14)
9
Additionally assume that the remainders defined in (2.4) satisfy
|R1,m+k∗ | = OP
( 1√
m
)
and |Rm+k∗+1,2m| = OP
( 1√
m
)
. (2.15)
Remark 2.10 The assumptions stated above are substantially weaker than those used
to investigate the asymptotic properties of sup∞k=1 Eˆm(k)/w(
k
m) under the null hypothesis.
Basically, we only assume reasonable behavior of the time series before and after the change
point and can drop the uniform approximation in Assumption 2.2 and the uniform negligi-
bility of the remainders in Assumption 2.4. It is easy to see, that (2.14) is already satisfied
if both, the phases before and after the change fulfill a central limit theorem. Finally, it is
worth mentioning that one can also derive the subsequent results when replacing the 2m
by cm for an arbitrary constant c > 1, however - for the sake of better readability - we will
work with this (minimally stricter) assumption.
The next Theorem yields consistency under the alternative hypothesis.
Theorem 2.11 Assume that the alternative hypothesis (1.2) and Assumptions 2.3 and 2.9
hold. If further Σˆm is a consistent and non-singular estimator of the long-run variance
matrix Σ it holds that
∞
sup
k=1
Eˆm(k)
w(k/m)
P
=⇒∞ .
Consequently,
lim
m→∞P
(
∞
sup
k=1
Eˆm(k)
w(k/m)
> c
)
= 1
for any constant c ∈ R.
3 Some specific change point problems
In this section we briefly illustrate how the theory developed in Section 2 can be employed
to construct monitoring schemes for a specific parameter of the distribution function. For
the sake of brevity we restrict ourselves to the mean and the parameters in a linear model.
Other examples such as the variance or quantiles can be found in Dette and Go¨smann
(2018).
10
3.1 Changes in the mean
The sequential detection of changes in the mean
µ(F ) = EF [X] =
∫
Rd
xdF (x) .
has been extensively discussed in the literature [see Aue and Horva´th (2004), Fremdt
(2014b) or Hoga (2017) among many others].
Is is easy to verify (and well known), that the influence function for the mean is given
by
IF(x, F, µ) = x− EF [X] ,
and Assumption 2.4 and the second part of Assumption 2.9 are obviously satisfied in this
case since we have Ri,j = 0 for all i, j. For the remaining assumptions in Section 2 it
now suffices that the centered time series
{
Xt − E[Xt]
}
t∈Z fulfills Assumption 2.2, which
also implies the remaining part of Assumption 2.9 [see also the discussion in Remark 2.5].
In this situation both, Theorem 2.6 and Theorem 2.11 are valid provided that the chosen
threshold fulfills Assumption 2.3.
3.2 Changes in linear models
Consider the time-dependent linear model
Yt = P
>
t βt + εt , (3.1)
where the random variables {Pt}t∈N are the Rp-valued predictors, βt ∈ Rp is a p-dimensional
parameter and {εt}t∈N is a centered random sequence independent of {Pt}t∈N. The identi-
fication of changes in the vector of parameters in the linear model represents the prototype
problem in sequential change point detection as it has been extensively studied in the lit-
erature [see Chu et al. (1996), Horva´th et al. (2004), Aue et al. (2009), Fremdt (2014b),
among many others].
This situation is covered by the general theory developed in Section 2 and 3. To be precise
let
Xt = (P
>
t , Yt)
> ∈ Rd , d = p+ 1 and t = 1, 2 . . . (3.2)
11
be the the joint vectors of predictor and response with (joint) distribution function Ft, such
that the marginal distributions of Yt and Pt are given by
Ft,Y = Ft(∞, . . . ,∞, ·) and Ft,P = Ft(·, . . . , ·,∞) ,
respectively, where we will assume that the predictor sequence is stationary, that is Ft,P =
FP . In a first step we will consider the case, where the moment matrix
M := E[P1P>1 ] =
∫
Rd
p · p>dFP (p)
is known (we will discuss later on why this assumption is non-restrictive) and non-singular.
In this setup, the parameter βt can be represented as a functional of the distribution function
Ft, that is
βt = β(Ft) := M
−1 ·
∫
Rd
p · ydFt(y, p) = M−1 · E
[
PtYt
]
,
which leads to the estimators
βˆji = β(Fˆ
j
i ) =
M−1
j − i+ 1
j∑
t=i
PtYt (3.3)
from the sample (Pi, Yi), . . . , (Pj , Yj). To compute the influence function, let (p, y) ∈ Rp×R,
then
IF((p, y), Ft, β) = lim
η↘0
β
(
(1− η)Ft + εδ(p,y)
)− β(Ft)
η
= lim
η↘0
M−1
[
(1− η)E[PtYt] + ηpy
η
− βt
η
]
= M−1
(
py − E[PtYt]
)
,
which is the influence function (for β) in the linear model stated above [see for example
Hampel et al. (1986) for a comprehensive discussion of influence functions]. In the following,
we will use the notation IF t = IF
(
Xt, Ft, β
)
again. Note that
IF t = M−1
(
PtYt − E[PtYt]
)
= M−1PtYt − βt , (3.4)
12
which directly gives E[IF t] = 0. Under the null hypothesis the random sequence (Xt)t∈N
is stationary and the linearization defined in (2.7) simplifies to
βˆji − β1 = β(Fˆ ji )− β1 =
M−1
j − i+ 1
j∑
t=i
PtYt − β1 = 1
j − i+ 1
j∑
t=i
(
M−1PtYt − βt
)
=
1
j − i+ 1
j∑
t=i
IF t .
(3.5)
Consequently, the remainders in (2.7) vanish and Assumption 2.4 is obviously satisfied.
Next, note that the long-run variance matrix is given by
Σ =
∑
t∈Z
Cov
(IF0, IF t) = M−1ΓM−1 (3.6)
with Γ =
∑
t∈Z Cov
(
Y0P0, YtPt
)
, which can be estimated by Σˆ = M−1ΓˆM−1 where Γˆ is
an estimator for Γ. Observing (3.5) it is now easy to see that in the resulting statistic Eˆm
the matrix M cancels out, that is
Eˆm(k) = m
−1/2 k−1max
j=0
(k − j)
∥∥∥βˆm+j1 − βˆm+km+j+1∥∥∥
Σˆ−1
= m−1/2 k−1max
j=0
(k − j)
∥∥∥ 1
m+ j
m+j∑
t=1
YtPt − 1
k − j
m+k∑
t=m+j+1
YtPt
∥∥∥
Γˆ−1
(3.7)
and therefore does not depend on the matrix M . We therefore obtain the following result,
which describes the asymptotic properties of the monitoring scheme based on the statistic
Eˆm for a change in the parameter in the linear regression model (3.1). The proof is a direct
consequence of Theorem 2.6 and 2.11.
Corollary 3.1 Assume that the predictor sequence {Pt}t∈N is strictly stationary with a non-
singular second moment matrix M = E[P1P>1 ]. Let Γˆm denote a non-singular, consistent
estimator of the non-singular long-run variance matrix Γ defined in (4.8). Further suppose
that the sequences {Pt}t∈N and {εt}t∈N are independent and let the threshold function under
consideration fulfill Assumption 2.3.
(i) Under the null hypothesis H0 of no change assume additionally that the sequence
{IF t}t∈N defined in (3.4) admits the approximation in Assumption 2.2. Then moni-
toring based on the statistic Eˆ in (3.7) is an asymptotic level α procedure.
(ii) Under the alternative hypothesis H1 assume that {IF t}t∈N fulfills (2.14) of Assump-
tion 2.9. Then the monitoring based on the statistic Eˆ in (3.7) is consistent.
13
4 Finite sample properties
In this section we investigate the finite sample properties of our monitoring procedure
and demonstrate its superiority with respect to the available methodology. We choose the
following two statistics as our benchmark
Qˆm(k) :=
k
m1/2
∥∥∥θˆm1 − θˆm+km ∥∥∥
Σˆ−1
,
Pˆm(k) :=
k−1
max
j=0
k − j
m1/2
∥∥∥θˆm1 − θˆm+km+j+1∥∥∥
Σˆ−1
.
(4.1)
The procedure based on Qˆ was originally proposed by Horva´th et al. (2004) for detecting
changes in the parameters of linear models and since then reconsidered for example by Aue
et al. (2012), Wied and Galeano (2013) and Pape et al. (2016) for the detection of changes
in the CAPM-model, correlation and variances, respectively. A statistic of the type Pˆ
was recently proposed by Fremdt (2014b) and has been already reconsidered by Kirch and
Weber (2018). In the simulation study we will restrict ourselves to the commonly used class
of threshold functions wγ defined in (2.13), where we set the involved, technical constant
ε = 0.
Under the assumptions made in Section 2, it can be shown by similar arguments as given
in Appendix A that
∞
sup
k=1
Qˆm(k)
wγ(k/m)
D
=⇒ sup
0≤t<1
|W (t)|
max{tγ , ε} =: L2,γ (4.2)
and
∞
sup
k=1
Pˆm(k)
wγ(k/m)
D
=⇒ sup
0≤t<1
max
0≤s≤t
1
max{tγ , ε}
∣∣∣W (t)− 1− t
1− sW (s)
∣∣∣ =: L3,γ , (4.3)
where W denotes a p-dimensional Brownian motion. For detailed proofs (under slightly
different assumptions) of (4.2) and (4.3), the reader is relegated to Horva´th et al. (2004)
and Fremdt (2014b), where procedures of these types are considered in the special case of
a linear model.
Recall the notation of L1,γ introduced in Corollary 2.8. By (4.2), (4.3) and Corollary 2.7
the necessary critical values for the procedures Eˆ, Qˆ and Pˆ combined with threshold wγ
are given as the (1− α)-quantiles of the distributions L1,γ , L2,γ and L3,γ , respectively and
can easily be obtained by Monte Carlo simulations. The quantiles are listed in Table 1 for
dimensions p = 1 and p = 2 and have been calculated by 10000 runs simulating the cor-
responding distributions where the underlying Brownian motions have been approximated
on a grid of 5000 points. In Sections 4.1 and 4.2 below, we will examine the finite sample
properties of the three statistics for the detection of changes in the mean and in the regres-
14
sion coefficients of a linear model, respectively. All subsequent results presented in these
sections are based on 1000 independent simulation runs and a fixed test level of α = 0.05.
L1,γ L2,γ L3,γ
p γ \α 0.01 0.05 0.1 0.01 0.05 0.1 0.01 0.05 0.1
1
0 2.9762 2.4721 2.2175 2.8262 2.2599 1.9914 2.7912 2.2365 1.9497
0.25 3.1050 2.5975 2.3542 2.9638 2.4296 2.1758 2.9445 2.3860 2.1060
0.45 3.4269 2.9701 2.7398 3.3817 2.9241 2.7002 3.3015 2.7992 2.5437
2
0 3.4022 2.8943 2.6562 3.2272 2.6794 2.4008 3.2461 2.6957 2.4266
0.25 3.5279 3.0948 2.7781 3.3322 2.7981 2.5481 3.3630 2.8433 2.5911
0.45 3.8502 3.3912 3.1509 3.7010 3.2046 2.9543 3.7467 3.2966 3.0620
Table 1: (1-α)-quantiles of the distributions L1,γ, L2,γ and L3,γ for different choices of
γ and different dimensions of the parameter. The cutoff constant was set to ε = 0. The
results for L2,γ and L3,γ for p = 1 are taken from Fremdt (2014b) and Horva´th et al. (2004),
respectively.
4.1 Changes in the mean
In this section we will compare the finite sample properties of the procedures based on the
statistics Eˆ, Pˆ and Qˆ for changes in the mean as outlaid in Section 3.1. Here we test the
null hypothesis of no change which is given by
H0 : µ1 = · · · = µm = µm+1 = µm+2 = . . . , (4.4)
while the alternative, that the parameter µt changes beyond the initial data set, is defined
as
H1 : ∃k? ∈ N : µ1 = · · · = µm+k?−1 6= µm+k? = µm+k?+1 = . . . . (4.5)
We will consider two different data generating models, a white noise process and an autore-
gressive process given by
(M1) Xt i.i.d. ∼ N (0, 1) ,
(M2) Xt = 0.1Xt−1 + εt with εt i.i.d. ∼ N (0, 1) .
For the AR(1)-process specified in model (M2), we create a burn-in sample of 100 obser-
vations in the first place. To simulate the alternative hypotheses, changes in the mean are
added to the data, that is
Xδt =
Xt if t < m+ k∗ ,Xt + δ if t ≥ m+ k∗ ,
15
where δ = E[Xm+k∗ ]− E[Xm+k∗−1] denotes the desired change amount. For the necessary
covariance estimation we employ the well known quadratic spectral estimator [see (Andrews,
1991)] with its implementation in the R-package ‘sandwich’ [see Zeileis (2004)]. To take
into account the possible appearance of changes only the initial stable segment X1, . . . , Xm
is used for this estimate, which is standard in the literature [see for example Horva´th et al.
(2004), Wied and Galeano (2013), or Dette and Go¨smann (2018) among many others].
In Table 2 we display the type 1-Error for both time series models and different choices of γ
in the threshold functions. The principle observation is, that all three statistical procedures
offer a reasonable approximation of the desired nominal level of α = 0.05. The results for
the dependent model (M2) are slightly worse than those for the white noise model (M1).
This effect may be caused by a less precise estimation of the long-run variance for small
sample sizes. Accordingly, this effect is weaker for the case m = 100.
In Figures 1, 2, 3 and 4 we illustrate the power of the procedures under the alternative hy-
pothesis for increasing values of the change and different change positions for combinations
of γ = 0, γ = 0.45 and m = 50, m = 100. The basic tendency in all four plots is similar:
While the procedures behave similar for a change close to the initial data set (first row),
the method based on Eˆ is clearly superior to the others the more the distance to the initial
set grows. To give an example, consider the left plot of the last row in Figure 1. Here the
test based on the statistic Eˆ already has a power of 32.9% for a change of µ = 1 whereas
the tests based on the statistics Pˆ and Qˆ have power of 24.4% and 22.7%, respectively. The
superior performance of Eˆ can most likely be explained by the more accurate estimate of
the pre-change parameter by θˆm+j1 , while the the other statistics only involve the estimator
θˆm1 [see formulas (2.2) and (4.1)].
For the sake of an appropriate understanding of our findings, the reader should be aware
of the fact, that - although we consider open-end procedures here - simulations have to be
stopped eventually. Here we chose this stopping point as 1000 (m = 50) or 3000 (m = 100)
observations and it is expectable that the testing power of all procedures increases with
a later stopping point. Therefore the observed superiority of Eˆ refers to the type 2-Error
until the specified stopping point.
16
(M1) (M2)
m γ Eˆ Qˆ Pˆ Eˆ Qˆ Pˆ
50
0 5.4% 5.2% 5.5% 8.1% 7.1% 8.2%
0.25 5.0% 4.9% 5.4% 8.3% 7.0% 9.5%
0.45 4.5% 3.6% 4.8% 7.6% 5.6% 9.2%
100
0 4.2% 4.3% 4.9% 6.9% 6.5% 6.9%
0.25 5.0% 4.9% 5.9% 7.6% 6.5% 7.0%
0.45 6.0% 4.9% 7.0% 6.5% 4.8% 7.7%
Table 2: Type 1-Error for the open-end procedures based on Eˆ, Qˆ and Pˆ . The size of the
known stable data was set to m = 50 (upper part), m = 100 (lower part).
17
(M1) (M2)
m+ k∗ = 51
m+ k∗ = 101
m+ k∗ = 501
m+ k∗ = 801
Figure 1: Power of the monitoring procedures for a change in the mean based on the statis-
tics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0 and m = 50.
18
(M1) (M2)
m+ k∗ = 51
m+ k∗ = 101
m+ k∗ = 501
m+ k∗ = 801
Figure 2: Power of the monitoring procedures for a change in the mean based on the statis-
tics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0.45 and m = 50.
19
(M1) (M2)
m+ k∗ = 101
m+ k∗ = 501
m+ k∗ = 1001
Figure 3: Power of the monitoring procedures for a change in the mean based on the statis-
tics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0 and m = 100.
20
(M1) (M2)
m+ k∗ = 101
m+ k∗ = 501
m+ k∗ = 1001
Figure 4: Power of the monitoring procedures for a change in the mean based on the statis-
tics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0.45 and m = 100.
21
4.2 Changes in linear models
In this section we present some simulation results for the detection of changes in the linear
model (3.1). We aim to detect changes in the unknown parameter vector βt ∈ Rp by testing
the null hypothesis
H0 : β1 = · · · = βm = βm+1 = βm+2 = . . . , (4.6)
against the alternative that the parameter βt changes beyond the initial data set, that is
H1 : ∃k? ∈ N : β1 = · · · = βm+k?−1 6= βm+k? = βm+k?+1 = . . . . (4.7)
To be precise, we consider the model (3.1) with p = 2 and the following choice of predictors
(LM1) Pt = (1, Zt)
> ,
(LM2) Pt = (1, 1 +Gt)
> with Gt = σ¯tZt and σ¯2t = 0.5 + 0.2Zt−1 + 0.3σ¯2t−1 ,
where Zt denotes an i.i.d. sequence of N (0, 0.5) random variables in both models. The
parameter vector is fixed at βt = (1, 1) under the null hypothesis and to examine the
alternative hypothesis, changes are added to its second component, that is
βδt =
(1, 1)> if t < m+ k∗ ,(1, 1 + δ)> if t ≥ m+ k∗ .
For both scenarios we simulated the residuals εt in model (3.1) as i.i.d. N (0, 0.5) sequences.
Note that the models specified above have been already considered by Fremdt (2014b). As
pointed out in Section 3.2 the asymptotic variance that needs to be estimated within our
procedures is given by
Γ =
∑
t∈Z
Cov
(
P0Y0, PtYt
)
. (4.8)
We estimate this quantity based on the stable segment of observations (Y1, P1), . . . , (Ym, Pm)
using the well known quadratic spectral estimator [see Andrews (1991)] with its implemen-
tation in the R-package ‘sandwich’ [see Zeileis (2004)].
The problem of detecting changes in the parameter of the linear model has also been
addressed using partial sums of the residuals εˆt = Yt − P>t βˆI in statistics similar to
(4.1), where βˆI is an initial estimate of β computed from the initial stable segment [see
for example Chu et al. (1996), Horva´th et al. (2004), Fremdt (2014a) among many others].
Our approach directly compares estimators for the vector βt, which are derived using the
22
general methodology introduced in Section 2 and 3. The resulting statistics are obtained
replacing θˆ by βˆ in equation (4.1). We also refer to Leisch et al. (2000) for a comparison of
residual based methods with methods using the estimators directly (these authors consider
a statistic similar to Qˆ).
In Table 3 we display the approximation of the nominal level for the three statistics with
different values of the parameter γ in the threshold function, where monitoring was stopped
after 1500 observations. We observe a reasonable approximation of the nominal level 5% in
the case γ = 0, while the rejection probabilities for for γ = 0.25 or γ = 0.45 slightly exceed
the desired level of 5%. The fact that larger values of γ ∈ [0, 1/2) can lead to a worse
approximation of the desired type 1-Error has also been observed by other authors [see, for
example, Wied and Galeano (2013)] and can be explained by a more sensitive threshold
function at the monitoring start if γ is chosen close to 1/2. Overall, the approximation is
slightly better for the independent case in model (LM1).
In Figure 5 we compare the power with respect to the change amount for different change
positions, where we restrict ourselves to the case γ = 0 for the sake of brevity. The
results are very similar to those provided for the mean functional in Section 4.1. Again
the monitoring scheme based on Eˆ outperforms the procedures based on Qˆ and Pˆ , and the
superiority is larger for a later change. We omit a detailed discussion and summarize that
the empirical findings have indicated superiority (w.r.t. testing power) of the monitoring
scheme based on the statistic Eˆ.
(LM1) (LM2)
γ Eˆ Qˆ Pˆ Eˆ Qˆ Pˆ
0 6.4% 6.5% 6.7% 7.2% 6.7% 7.2%
0.25 7.6% 8.8% 9.1% 8.5% 9.6% 9.5%
0.45 12.0% 12.2% 12.1% 12.6% 12.2% 12.6%
Table 3: Type 1-Error for the open-end procedures based on Eˆ, Qˆ and Pˆ . The size of the
known stable data was set to m = 100.
23
(LM1) (LM2)
m+ k∗ = 51
m+ k∗ = 201
m+ k∗ = 501
Figure 5: Power of the monitoring procedures for a change in the regression parameters
for the open-end procedures based on the statistics Eˆ (solid line), Qˆ (dashed line) and Pˆ
(dotted line) with γ = 0 and m = 100.
24
5 Closed-end scenarios
It is worthwhile to mention that the theory developed so far also covers the case of closed-
end scenarios [sometimes also called finite time horizon in the literature]. In this section,
we will very briefly discuss this situation and present a small batch of simulation results,
which also indicate the superiority of the statistic Eˆ for closed-end scenarios. Note that
the null hypothesis in this setup is given by
H0 : θ1 = · · · = θm = θm+1 = θm+2 = . . . = θTm , (5.1)
which is tested against the alternative that the parameters changes (once) at some time
m+ 1 ≤ m+ k? ≤ Tm, that is
H1 : ∃k? ∈ N : θ1 = · · · = θm+k?−1 6= θm+k? = θm+k?+1 = . . . = θTm . (5.2)
Here the factor T ∈ N controls the length of the monitoring period compared to the size of
the initial data set. Under the assumptions stated in Section 2, we can prove a corresponding
statement of Theorem 2.6 and Corollary 2.8.
Theorem 5.1 Assume that the null hypothesis (5.1) and Assumptions 2.2 - 2.4 hold. If
further Σˆm is a consistent and non-singular estimator of the long-run variance matrix Σ it
holds that
Tm
sup
k=1
Eˆm(k)
wγ(k/m)
D
=⇒ max
0≤t<T
max
0≤s≤t
1
wγ(t)
∣∣∣ 1 + t
1 + s
W (1 + s)−W (1 + t)
∣∣∣
D
= max
0<t<T/(T+1)
max
0≤s≤t
1
max{tγ , ε}
∣∣∣W (t)−W (s)∣∣∣ =: L1,γ(T ) ,
(5.3)
where W is a p-dimensional Brownian motion with independent components.
The proof of Theorem 5.1 follows by a straightforward adaption of the proofs of Theorem
2.6 and Corollary 2.8 given in Appndix A. The corresponding results for the tests based on
statistics Qˆ and Pˆ defined in (4.1) read as follows
Tm
max
k=1
Qˆm(k)
wγ(k/m)
D
=⇒ max
0<t<T/(T+1)
|W (t)|
max{tγ , ε} =: L2,γ(T ) (5.4)
and
Tm
max
k=1
Pˆm(k)
wγ(k/m)
D
=⇒ max
0<t<T/(T+1)
max
0≤s≤t
1
max{tγ , ε}
∣∣∣W (t)− 1− t
1− sW (s)
∣∣∣ =: L3,γ(T ) . (5.5)
25
To complete the discussion on closed-end scenarios we will display a small batch of sim-
ulation results for the detection of changes in the mean as described in Section 3.1. For
the sake of brevity, only the choice T = 4 is examined here [unpublished simulation results
show similar outcomes for other choices of T ]. The remaining simulation settings are the
same as used for the simulation study presented in Section 4.1 and in Table 4 we display
the necessary critical values defining the rejection regions for the different procedures.
The approximation of the nominal level under the null hypothesis is displayed in Table 5
and in Figures 6 and 7 the power of the different procedures with respect to change amount
and change position for γ = 0 and γ = 0.45 is illustrated. The results are very similar to
the open-end scenario discussed in Section 4 and confirm the superiority of tests based on
the statistic Eˆ.
L1,γ(4) L2,γ(4) L3,γ(4)
γ \α 0.01 0.05 0.1 0.01 0.05 0.1 0.01 0.05 0.1
0 2.7080 2.1995 1.9811 2.5145 1.9826 1.7380 2.5572 2.0435 1.8019
0.25 2.9558 2.4345 2.2220 2.7602 2.2223 1.9799 2.8210 2.2986 2.0750
0.45 3.3850 2.9371 2.6994 3.2238 2.7398 2.4952 3.3156 2.8626 2.6274
Table 4: (1-α)-quantiles of the distributions L1,γ(4), L2,γ(4) and L3,γ(4) for different choices
of γ. The cutoff constant was set to ε = 0 and the dimension is p = 1.
(M1) (M2)
γ Eˆ Qˆ Pˆ Eˆ Qˆ Pˆ
0 5.2% 5.3% 5.3% 8.0% 7.3% 7.4%
0.25 5.4% 6.0% 5.8% 8.6% 7.5% 8.1%
0.45 4.9% 5.4% 4.5% 6.1% 6.4% 5.9%
Table 5: Type 1-Error for the closed-end procedures for a change in the mean based on the
statistics Eˆ, Qˆ and Pˆ with a training data set of size m = 200 and a monitoring window of
T = 4.
26
(M1) (M2)
m+ k∗ = 201
m+ k∗ = 501
m+ k∗ = 801
Figure 6: Power of the (closed-end) monitoring procedures for a change in the mean based
on the statistics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0, m = 200
and T = 4.
27
(M1) (M2)
m+ k∗ = 201
m+ k∗ = 501
m+ k∗ = 801
Figure 7: Power of the (closed-end) monitoring procedures for a change in the mean based
on the statistics Eˆ (solid line), Qˆ (dashed line) and Pˆ (dotted line) with γ = 0.45, m = 200
and T = 4.
28
6 Two applications
In this section, we discuss some practical issues that arise when our methodology is applied
to monitor for changes in linear models. As a summary of our recommendations, we provide
Algorithm 1 and two elaborated examples related to the United Kingdom European Union
membership referendum 2016. We consider the linear model
Yt = β1P1,t + β2P2,t + εt , (6.1)
where Yt is a real-valued response and (P1,t, P2,t) is a two-dimensional predictor, which is
a special case of the linear model considered in Section 3.2.
As most monitoring procedures, our approach requires a stable segment of m obser-
vations in which no changes have yet happened. To ensure that a candidate for a stable
segment is suitable, we suggest to make use of well known retrospective change point test
methodology. A comprehensive introduction and summary for this topic can, for example,
be found in Cso¨rgo¨ and Horva´th (1997). For the sake of completeness details of the ret-
rospective test that is employed as part of our analysis are provided in Section B of the
appendix.
In light of our asymptotic arguments in Section 2, we require that m will not be chosen
smaller than a minimum segment size, say mmin, to be specified by the user. In practice,
the observations of times t = 1, . . . ,m are then used as (a first candidate for) a stable
segment and monitoring starts not earlier than at time m+ 1; see line 2 of Algorithm 1.
Before monitoring begins, we apply the retrospective test to the candidate for a stable
segment and use it as such only if the test does not reject the null hypothesis of no change.
Otherwise, if there is evidence of a change, we record the change point as ‘found during
setup’, and continue in this manner with the next mmin observations onwards from the
change that we identified until a segment is found where the retrospective test does not
reject. Obtaining the stable segment for monitoring is described in lines 4–10 of Algorithm 1.
Once a stable segment is available monitoring starts: The detector is updated for every
incoming observation, namely (Yt, P1,t, P2,t), and a decision is made whether to reject the
null hypothesis and stop the procedure or to continue monitoring with the subsequent
observation. This is described in lines 11–13 of Algorithm 1.
After a change has been detected, the monitoring phase ends and an estimate for the
time of the change is obtained via the formula
ˆ`= m+ arg
k−1
max
j=0
m−3/2j(k − j)
∥∥∥ 1
m+ j
s+m+j−1∑
t=s
YtPt − 1
k − j
s+m+k−1∑
t=s+m+j
YtPt
∥∥∥
Γˆ−1
, (6.2)
where Pt = (P1,t, P2,t)
>. The objective function in (6.2) resembles the one in the detector
29
defined in (3.7), but we use the weights of the retrospective tests defined in (B.1). Note
that - once monitoring has stopped - the problem of estimating the change time is of retro-
spective nature with the constraint that the first m observations are known as stable. The
estimated time of change ˆ` is then labeled as ‘found during monitoring’ and recorded.
For the next monitoring phase the observations from the change point that was just iden-
tified to the last observation previously monitored are considered as a candidate for the
next stable segment. In the case where this set would have fewer than mmin elements it
is enlarged by adding incoming observations until mmin observations are available. This is
described in lines 14–16 of Algorithm 1. The procedure then goes back to line 4 where the
retrospective test is applied to assess the suitability of the candidate to be the next stable
segment. Note that, if a change is identified in the candidate for the next stable segment, a
new candidate is chosen to be the observations starting with that identified change point to
the time where monitoring has last stopped or mmin observations onwards from the identi-
fied change in case that the candidate would otherwise have fewer than mmin observations;
see line 8 of Algorithm 1.
In the remaining part of this section, we present the outcomes of the statistical analysis
by Algorithm 1 for two data sets related to the United Kingdom (UK) European Union
(EU) membership referendum, which took place on 23 June 2016. For our analysis we chose
the significance levels to be α = 0.1 and the threshold function w0, as defined in (2.13). All
data used was obtained from https://www.ariva.de on 6 April 2019.
As our first example, we consider the relation of UK’s currency, Pound Sterling (GBP),
to the Eurozone’s currency, the Euro (EUR), and Switzerland’s currency, the Swiss franc
(CHF). More precisely, we consider daily log returns of the exchange rate of GBP to the
United States dollar (USD) as a response Yt of a linear model as described in (6.1). As
predictors we now consider the log returns of EUR to USD (P1,t) and CHF to USD (P2,t).
A graphical representation of the exchange rates and associated log returns for the period
from Januar 2016 to March 2019 can be seen in Figure 8. The outcomes of Algorithm 1
are presented visually in the graphs. We chose mmin := 30, which corresponds to six weeks
of data. No evidence was found for the set of the first 30 observation (6 Jan 2016 to 16
Feb 2016) to contain a change and so it was used as the stable segment for monitoring.
The monitoring went on until 30 Jun 2016 and 9 Jun 2016 was obtained as an estimate
for the time of change. In the next step a retrospective test is performed to verify whether
the observations from 9 Jun 2016 to 20 Jul 2016 contain any changes. No evidence for a
change was found. The monitoring procedure is restarted with these 30 observations as
the stable segment and monitoring continues until it is stopped on 21 Nov 2016 and the
15 Aug 2016 is identified as a time for a possible change. The period from 15 Aug to 21
Nov 2016 is considered as a candidate for a stable segment and no evidence against this is
found. The 71 observation in this segment are used for the third iteration of monitoring,
30
Data: Univariate responses Yt and bivariate predictors (P1,t, P2,t);
Input: Minimum size mmin for the stable segment, γ to be used for the threshold
function wγ , significance level α;
Output: A sequence of change points c1, c2, . . . and labels T1, T2, . . . that indicate
whether the changes were identified by the retrospective test during
setup (R) or by the sequential procedure during monitoring (S);
1 begin
2 m := mmin, i := 1, s := 1;
3 repeat
// Find stable segment
4 while retrospective test does reject do
5 Apply test from Section B to (Yt, P1,t, P2,t), t = s, . . . , s+m− 1;
6 if test rejected then
7 Record the change point: ci := s+ j
∗ − 1, Ti := R, i+ +;
8 Prepare the next test: s := j∗, m := max{j∗ − s,mmin};
9 end
10 end
// Monitor for changes
11 repeat
12 Compute detector Eˆm(k) from (Yt, P1,t, P2,t), t = s, . . . , s+m+ k;
13 until Eˆm(k) exceeds the critical curve cαwγ(k/m);
// Estimate time of change
14 Compute ˆ` defined in (6.2);
15 Record the change point: ci := s+ ˆ`, Ti := S, i+ +;
16 Prepare the restart: m := max{s+m+ k − ci + 1,mmin}, s := ci;
17 until terminated by user ;
18 end
Algorithm 1: Procedure for repeated monitoring.
31
2016 2017 2018 2019
1.
20
1.
30
1.
40
1.
50
G
BP
/U
SD
l l l
2016 2017 2018 2019
−
0.
08
−
0.
02
rtn
. G
BP
/U
SD
l l l
2016 2017 2018 2019
1.
05
1.
15
1.
25
EU
R/
US
D
l l l
2016 2017 2018 2019
−
0.
02
0.
00
0.
02
rtn
. E
UR
/U
SD
l l l
2016 2017 2018 2019
0.
98
1.
02
1.
06
CH
F/
US
D
l l l
2016 2017 2018 2019−
0.
01
5
0.
00
0
0.
01
5
rtn
. C
HF
/U
SD
l l l
Figure 8: Results of Algorithm 1 applied to log returns of three foreign exchange rates. Re-
sponse: GBP/USD (top), predictors: EUR/USD (middle) and CHF/USD (bottom). Set-
tings for Algorithm 1 were mmin := 30, α = 0.1, γ = 0. Shaded areas indicate observations
that were used as the stable segment (light gray) or monitored for changes (dark gray). Ver-
tical lines indicate times the sequential procedure rejected (blue box) or the estimate time
of change (red circle). The estimated times of the change points (times the sequential pro-
cedure stopped in brackets) were: 2016-06-09 (2016-06-30), 2016-08-15 (2016-11-21), and
2017-05-17 (2017-12-08).
which continues until 8 Dec 2017, when 17 May 2017 is identified as a possible time of a
change. Finally, a retrospective test is applied to the observations from 17 May 2017 to 8
Dec 2017 and as no evidence of a change is found monitoring commences with those 148
observations as a stable set without stopping until the end of the available data.
As our second example, we consider the relation of the UK’s market to that of the United
States (US) and the EU. More precisely, we consider daily log returns of the FTSE 100, a
share index of the 100 companies listed on the London Stock Exchange with the highest
market capitalization, as a response Yt of the linear model described in (6.1). As predictors
we consider the log returns of two similarly constructed indices that are related to the US
and EU markets, namely the S&P 500 (P1,t) and the EuroStoxx 50 (P2,t). A graphical
representation of the prices and log returns for the period from January 2016 to March
32
2016 2017 2018 20195
50
0
65
00
75
00
FT
SE
l l
2016 2017 2018 2019
−
0.
04
0.
00
FT
SE
l l
2016 2017 2018 20191
80
0
22
00
26
00
SP
50
0
l l
2016 2017 2018 2019
−
0.
04
0.
00
0.
04
SP
50
0
l l
2016 2017 2018 2019
28
00
32
00
36
00
Eu
rS
to
xx
50
l l
2016 2017 2018 2019
−
0.
08
−
0.
02
0.
04
Eu
rS
to
xx
50
l l
Figure 9: Results of Algorithm 1 applied to log returns of three market indices. Response:
FTSE 100 (top), predictors: S&P 500 (middle) and EuroStoxx 50 (bottom). Settings for
Algorithm 1 were mmin := 30, α = 0.1, γ = 0. Shaded areas indicate observations that were
used as the stable segment (light gray) or monitored for changes (dark gray). Vertical lines
indicate times the sequential procedure rejected (blue box) or the estimate time of change (red
circle). The estimated times of the change points (times the sequential procedure stopped in
brackets) were: 2016-03-10 (2016-05-20), and 2016-06-21 (2016-06-23).
2019 can be seen in Figure 9. The outcomes of Algorithm 1 are presented visually in the
graphs. We chose mmin := 30. As the retrospective test affirms that the first 30 observation
(5 Jan 2016 to 12 Feb 2016) are stable, they are employed as the stable segment for the first
iteration of monitoring. The monitoring went on until 20 May 2016, when the sequential
procedure stopped and returned 10 Mar 2016 as the estimated time of the change. The
retrospective test identified no changes for the 55 days from 10 Mar 2016 to 20 May 2016
and the second iteration of monitoring then went on until 23 Jun 2016, the day of the UK’s
EU referendum. The estimated time for a change point was 21 Jun 2016. The retrospective
test found no evidence for the period from the 21 Jun 2016 to the 28 Jul 2016, so these
30 days of observations are used as the stable segment with which monitoring continues
without stopping until 29 March 2019 (the end of the available data).
33
7 Summary and outlook
In this paper we developed a new monitoring scheme for change point detection in a parame-
ter of multivariate time series which is applicable in an open-end scenario. Compared to the
commonly used methods we replace the estimator of the parameter from the initial sample
X1, . . . , Xm by an estimator from the sample X1, . . . , Xm+j . We then compare this estima-
tor with the estimator from the sample Xm+j+1, . . . , Xm+k for every j = 0, . . . , k − 1, For
the new statistic the asymptotic distribution under the null hypothesis and the consistency
of a corresponding test, which controls the type-1 error, are established. By considering a
common class of thresholds wγ defined in (2.13) the limit reduces to an elementary distri-
bution, for which quantiles can be obtained by straightforward Monte Carlo simulations.
Finally, we demonstrate by a comprehensive simulation study that the new monitoring
scheme is superior (in terms of testing power) to a benchmark consisting of common meth-
ods proposed in the literature. The new statistic can also be used in closed-end scenarios,
for which the same superiority in power is observed.
A very challenging subject for future research is the characterization of the asymptotic
distribution for the stopping times based on the statistic Eˆ defined in (2.2). Correspond-
ing results are already known for the methods based on Qˆ and Pˆ , see Aue and Horva´th
(2004) and Fremdt (2014a), respectively. Moreover, as the finite sample properties depend
sensitively on the efficient estimation of the long-run variance it is of interest to develop
a concept of self-normalization [see Shao and Zhang (2010)], which is applicable in an
open-end scenario.
Acknowledgments This work has been supported in part by the Collaborative Research
Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Teilprojekt A1, C1)
and the Research Training Group ’High-dimensional phenomena in probability - fluctuations
and discontinuity’ (RTG 2131) of the German Research Foundation (DFG).
References
Anatolyev, S. and Kosenok, G. (2018). Sequential Testing with Uniformly Distributed Size. Journal
of Time Series Econometrics, 10(2):1–22.
Andrews, D. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation.
Econometrica, 59(3):817–58.
Aue, A., Dienes, C., Fremdt, S., and Steinebach, J. (2015). Reaction times of monitoring schemes
for arma time series. Bernoulli, 21(2):1238–1259.
Aue, A., Ho¨rmann, S., Horva´th, L., Husˇkova´, M., and Steinebach, J. G. (2012). Sequential testing
for the stability of high-frequency portfolio betas. Econometric Theory, 28(4):804–837.
Aue, A. and Horva´th, L. (2004). Delay time in sequential detection of change. Statistics & Probability
Letters, 67(3):221–231.
34
Aue, A. and Horva´th, L. (2013). Structural breaks in time series. Journal of Time Series Analysis,
34(1):1–16.
Aue, A., Horva´th, L., Husˇkova´, M., and Kokoszka, P. (2006). Change-point monitoring in linear
models. The Econometrics Journal, 9(3):373–403.
Aue, A., Horva´th, L., and Reimherr, M. L. (2009). Delay times of sequential procedures for multiple
time series regression models. Journal of Econometrics, 149(2):174 – 190.
Aue, A., Ho¨rmann, S., Horva´th, L., and Husˇkova´, M. (2014). Dependent functional linear models
with applications to monitoring structural change. Statistica Sinica, 24(3):1043–1073.
Chen, Z. and Tian, Z. (2010). Modified procedures for change point monitoring in linear models.
Mathematics and Computers in Simulation, 81(1):62 – 75.
Chu, C.-S. J., Stinchcombe, M., and White, H. (1996). Monitoring structural change. Econometrica,
64(5):1045–1065.
Cso¨rgo¨, M. and Horva´th, L. (1997). Limit theorems in change-point analysis. Wiley series in
probability and statistics. Wiley.
Dette, H. and Go¨smann, J. (2018). A likelihood ratio approach to sequential change point detection
in a genereal class of parameters. To appear in: Journal of the American Statistical Association;
https://arxiv.org/abs/1802.07696.
Fremdt, S. (2014a). Asymptotic distribution of the delay time in page’s sequential procedure.
Journal of Statistical Planning and Inference, 145:74 – 91.
Fremdt, S. (2014b). Page’s sequential procedure for change-point detection in time series regression.
Statistics: A Journal of Theoretical and Applied Statistics, 48(1):1–28.
Hampel, F., Ronchetti, E., Rousseeuw, P., and Stahel, W. (1986). Robust statistics: The approach
based on influence functions. Wiley series in probability and mathematical statistics. Probability
and mathematical statistics. Wiley.
Hoga, Y. (2017). Monitoring multivariate time series. Journal of Multivariate Analysis, 155:105 –
121.
Horva´th, L., Husˇkova´, M., Kokoszka, P., and Steinebach, J. (2004). Monitoring changes in linear
models. Journal of Statistical Planning and Inference, 126(1):225 – 251.
Horva´th, L., Ku¨hn, M., and Steinebach, J. (2008). On the performance of the fluctuation test for
structural change. Sequential Analysis, 27(2):126–140.
Kirch, C. and Weber, S. (2018). Modified sequential change point procedures based on estimating
functions. Electron. J. Statist., 12(1):1579–1613.
Lai, T. L. (1995). Sequential changepoint detection in quality control and dynamical systems.
Journal of the Royal Statistical Society. Series B (Methodological), 57(4):613–658.
Lai, T. L. (2001). Sequential analysis: Some classical problems and new challenges. 11:303–408.
Leisch, F., Hornik, K., and Kuan, C.-M. (2000). Monitoring structural changes with the generalized
fluctuation test. Econometric Theory, 16(6):835–854.
Otto, S. and Breitung, J. (2019). A likelihood ratio approach to sequential
change point detection in a genereal class of parameters. http://www.wisostat.uni-
koeln.de/sites/statistik/BackwardCUSUM.pdf.
Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1/2):100–115.
Page, E. S. (1955). Control charts with warning lines. Biometrika, 42(1-2):243–257.
35
Pape, K., Wied, D., and Galeano, P. (2016). Monitoring multivariate variance changes. Journal of
Empirical Finance, 39(Part A):54 – 68.
Schmitz, A. and Steinebach, J. G. (2010). A note on the monitoring of changes in linear models
with dependent errors, pages 159–174. Springer Berlin Heidelberg, Berlin, Heidelberg.
Shao, X. and Zhang, X. (2010). Testing for change points in time series. Journal of the American
Statistical Association, 105(491):1228–1240.
Shorack, G. and Wellner, J. (1986). Empirical processes with applications to statistics. Wiley series
in probability and mathematical statistics: Probability and mathematical statistics. Wiley.
Wied, D. and Galeano, P. (2013). Monitoring correlation change in a sequence of random variables.
Journal of Statistical Planning and Inference, 143(1):186 – 196.
Zeileis, A. (2004). Econometric computing with hc and hac covariance matrix estimators. Journal
of Statistical Software, 11(10):1–17.
36
A Proofs of the results in Section 2
Proof of Theorem 2.6: In the proof we use the following extra notation. Define the
statistic
Em(k) = m
−1/2 k−1max
j=0
(k − j)
∥∥∥θˆm+j1 − θˆm+km+j+1∥∥∥
Σ−1
, (A.1)
where we have replaced the long-run variance estimator Σˆ by the (unknown) true long-run
variance Σ in definition (2.2). Further define
E˜m(k) = m
−1/2 k−1max
j=0
(k − j)
∥∥∥∥ 1m+ j
m+j∑
t=1
IF t − 1
k − j
m+k∑
t=m+j+1
IF t
∥∥∥∥
Σ−1
, (A.2)
where we have replaced the estimator θˆji by corresponding averages of the influence function
in (A.1). Finally, the triple (Ω,A,P) will denote the underlying probability space.
The proof itself is now split into several Lemmas A.1 - A.5. The first Lemma shows that
E˜m(k) and Em(k) are (asymptotically) equivalent. Lemma A.2 will approximate Em(k) by
Brownian motions, while Lemma A.3 then yields a limit for this approximation. Lemma A.4
finishes the proof by plugging in the covariance estimator, meaning that Eˆm(k) and Em(k)
are asymptotically equivalent. Finally, Lemma A.5 will establish the other representation
of the limit distribution in (2.11). In each Lemma, we suppose that the assumptions of
Theorem 2.6 are valid.
Lemma A.1 (Remove Remainders) It holds that
∞
sup
k=1
1
w(k/m)
∣∣∣Em(k)− E˜m(k)∣∣∣ = oP(1)
as m→∞.
Proof. By the (reverse) triangle inequality and the linearization in (2.7) we obtain∣∣∣Em(k)− E˜m(k)∣∣∣
≤ m−1/2 k−1max
j=0
(k − j)
∥∥∥θˆm+j1 − θˆm+km+j+1 − 1m+ j
m+j∑
t=1
IF t + 1
k − j
m+k∑
t=m+j+1
IF t
∥∥∥
Σ−1
= m−1/2 k−1max
j=0
(k − j)
∥∥∥R1,m+j −Rm+j+1,m+k∥∥∥
Σ−1
,
37
where we used that θt is constant for the last equality. Next, we obtain that
∞
sup
k=1
1
m1/2w(k/m)
k−1
max
j=0
(k − j)
∥∥∥Rm+j+1,m+k∥∥∥
Σ−1
=
∞
sup
k=1
(m+ k)1/2
m1/2w(k/m)
k−1
max
j=0
k − j
(m+ k)1/2
∥∥∥Rm+j+1,m+k∥∥∥
Σ−1
≤ ∞sup
k=1
(m+ k)1/2
m1/2w(k/m)
∞
sup
k=1
max
1≤i<j≤m+k
j − i+ 1
(m+ k)1/2
∥∥∥Ri,j∥∥∥
Σ−1
.
(A.3)
Similar as in (A.3) it holds
∞
sup
k=1
1
m1/2w(k/m)
k−1
max
j=0
(k − j)
∥∥∥R1,m+j∥∥∥
Σ−1
≤ ∞sup
k=1
m+ k
mw(k/m)
√
m
∞
sup
k=1
k−1
max
j=0
∥∥∥R1,m+j∥∥∥
Σ−1
=
∞
sup
k=1
m+ k
mw(k/m)
∞
sup
k=0
√
m
∥∥∥R1,m+k∥∥∥
Σ−1
(A.4)
Using Assumption 2.3 for the threshold w we obtain
∞
sup
k=1
m+ k
mw(k/m)
=
∞
sup
k=1
1 +
k
m
w(k/m)
≤ sup
t>0
1 + t
w(t)
<∞ . (A.5)
and by similar arguments it holds that
∞
sup
k=1
(m+ k)1/2√
mw(k/m)
<∞ . (A.6)
Further note that due to Assumption 2.4 with probability one
∞
sup
k=1
√
m
∣∣∣R1,m+k∣∣∣ ≤ ∞sup
k=1
√
m+ k
∣∣∣R1,m+k∣∣∣ = ∞sup
k=m
√
k
∣∣∣R1,k∣∣∣
j=k,i=1
≤ ∞sup
k=m
max
1≤i<j≤k
j − i+ 1√
k
∣∣∣Ri,j∣∣∣ = o(1) . (A.7)
Now combining (A.5), (A.6) and (A.7) the bounds derived in (A.3) and (A.4) are of order
oP(1), which finishes the proof of Lemma A.1.
For the proof of the next Lemma we can proceed (roughly) similar to the proof of Lemma
5.2 in Fremdt (2014b).
Lemma A.2 (Approximation with Brownian motions) Define
Pm(k) =
1√
m
k−1
max
j=0
∣∣∣Wm,1(k)− m+ k
m+ j
Wm,1(j) +
k − j
m+ j
Wm,2(m)
∣∣∣ ,
38
then
∞
sup
k=1
1
w(k/m)
∣∣∣E˜m(k)− Pm(k)∣∣∣ = oP(1)
as m→∞.
Proof. For the remainder of the proof let WΣm,i :=
√
ΣWm,i for i = 1, 2 and note that this
implies
Pm(k) =
1√
m
k−1
max
j=0
∥∥∥WΣm,1(k)− m+ km+ j WΣm,1(j) + k − jm+ jWΣm,2(m)∥∥∥Σ−1 .
The last display and the (reverse) triangle inequality then yield∣∣∣E˜m(k)− Pm(k)∣∣∣
≤ 1√
m
k−1
max
j=0
∥∥∥ k − j
m+ j
m+j∑
t=1
IF t −
m+k∑
t=m+j+1
IF t
+WΣm,1(k)−
m+ k
m+ j
WΣm,1(j) +
k − j
m+ j
WΣm,2(m)
∥∥∥
Σ−1
≤ 1√
m
∥∥∥ m+k∑
t=m+1
IF t −WΣm,1(k)
∥∥∥
Σ−1
+
1√
m
k−1
max
j=0
m+ k
m+ j
∥∥∥ m+j∑
t=m+1
IF t −WΣm,1(j)
∥∥∥
Σ−1
+
1√
m
k−1
max
j=0
k − j
m+ j
∥∥∥ m∑
t=1
IF t −WΣm,2(m)
∥∥∥
Σ−1
.
(A.8)
We will treat the three summands of the last display separately. Using the definition of the
operator norm, we derive the following bound for the first summand
∞
sup
k=1
1
w(k/m)
√
m
∥∥∥ m+k∑
t=m+1
IF t −WΣm,1(k)
∥∥∥
Σ−1
≤ ∞sup
k=1
kξ
w(k/m)
√
m
∞
sup
k=1
1
kξ
∥∥∥ m+k∑
t=m+1
IF t −WΣm,1(k)
∥∥∥
Σ−1
By Assumption 2.2 and the estimate ‖x‖Σ−1 ≤ ‖Σ−1‖1/2op |x| for all x ∈ Rp, the second factor
is of order OP(1). Since w has a lower bound `w we obtain for the first factor, that
m
sup
k=1
kξ
w(k/m)
√
m
≤ msup
k=1
kξ
`w
√
m
=
mξ−1/2
`w
= o(1) ,
∞
sup
k=m+1
kξ
w(k/m)
√
m
≤ ∞sup
k=m+1
1
k1/2−ξ
∞
sup
k=1
k1/2
w(k/m)
√
m
≤ 1
m1/2−ξ
sup
t>0
√
t
w(t)
= o(1) .
39
Next, we can bound the second summand of the right-hand side in (A.8) by
∞
sup
k=1
1
w(k/m)
√
m
k−1
max
j=0
m+ k
m+ j
∥∥∥ m+j∑
t=m+1
IF t −WΣm,1(j)
∥∥∥
Σ−1
≤ ∞sup
k=1
m+ k
w(k/m)
√
m
k−1
max
j=0
1
jξm1−ξ
∥∥∥ m+j∑
t=m+1
IF t −WΣm,1(j)
∥∥∥
Σ−1
≤ ∞sup
k=1
(m+ k)mξ−1
w(k/m)
√
m
∞
sup
k=1
1
kξ
∥∥∥ m+k∑
t=m+1
IF t −WΣm,1(k)
∥∥∥
Σ−1
,
where we used that (m+ j) = (m+ j)ξ(m+ j)1−ξ ≥ jξm1−ξ. Using again Assumption 2.2,
the second factor in the last display is of order OP(1). Moreover, following the idea of the
proof of Lemma 3 in Aue et al. (2006) it holds that
m
sup
k=1
(m+ k)mξ−1
w(k/m)
√
m
≤ msup
k=1
(m+ k)mξ−3/2
`w
=
2mξ−1/2
`w
= o(1) ,
∞
sup
k=m+1
(m+ k)mξ−1
w(k/m)
√
m
= mξ−1/2 ∞sup
k=m+1
1 + k/m
w(k/m)
≤ mξ−1/2 sup
t>1
1 + t
w(t)
= o(1)
(A.9)
and it remains to treat the third summand of the right-hand side in (A.8), which can be
bounded by
∞
sup
k=1
1
w(k/m)
√
m
k−1
max
j=0
k − j
m+ j
∥∥∥ m∑
t=1
IF t −WΣm,2(m)
∥∥∥
Σ−1
≤ ∞sup
k=1
1
w(k/m)
√
m
k
m
∥∥∥ m∑
t=1
IF t −WΣm,2(m)
∥∥∥
Σ−1
≤ ∞sup
k=1
kmξ−1
w(k/m)
√
m
1
mξ
∥∥∥ m∑
t=1
IF t −WΣm,2(m)
∥∥∥
Σ−1
.
Using Assumption 2.2 and the arguments in (A.9) this term is of order oP(1), which finishes
the proof of Lemma A.2.
Lemma A.3 (Obtain limit process) The following weak convergence holds
∞
sup
k=1
Pm(k)
w(k/m)
D
=⇒ sup
0≤t<∞
max
0≤s≤t
1
w(t)
∣∣∣∣W1(t)− 1 + t1 + sW1(s) + t− s1 + sW2(1)
∣∣∣∣
as m → ∞, where W1 and W2 denote independent, p-dimensional standard Brownian
motions.
Proof. First note that due to the scaling properties of the Brownian motion [see for example
40
Chapter 2 of Shorack and Wellner (1986)], it holds in distribution that
Pm(k)
D
=
k−1
max
j=0
∣∣∣W1(k/m)− 1 + k/m
1 + j/m
W1(j/m) +
k/m− j/m
1 + j/m
W2(1)
∣∣∣ := P˜m(k) (A.10)
and so within the proof we will - without loss of generality - only consider P˜m(k). Addi-
tionally, we define the processes
L(1)(s, t) :=
1
w(t)
∣∣∣ t+ 1
s+ 1
W1(s)−W1(t)
∣∣∣ ,
L(2)(s, t) :=
1
w(t)
t− s
s+ 1
|W2(1)| .
We will show that L(i)(s, t) for i = 1, 2 are uniformly continuous on R+∆ = {(s, t) ∈ R2 | 0 ≤
s ≤ t} with probability one.
Case L(1): In the following let ε > 0 be fixed but arbitrary. For almost every ω ∈ Ω the
law of iterated logarithm implies that we can choose a C = C(ε, ω) > 0 sufficiently large
such that
sup
t≥C
|W1(t)|
t
<
ε
8B
, (A.11)
with B defined by
B := sup
t>0
t+ 2
w(t)
<∞ .
Depending on C, we can split R+∆ into the (overlapping) sets
R+∆ =M1(C) ∪M2(C) ∪M3(C) , (A.12)
where we use the definitions
M1(C) = R+∆ ∩ [0, C + 1]2 ,
M2(C) = R+∆ ∩ [0, C + 1]× [C,∞) ,
M3(C) = R+∆ ∩ [C,∞)2 .
Further let d denote the maximum distance that is
d
(
(s1, t1), (s2, t2)
)
= max
{|s1 − s2|, |t1 − t2|} .
Note that by construction of the decomposition in (A.12), whenever d((s1, t1), (s2, t2)) < δ
for sufficiently small δ > 0, then there’s j ∈ 1, 2, 3, such that both pairs are in the same
41
subset Mj(C). Thus the a.s. uniform continuity of L(1) follows if we can choose δ > 0
sufficiently small, such that
sup
d((s1,t1),(s2,t2))<δ
(s1,t1),(s2,t2)∈Mj(C)
∣∣∣L(1)(s1, t1)− L(1)(s2, t2)∣∣∣ < ε (A.13)
for j = 1, 2, 3. In the following, we will treat each subset separately.
Set M1(C): As this set is compact, the (ordinary) almost sure continuity of L(1) already
implies that (A.13) holds for j = 1 and δ > 0 sufficiently small.
Set M2(C): We have the following bound
sup
d((s1,t1),(s2,t2))<δ
(s1,t1),(s2,t2)∈M2(C)
∣∣∣L(1)(s1, t1)− L(1)(s2, t2)∣∣∣
≤ sup
|t1−t2|<δ
0≤s≤C+1, t1,t2>C
∣∣∣L(1)(s, t1)− L(1)(s, t2)∣∣∣+ sup
|s1−s2|<δ
0≤s1,s2≤C+1, t>C
∣∣∣L(1)(s1, t)− L(1)(s2, t)∣∣∣ .
(A.14)
We will treat both summands of the last display individually. The first one can be bounded
again as follows
sup
|t1−t2|<δ
0≤s≤C+1, t1,t2>C
∣∣∣L(1)(s, t1)− L(1)(s, t2)∣∣∣
≤ sup
|t1−t2|<δ
0≤s≤C+1, t1,t2>C
∣∣∣ 1
w(t1)
− 1
w(t2)
∣∣∣∣∣∣ t1 + 1
s+ 1
W1(s)−W1(t1)
∣∣∣
+
1
w(t2)
∣∣∣ t1 − t2
s+ 1
W1(s)−W1(t1) +W1(t2)
∣∣∣
(A.15)
and again we will treat both terms separately. For the first term note that
sup
|t1−t2|<δ
0≤s≤C+1, t1,t2>C
|w(t2)− w(t1)|
`w
t1 + 1
w(t1)
∣∣∣W1(s)
s+ 1
− W1(t1)
t1 + 1
∣∣∣
≤ 2 sup
|t1−t2|<δ
|w(t2)− w(t1)|
`w
sup
t≥C
t+ 1
w(t)
sup
t≥0
|W1(t)|
t+ 1
,
and since w is uniformly continuous this expression is smaller than ε/3 for sufficiently small
42
δ. For the second term of the right-hand side of (A.15), note that we have the bound
sup
|t1−t2|<δ
0≤s≤C+1, t1,t2>C
1
w(t2)
∣∣∣ t1 − t2
s+ 1
W1(s)−W1(t1) +W1(t2)
∣∣∣
≤ sup
|t1−t2|<δ
0≤s≤C+1, t1,t2>C
|t1 − t2|
w(t2)
|W1(s)|
s+ 1
+
t1 + 1
w(t2)
|W1(t1)|
t1 + 1
+
t2 + 1
w(t2)
|W1(t2)|
t2 + 1
≤ δ
`w
sup
s≥0
|W1(s)|
s+ 1
+ 2 sup
t>0
t+ 2
w(t)
sup
t>C
|W1(t)|
t+ 1
,
and by the choice of C in (A.11) and for sufficiently small δ this is bounded by ε/3. To
complete the treatment of M2(C) it only remains to examine the second term on the
right-hand side of (A.14). We obtain that
sup
|s1−s2|<δ
0≤s1,s2≤C+1, t>C
∣∣∣L(1)(s1, t)− L(1)(s2, t)∣∣∣
≤ sup
t>0
t+ 1
w(t)
sup
|s1−s2|<δ
0≤s1,s2≤C+1
∣∣∣W1(s1)
s1 + 1
− W1(s2)
s2 + 1
∣∣∣ ,
which can be bounded by ε/3 for sufficently small δ since the first factor is a constant and
the function f(s) = W (s+ 1)/(s+ 1) is uniformly continuous on the compact set [0, C + 1]
with probability one.
Set M3(C): Note that
sup
d((s1,t1),(s2,t2))<δ
(s1,t1),(s2,t2)∈M3(C)
∣∣∣L(1)(s1, t1)− L(1)(s2, t2)∣∣∣
≤ 2 sup
s,t>C
|L(1)(s, t)| ≤ 2 sup
t>0
t+ 1
w(t)
sup
s,t>C
∣∣∣W1(s)
s+ 1
− W1(t)
t+ 1
∣∣∣ ≤ 4 sup
t>0
t+ 1
w(t)
sup
t≥C
|W1(t)|
t
< ε ,
where we used the choice of C in (A.11) for the last estimate.
This completes the third case and so the almost sure uniform continuity of L(1) on the set
R+∆ is established.
Case L(2): Again let ε > 0 and suppose that d
(
(s1, t1), (s2, t2)
)
< δ. It holds that
∣∣L(2)(s1, t1)− L(2)(s2, t2)∣∣ ≤ ∣∣L(2)(s1, t1)− L(2)(s2, t1)∣∣+ ∣∣L(2)(s2, t1)− L(2)(s2, t2)∣∣
(A.16)
43
and note for the first summand of the last display that
∣∣L(2)(s1, t1)− L(2)(s2, t1)∣∣ = |W2(1)|
w(t1)
∣∣∣∣ t1 − s1s1 + 1 − t1 − s2s2 + 1
∣∣∣∣ = |W2(1)|(t1 + 1)w(t1)
∣∣∣∣ s2 − s1(s1 + 1)(s2 + 1)
∣∣∣∣
≤ |W2(1)|(t1 + 1)
w(t1)
|s2 − s1|
and by Assumption 2.3 the last term is smaller than ε/2 uniformly for all t1 > 0 if δ > 0
is chosen sufficiently small. It remains to examine the second summand of the right-hand
side of (A.16). It holds that
|L(2)(s2, t1)− L(2)(s2, t2)
∣∣ ≤ ∣∣∣∣ 1w(t1) − 1w(t2)
∣∣∣∣ |t1 − s2|s2 + 1 |W2(1)|+ |W2(1)|w(t2) |t2 − t1|s2 + 1
≤ |w(t1)− w(t2)|
w(t1)w(t2)
t1 + δ
s2 + 1
|W2(1)|+ |W2(1)|
`w
|t2 − t1|
≤ |w(t1)− w(t2)|
`w
t1 + δ
w(t1)
|W2(1)|+ |W2(1)|
`w
|t2 − t1| ,
where we used that s2 ≤ t2 ≤ t1 + δ whenever d
(
(s1, t1), (s2, t2)) < δ. By Assumption 2.3
the last display is smaller than ε/2 whenever δ > 0 is sufficiently small and so the almost
sure continuity of L(2) is shown.
Finally, we can combine our observations to finish the proof. Note that by the results above,
also the process L(s, t) := L(1)(s, t)+L(2)(s, t) is uniformly continuous with probability one
and further we have the identity
sup
1≤t<∞
sup
0≤s≤t
L
(bsc/m, btc/m) = ∞sup
k=1
P˜m(k)
and so the Lemma’s claim now follows if we derive that (with probability one)
sup
1≤t<∞
sup
0≤s≤t
L
(bsc/m, btc/m) =⇒
m→∞ sup0≤t<∞
sup
0≤s≤t
L
(
s, t
)
.
First, observe that almost surely
sup
1≤t<∞
sup
0≤s≤t
L
(bsc/m, btc/m) = sup
0≤t<∞
sup
0≤s≤t
L
(bsc/m, btc/m)+ o(1)
as m → ∞. Next, we can use the almost sure uniform continuity of L since for arbitrary
44
ε > 0 and almost every ω ∈ Ω we can choose sufficiently large m = m(ε, ω) such that∣∣∣∣ sup
0≤t<∞
sup
0≤s≤t
L
(bsc/m, btc/m)− sup
0≤t<∞
sup
0≤s≤t
L
(
s, t
)∣∣∣∣
≤ sup
0≤t<∞
sup
0≤s≤t
∣∣∣∣L(bsc/m, btc/m)− L(s/m, t/m)∣∣∣∣
≤ sup
d((s1,t1),(s2,t2))<1/m
(s1,t1),(s2,t2)∈R+∆
∣∣∣∣L(s1, t1)− L(s2, t2)∣∣∣∣ < ε .
Combining Lemma A.1, A.2 and A.3 we have already proven that
∞
sup
k=1
Em(k)
w(k/m)
D
=⇒ max
0≤t<∞
max
0≤s≤t
1
w(t)
∣∣∣W1(t)− 1 + t
1 + s
W1(s) +
t− s
1 + s
W2(1)
∣∣∣ , (A.17)
and it only remains to investigate the impact of the covariance estimator. Therefore the
following Lemma finishes the proof of Theorem 2.6.
Lemma A.4 (Plug in of covariance estimator) We have that
∞
sup
k=1
Em(k)
w(k/m)
− ∞sup
k=1
Eˆm(k)
w(k/m)
= oP(1) .
Proof. Observe the bound∣∣∣∣ ∞sup
k=1
Em(k)
w(k/m)
− ∞sup
k=1
Eˆm(k)
w(k/m)
∣∣∣∣
≤ ∞sup
k=1
1
w(k/m)
√
m
k−1
max
j=0
(k − j)
∣∣∣∣(θˆm+j1 − θˆm+km+j+1)>(Σˆ−1m − Σ−1)(θˆm+j1 − θˆm+km+j+1)∣∣∣∣1/2 .
(A.18)
Next note that for a symmetric matrix A and an arbitrary vector v the Cauchy-Schwarz
inequality implies
|v>Av|2 ≤ |Av||v| ≤ ‖A‖op|v|2 ,
and we can bound (A.18) by∥∥∥Σˆ−1m − Σ−1∥∥∥1/2
op
∞
sup
k=1
1
w(k/m)
√
m
k−1
max
j=0
(k − j)
∣∣∣θˆm+j1 − θˆm+km+j+1∣∣∣ . (A.19)
Since Σˆ is a consistent estimator of Σ, an application of the continuous mapping theorem
45
yields ∥∥∥Σˆ−1m − Σ−1∥∥∥
op
= oP(1) . (A.20)
Next, the definition of the operator norm yields
∞
sup
k=1
1
w(k/m)
√
m
k−1
max
j=0
(k − j)∣∣θˆm+j1 − θˆm+km+j+1∣∣
≤ ‖Σ1/2‖op ∞sup
k=1
1
w(k/m)
√
m
k−1
max
j=0
(k − j)∥∥θˆm+j1 − θˆm+km+j+1∥∥Σ−1
= ‖Σ1/2‖op ∞sup
k=1
Em(k)
w(k/m)
= OP(1) .
Now a combination of (A.17) and (A.20) implies that the expression in (A.19) is of order
oP(1), which completes the proof of Lemma A.4 and thus also the proof of Theorem 2.6.
Combining Lemmas A.1, A.2, A.3 and A.4 we have now established that
∞
sup
k=1
Eˆm(k)
w(k/m)
D
=⇒ sup
0≤t<∞
max
0≤s≤t
1
w(t)
∣∣∣∣W1(t)− 1 + t1 + sW1(s) + t− s1 + sW2(1)
∣∣∣∣
and it remains to show the equality in distribution stated in (2.11).
Lemma A.5 (Simplify limit distribution) It holds that
sup
0≤t<∞
max
0≤s≤t
1
w(t)
∣∣∣∣W1(t)− 1 + t1 + sW1(s) + t− s1 + sW2(1)
∣∣∣∣
D
= sup
0≤t<∞
max
0≤s≤t
t+ 1
w(t)
∣∣∣W( t
t+ 1
)
−W
( s
s+ 1
)∣∣∣ ,
where W is a standard p-dimensional Brownian motion.
Proof. In the following let Z denote a standard Gaussian random variable, that is indepen-
dent of W1. Observe that
1
w(t)
∣∣∣∣W1(t)− 1 + t1 + sW1(s) + t− s1 + sW2(1)
∣∣∣∣
=
1
w(t)(s+ 1)
∣∣∣(s+ 1)W1(t)− (t+ 1)W1(s) + (t− s)W2(1)∣∣∣
D
=
1
w(t)(s+ 1)
∣∣∣(s+ 1)W (t)− (t+ 1)W (s)− (t− s)Z∣∣∣
=
1
w(t)(s+ 1)
∣∣∣(s+ 1)W (t)− (t+ 1)W (s)− (t− s)Z + stZ − stZ∣∣∣
=
1
w(t)(s+ 1)
∣∣∣(s+ 1){W (t)− tZ}− (t+ 1){W (s)− sZ}∣∣∣ . (A.21)
46
Following Horva´th et al. (2004), Fremdt (2014b), computing the covariance function implies
the following identity (in distribution){
W (t)− tZ
}
t≥0
D
=
{
(1 + t)W
( t
t+ 1
)}
t≥0
.
Applying this to (A.21) yields
1
w(t)(s+ 1)
∣∣∣(t+ 1){W (s)− sZ}− (s+ 1){W (t)− tZ}∣∣∣
D
=
1
w(t)(s+ 1)
∣∣∣(t+ 1)(s+ 1)W( s
s+ 1
)
− (t+ 1)(s+ 1)W
( t
t+ 1
)∣∣∣
=
1
w(t)
∣∣∣(t+ 1)W( s
s+ 1
)
− (t+ 1)W
( t
t+ 1
)∣∣∣ .
This completes the proof of Lemma A.5 and also of Theorem 2.6.
Proof of Lemma 2.8: Using the definition w(t) = (1 + t) max
{( t
1 + t
)γ
, ε
}
, we obtain
that
sup
0≤t<∞
max
0≤s≤t
1
w(t)
∣∣∣(t+ 1)W( s
s+ 1
)
− (t+ 1)W
( t
t+ 1
)∣∣∣
= sup
0≤t<∞
max
0≤s≤t
1
max
{( t
1 + t
)γ
, ε
}∣∣∣W( s
s+ 1
)
−W
( t
t+ 1
)∣∣∣
= sup
0≤t<∞
max
0≤s≤t
1
max
{( t
1 + t
)γ
, ε
}∣∣∣W( s
s+ 1
)
−W
( t
t+ 1
)∣∣∣
= sup
0≤t<1
max
0≤s≤t
1
max{tγ , ε}
∣∣∣W (s)−W (t)∣∣∣ ,
where we used that the mapping x 7→ x/(1 + x) is bijective and increasing on the domain
[0,∞) with co-domain [0, 1) .
Proof of Theorem 2.11: For m sufficiently large we have k∗ < m. Then observe the
lower bound
∞
sup
k=1
Eˆm(k)
w(k/m)
=
∞
sup
k=1
1
w(k/m)m1/2
k−1
max
j=0
(k − j)
∥∥∥θˆm+j1 − θˆm+km+j+1∥∥∥
Σˆ−1
j=k∗
≥ sup
k∗≤k<∞
1
w(k/m)m1/2
(k − k∗)
∥∥∥θˆm+k∗1 − θˆm+km+k∗+1∥∥∥
Σˆ−1
k=m≥ m− k
∗
m1/2w(1)
∥∥∥θˆm+k∗1 − θˆ2mm+k∗+1∥∥∥
Σˆ−1
.
47
The last display is bounded from below by
m− k∗
m1/2w(1)
∥∥∥θ(1) − θ(2)∥∥∥
Σˆ−1
− m− k
∗
m1/2w(1)
∥∥∥θˆm+k∗1 − θ(1)∥∥∥
Σˆ−1
− m− k
∗
m1/2w(1)
∥∥∥θˆ2mm+k∗+1 − θ(2)∥∥∥
Σˆ−1
.
Using that Σˆ is (weakly) convergent with non-singular limit, the first summand above
diverges to infinity and so it suffices to proof that the remaining two summands are of
order OP(1). This follows from the linearization in equation (2.7) since this gives
m− k∗
m1/2w(1)
(
θˆm+k
∗
1 − θ(1)
)
=
m− k∗
m1/2(m+ k∗)w(1)
m+k∗∑
t=1
IF t + m− k
∗
m1/2w(1)
R1,m+k∗ = OP(1)
and
m− k∗
m1/2w(1)
(
θˆ2mm+k∗+1 − θ(2)
)
=
1
m1/2w(1)
2m∑
t=m+k∗+1
IF t + (m− k
∗)
m1/2w(1)
Rm+k∗+1,2m = OP(1) ,
where we also applied Assumption 2.9. Now the claim follows.
B Retrospective test for changes in linear models
Consider the setting outlined in Section 3.2. In this situation we would like to test for
stability of the regression vectors in the initial sample, that is
H0 : β1 = · · · = βm .
Following the ideas stated in chapter 1 of Cso¨rgo¨ and Horva´th (1997) it is sensible to employ
the statistic
Tm =
1
m3/2
m
max
j=1
j(m− j)‖βˆj1 − βˆmj+1‖Σˆ , (B.1)
where βˆ was defined in (3.3). A sensible estimate for the time of the change is given by
j∗ := arg mmax
j=1
j(m− j)‖βˆj1 − βˆmj+1‖Σˆ (B.2)
48
Given the assumptions of Section 3.2 and using the theory of the reference one can verify
that under the null hypothesis above
Tn
D
=⇒ max
t∈[0,1]
|B(t)| ,
where B denotes a p-dimensional Brownian bridge (with independent components). Thus
rejecting the null whenever
Tn > b1−α
with b1−α denoting the (1−α) quantile of the limit distribution yields an appropriate testing
procedure for the retrospective problem.
49