UNIVERSITY OF DORTMUND
REIHE COMPUTATIONAL INTELLIGENCE
COLLABORATIVE RESEARCH CENTER 531
Design and Management of Complex Technical Processes
and Systems by means of Computational Intelligence Methods
Analysis of the (1+1) EA for a Noisy OneMax
Stefan Droste
No. CI-161/04
Technical Report ISSN 1433-3325 February 2004
Secretary of the SFB 531 · University of Dortmund · Dept. of Computer Science/XI
44221 Dortmund · Germany
This work is a product of the Collaborative Research Center 531, “Computational
Intelligence,” at the University of Dortmund and was printed with financial support of
the Deutsche Forschungsgemeinschaft.
Analysis of the (1+1) EA for a Noisy OneMax
Stefan Droste
LS Informatik 2, Universita¨t Dortmund, 44221 Dortmund, Germany
stefan.droste@udo.edu
Abstract. In practical applications evaluating a fitness function is fre-
quently subject to noise, i. e., the “true fitness” is disturbed by some
random variations. Evolutionary algorithms (EAs) are often successfully
applied to noisy problems, where they have turned out to be particularly
robust. Theoretical results on the behavior of EAs for noisy functions
are comparatively very rare, especially for discrete search spaces. Here
we present an analysis of the (1+1) EA for a noisy variant of OneMax
and compute the maximal noise strength allowing the (1+1) EA a poly-
nomial runtime asymptotically exactly. The methods used in the proofs
are presented in a general form with clearly stated conditions in order to
simplify further applications.
1 Introduction
When trying to optimize problems in practice, it is rarely possible to determine
the fitness value of a search point exactly. Most often noise changes the fitness
value by some (typically small) amount because of interferences during the ex-
periment resulting in the fitness value. Evolutionary algorithms (EAs) are often
good at coping with noise due to the use of a population of search points. But
there are only few theoretical results about the effects of noise on the perfor-
mance of EAs for discrete search spaces, while continuous search spaces found
more attention (see [AB03] for an overview).
In the last years the rigorous runtime analysis of EAs (e. g. see [DJW02],
[JW99] or [WW03]) has proven to be a useful approach for extending our knowl-
edge about EAs, besides other well-known approaches like schema theory or
macroscopic, statistical, fitness landscape resp. local analysis. Here we extend
this approach to the optimization of noisy functions by EAs. Since this is the
first rigorous analysis of this kind, we start with a simple EA and fitness func-
tion: the (1+1) EA for a noisy variant of OneMax. The (1+1) EA uses only
one individual, changed by mutation only, and has been analyzed for a number
of different fitness functions (e. g. see [Rud97] or [DJW02]), greatly extending
the knowledge about the random processes underlying the (1+1) EA and ap-
propriate proof techniques. Note that all these fitness functions are static and
noise-free (just recently, dynamic variants of OneMax have been analyzed, see
[Dro02] and [Dro03]), which is not very common in practical applications.
 This research was partly supported by the Deutsche Forschungsgemeinschaft as part
of the Collaborative Research Center “Computational Intelligence” (531).
In this paper we analyze the influence of the noise strength on the runtime
of the (1+1) EA until the optimum of OneMax is found. We show that noise,
which changes one uniformly chosen bit before evaluation with probability p,
makes an efficient optimization of the (1+1) EA impossible if and only if p
grows asymptotically faster than log(n)/n (where n is the dimension of the search
space). This is especially interesting since log(n)/n is also the critical value for
the (1+1) EA on a dynamic OneMax where in each step with probability p one
uniformly chosen bit of the target bit string changes. Although the two processes
have no obvious interpretation in terms of each other, we can generalize the
methods used to analyze the dynamic OneMax (see [Dro02]) to noisy OneMax.
In the next section, we formally define the (1+1) EA, our noise model, and
the runtime, our performance measure. Section 3 presents the methods used in
Section 4 to analyze the runtime of the (1+1) EA for noisy OneMax. This
separation should increase comprehensibility, because the methods in Section 3
clearly state the conditions necessary for applying them, while Section 4 shows
that the (1+1) EA on noisy OneMax fulfills these conditions. We finish with
some conclusions.
2 The (1+1) EA and Noisy OneMax
In this section, we formally define the (1+1) EA and the noise model investi-
gated. The (1+1) EA is the most basic EA since it uses only one individual
changed by mutation only. This simplicity makes it an ideal starting point for
theoretical analyzes (see [DJW02]). Furthermore, the (1+1) EA has proven to
be surprisingly efficient for some functions compared to more involved EAs (see
[FM92]), making its analysis worth while in itself.
The (1+1) EA for noisy functions differs from the (1+1) EA for noise-free
evaluation except only in one point: since the fitness-value of the parental search
point evaluated in a generation before is not guaranteed to be the correct one,
both the parent and the child are evaluated in each generation. Without this
resampling a noisy evaluation can only be corrected if a copy of the search point
is evaluated (e. g. if the mutation has no effect). Let fN : {0, 1}n → R denote
the noisy function. Then the (1+1) EA looks as follows:
Definition 1 ((1+1) EA for maximization of noisy functions).
1. Set t := 0 and choose xt ∈ {0, 1}n randomly uniformly.
2. Set x′ := xt and independently flip each bit of x′ with prob. 1/n.
3. If fN (x′) ≥ fN(xt), set xt+1 := x′, else xt+1 := xt.
4. Set t := t + 1 and go to step 2.
In the following we will look at the case that fN results from the true fitness
function f : {0, 1}n → {0, 1} by randomly changing the argument, i. e. fN(x) =
f(N(x)), where the random function N : {0, 1}n → {0, 1}n represents the noise.
Hence, our noise model considers noise which takes effect before evaluation. We
assume that the noisy evaluations are independent, i. e., the result of previous
evaluations does not influence the actual evaluation.
We are interested in the number of generations the (1+1) EA needs to eval-
uate a maximum of the true fitness function f for the first time. Since there
are two fitness evaluations per generation, the number of generations uniquely
determines the number of fitness evaluations, which is often the most costly op-
eration in practice. Considering an optimum of the true fitness function as an
optimum of the noisy function seems to be a more natural choice than investi-
gating the number of generations until a search point with maximal noisy fitness
value is evaluated, since such a point can be “far away” from a true maximum.
Hence, we analyze the number Tf,N of generations the (1+1) EA needs to find
a maximum of the underlying true fitness function f under noise N :
Definition 2 (Runtime of the (1+1) EA for noisy functions). The run-
time Tf,N of the (1+1) EA for a noisy function fN is the number of generations
until a maximum of the true fitness function f is found:
Tf,N := min{t ∈ N0 | f(xt) = max{f(x) |x ∈ {0, 1}n}}.
We analyze the noise model N1p changing with probability p exactly one uni-
formly chosen bit of x, while no bit is changed with probability 1 − p. Such a
model can be appropriate for applications where the genotype-phenotype map-
ping is error-prone, but the evaluation of the phenotype exact, e. g. when the
search point is transformed by a complicated process before being evaluated.
Definition 3 (One-bit noise N 1p). Let p ∈ [0, 1]. The random noise function
N1p : {0, 1}
n
→ {0, 1}n is defined as follows:
∀x, y ∈ {0, 1}n : P (N1p (x) = y) =



1 − p if x = y,
p/n if H(x, y) = 1,
0 if H(x, y) > 2.
We say that noisy evaluation increases resp. decreases the fitness of a point
x ∈ {0, 1}n if fN (x) > f(x) resp. fN(x) < f(x). For f = OneMax (with
OneMax(x
1
, . . . , xn) := x1 + · · · + xn) the noise function N1p can only change
the fitness of a point by one, and the probability of N1p increasing resp. decreasing
the fitness of x is directly proportional to n −OneMax(x) resp. OneMax(x).
Our goal is to determine the critical noise strength p = p(n) such that the
runtime of the (1+1) EA is polynomial in n with high probability if the noise is
at most p, but super-polynomial with high probability if the noise is asymptoti-
cally larger than p. Although only one bit of the search point to be evaluated is
changed by N1p , this can make the (1+1) EA accept a child x
′ whose true fitness
OneMax(x′) is by two smaller than the true fitness OneMax(xt) of the parent
xt (see Section 4). This differs from the (1+1) EA for a dynamically changing
OneMax, where the target bit string is changed by the operator N1p and the
Hamming distance to the target is to be minimized (see [Dro02] for an analysis
that the expected runtime is polynomial if and only if p = O(log(n)/n)). Al-
though in the latter process the distance to the optimum can only increase by
one in one step, our techniques can also cope with the first process and show
that in this case p = O(log(n)/n) is the critical noise strength, too.
To distinguish more clearly between the key properties of the (1+1) EA on
noisy OneMax resulting in polynomial resp. super-polynomial runtime and the
necessary technical proofs, we show some general methods for proving polyno-
mial upper and super-polynomial lower bounds in the next section.
3 Techniques for Bounding Markov Chains
In this section, we present some general results that can be applied to analyze
the (1+1) EA for the noisy OneMax presented in the last section. They are
presented in a general form with clearly stated conditions in order to make them
easily applicable to other problems. Furthermore, this should make their proofs
more comprehensible, since they need not to take care of the specific details of
the EA analyzed as long as the EA fulfills the general conditions.
The (1+1) EA on noisy OneMax is a Markov process on the state space
{0, . . . , n}, where state i represents that the actual individual xt has true fitness
OneMax(xt) = i. Hence, we identify the process by (n + 1)2 transition prob-
abilities p
·,· = (pi,j)i,j∈{0,...,n}, where pi,j ∈ [0, 1] is the probability of moving
from state i to state j in one step. Let Ti,j be the number of steps to come from
state i to j for the first time, i. e. Ti,n is the runtime when starting in state i.
3.1 Upper Bound Techniques
Our first result, implicitly already used in [Dro02], upper bounds the runtime of
a process p
·,· that can only move from i to i−1, i or i+1 in one step (a so-called
{−1, 0, +1}-process) by a polynomial:
Lemma 1. Let p
·,· be a {−1, 0, +1}-process on {0, . . . , n}. If there are two con-
stants c+, c− ∈ R+ such that
∀i ∈ {0, . . . , n} : pi,i+1 ≥ c
+
·
n − i
n
and pi,i−1 ≤ c
−
·
log(n)
n
·
i
n
,
then for all i ∈ {0, . . . , n − 1}
E(Ti,i+1) ≤
n1+c
−/(c+·ln(2))
c+ · (n − i)
and E(T
0,n) = O
(
n1+c
−/(c+·ln(2))
· log(n)
)
.
Proof. It is well known (e. g. see [DJW00]) that E(Ti,i+1) for {−1, 0, +1}-processes
is
∑i
k=0
1
pk,k+1 ·
∏i
l=k+1
pl,l−1
pl,l+1 . Utilizing the bounds for pi,i−1 and pi,i+1, we get:
E(Ti,i+1) ≤
i
∑
k=0
n
c+(n − k)
·
i
∏
l=k+1
n
c+(n − l)
·
c− · log(n) · l
n2
=
i
∑
k=0
n
c+(n − k)
·
(
c−
c+
log(n)
n
)i−k
·
i!
k!
·
(n − i − 1)!
(n − k − 1)!
=
n
c+
·
i
∑
k=0
(
c−
c+
log(n)
n
)i−k (i
k
)
(n−k
i−k
)
(n − i)
≤
n
c+(n − i)
(
1 +
c−
c+
log(n)
n
)i
≤
n
c+(n − i)
· exp
(
c−
c+
log(n)
n
· i
)
≤
n1+c
−/(c+·ln(2))
c+(n − i)
.
Summing up these values for all i ∈ {0, . . . , n−1} gives the desired upper bound
on E(T
0,n) because
∑n
i=1 1/i is O(log(n)). unionsq
To upper bound a Markov process p
·,· (i. e. replacing it by a process whose
finite runtime to come from i to n stochastically dominates the runtime Ti,n
of the old process for all i < n) being no {−1, 0, +1}-process, we can proceed
as follows: first, we “delete” all improvements by more than one, i. e., we set
all probabilities pi,i+d for d ≥ 2 to zero and increase pi,i by
∑n−i
d=2 pi,i+d. This
leads to a process whose runtime stochastically dominates the runtime of the
old process (see [Dro03]), where a random variable X stochastically dominates
a random variable Y if for all values d of X and Y : P (X ≥ d) ≥ P (Y ≥ d).
Afterwards, we have a process that can move from state i only to states
0, . . . , i + 1, a so-called ≤1-process. To upper bound such a ≤ 1-process p
·,· by a
{−1, 0, 1}-process p˜
·,·, we can use the following lemma proven in [Dro03]:
Lemma 2. Let (p
·,·) be a ≤1-process and (p˜·,·) be a {−1, 0, 1}-process on the
state space {0, . . . , n}. If the following conditions hold
1. ∀i ∈ {1, . . . , n − 1} : pi,0 ≤
∏i
k=1 p˜k,k−1,
2. ∀i ∈ {1, . . . , n − 1} : ∀j ∈ {1, . . . , i− 1} : pi,j ≤ p˜j,j
∏i
k=j+1 p˜k,k−1,
3. ∀i ∈ {0, . . . , n − 1} : pi,i+1 ≥ p˜i,i+1,
then E(Ti,n) ≤ E(T˜i,n) for all i ∈ {0, . . . , n}.
Now we can use Lemma 1 to upper bound the expected runtime of this slower
{−1, 0, +1}-process p˜
·,· by a polynomial if the transition probabilities fulfill the
conditions. We will see in Section 4 that the process resulting from the (1+1) EA
on noisy OneMax for small noise strength p = O(log(n)/n) has this form.
3.2 Lower Bound Techniques
If the probabilities pi,i−1 of the process p·,· are only a little bit larger than nec-
essary for Lemma 1, i. e. by a non-constant factor α(n), the expected runtime of
the process is super-polynomially. To be more exact, the runtime is polynomially
only with super-polynomially small probability o(1/poly(n)), i. e. smaller than
1/q(n) for any polynomial q (see also [Dro02] and [Dro03]):
Lemma 3. Let p
·,· be a Markov process on {0, . . . , n}. If for a function α : N+ →
R with α(n)
n→∞
−→ ∞ and α(n) ≤ n/ log(n) the conditions
1. ∀ i ≥ 0, d = ω(log(n)) : pi,i+d = o(1/poly(n)),
2. ∀ i ≥ n − α(n) log(n), d = ω(1) : pi,i+d = o(1/poly(n)), and
3. ∃ c+, c− ∈ R+ : ∀ i ≥ n − α(n) log(n) :
n−i
∑
j=1
pi,i+j ≤ c+ ·
n − i
n
and pi,i−1 ≥ c− · α(n) ·
log(n)
n
·
i
n
hold then for all i ≤ n − α(n) log(n) the probability that Ti,n is polynomial is
super-polynomially small, i. e. o(1/poly(n)).
Proof. Let It ∈ {0, . . . , n} be the state of the process at time step t. Since for
all i the probability of a direct step from i to i + d with d = ω(log(n)) is super-
polynomially small, we assume that only improving steps by O(log(n)) happen.
Hence, regardless of the initial state i ≤ n − α(n) log(n), the process reaches
state n only via a state between n − α(n) log(n) and n − α(n) log(n)/2. Since
a constant factor does not matter for our definition of α(n), for an arbitrary
constant a ∈ ]0, 1[ there have to be time steps t
1
< t
2
, such that It ∈ {n −
α(n)a log(n), . . . , n−1} for all t ∈ {t
1
, . . . , t
2
−1}, and It
2
= n (a similar technique
is used in [RRS95]). We show that it is super-polynomially unlikely to “bridge
the gap” from state n − α(n)a log(n) to n in polynomially many steps. Hence,
we can assume that i ≥ n − α(n)a log(n) and that no improvements by more
than α(n)d for a constant d ∈ ]0, 1[ do happen.
Since (n − α(n)a log(n))/n converges to 1, we bound pi,i−1 and
∑n−i
d=1 pi,i+d
for all n larger than a constant n
0
, i. e. n large enough:
pi,i−1 ≥
c−
2
· α(n)
log(n)
n
and
n−i
∑
d=1
pi,i+d ≤ c
+
· α(n)a
log(n)
n
.
Therefore, the probability of an improving step under the condition that the
step changes the state (i. e. the step is effective) is at most
c+α(n)a log(n)/n
c+α(n)a log(n)/n + c−α(n) log(n)/(2n)
≤
2c+
c−
·
1
α(n)1−a
.
Hence, the expected number of improving steps during t effective steps is at
most (2c+/c−) · t/α(n)1−a. However, to bridge the gap of size α(n)a log(n) in t
effective steps consisting of t+ improving and t− decreasing steps, we must have
α(n)d · t+ − t− ≥ α(n)a · log(n) ⇐⇒ t+ ≥
α(n)a log(n) + t
α(n)d + 1
.
The last term is at least t/α(n)c+d for every constant c > 0 and n large enough.
If we choose a, c, and d with c + d < 1 − a, the number of improving steps
necessary to reach state n is by a non-constant factor larger than its expected
number. For n large enough, this factor is at least 2.
By Chernoff bounds (see [MR95]) the probability that the sum of binary
random variables is by a constant factor 1 + δ larger than its expected value µ
is upper bounded by (we use δ = 1 and µ = (2c+/c−)α(n)a−1t)
(
exp(δ)
(1 + δ)1+δ
)µ
=
(
exp(1)
4
)
(2c+/c−)α(n)a−1t
.
If t is small, this bound is not super-polynomially small. But since the process
must bridge a gap of size α(n)a log(n) and any improvement by more than α(n)d
is excluded, the number t of effective steps must be at least α(n)a−d log(n).
Hence, the above bound is at most (exp(1)/4)(2c
+/c−)α(n)2a−d−1 log(n), which is
super-polynomially small if 2a− d− 1 > 0. Since a = 2/3, c = 1/7, and d = 1/7
fulfill 2a−d−1 > 0 and c+d < 1−a, the probability of every phase of polynomial
length reaching the optimum is super-polynomially small. unionsq
If the noise strength is too large, this result cannot be applied to lower bound
the (1+1) EA on noisy OneMax. In this case a step decreasing the state is by a
constant factor more likely than a step increasing it if the process is close to the
optimal state n. If furthermore an improvement by d is by a non-constant factor
more likely than an improvement by d + 1, we can apply the following result:
Lemma 4. Let p
·,· be a Markov process on {0, . . . , n} such that
1. ∀ i ≥ 0, d = ω(log(n)) : pi,i+d = o(1/poly(n)),
2. ∃ δ, ε, γ > 0: ∀ i ≥ n − nδ, d ∈ {1, . . . , n − i} :
i
∑
j=1
pi,i−j ≥ (1 + ε) ·
n−i
∑
j=1
pi,i+j and
pi,i+d
pi,i+d+1
≥ nγ .
Then for all i ≤ n − nδ the probability of Ti,n being polynomially is super-
polynomially small.
Proof. Analogously to the proof of Lemma 3, we can assume that to reach the
target state n, there has to be a phase ranging from time steps t
1
to t
2
, such
that It
2
= n and for all t ∈ {t
1
, . . . , t
2
− 1} we have It ∈ {n − nδ, . . . , n − 1} .
We show that every phase of polynomial length is super-polynomially unlikely.
Let Xt ∈ {−(n − nδ), . . . , nδ} be the change of state in time step t, i. e.
P (Xt = d) = pi,i+d if the process is in state i at time step t. The phase can only
be successful if Xt
1
+ · · · + Xt
2
= nδ, which will be shown to be exponentially
unlikely. Since Xt < 0 is at least by a factor 1 + ε more likely than Xt > 0, we
make the process only faster if we assume that P (Xt < 0) = (1 + ε) ·P (Xt > 0)
and that every step decreasing the number of ones decreases it by exactly one.
Since pi,i+d/pi,i+d+1 is at least nγ for d > 0, we replace the step size d of an
improving step by the value of a geometrically distributed random variable Y
with success probability 1−n−γ : this leads to a stochastically dominating process
because P (Y = d)/P (Y = d+1) is exactly nγ . All in all, we have replaced every
step of p
·,· in the considered phase by the value of a {−1, . . . , nγ}-valued random
variable X ′ defined by
P (X ′ = d) =
{
1+ε
2+ε if d = −1
1
2+ε · (n
−γ)d−1 · (1 − n−γ) if d ≥ 1
(Note that X ′ < 0 is by a factor 1+ ε more likely than X ′ > 0.) Hence, the ran-
dom variable St(n) = X ′
1
+ · · ·+X ′t(n) stochastically dominates the improvement
of the process p
·,· during a phase of length t(n) starting from a state at least
n−nδ. As the expected value of Y is 1/(1−n−γ), for n large enough we have
E(X ′) = −
1 + ε
2 + ε
+
1
2 + ε
·
1
1 − n−γ
= −
1 + ε
2 + ε
+
1
2 + ε
·
nγ
nγ − 1
≤ −
ε/2
2 + ε
.
Hence, the expected value of St(n) is for n large enough and a constant ε′ > 0
at most −t(n)ε′. In order for a phase of length t(n) to be successful St(n) has to
be at least nγ . We show that even P (St(n) > 0) is super-polynomially unlikely
for all possible phase lengths t(n). To estimate P (St(n) > 0) we cannot use
Chernoff bounds, since St(n) is no sum of {0, 1}-valued, but {−1, . . . , nγ}-valued
random variables. In this situation we use Hoeffding’s inequality ([Hoe63]), a
generalization of Chernoff bounds, stating in our notation
P
(
St(n) − E(St(n)) ≥ t(n)ε
)
≤ exp
(
−2t(n)ε2
z2
)
where z is the number of different values of X ′i. Because of Condition 1 we
can assume that no improvements by more than log(n)2 − 2 happen implying
z = log(n)2 and t(n) ≥ nγ/ log(n)2. Therefore, Hoeffding’s inequality gives us an
exponentially small upper bound on the probability, that a phase of polynomial
length is successful. unionsq
4 Application of the Techniques for the (1+1) EA on
Noisy OneMax
Now we apply the techniques presented in the previous section to determine the
maximal noise strength p under which the (1+1) EA is still able to optimize
OneMax with single-bit noise in expected polynomial time. Hence, let Muti ∈
{−i, . . . , n − i} be the random variable denoting the change of the number of
ones after a mutation of a search point x with OneMax(x) = i. Depending on
the value d = 0 of Mut i we determine if and when this mutation leads to a
change of the number of ones of the actual individual by d:
– If Mut i = d with d ≥ 2, the mutation will be accepted regardless of the noisy
evaluations. Hence, for all i ∈ {0, . . . , n − 2} and d ∈ {2, . . . , n − i}:
pi,i+d = P (Mut i = d).
– If Mut i = 1, the mutation will be accepted except in case that the noisy
evaluation of the parent increases its value while the noisy evaluation of the
child decreases its value. Hence, for all i ∈ {0, . . . , n − 1}:
pi,i+1 = P (Mut i = 1) ·
(
1 − p
n − i
n
· p
i + 1
n
)
. (1)
– If Mut i = −1, the mutation will be accepted if and only if one of the three
following cases happens:
• the noisy evaluation of the parent decreases its value and the evaluation
of the child is not disturbed by noise,
• the noisy evaluation of the parent decreases its value and the noisy eval-
uation of the child increases it value, or
• the evaluation of the parent is not changed by noise and the noisy eval-
uation of the child increases its value.
Hence, for all i ∈ {1, . . . , n}:
pi,i−1 = P (Mut i = −1) ·
(
p
i
n
(
1 − p + p
n − i + 1
n
)
+ (1 − p)p
n − i + 1
n
)
.
(2)
– If Mut i = −2, the mutation will be accepted only if the noisy evaluation of
the parent decreases its value and the noisy evaluation of the child increases
its value. Hence, for all i ∈ {2, . . . , n}:
pi,i−2 = P (Mut i = −2) ·
(
p
i
n
p
n − i + 2
n
)
. (3)
As every mutation decreasing the number of ones by at least two is not accepted,
pi,i−d = 0 for all d ≥ 2.
It is obvious that the probability P (Mut i = d) is essential when analyzing
the (1+1) EA on noisy OneMax. We can easily show, that this probability is
O(((n − i)/n)d) for d > 0 and O((i/n)d) for d < 0:
d > 0 : P (Mut i = d) =
n−i
∑
k=d
(
n − i
k
)
·
(
i
k − d
)
·
(
1
n
)
2k−d (
1 −
1
n
)n−2k+d
≤
(
n − i
n
)d (
1 −
1
n
)n−d 1
d!
+
n−i
∑
k=d+1
(
n − i
n
)k 1
k!
. (4)
d < 0 : P (Mut i = d) =
i
∑
k=d
(
i
k
)
·
(
n − i
k − d
)
·
(
1
n
)
2k−d (
1 −
1
n
)n−2k+d
≤
(
i
n
)d
·
(
1 −
1
n
)n−d
·
1
d!
+
i
∑
k=d+1
(
i
n
)k
·
1
k!
. (5)
These bounds are asymptotically tight as long as d is a constant:
d > 0 : P (Mut i = d) =
n−i
∑
k=d
(
n − i
k
)
·
(
i
k − d
)
·
(
1
n
)
2k−d (
1 −
1
n
)n−2k+d
≥
(
n − i
d
)(
1
n
)d
· exp(−1) ≥
(
n − i
n
)d
·
exp(−1)
dd
. (6)
d < 0 : P (Mut i = d) =
i
∑
k=d
(
i
k
)
·
(
n − i
k − d
)
·
(
1
n
)
2k−d (
1 −
1
n
)n−2k+d
≥
(
i
d
)(
1
n
)d
· exp(−1) ≥
(
i
n
)d
·
exp(−1)
dd
. (7)
4.1 A Polynomial Upper Bound for p = O(log(n)/n)
First, we want to upper bound the expected runtime of the (1+1) EA on noisy
OneMax for p = O(log(n)/n). Therefore, we “delete” all transitions going from
i to i + d with d ≥ 2 (making the process only slower) and bound the remaining
transition probabilities in the following way for n large enough:
pi,i+1
(1),(6)
≥
n − i
n
· exp(−1) ·
(
1 − p2
(n − i)(i + 1)
n2
)
≥
n − i
n
·
3 exp(−1)
4
,
pi,i−1
(2)
≤
i
n
· 2p and pi,i−2
(3)
≤
(
i
2
)(
1
n
)
2
p2 =
i(i − 1)
n2
·
p2
2
.
To upper bound p
·,·, we replace it by the following {−1, 0, +1}-process p˜·,·:
p˜i,i+1 =
n − i
n
·
3 exp(−1)
4
and p˜i,i−1 :=
i
n
· 4p.
It is obvious that p˜i,i+1 ≥ pi,i+1. Since 3 exp(−1)/4 < 1/2 and 4p converges to
0, p˜i,i is at least 1/2 for n large enough. Hence, p˜i,i−1 · p˜i−1,i−1 ≥ pi,i−1 and
p˜i,i−1 · p˜i−1,i−2 · p˜i−2,i−2 ≥ pi,i−2 hold, implying that Lemma 2 guarantees that
p˜
·,· upper bounds p·,·. Finally, applying Lemma 1 to p˜·,· gives us:
Theorem 1. The expected runtime TOneMax,N1p of the (1+1) EA on OneMax
with noise function N1p is polynomial for all p = O(log(n)/n).
4.2 A Super-polynomial Lower Bound for p = ω(log(n)/n)
Let us now look at the case that p grows asymptotically faster than log(n)/n,
i. e. p ≥ γ(n) log(n)/n for some function γ : N+ → R with γ(n)
n→∞
−→ ∞. In the
following, we want to show that the conditions of Lemma 3 hold for p
·,· with
α(n) := min{log(n), γ(n)}, if p ≤ 1 − α(n) log(n)/n.
It is obvious that α(n)
n→∞
−→ ∞. Condition 1 holds, since the probabil-
ity P (Mut i = d) is at most 1/d!, which is super-polynomially small for d =
ω(log(n)). Furthermore, pi,i+d = o(1/poly(n)) for all i ≥ n − log(n)α(n) and
non-constant d because of the upper bound (4) for P (Mut i = d). Hence, condi-
tion 2 holds. We can upper bound
∑n−i
d=1 pi,i+d (using (4)) by
n−i
∑
d=1
n−i
∑
k=d
(
n − i
n
)k
·
1
k!
=
n−i
∑
k=1
(
n − i
n
)k
·
1
(k − 1)!
= O
(
n − i
n
)
.
In turn, we lower bound pi,i−1 for i ≥ n − α(n) log(n) and n large enough by:
pi,i−1
(2),(7)
≥
i
n
· exp(−1) · p ·
(
i
n
(
1 − p + p
n − i + 1
n
)
+ (1 − p)
n − i + 1
n
)
≥
i
n
· exp(−1) ·
1
4
·
α(n) log(n)
n
.
The last inequality holds due to the following case inspection: if p ≤ 1/2, we use
p ≥ α(n) log(n)/n, i/n ≥ 1/2, 1−p ≥ 1/2 and replace all other terms by zero. If
1/2 < p ≤ 1 − α(n) log(n)/n, we use p > 1/2, i/n ≥ 1/2, 1 − p ≥ α(n) log(n)/n,
and replace all other terms by zero. Hence, all conditions of Lemma 3 hold.
Let us now look at the case p > 1 − α(n) log(n)/n. We cannot lower bound
pi,i−1 by Θ(α(n)(log(n)/n)(i/n)) anymore: e. g. for p = 1 decreasing the number
of ones implies that the evaluation of the parent decreases its fitness, while the
evaluation of the child increases its fitness, i. e. pi,i−1 = P (Mut i = −1) · (i/n) ·
(n − i + 1)/n, which is smaller than α(n)(log(n)/n)(i/n) for i close to n.
So, we use Lemma 4 to lower bound the runtime for p > 1 − α(n) log(n)/n.
First, we show that decreasing the number of ones is by a constant factor more
likely than increasing it, if i ≥ n − n1/3:
n−i
∑
d=1
pi,i+d ≤
n−i
∑
d=1
P (Mut i = d)
(4)
≤
n − i
n
·
(
1 −
1
n
)n−1
+
n−i
∑
d=2
O
(
(
n − i
n
)d
)
=
n − i
n
·
(
1 −
1
n
)n−1
+ O
(
1
n4/3
)
,
pi,i−1
(2)
≥
P (Mut i = −1) · p2 ·
i
n
n − i
n
(7)
≥
(
i
n
· p
)
2
(
1 −
1
n
)n−1
·
n − i
n
,
pi,i−2
(3)
≥ P (Mut i = −2) · p
2
·
i
n
n − i
n
(7)
≥
i2(i − 1)
2n3
· p2 ·
(
1 −
1
n
)n−2
·
n − i
n
.
Hence, we have
∑n−i
d=1 pi,i+d
pi,i−1 + pi,i−2
≤
n−i
n ·
(
1 − 1n
)n−1
+ O
(
1
n4/3
)
( i
n · p
)
2
(
1 − 1n
)n−1
·
n−i
n +
i2(i−1)
2n3 · p
2
·
(
1 − 1n
)n−2
·
n−i
n
=
1 + O( 1n1/3 )
( i
n · p
)
2
+ i
2
(i−1)
2n3 · p
2
·
(
1 − 1n
)
−1
.
Since the denominator converges to 3/2 and the nominator to 1, the last term
is at least 4/3 for n large enough. Hence, an impairment is by a constant more
likely than an improvement for all i ≥ n − n1/3 and n large enough.
To apply Lemma 4 we must show additionally that pi+d/pi+d+1 is at least
nγ for all i ≥ n − n1/3, d ≥ 1, and a constant γ > 0:
pi+d
pi+d+1
(1),(4),(6)
≥
Ω
(
(n−i
n
)d
)
O
(
(n−i
n
)d+1
) = Ω
(
n
n − i
)
= Ω(n2/3).
Hence, Lemma 4 tells us that the runtime of the (1+1) EA on noisy OneMax for
p ≥ 1−α(n) log(n) is polynomial only with super-polynomially small probability:
Theorem 2. The runtime TOneMax,N1p of the (1+1) EA on OneMax with noise
function N1p is polynomial with super-polynomially small probability for all p =
ω(log(n)/n).
5 Conclusions
We have analyzed the runtime of the (1+1) EA on a variant of OneMax where
every evaluation is subject to noise with probability p and noise changes one
uniformly chosen bit. In this case, the (1+1) EA optimizes OneMax with high
probability in polynomial time if and only if p is O(log(n)/n). This is the first
rigorous analysis of an EA for a noisy fitness function without any assumptions.
Moreover, the paper presents the methods used to prove this result in a general
and modular way in order to help understanding the ideas and to simplify ap-
plication to other EAs and/or noise models. For instance, generalizing the result
for a noise model where every bit is changed by noise with some probability p is
an obvious goal for future research.
Acknowledgements
Hereby I thank Jens Ja¨gersku¨pper, Tobias Storch, and Carsten Witt for valuable
advice and discussions.
References
[AB03] Dirk V. Arnold and H.-G. Beyer. A comparison of evolution strategies with
other direct search methods in the presence of noise. Computational Opti-
mization and Applications, 24:135 – 159, 2003.
[DJW00] S. Droste, Th. Jansen, and I. Wegener. Dynamic parameter control in simple
evolutionary algorithms. In Proceedings of FOGA 2000, pages 275–294, 2000.
[DJW02] S. Droste, Th. Jansen, and I. Wegener. On the analysis of the (1 + 1) EA.
Theoretical Computer Science, (276):51–81, 2002.
[Dro02] S. Droste. Analysis of the (1+1) EA for a dynamically changing OneMax-
variant. In Proceedings of CEC 2002, pages 55–60, 2002.
[Dro03] S. Droste. Analysis of the (1+1) EA for a dynamically bit-wise changing
OneMax-variant. In Proceedings of GECCO 2003, pages 909–921, 2003.
[FM92] S. Forrest and M. Mitchell. Relative building block fitness and the building
block hypothesis. In Proceedings of FOGA 1992, pages 198–226, 1992.
[Hoe63] W. Hoeffding. Probability inequalities for sums of bounded random variables.
Journal of the American Statistical Association, 58:13–30, 1963.
[JW99] Th. Jansen and I. Wegener. On the analysis of evolutionary algorithms –
a proof that crossover really can help. In Proceedings of ESA 1999, pages
184–193, 1999.
[MR95] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University
Press, 1995.
[RRS95] Y. Rabani, Y. Rabinovich, and A. Sinclair. A computational view of popula-
tion genetics. In Proceedings of STOC 1995, pages 83–92, 1995.
[Rud97] G. Rudolph. Convergence Properties of Evolutionary Algorithms. Verlag Dr.
Kovacˇ, 1997.
[WW03] I. Wegener and C. Witt. On the optimization of monotone polynomials by
the (1+1) EA and randomized local search. In Proceedings of GECCO 2003,
pages 622–633, 2003.