UNIVERSITY OF DORTMUND REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 531 Design and Management of Complex Technical Processes and Systems by means of Computational Intelligence Methods Analysis of the (1+1) EA for a Noisy OneMax Stefan Droste No. CI-161/04 Technical Report ISSN 1433-3325 February 2004 Secretary of the SFB 531 · University of Dortmund · Dept. of Computer Science/XI 44221 Dortmund · Germany This work is a product of the Collaborative Research Center 531, “Computational Intelligence,” at the University of Dortmund and was printed with financial support of the Deutsche Forschungsgemeinschaft. Analysis of the (1+1) EA for a Noisy OneMax Stefan Droste LS Informatik 2, Universita¨t Dortmund, 44221 Dortmund, Germany stefan.droste@udo.edu Abstract. In practical applications evaluating a fitness function is fre- quently subject to noise, i. e., the “true fitness” is disturbed by some random variations. Evolutionary algorithms (EAs) are often successfully applied to noisy problems, where they have turned out to be particularly robust. Theoretical results on the behavior of EAs for noisy functions are comparatively very rare, especially for discrete search spaces. Here we present an analysis of the (1+1) EA for a noisy variant of OneMax and compute the maximal noise strength allowing the (1+1) EA a poly- nomial runtime asymptotically exactly. The methods used in the proofs are presented in a general form with clearly stated conditions in order to simplify further applications. 1 Introduction When trying to optimize problems in practice, it is rarely possible to determine the fitness value of a search point exactly. Most often noise changes the fitness value by some (typically small) amount because of interferences during the ex- periment resulting in the fitness value. Evolutionary algorithms (EAs) are often good at coping with noise due to the use of a population of search points. But there are only few theoretical results about the effects of noise on the perfor- mance of EAs for discrete search spaces, while continuous search spaces found more attention (see [AB03] for an overview). In the last years the rigorous runtime analysis of EAs (e. g. see [DJW02], [JW99] or [WW03]) has proven to be a useful approach for extending our knowl- edge about EAs, besides other well-known approaches like schema theory or macroscopic, statistical, fitness landscape resp. local analysis. Here we extend this approach to the optimization of noisy functions by EAs. Since this is the first rigorous analysis of this kind, we start with a simple EA and fitness func- tion: the (1+1) EA for a noisy variant of OneMax. The (1+1) EA uses only one individual, changed by mutation only, and has been analyzed for a number of different fitness functions (e. g. see [Rud97] or [DJW02]), greatly extending the knowledge about the random processes underlying the (1+1) EA and ap- propriate proof techniques. Note that all these fitness functions are static and noise-free (just recently, dynamic variants of OneMax have been analyzed, see [Dro02] and [Dro03]), which is not very common in practical applications.  This research was partly supported by the Deutsche Forschungsgemeinschaft as part of the Collaborative Research Center “Computational Intelligence” (531). In this paper we analyze the influence of the noise strength on the runtime of the (1+1) EA until the optimum of OneMax is found. We show that noise, which changes one uniformly chosen bit before evaluation with probability p, makes an efficient optimization of the (1+1) EA impossible if and only if p grows asymptotically faster than log(n)/n (where n is the dimension of the search space). This is especially interesting since log(n)/n is also the critical value for the (1+1) EA on a dynamic OneMax where in each step with probability p one uniformly chosen bit of the target bit string changes. Although the two processes have no obvious interpretation in terms of each other, we can generalize the methods used to analyze the dynamic OneMax (see [Dro02]) to noisy OneMax. In the next section, we formally define the (1+1) EA, our noise model, and the runtime, our performance measure. Section 3 presents the methods used in Section 4 to analyze the runtime of the (1+1) EA for noisy OneMax. This separation should increase comprehensibility, because the methods in Section 3 clearly state the conditions necessary for applying them, while Section 4 shows that the (1+1) EA on noisy OneMax fulfills these conditions. We finish with some conclusions. 2 The (1+1) EA and Noisy OneMax In this section, we formally define the (1+1) EA and the noise model investi- gated. The (1+1) EA is the most basic EA since it uses only one individual changed by mutation only. This simplicity makes it an ideal starting point for theoretical analyzes (see [DJW02]). Furthermore, the (1+1) EA has proven to be surprisingly efficient for some functions compared to more involved EAs (see [FM92]), making its analysis worth while in itself. The (1+1) EA for noisy functions differs from the (1+1) EA for noise-free evaluation except only in one point: since the fitness-value of the parental search point evaluated in a generation before is not guaranteed to be the correct one, both the parent and the child are evaluated in each generation. Without this resampling a noisy evaluation can only be corrected if a copy of the search point is evaluated (e. g. if the mutation has no effect). Let fN : {0, 1}n → R denote the noisy function. Then the (1+1) EA looks as follows: Definition 1 ((1+1) EA for maximization of noisy functions). 1. Set t := 0 and choose xt ∈ {0, 1}n randomly uniformly. 2. Set x′ := xt and independently flip each bit of x′ with prob. 1/n. 3. If fN (x′) ≥ fN(xt), set xt+1 := x′, else xt+1 := xt. 4. Set t := t + 1 and go to step 2. In the following we will look at the case that fN results from the true fitness function f : {0, 1}n → {0, 1} by randomly changing the argument, i. e. fN(x) = f(N(x)), where the random function N : {0, 1}n → {0, 1}n represents the noise. Hence, our noise model considers noise which takes effect before evaluation. We assume that the noisy evaluations are independent, i. e., the result of previous evaluations does not influence the actual evaluation. We are interested in the number of generations the (1+1) EA needs to eval- uate a maximum of the true fitness function f for the first time. Since there are two fitness evaluations per generation, the number of generations uniquely determines the number of fitness evaluations, which is often the most costly op- eration in practice. Considering an optimum of the true fitness function as an optimum of the noisy function seems to be a more natural choice than investi- gating the number of generations until a search point with maximal noisy fitness value is evaluated, since such a point can be “far away” from a true maximum. Hence, we analyze the number Tf,N of generations the (1+1) EA needs to find a maximum of the underlying true fitness function f under noise N : Definition 2 (Runtime of the (1+1) EA for noisy functions). The run- time Tf,N of the (1+1) EA for a noisy function fN is the number of generations until a maximum of the true fitness function f is found: Tf,N := min{t ∈ N0 | f(xt) = max{f(x) |x ∈ {0, 1}n}}. We analyze the noise model N1p changing with probability p exactly one uni- formly chosen bit of x, while no bit is changed with probability 1 − p. Such a model can be appropriate for applications where the genotype-phenotype map- ping is error-prone, but the evaluation of the phenotype exact, e. g. when the search point is transformed by a complicated process before being evaluated. Definition 3 (One-bit noise N 1p). Let p ∈ [0, 1]. The random noise function N1p : {0, 1} n → {0, 1}n is defined as follows: ∀x, y ∈ {0, 1}n : P (N1p (x) = y) =    1 − p if x = y, p/n if H(x, y) = 1, 0 if H(x, y) > 2. We say that noisy evaluation increases resp. decreases the fitness of a point x ∈ {0, 1}n if fN (x) > f(x) resp. fN(x) < f(x). For f = OneMax (with OneMax(x 1 , . . . , xn) := x1 + · · · + xn) the noise function N1p can only change the fitness of a point by one, and the probability of N1p increasing resp. decreasing the fitness of x is directly proportional to n −OneMax(x) resp. OneMax(x). Our goal is to determine the critical noise strength p = p(n) such that the runtime of the (1+1) EA is polynomial in n with high probability if the noise is at most p, but super-polynomial with high probability if the noise is asymptoti- cally larger than p. Although only one bit of the search point to be evaluated is changed by N1p , this can make the (1+1) EA accept a child x ′ whose true fitness OneMax(x′) is by two smaller than the true fitness OneMax(xt) of the parent xt (see Section 4). This differs from the (1+1) EA for a dynamically changing OneMax, where the target bit string is changed by the operator N1p and the Hamming distance to the target is to be minimized (see [Dro02] for an analysis that the expected runtime is polynomial if and only if p = O(log(n)/n)). Al- though in the latter process the distance to the optimum can only increase by one in one step, our techniques can also cope with the first process and show that in this case p = O(log(n)/n) is the critical noise strength, too. To distinguish more clearly between the key properties of the (1+1) EA on noisy OneMax resulting in polynomial resp. super-polynomial runtime and the necessary technical proofs, we show some general methods for proving polyno- mial upper and super-polynomial lower bounds in the next section. 3 Techniques for Bounding Markov Chains In this section, we present some general results that can be applied to analyze the (1+1) EA for the noisy OneMax presented in the last section. They are presented in a general form with clearly stated conditions in order to make them easily applicable to other problems. Furthermore, this should make their proofs more comprehensible, since they need not to take care of the specific details of the EA analyzed as long as the EA fulfills the general conditions. The (1+1) EA on noisy OneMax is a Markov process on the state space {0, . . . , n}, where state i represents that the actual individual xt has true fitness OneMax(xt) = i. Hence, we identify the process by (n + 1)2 transition prob- abilities p ·,· = (pi,j)i,j∈{0,...,n}, where pi,j ∈ [0, 1] is the probability of moving from state i to state j in one step. Let Ti,j be the number of steps to come from state i to j for the first time, i. e. Ti,n is the runtime when starting in state i. 3.1 Upper Bound Techniques Our first result, implicitly already used in [Dro02], upper bounds the runtime of a process p ·,· that can only move from i to i−1, i or i+1 in one step (a so-called {−1, 0, +1}-process) by a polynomial: Lemma 1. Let p ·,· be a {−1, 0, +1}-process on {0, . . . , n}. If there are two con- stants c+, c− ∈ R+ such that ∀i ∈ {0, . . . , n} : pi,i+1 ≥ c + · n − i n and pi,i−1 ≤ c − · log(n) n · i n , then for all i ∈ {0, . . . , n − 1} E(Ti,i+1) ≤ n1+c −/(c+·ln(2)) c+ · (n − i) and E(T 0,n) = O ( n1+c −/(c+·ln(2)) · log(n) ) . Proof. It is well known (e. g. see [DJW00]) that E(Ti,i+1) for {−1, 0, +1}-processes is ∑i k=0 1 pk,k+1 · ∏i l=k+1 pl,l−1 pl,l+1 . Utilizing the bounds for pi,i−1 and pi,i+1, we get: E(Ti,i+1) ≤ i ∑ k=0 n c+(n − k) · i ∏ l=k+1 n c+(n − l) · c− · log(n) · l n2 = i ∑ k=0 n c+(n − k) · ( c− c+ log(n) n )i−k · i! k! · (n − i − 1)! (n − k − 1)! = n c+ · i ∑ k=0 ( c− c+ log(n) n )i−k (i k ) (n−k i−k ) (n − i) ≤ n c+(n − i) ( 1 + c− c+ log(n) n )i ≤ n c+(n − i) · exp ( c− c+ log(n) n · i ) ≤ n1+c −/(c+·ln(2)) c+(n − i) . Summing up these values for all i ∈ {0, . . . , n−1} gives the desired upper bound on E(T 0,n) because ∑n i=1 1/i is O(log(n)). unionsq To upper bound a Markov process p ·,· (i. e. replacing it by a process whose finite runtime to come from i to n stochastically dominates the runtime Ti,n of the old process for all i < n) being no {−1, 0, +1}-process, we can proceed as follows: first, we “delete” all improvements by more than one, i. e., we set all probabilities pi,i+d for d ≥ 2 to zero and increase pi,i by ∑n−i d=2 pi,i+d. This leads to a process whose runtime stochastically dominates the runtime of the old process (see [Dro03]), where a random variable X stochastically dominates a random variable Y if for all values d of X and Y : P (X ≥ d) ≥ P (Y ≥ d). Afterwards, we have a process that can move from state i only to states 0, . . . , i + 1, a so-called ≤1-process. To upper bound such a ≤ 1-process p ·,· by a {−1, 0, 1}-process p˜ ·,·, we can use the following lemma proven in [Dro03]: Lemma 2. Let (p ·,·) be a ≤1-process and (p˜·,·) be a {−1, 0, 1}-process on the state space {0, . . . , n}. If the following conditions hold 1. ∀i ∈ {1, . . . , n − 1} : pi,0 ≤ ∏i k=1 p˜k,k−1, 2. ∀i ∈ {1, . . . , n − 1} : ∀j ∈ {1, . . . , i− 1} : pi,j ≤ p˜j,j ∏i k=j+1 p˜k,k−1, 3. ∀i ∈ {0, . . . , n − 1} : pi,i+1 ≥ p˜i,i+1, then E(Ti,n) ≤ E(T˜i,n) for all i ∈ {0, . . . , n}. Now we can use Lemma 1 to upper bound the expected runtime of this slower {−1, 0, +1}-process p˜ ·,· by a polynomial if the transition probabilities fulfill the conditions. We will see in Section 4 that the process resulting from the (1+1) EA on noisy OneMax for small noise strength p = O(log(n)/n) has this form. 3.2 Lower Bound Techniques If the probabilities pi,i−1 of the process p·,· are only a little bit larger than nec- essary for Lemma 1, i. e. by a non-constant factor α(n), the expected runtime of the process is super-polynomially. To be more exact, the runtime is polynomially only with super-polynomially small probability o(1/poly(n)), i. e. smaller than 1/q(n) for any polynomial q (see also [Dro02] and [Dro03]): Lemma 3. Let p ·,· be a Markov process on {0, . . . , n}. If for a function α : N+ → R with α(n) n→∞ −→ ∞ and α(n) ≤ n/ log(n) the conditions 1. ∀ i ≥ 0, d = ω(log(n)) : pi,i+d = o(1/poly(n)), 2. ∀ i ≥ n − α(n) log(n), d = ω(1) : pi,i+d = o(1/poly(n)), and 3. ∃ c+, c− ∈ R+ : ∀ i ≥ n − α(n) log(n) : n−i ∑ j=1 pi,i+j ≤ c+ · n − i n and pi,i−1 ≥ c− · α(n) · log(n) n · i n hold then for all i ≤ n − α(n) log(n) the probability that Ti,n is polynomial is super-polynomially small, i. e. o(1/poly(n)). Proof. Let It ∈ {0, . . . , n} be the state of the process at time step t. Since for all i the probability of a direct step from i to i + d with d = ω(log(n)) is super- polynomially small, we assume that only improving steps by O(log(n)) happen. Hence, regardless of the initial state i ≤ n − α(n) log(n), the process reaches state n only via a state between n − α(n) log(n) and n − α(n) log(n)/2. Since a constant factor does not matter for our definition of α(n), for an arbitrary constant a ∈ ]0, 1[ there have to be time steps t 1 < t 2 , such that It ∈ {n − α(n)a log(n), . . . , n−1} for all t ∈ {t 1 , . . . , t 2 −1}, and It 2 = n (a similar technique is used in [RRS95]). We show that it is super-polynomially unlikely to “bridge the gap” from state n − α(n)a log(n) to n in polynomially many steps. Hence, we can assume that i ≥ n − α(n)a log(n) and that no improvements by more than α(n)d for a constant d ∈ ]0, 1[ do happen. Since (n − α(n)a log(n))/n converges to 1, we bound pi,i−1 and ∑n−i d=1 pi,i+d for all n larger than a constant n 0 , i. e. n large enough: pi,i−1 ≥ c− 2 · α(n) log(n) n and n−i ∑ d=1 pi,i+d ≤ c + · α(n)a log(n) n . Therefore, the probability of an improving step under the condition that the step changes the state (i. e. the step is effective) is at most c+α(n)a log(n)/n c+α(n)a log(n)/n + c−α(n) log(n)/(2n) ≤ 2c+ c− · 1 α(n)1−a . Hence, the expected number of improving steps during t effective steps is at most (2c+/c−) · t/α(n)1−a. However, to bridge the gap of size α(n)a log(n) in t effective steps consisting of t+ improving and t− decreasing steps, we must have α(n)d · t+ − t− ≥ α(n)a · log(n) ⇐⇒ t+ ≥ α(n)a log(n) + t α(n)d + 1 . The last term is at least t/α(n)c+d for every constant c > 0 and n large enough. If we choose a, c, and d with c + d < 1 − a, the number of improving steps necessary to reach state n is by a non-constant factor larger than its expected number. For n large enough, this factor is at least 2. By Chernoff bounds (see [MR95]) the probability that the sum of binary random variables is by a constant factor 1 + δ larger than its expected value µ is upper bounded by (we use δ = 1 and µ = (2c+/c−)α(n)a−1t) ( exp(δ) (1 + δ)1+δ )µ = ( exp(1) 4 ) (2c+/c−)α(n)a−1t . If t is small, this bound is not super-polynomially small. But since the process must bridge a gap of size α(n)a log(n) and any improvement by more than α(n)d is excluded, the number t of effective steps must be at least α(n)a−d log(n). Hence, the above bound is at most (exp(1)/4)(2c +/c−)α(n)2a−d−1 log(n), which is super-polynomially small if 2a− d− 1 > 0. Since a = 2/3, c = 1/7, and d = 1/7 fulfill 2a−d−1 > 0 and c+d < 1−a, the probability of every phase of polynomial length reaching the optimum is super-polynomially small. unionsq If the noise strength is too large, this result cannot be applied to lower bound the (1+1) EA on noisy OneMax. In this case a step decreasing the state is by a constant factor more likely than a step increasing it if the process is close to the optimal state n. If furthermore an improvement by d is by a non-constant factor more likely than an improvement by d + 1, we can apply the following result: Lemma 4. Let p ·,· be a Markov process on {0, . . . , n} such that 1. ∀ i ≥ 0, d = ω(log(n)) : pi,i+d = o(1/poly(n)), 2. ∃ δ, ε, γ > 0: ∀ i ≥ n − nδ, d ∈ {1, . . . , n − i} : i ∑ j=1 pi,i−j ≥ (1 + ε) · n−i ∑ j=1 pi,i+j and pi,i+d pi,i+d+1 ≥ nγ . Then for all i ≤ n − nδ the probability of Ti,n being polynomially is super- polynomially small. Proof. Analogously to the proof of Lemma 3, we can assume that to reach the target state n, there has to be a phase ranging from time steps t 1 to t 2 , such that It 2 = n and for all t ∈ {t 1 , . . . , t 2 − 1} we have It ∈ { n − nδ , . . . , n − 1} . We show that every phase of polynomial length is super-polynomially unlikely. Let Xt ∈ {−( n − nδ ), . . . , nδ} be the change of state in time step t, i. e. P (Xt = d) = pi,i+d if the process is in state i at time step t. The phase can only be successful if Xt 1 + · · · + Xt 2 = nδ, which will be shown to be exponentially unlikely. Since Xt < 0 is at least by a factor 1 + ε more likely than Xt > 0, we make the process only faster if we assume that P (Xt < 0) = (1 + ε) ·P (Xt > 0) and that every step decreasing the number of ones decreases it by exactly one. Since pi,i+d/pi,i+d+1 is at least nγ for d > 0, we replace the step size d of an improving step by the value of a geometrically distributed random variable Y with success probability 1−n−γ : this leads to a stochastically dominating process because P (Y = d)/P (Y = d+1) is exactly nγ . All in all, we have replaced every step of p ·,· in the considered phase by the value of a {−1, . . . , nγ}-valued random variable X ′ defined by P (X ′ = d) = { 1+ε 2+ε if d = −1 1 2+ε · (n −γ)d−1 · (1 − n−γ) if d ≥ 1 (Note that X ′ < 0 is by a factor 1+ ε more likely than X ′ > 0.) Hence, the ran- dom variable St(n) = X ′ 1 + · · ·+X ′t(n) stochastically dominates the improvement of the process p ·,· during a phase of length t(n) starting from a state at least n−nδ . As the expected value of Y is 1/(1−n−γ), for n large enough we have E(X ′) = − 1 + ε 2 + ε + 1 2 + ε · 1 1 − n−γ = − 1 + ε 2 + ε + 1 2 + ε · nγ nγ − 1 ≤ − ε/2 2 + ε . Hence, the expected value of St(n) is for n large enough and a constant ε′ > 0 at most −t(n)ε′. In order for a phase of length t(n) to be successful St(n) has to be at least nγ . We show that even P (St(n) > 0) is super-polynomially unlikely for all possible phase lengths t(n). To estimate P (St(n) > 0) we cannot use Chernoff bounds, since St(n) is no sum of {0, 1}-valued, but {−1, . . . , nγ}-valued random variables. In this situation we use Hoeffding’s inequality ([Hoe63]), a generalization of Chernoff bounds, stating in our notation P ( St(n) − E(St(n)) ≥ t(n)ε ) ≤ exp ( −2t(n)ε2 z2 ) where z is the number of different values of X ′i. Because of Condition 1 we can assume that no improvements by more than log(n)2 − 2 happen implying z = log(n)2 and t(n) ≥ nγ/ log(n)2. Therefore, Hoeffding’s inequality gives us an exponentially small upper bound on the probability, that a phase of polynomial length is successful. unionsq 4 Application of the Techniques for the (1+1) EA on Noisy OneMax Now we apply the techniques presented in the previous section to determine the maximal noise strength p under which the (1+1) EA is still able to optimize OneMax with single-bit noise in expected polynomial time. Hence, let Muti ∈ {−i, . . . , n − i} be the random variable denoting the change of the number of ones after a mutation of a search point x with OneMax(x) = i. Depending on the value d = 0 of Mut i we determine if and when this mutation leads to a change of the number of ones of the actual individual by d: – If Mut i = d with d ≥ 2, the mutation will be accepted regardless of the noisy evaluations. Hence, for all i ∈ {0, . . . , n − 2} and d ∈ {2, . . . , n − i}: pi,i+d = P (Mut i = d). – If Mut i = 1, the mutation will be accepted except in case that the noisy evaluation of the parent increases its value while the noisy evaluation of the child decreases its value. Hence, for all i ∈ {0, . . . , n − 1}: pi,i+1 = P (Mut i = 1) · ( 1 − p n − i n · p i + 1 n ) . (1) – If Mut i = −1, the mutation will be accepted if and only if one of the three following cases happens: • the noisy evaluation of the parent decreases its value and the evaluation of the child is not disturbed by noise, • the noisy evaluation of the parent decreases its value and the noisy eval- uation of the child increases it value, or • the evaluation of the parent is not changed by noise and the noisy eval- uation of the child increases its value. Hence, for all i ∈ {1, . . . , n}: pi,i−1 = P (Mut i = −1) · ( p i n ( 1 − p + p n − i + 1 n ) + (1 − p)p n − i + 1 n ) . (2) – If Mut i = −2, the mutation will be accepted only if the noisy evaluation of the parent decreases its value and the noisy evaluation of the child increases its value. Hence, for all i ∈ {2, . . . , n}: pi,i−2 = P (Mut i = −2) · ( p i n p n − i + 2 n ) . (3) As every mutation decreasing the number of ones by at least two is not accepted, pi,i−d = 0 for all d ≥ 2. It is obvious that the probability P (Mut i = d) is essential when analyzing the (1+1) EA on noisy OneMax. We can easily show, that this probability is O(((n − i)/n)d) for d > 0 and O((i/n)d) for d < 0: d > 0 : P (Mut i = d) = n−i ∑ k=d ( n − i k ) · ( i k − d ) · ( 1 n ) 2k−d ( 1 − 1 n )n−2k+d ≤ ( n − i n )d ( 1 − 1 n )n−d 1 d! + n−i ∑ k=d+1 ( n − i n )k 1 k! . (4) d < 0 : P (Mut i = d) = i ∑ k=d ( i k ) · ( n − i k − d ) · ( 1 n ) 2k−d ( 1 − 1 n )n−2k+d ≤ ( i n )d · ( 1 − 1 n )n−d · 1 d! + i ∑ k=d+1 ( i n )k · 1 k! . (5) These bounds are asymptotically tight as long as d is a constant: d > 0 : P (Mut i = d) = n−i ∑ k=d ( n − i k ) · ( i k − d ) · ( 1 n ) 2k−d ( 1 − 1 n )n−2k+d ≥ ( n − i d )( 1 n )d · exp(−1) ≥ ( n − i n )d · exp(−1) dd . (6) d < 0 : P (Mut i = d) = i ∑ k=d ( i k ) · ( n − i k − d ) · ( 1 n ) 2k−d ( 1 − 1 n )n−2k+d ≥ ( i d )( 1 n )d · exp(−1) ≥ ( i n )d · exp(−1) dd . (7) 4.1 A Polynomial Upper Bound for p = O(log(n)/n) First, we want to upper bound the expected runtime of the (1+1) EA on noisy OneMax for p = O(log(n)/n). Therefore, we “delete” all transitions going from i to i + d with d ≥ 2 (making the process only slower) and bound the remaining transition probabilities in the following way for n large enough: pi,i+1 (1),(6) ≥ n − i n · exp(−1) · ( 1 − p2 (n − i)(i + 1) n2 ) ≥ n − i n · 3 exp(−1) 4 , pi,i−1 (2) ≤ i n · 2p and pi,i−2 (3) ≤ ( i 2 )( 1 n ) 2 p2 = i(i − 1) n2 · p2 2 . To upper bound p ·,·, we replace it by the following {−1, 0, +1}-process p˜·,·: p˜i,i+1 = n − i n · 3 exp(−1) 4 and p˜i,i−1 := i n · 4p. It is obvious that p˜i,i+1 ≥ pi,i+1. Since 3 exp(−1)/4 < 1/2 and 4p converges to 0, p˜i,i is at least 1/2 for n large enough. Hence, p˜i,i−1 · p˜i−1,i−1 ≥ pi,i−1 and p˜i,i−1 · p˜i−1,i−2 · p˜i−2,i−2 ≥ pi,i−2 hold, implying that Lemma 2 guarantees that p˜ ·,· upper bounds p·,·. Finally, applying Lemma 1 to p˜·,· gives us: Theorem 1. The expected runtime TOneMax,N1p of the (1+1) EA on OneMax with noise function N1p is polynomial for all p = O(log(n)/n). 4.2 A Super-polynomial Lower Bound for p = ω(log(n)/n) Let us now look at the case that p grows asymptotically faster than log(n)/n, i. e. p ≥ γ(n) log(n)/n for some function γ : N+ → R with γ(n) n→∞ −→ ∞. In the following, we want to show that the conditions of Lemma 3 hold for p ·,· with α(n) := min{log(n), γ(n)}, if p ≤ 1 − α(n) log(n)/n. It is obvious that α(n) n→∞ −→ ∞. Condition 1 holds, since the probabil- ity P (Mut i = d) is at most 1/d!, which is super-polynomially small for d = ω(log(n)). Furthermore, pi,i+d = o(1/poly(n)) for all i ≥ n − log(n)α(n) and non-constant d because of the upper bound (4) for P (Mut i = d). Hence, condi- tion 2 holds. We can upper bound ∑n−i d=1 pi,i+d (using (4)) by n−i ∑ d=1 n−i ∑ k=d ( n − i n )k · 1 k! = n−i ∑ k=1 ( n − i n )k · 1 (k − 1)! = O ( n − i n ) . In turn, we lower bound pi,i−1 for i ≥ n − α(n) log(n) and n large enough by: pi,i−1 (2),(7) ≥ i n · exp(−1) · p · ( i n ( 1 − p + p n − i + 1 n ) + (1 − p) n − i + 1 n ) ≥ i n · exp(−1) · 1 4 · α(n) log(n) n . The last inequality holds due to the following case inspection: if p ≤ 1/2, we use p ≥ α(n) log(n)/n, i/n ≥ 1/2, 1−p ≥ 1/2 and replace all other terms by zero. If 1/2 < p ≤ 1 − α(n) log(n)/n, we use p > 1/2, i/n ≥ 1/2, 1 − p ≥ α(n) log(n)/n, and replace all other terms by zero. Hence, all conditions of Lemma 3 hold. Let us now look at the case p > 1 − α(n) log(n)/n. We cannot lower bound pi,i−1 by Θ(α(n)(log(n)/n)(i/n)) anymore: e. g. for p = 1 decreasing the number of ones implies that the evaluation of the parent decreases its fitness, while the evaluation of the child increases its fitness, i. e. pi,i−1 = P (Mut i = −1) · (i/n) · (n − i + 1)/n, which is smaller than α(n)(log(n)/n)(i/n) for i close to n. So, we use Lemma 4 to lower bound the runtime for p > 1 − α(n) log(n)/n. First, we show that decreasing the number of ones is by a constant factor more likely than increasing it, if i ≥ n − n1/3: n−i ∑ d=1 pi,i+d ≤ n−i ∑ d=1 P (Mut i = d) (4) ≤ n − i n · ( 1 − 1 n )n−1 + n−i ∑ d=2 O ( ( n − i n )d ) = n − i n · ( 1 − 1 n )n−1 + O ( 1 n4/3 ) , pi,i−1 (2) ≥ P (Mut i = −1) · p2 · i n n − i n (7) ≥ ( i n · p ) 2 ( 1 − 1 n )n−1 · n − i n , pi,i−2 (3) ≥ P (Mut i = −2) · p 2 · i n n − i n (7) ≥ i2(i − 1) 2n3 · p2 · ( 1 − 1 n )n−2 · n − i n . Hence, we have ∑n−i d=1 pi,i+d pi,i−1 + pi,i−2 ≤ n−i n · ( 1 − 1n )n−1 + O ( 1 n4/3 ) ( i n · p ) 2 ( 1 − 1n )n−1 · n−i n + i2(i−1) 2n3 · p 2 · ( 1 − 1n )n−2 · n−i n = 1 + O( 1n1/3 ) ( i n · p ) 2 + i 2 (i−1) 2n3 · p 2 · ( 1 − 1n ) −1 . Since the denominator converges to 3/2 and the nominator to 1, the last term is at least 4/3 for n large enough. Hence, an impairment is by a constant more likely than an improvement for all i ≥ n − n1/3 and n large enough. To apply Lemma 4 we must show additionally that pi+d/pi+d+1 is at least nγ for all i ≥ n − n1/3, d ≥ 1, and a constant γ > 0: pi+d pi+d+1 (1),(4),(6) ≥ Ω ( (n−i n )d ) O ( (n−i n )d+1 ) = Ω ( n n − i ) = Ω(n2/3). Hence, Lemma 4 tells us that the runtime of the (1+1) EA on noisy OneMax for p ≥ 1−α(n) log(n) is polynomial only with super-polynomially small probability: Theorem 2. The runtime TOneMax,N1p of the (1+1) EA on OneMax with noise function N1p is polynomial with super-polynomially small probability for all p = ω(log(n)/n). 5 Conclusions We have analyzed the runtime of the (1+1) EA on a variant of OneMax where every evaluation is subject to noise with probability p and noise changes one uniformly chosen bit. In this case, the (1+1) EA optimizes OneMax with high probability in polynomial time if and only if p is O(log(n)/n). This is the first rigorous analysis of an EA for a noisy fitness function without any assumptions. Moreover, the paper presents the methods used to prove this result in a general and modular way in order to help understanding the ideas and to simplify ap- plication to other EAs and/or noise models. For instance, generalizing the result for a noise model where every bit is changed by noise with some probability p is an obvious goal for future research. Acknowledgements Hereby I thank Jens Ja¨gersku¨pper, Tobias Storch, and Carsten Witt for valuable advice and discussions. References [AB03] Dirk V. Arnold and H.-G. Beyer. A comparison of evolution strategies with other direct search methods in the presence of noise. Computational Opti- mization and Applications, 24:135 – 159, 2003. [DJW00] S. Droste, Th. Jansen, and I. Wegener. Dynamic parameter control in simple evolutionary algorithms. In Proceedings of FOGA 2000, pages 275–294, 2000. [DJW02] S. Droste, Th. Jansen, and I. Wegener. On the analysis of the (1 + 1) EA. Theoretical Computer Science, (276):51–81, 2002. [Dro02] S. Droste. Analysis of the (1+1) EA for a dynamically changing OneMax- variant. In Proceedings of CEC 2002, pages 55–60, 2002. [Dro03] S. Droste. Analysis of the (1+1) EA for a dynamically bit-wise changing OneMax-variant. In Proceedings of GECCO 2003, pages 909–921, 2003. [FM92] S. Forrest and M. Mitchell. Relative building block fitness and the building block hypothesis. In Proceedings of FOGA 1992, pages 198–226, 1992. [Hoe63] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963. [JW99] Th. Jansen and I. Wegener. On the analysis of evolutionary algorithms – a proof that crossover really can help. In Proceedings of ESA 1999, pages 184–193, 1999. [MR95] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. [RRS95] Y. Rabani, Y. Rabinovich, and A. Sinclair. A computational view of popula- tion genetics. In Proceedings of STOC 1995, pages 83–92, 1995. [Rud97] G. Rudolph. Convergence Properties of Evolutionary Algorithms. Verlag Dr. Kovacˇ, 1997. [WW03] I. Wegener and C. Witt. On the optimization of monotone polynomials by the (1+1) EA and randomized local search. In Proceedings of GECCO 2003, pages 622–633, 2003.