# Why Evolutionary Algorithms Cannot Generate Specified Complexity

MetaNews

November 1, 1999

In this piece I want to develop this last point further. I want to sketch a fuller justification for this failure of evolutionary algorithms to generate specified complexity. Note that I'm not going to provide a precise mathematical proof here. I'm simply going to sketch the general mathematical considerations that lead me to regard the attempt to generate specified complexity via evolutionary algorithms as a fool's errand. Indeed, I'm going to argue that the failure of evolutionary algorithms to generate actual specified complexity counts decisively against their ability as general purpose problem solvers and by implication raises substantive doubts about the power of the Darwinian mechanism generally to produce the marvels increasingly attributed to them. Note that in this piece I am keeping the body of the text non-technical. Nonetheless, I include a set of technical appendixes for those who want to see some of the relevant mathematics.

In general terms, the problem of generating specified complexity via an evolutionary algorithm can be conceived as follows. We are given a phase space of possible solutions to a problem and a fitness function over that phase space. Our task is to optimize this fitness function by finding a point in the phase space that attains a certain level of fitness. Think of it this way: The phase space is a vast plane, the fitness function is a vast hollowed-out mountain-range over the plane (complete with low-lying foothills and incredibly high peaks). The task of an evolutionary algorithm is by moving around in the plane to get to some point under the mountain-range where it attains at least a certain height (e.g., 10,000 feet). The collection of all such places on the plane where the mountain range attains at least that height (here 10,000 feet) we will call the **target**. Thus the job of the evolutionary algorithm is by navigating the phase space to find its way into the target (see Appendix 1 below).

Now, the phase space (which we are picturing as a giant plane) usually comes with some additional topological structure, typically given by a metric or distance function (see Appendix 2). This topological structure tells us how points in the phase space are related geometrically to nearby points. Also, even though the phase space is huge, it tends to be finite (strictly finite for problems represented on computer and topologically finite, or what topologists call "compact," in general). Moreover, such spaces typically come with a uniform probability that is adapted to the topology of the phase space (see Appendix 3).

Basically this means that if you get out your tape measure and measure off a three by five foot area in one part of the phase space, the uniform probability will assign it the same probability as a three by five foot area in another portion of the phase space. All the spaces to which I've seen evolutionary algorithms applied do indeed satisfy these two conditions of having a finite topological structure (i.e., they are compact) and possessing a uniform probability. Moreover, this uniform probability is what typically gets used to estimate the complexity/improbability of the target (i.e., the area of the phase space under the mountain range where it attains a certain requisite level--e.g., 10,000 feet).

For instance, in Dawkins's METHINKS*IT*IS*LIKE*A*WEASEL example, the phase space consists of strings of upper case Roman letters and spaces (represented by asterisks) of length 28. A uniform probability on this space assigns equal probability to each of these sequences--approximately 1 in 10^40. It's this improbability that corresponds to the complexity of the target sequence and with respect to which this target sequence constitutes an instance of specified complexity.

In general, given a phase space with a target sitting under those places where the mountain range attains at least a certain level (e.g., 10,000 feet), the (uniform) probability of randomly choosing a point from the phase space and landing in the target will be very small. In Dawkins's example, the target equals the character string METHINKS*IT*IS*LIKE*A*WEASEL and the improbability is 1 in 10^40. For non-toy examples the improbability is typically much less than my universal probability bound of 1 in 10^150 that I justify in The Design Inference (Cambridge, 1998; cf. section 6.5). Indeed, if the probability of the target were not small, a random search through the phase space would suffice to find a point in the target, and there would be no need to construct an evolutionary algorithm to find it.

We therefore suppose that the target is just a tiny portion of the whole phase space; or, in slightly more technical language, the (uniform) probability of the target in relation to the phase space as a whole is exceedingly small. What's more, the target, in virtue of its explicit identification, is specified (certainly this is the case in Dawkins's example where the target includes but one point and coincides with the character string METHINKS*IT*IS*LIKE*A*WEASEL). Thus it would seem that to find a point in the target would be to generate specified complexity.

But let's look deeper. Consider an evolutionary algorithm that does in fact find the target. An evolutionary algorithm can be conceived as a stochastic process that moves around the phase space some finite number of times (see Appendix 4). Let's call the evolutionary algorithm E. The evolutionary algorithm starts at some point E(0) in the phase space (usually chosen at random). Then it moves to E(1). Then to E(2). Then to E(3). Etc. For E successfully to find the target (i.e., to find a point under the mountain range where it attains at least a certain level--e.g., 10,000 feet) then means that within a manageable number of steps n, E is very likely to land in the target--i.e., some one of E(0), E(1), ..., E(n) is likely to land in the target (see Appendix 5). Simply put, the algorithm E has to get us into the target with high probability and in a relatively short number of steps. In the Dawkins example, E(n) rapidly converged to METHINKS*IT*IS*LIKE*A*WEASEL for n around 40.

An evolutionary algorithm needs to be contrasted with pure random sampling. Pure random sampling treats the phase space as a giant urn from which we draw items at random according to the uniform probability. In that case, a random sample from M of size k will contain a point in the target with probability better than 1/2 provided that k is around the reciprocal of the (uniform) probability of the target. Since we are assuming that the probability of the target is less than my universal probability bound of 1 in 10^150 given earlier, it follows that k will need to be at least 10^150. This number is enormous and far exceeds the number of computations conceivable for any traditional computer. Moreover, it doesn't seem that quantum computation is going to render this number tractable either since the points in phase space need to be made explicit in any random sampling scheme (implying decoherence and thus preventing us from exploiting quantum superposition).

Let's now return to the evolutionary algorithm E. We're going to allow ourselves a certain number of steps, call it m, for E to land in the target. Clearly m is going to have to be much less than 10^150 if we're going to program E on a computer and have any hope of E landing in the target. With m fixed, we can determine the probability that E will land in any subset of phase space in m steps (see Appendix 6). For instance, in the Dawkins example, for m = 100 and the target sequence METHINKS*IT*IS*LIKE*A*WEASEL and E the cumulative selection algorithm Dawkins constructed, the probability of E attaining the target in m = 100 steps is approximately 1.

What this means is that even though with respect to the uniform probability on the phase space the target has exceedingly small probability, the probability for the evolutionary algorithm E to get into the target in m steps is no longer small. And since complexity and improbability are for the purposes of specified complexity parallel notions, this means that even though the target is complex and specified with respect to the uniform probability on the phase space, it remains specified but is no longer complex with respect to the probability induced by evolutionary algorithm E.

Does this mean that the evolutionary algorithm has in fact generated complex specified information, but that in referring to a loss of complexity with respect to E I'm simply engaging in some fancy redefinitions to avoid this conclusion? I don't think so. Remember that we are interested in the **generation** of specified complexity and not in its reshuffling.

To see what's at stake here, we need to be clear about a restriction that needs to be placed on E if it is to count as a genuine evolutionary algorithm (i.e., a legitimate correlative of the Darwinian mutation-selection mechanism). It is not, for instance, legitimate for E to be able to survey the mountain range (i.e., fitness landscape), see where in the phase space it attains a global maximum, and then head in that direction. That would be teleology. No, E must be able to navigate its way to the target either by randomly choosing points from the phase space or by using those as starting points and then selecting other points in the phase space based **solely** on the topology of the phase space and without recourse to the fitness function, except to evaluate the fitness function at individual points of the phase space already traversed by E. In other words, E must move around the phase space only on the basis of its topology and the elevation of the fitness function at points in the phase space already traversed by E.

We can think of it this way: E(0), the first point selected by the evolutionary algorithm, is selected randomly from the phase space (i.e., with respect to the uniform probability on the phase space); the fitness function can then be evaluated at E(0) (i.e., we can determine how high the mountain range is at that point E(0) in phase space); given only E(0), the fitness function's height at E(0), and the topology of phase space, E selects E(1); then the height of the fitness function can be evaluated at E(1); then E(2) is selected based only on E(0), E(1), the height of the fitness function at these two points, and the topology of M; etc. In other words: The evolutionary algorithm E must be independent of the fitness function except for those points that E has hitherto traversed and then only insofar as the fitness function is evaluated at those points.

Certainly this means that the evolutionary algorithm E is highly constrained in the use it can make of the fitness function. But there's more. It means that the success of E in hitting the target depends crucially on the structure of the fitness function. If, for instance, the fitness function is totally flat and close to the ground whenever it is outside the target, then it fails to discriminate between points outside the target and so cannot be any help guiding an evolutionary algorithm into the target. For such a fitness function, the probability of the evolutionary algorithm landing in the target is no better than the probability of pure random sampling landing in the target, which as we know is inadequate to get us there (see Appendix 7).

But the problem is even worse. It follows by a combinatorial argument that for any partition of the phase space into pieces none of which has probability more than the probability of the target (which by assumption is less than 1 in 10^150), for the vast majority of these partition elements the probability of the evolutionary algorithm E entering them is going to be no better than pure random sampling. It follows that the vast majority of fitness functions on the phase space that coincide with our original fitness function on the target but reshuffle the function on the partition elements outside the target will not land the evolutionary algorithm in the target (this result is essentially a corollary of the No Free Lunch theorems by Wolpert and Macready). Simply put, the vast majority of fitness functions will not guide E into the target even if they coincide with our original fitness function on the target (see Appendix 8).

This result refutes the claim that evolutionary algorithms can generate specified complexity, for it means that they can yield specified complexity only if such algorithms along with their fitness functions are carefully adapted to the complex specified targets they are meant to attain. In other words, all the specified complexity we get out of an evolutionary algorithm has first to be put into the construction of the evolutionary algorithm and into the fitness function that guides the algorithm. Evolutionary algorithms therefore do not generate or create specified complexity, but merely harness already existing specified complexity. Like a bump under a rug, the specified complexity problem has been shifted around, but it has not been eliminated.

I have omitted many details. I have also omitted some complications which to my mind make the problem of generating specified complexity via evolutionary algorithms even more problematic (in nature, for instance, the fitness function will not stay fixed but vary over time). Some of the details are treated in chapter 6 of my recently released

*Intelligent Design: The Bridge Between Science & Theology*(InterVarsity). A full treatment will have to await a book I'm currently writing (

*Redesigning Science: Why Specified Complexity Is a Reliable Empirical Marker of Actual Design*). But I want to make these preliminary results available because the misconception that one can purchase specified complexity on the cheap is widespread and ill-conceived.

The only known generator of specified complexity that we know is intelligence. Sans intelligence, a process that yields specified complexity merely converts already existing specified complexity. We are seeing a similar phenomenon with inflationary cosmologies, which attempt to wash out cosmological fine-tuning but invariably seem to smuggle it back in. Smuggling in specified complexity is not the same as **generating** specified complexity. I challenge the biological community to take these results seriously, and reevaluate how it understands the generation of specified complexity.

+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_

BRIEF TECHNICAL APPENDIXES

== APPENDIX 1== Let's call the phase space M and the fitness function f. Let's say f maps into the non-negative reals and that by optimization here we mean maximizing f. Thus we'll want to find a point in that subset of M where f attains at least a certain level a* (a* being a positive real number). Let's refer to this subset as the target T. This subset is represented mathematically as T = {x in M | f(x) U a*}, i.e., the set of all x in M where f(x) is at least a*. Our task, then, is to find some point q in T, i.e., a point q in the phase space satisfying f(q) U a*.

== APPENDIX 2== A metric or distance function d on M is a non-negative real-valued function on the cartesian product of M satisfying the following properties: (1) for all x in M, d(x,x) = 0 (2) for all x and y in M, d(x,y) = d(y,x) [symmetry] (3) for all x, y, and z in M, d(x,z) U d(x,y) + d(y,z) [triangle inequality]

==APPENDIX 3== Ordinarily the phase space M satisfies a uniformizability condition, i.e., M possesses a uniform probability measure U that is adapted to the metric on M so that geometrically congruent pieces of M get assigned identical probabilities (for a general account of this see my article "Uniform Probability," Journal of Theoretical Probability, Vol. 3, No. 4, 1990, pp. 611-626).

== APPENDIX 4== An evolutionary algorithm E can be conceived as a stochastic process from the natural numbers N = {0, 1, 2, ...} into the phase space M, but which will truncated after a certain point (we don't allow it to circulate in the phase space endlessly). To call E a stochastic process on the natural numbers N is to say that E is a measurable function from the cartesian product of NxS into M where S is a probability space with some probability measure P.

== APPENDIX 5== More formally, E is a stochastic process on NxS where S has probability P. Thus for E successfully to optimize f with respect to M means that within a manageable number of steps n, the set {E(0,s), E(1,s), ..., E(n,s)} contains a point q in T = {x in M | f(x) U a*} with high probability for s in S, i.e., P({s in S | {E(0,s), E(1,s), ..., E(n,s)} intersects T}) is large.

== APPENDIX 6== E induces a probability assignment E* on M where for each subset A of M (technically, "Borel subset"), E*(A) = P({s in S | {E(0,s), E(1,s), ... E(m,s)} intersects A}). Note that E* is not a probability measure since additivity fails

== APPENDIX 7== For such a fitness function f, for E to land in T = {x in M | f(x) U a*} in the requisite m number of steps, cannot exceed m times the uniform probability of the target, i.e., the number of points in phase space traversed by E times the uniform probability of the target. Since that uniform probability is no greater than 1 in 10^150 and since m must be vastly smaller than 10^150 for E to be computationally tractable, it follows that for such a fitness landscape, evolutionary algorithms are no better than taking purely random samples of size m.

== APPENDIX 8== Let A(1), A(2), ..., A(r) be a partition of M (i.e., a mutually exclusive and exhaustive compartmentalization of M) such that for each element of this partition A(i), U(A(i)) U U(T) (< 1 in 10^150). Also assume that one of the A(i)'s is T. Then r U 10^150 (the sum of the U(A(i))'s for i going from 1 to r is unity). It follows that for any s in S, {E(0,s), E(1,s), ... E(m,s)} intersects at most m+1 of the partition elements. Since m is by assumption vastly smaller than 10^150 (it has to be if the evolutionary algorithm is going to be computationally tractable), permuting f on the partition elements other than the target T = {x in M | f(x) U a*} (thus leaving f fixed on the target) indicates that only a proportion of around m/r of these permutations will be likely to land E in T (m/r is an incredibly small number). On the other hand, the vast majority of these permutations (i.e., a proportion of 1-m/r) will land in T with probability on the order m x U(T) (which remains an incredibly small probability).

This is another posting from the Meta-List. Copyright 1997, 1998, 1999. William Grassie.