Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Intelligent Design as a Theory of Information

_{William A. Dembski
February 20, 1997
Intelligent Design} _{Originally published at Access Research Network}

Abstract: For the scientific community intelligent design represents creationism’s latest grasp at scientific legitimacy. Accordingly, intelligent design is viewed as yet another ill-conceived attempt by creationists to straightjacket science within a religious ideology. But in fact intelligent design can be formulated as a scientific theory having empirical consequences and devoid of religious commitments. Intelligent design can be unpacked as a theory of information. Within such a theory, information becomes a reliable indicator of design as well as a proper object for scientific investigation. In my paper I shall (1) show how information can be reliably detected and measured, and (2) formulate a conservation law that governs the origin and flow of information. My broad conclusion is that information is not reducible to natural causes, and that the origin of information is best sought in intelligent causes. Intelligent design thereby becomes a theory for detecting and measuring information, explaining its origin, and tracing its flow.

1. Information

In Steps Towards Life Manfred Eigen (1992, p. 12) identifies what he regards as the central problem facing origins-of-life research: “Our task is to find an algorithm, a natural law that leads to the origin of information.” Eigen is only half right. To determine how life began, it is indeed necessary to understand the origin of information. Even so, neither algorithms nor natural laws are capable of producing information. The great myth of modern evolutionary biology is that information can be gotten on the cheap without recourse to intelligence. It is this myth I seek to dispel, but to do so I shall need to give an account of information. No one disputes that there is such a thing as information. As Keith Devlin (1991, p. 1) remarks, “Our very lives depend upon it, upon its gathering, storage, manipulation, transmission, security, and so on. Huge amounts of money change hands in exchange for information. People talk about it all the time. Lives are lost in its pursuit. Vast commercial empires are created in order to manufacture equipment to handle it.” But what exactly is information? The burden of this paper is to answer this question, presenting an account of information that is relevant to biology.

What then is information? The fundamental intuition underlying information is not, as is sometimes thought, the transmission of signals across a communication channel, but rather, the actualization of one possibility to the exclusion of others. As Fred Dretske (1981, p. 4) puts it, “Information theory identifies the amount of information associated with, or generated by, the occurrence of an event (or the realization of a state of affairs) with the reduction in uncertainty, the elimination of possibilities, represented by that event or state of affairs.” To be sure, whenever signals are transmitted across a communication channel, one possibility is actualized to the exclusion of others, namely, the signal that was transmitted to the exclusion of those that weren’t. But this is only a special case. Information in the first instance presupposes not some medium of communication, but contingency. Robert Stalnaker (1984, p. 85) makes this point clearly: “Content requires contingency. To learn something, to acquire information, is to rule out possibilities. To understand the information conveyed in a communication is to know what possibilities would be excluded by its truth.” For there to be information, there must be a multiplicity of distinct possibilities any one of which might happen. When one of these possibilities does happen and the others are ruled out, information becomes actualized. Indeed, information in its most general sense can be defined as the actualization of one possibility to the exclusion of others (observe that this definition encompasses both syntactic and semantic information).

This way of defining information may seem counterintuitive since we often speak of the information inherent in possibilities that are never actualized. Thus we may speak of the information inherent in flipping one-hundred heads in a row with a fair coin even if this event never happens. There is no difficulty here. In counterfactual situations the definition of information needs to be applied counterfactually. Thus to consider the information inherent in flipping one-hundred heads in a row with a fair coin, we treat this event/possibility as though it were actualized. Information needs to referenced not just to the actual world, but also cross-referenced with all possible worlds.

2. Complex Information

How does our definition of information apply to biology, and to science more generally? To render information a useful concept for science we need to do two things: first, show how to measure information; second, introduce a crucial distinction–the distinction between specified and unspecified information. First, let us show how to measure information. In measuring information it is not enough to count the number of possibilities that were excluded, and offer this number as the relevant measure of information. The problem is that a simple enumeration of excluded possibilities tells us nothing about how those possibilities were individuated in the first place. Consider, for instance, the following individuation of poker hands:

(i) A royal flush.
(ii) Everything else.

To learn that something other than a royal flush was dealt (i.e., possibility (ii)) is clearly to acquire less information than to learn that a royal flush was dealt (i.e., possibility (i)). Yet if our measure of information is simply an enumeration of excluded possibilities, the same numerical value must be assigned in both instances since in both instances a single possibility is excluded.

It follows, therefore, that how we measure information needs to be independent of whatever procedure we use to individuate the possibilities under consideration. And the way to do this is not simply to count possibilities, but to assign probabilities to these possibilities. For a thoroughly shuffled deck of cards, the probability of being dealt a royal flush (i.e., possibility (i)) is approximately .000002 whereas the probability of being dealt anything other than a royal flush (i.e., possibility (ii)) is approximately .999998. Probabilities by themselves, however, are not information measures. Although probabilities properly distinguish possibilities according to the information they contain, nonetheless probabilities remain an inconvenient way of measuring information. There are two reasons for this. First, the scaling and directionality of the numbers assigned by probabilities needs to be recalibrated. We are clearly acquiring more information when we learn someone was dealt a royal flush than when we learn someone wasn’t dealt a royal flush. And yet the probability of being dealt a royal flush (i.e., .000002) is minuscule compared to the probability of being dealt something other than a royal flush (i.e., .999998). Smaller probabilities signify more information, not less.

The second reason probabilities are inconvenient for measuring information is that they are multiplicative rather than additive. If I learn that Alice was dealt a royal flush playing poker at Caesar’s Palace and that Bob was dealt a royal flush playing poker at the Mirage, the probability that both Alice and Bob were dealt royal flushes is the product of the individual probabilities. Nonetheless, it is convenient for information to be measured additively so that the measure of information assigned to Alice and Bob jointly being dealt royal flushes equals the measure of information assigned to Alice being dealt a royal flush plus the measure of information assigned to Bob being dealt a royal flush.

Now there is an obvious way to transform probabilities which circumvents both these difficulties, and that is to apply a negative logarithm to the probabilities. Applying a negative logarithm assigns the more information to the less probability and, because the logarithm of a product is the sum of the logarithms, transforms multiplicative probability measures into additive information measures. What’s more, in deference to communication theorists, it is customary to use the logarithm to the base 2. The rationale for this choice of logarithmic base is as follows. The most convenient way for communication theorists to measure information is in bits. Any message sent across a communication channel can be viewed as a string of 0’s and 1’s. For instance, the ASCII code uses strings of eight 0’s and 1’s to represent the characters on a typewriter, with whole words and sentences in turn represented as strings of such character strings. In like manner all communication may be reduced to the transmission of sequences of 0’s and 1’s. Given this reduction, the obvious way for communication theorists to measure information is in number of bits transmitted across a communication channel. And since the negative logarithm to the base 2 of a probability corresponds to the average number of bits needed to identify an event of that probability, the logarithm to the base 2 is the canonical logarithm for communication theorists. Thus we define the measure of information in an event of probability p as -log2p (see Shannon and Weaver, 1949, p. 32; Hamming, 1986; or indeed any mathematical introduction to information theory).

What about the additivity of this information measure? Recall the example of Alice being dealt a royal flush playing poker at Caesar’s Palace and that Bob being dealt a royal flush playing poker at the Mirage. Let’s call the first event A and the second B. Since randomly dealt poker hands are probabilistically independent, the probability of A and B taken jointly equals the product of the probabilities of A and B taken individually. Symbolically, P(A&B) = P(A)xP(B). Given our logarithmic definition of information we therefore define the amount of information in an event E as I(E) =def -log2P(E). It then follows that P(A&B) = P(A)xP(B) if and only if I(A&B) = I(A)+I(B). Since in the example of Alice and Bob P(A) = P(B) = .000002, I(A) = I(B) = 19, and I(A&B) = I(A)+I(B) = 19 + 19 = 38. Thus the amount of information inherent in Alice and Bob jointly obtaining royal flushes is 38 bits.

Since lots of events are probabilistically independent, information measures exhibit lots of additivity. But since lots of events are also correlated, information measures exhibit lots of non-additivity as well. In the case of Alice and Bob, Alice being dealt a royal flush is probabilistically independent of Bob being dealt a royal flush, and so the amount of information in Alice and Bob both being dealt royal flushes equals the sum of the individual amounts of information. But consider now a different example. Alice and Bob together toss a coin five times. Alice observes the first four tosses but is distracted, and so misses the fifth toss. On the other hand, Bob misses the first toss, but observes the last four tosses. Let’s say the actual sequence of tosses is 11001 (1 = heads, 0 = tails). Thus Alice observes 1100* and Bob observes *1001. Let A denote the first observation, B the second. It follows that the amount of information in A&B is the amount of information in the completed sequence 11001, namely, 5 bits. On the other hand, the amount of information in A alone is the amount of information in the incomplete sequence 1100*, namely 4 bits. Similarly, the amount of information in B alone is the amount of information in the incomplete sequence *1001, also 4 bits. This time information doesn’t add up: 5 = I(A&B) _ I(A)+I(B) = 4+4 = 8.

Here A and B are correlated. Alice knows all but the last bit of information in the completed sequence 11001. Thus when Bob gives her the incomplete sequence *1001, all Alice really learns is the last bit in this sequence. Similarly, Bob knows all but the first bit of information in the completed sequence 11001. Thus when Alice gives him the incomplete sequence 1100*, all Bob really learns is the first bit in this sequence. What appears to be four bits of information actually ends up being only one bit of information once Alice and Bob factor in the prior information they possess about the completed sequence 11001. If we introduce the idea of conditional information, this is just to say that 5 = I(A&B) = I(A)+I(B|A) = 4+1. I(B|A), the conditional information of B given A, is the amount of information in Bob’s observation once Alice’s observation is taken into account. And this, as we just saw, is 1 bit.

I(B|A), like I(A&B), I(A), and I(B), can be represented as the negative logarithm to the base two of a probability, only this time the probability under the logarithm is a conditional as opposed to an unconditional probability. By definition I(B|A) =def -log2P(B|A), where P(B|A) is the conditional probability of B given A. But since P(B|A) =def P(A&B)/P(A), and since the logarithm of a quotient is the difference of the logarithms, log2P(B|A) = log2P(A&B) – log2P(A), and so -log2P(B|A) = -log2P(A&B) + log2P(A), which is just I(B|A) = I(A&B) – I(A). This last equation is equivalent to

(*) I(A&B) = I(A)+I(B|A)

Formula (*) holds with full generality, reducing to I(A&B) = I(A)+I(B) when A and B are probabilistically independent (in which case P(B|A) = P(B) and thus I(B|A) = I(B)).

Formula (*) asserts that the information in both A and B jointly is the information in A plus the information in B that is not in A. Its point, therefore, is to spell out how much additional information B contributes to A. As such, this formula places tight constraints on the generation of new information. Does, for instance, a computer program, call it A, by outputting some data, call the data B, generate new information? Computer programs are fully deterministic, and so B is fully determined by A. It follows that P(B|A) = 1, and thus I(B|A) = 0 (the logarithm of 1 is always 0). From Formula (*) it therefore follows that I(A&B) = I(A), and therefore that the amount of information in A and B jointly is no more than the amount of information in A by itself.

For an example in the same spirit consider that there is no more information in two copies of Shakespeare’s Hamlet than in a single copy. This is of course patently obvious, and any formal account of information had better agree. To see that our formal account does indeed agree, let A denote the printing of the first copy of Hamlet, and B the printing of the second copy. Once A is given, B is entirely determined. Indeed, the correlation between A and B is perfect. Probabilistically this is expressed by saying the conditional probability of B given A is 1, namely, P(B|A) = 1. In information-theoretic terms this is to say that I(B|A) = 0. As a result I(B|A) drops out of Formula (*), and so I(A&B) = I(A). Our information-theoretic formalism therefore agrees with our intuition that two copies of Hamlet contain no more information than a single copy.

Information is a complexity-theoretic notion. Indeed, as a purely formal object, the information measure described here is a complexity measure (cf. Dembski, 1998, ch. 4). Complexity measures arise whenever we assign numbers to degrees of complication. A set of possibilities will often admit varying degrees of complication, ranging from extremely simple to extremely complicated. Complexity measures assign non-negative numbers to these possibilities so that 0 corresponds to the most simple and _ to the most complicated. For instance, computational complexity is always measured in terms of either time (i.e., number of computational steps) or space (i.e., size of memory, usually measured in bits or bytes) or some combination of the two. The more difficult a computational problem, the more time and space are required to run the algorithm that solves the problem. For information measures, degree of complication is measured in bits. Given an event A of probability P(A), I(A) = -log2P(A) measures the number of bits associated with the probability P(A). We therefore speak of the “complexity of information” and say that the complexity of information increases as I(A) increases (or, correspondingly, as P(A) decreases). We also speak of “simple” and “complex” information according to whether I(A) signifies few or many bits of information. This notion of complexity is important to biology since not just the origin of information stands in question, but the origin of complex information.

3. Complex Specified Information

Given a means of measuring information and determining its complexity, we turn now to the distinction between specified and unspecified information. This is a vast topic whose full elucidation is beyond the scope of this paper (the details can be found in my monograph The Design Inference). Nonetheless, in what follows I shall try to make this distinction intelligible, and offer some hints on how to make it rigorous. For an intuitive grasp of the difference between specified and unspecified information, consider the following example. Suppose an archer stands 50 meters from a large blank wall with bow and arrow in hand. The wall, let us say, is sufficiently large that the archer cannot help but hit it. Consider now two alternative scenarios. In the first scenario the archer simply shoots at the wall. In the second scenario the archer first paints a target on the wall, and then shoots at the wall, squarely hitting the target’s bull’s-eye. Let us suppose that in both scenarios where the arrow lands is identical. In both scenarios the arrow might have landed anywhere on the wall. What’s more, any place where it might land is highly improbable. It follows that in both scenarios highly complex information is actualized. Yet the conclusions we draw from these scenarios are very different. In the first scenario we can conclude absolutely nothing about the archer’s ability as an archer, whereas in the second scenario we have evidence of the archer’s skill.

The obvious difference between the two scenarios is of course that in the first the information follows no pattern whereas in the second it does. Now the information that tends to interest us as rational inquirers generally, and scientists in particular, is not the actualization of arbitrary possibilities which correspond to no patterns, but rather the actualization of circumscribed possibilities which do correspond to patterns. There’s more. Patterned information, though a step in the right direction, still doesn’t quite get us specified information. The problem is that patterns can be concocted after the fact so that instead of helping elucidate information, the patterns are merely read off already actualized information.

To see this, consider a third scenario in which an archer shoots at a wall. As before, we suppose the archer stands 50 meters from a large blank wall with bow and arrow in hand, the wall being so large that the archer cannot help but hit it. And as in the first scenario, the archer shoots at the wall while it is still blank. But this time suppose that after having shot the arrow, and finding the arrow stuck in the wall, the archer paints a target around the arrow so that the arrow sticks squarely in the bull’s-eye. Let us suppose further that the precise place where the arrow lands in this scenario is identical with where it landed in the first two scenarios. Since any place where the arrow might land is highly improbable, in this as in the other scenarios highly complex information has been actualized. What’s more, since the information corresponds to a pattern, we can even say that in this third scenario highly complex patterned information has been actualized. Nevertheless, it would be wrong to say that highly complex specified information has been actualized. Of the three scenarios, only the information in the second scenario is specified. In that scenario, by first painting the target and then shooting the arrow, the pattern is given independently of the information. On the other hand, in this, the third scenario, by first shooting the arrow and then painting the target around it, the pattern is merely read off the information.

Specified information is always patterned information, but patterned information is not always specified information. For specified information not just any pattern will do. We therefore distinguish between the “good” patterns and the “bad” patterns. The “good” patterns will henceforth be called specifications. Specifications are the independently given patterns that are not simply read off information. By contrast, the “bad” patterns will be called fabrications. Fabrications are the post hoc patterns that are simply read off already existing information.

Unlike specifications, fabrications are wholly unenlightening. We are no better off with a fabrication than without one. This is clear from comparing the first and third scenarios. Whether an arrow lands on a blank wall and the wall stays blank (as in the first scenario), or an arrow lands on a blank wall and a target is then painted around the arrow (as in the third scenario), any conclusions we draw about the arrow’s flight remain the same. In either case chance is as good an explanation as any for the arrow’s flight. The fact that the target in the third scenario constitutes a pattern makes no difference since the pattern is constructed entirely in response to where the arrow lands. Only when the pattern is given independently of the arrow’s flight does a hypothesis other than chance come into play. Thus only in the second scenario does it make sense to ask whether we are dealing with a skilled archer. Only in the second scenario does the pattern constitute a specification. In the third scenario the pattern constitutes a mere fabrication.

The distinction between specified and unspecified information may now be defined as follows: the actualization of a possibility (i.e., information) is specified if independently of the possibility’s actualization, the possibility is identifiable by means of a pattern. If not, then the information is unspecified. Note that this definition implies an asymmetry between specified and unspecified information: specified information cannot become unspecified information, though unspecified information may become specified information. Unspecified information need not remain unspecified, but can become specified as our background knowledge increases. For instance, a cryptographic transmission whose cryptosystem we have yet to break will constitute unspecified information. Yet as soon as we break the cryptosystem, the cryptographic transmission becomes specified information.

What is it for a possibility to be identifiable by means of an independently given pattern? A full exposition of specification requires a detailed answer to this question. Unfortunately, such an exposition is beyond the scope of this paper. The key conceptual difficulty here is to characterize the independence condition between patterns and information. This independence condition breaks into two subsidiary conditions: (1) a condition to stochastic conditional independence between the information in question and certain relevant background knowledge; and (2) a tractability condition whereby the pattern in question can be constructed from the aforementioned background knowledge. Although these conditions make good intuitive sense, they are not easily formalized. For the details refer to my monograph The Design Inference.

If formalizing what it means for a pattern to be given independently of a possibility is difficult, determining in practice whether a pattern is given independently of a possibility is much easier. If the pattern is given prior to the possibility being actualized–as in the second scenario above where the target was painted before the arrow was shot–then the pattern is automatically independent of the possibility, and we are dealing with specified information. Patterns given prior to the actualization of a possibility are just the rejection regions of statistics. There is a well-established statistical theory that describes such patterns and their use in probabilistic reasoning. These are clearly specifications since having been given prior to the actualization of some possibility, they have already been identified, and thus are identifiable independently of the possibility being actualized (cf. Hacking, 1965).

Many of the interesting cases of specified information, however, are those in which the pattern is given after a possibility has been actualized. This is certainly the case with the origin of life: life originates first and only afterwards do pattern-forming rational agents (like ourselves) enter the scene. It remains the case, however, that a pattern corresponding to a possibility, though formulated after the possibility has been actualized, can constitute a specification. Certainly this was not the case in the third scenario above where the target was painted around the arrow only after it hit the wall. But consider the following example. Alice and Bob are celebrating their fiftieth wedding anniversary. Their six children all show up bearing gifts. Each gift is part of a matching set of china. There is no duplication of gifts, and together the gifts constitute a complete set of china. Suppose Alice and Bob were satisfied with their old set of china, and had no inkling prior to opening their gifts that they might expect a new set of china. Alice and Bob are therefore without a relevant pattern whither to refer their gifts prior to actually receiving the gifts from their children. Nevertheless, the pattern they explicitly formulate only after receiving the gifts could be formed independently of receiving the gifts–indeed, we all know about matching sets of china and how to distinguish them from unmatched sets. This pattern therefore constitutes a specification. What’s more, there is an obvious inference connected with this specification: Alice and Bob’s children were in collusion, and did not present their gifts as random acts of kindness.

But what about the origin of life? Is life specified? If so, to what patterns does life correspond, and how are these patterns given independently of life’s origin? Obviously, pattern-forming rational agents like ourselves don’t enter the scene till after life originates. Nonetheless, there are functional patterns to which life corresponds, and which are given independently of the actual living systems. An organism is a functional system comprising many functional subsystems. The functionality of organisms can be cashed out in any number of ways. Arno Wouters (1995) cashes it out globally in terms of viability of whole organisms. Michael Behe (1996) cashes it out in terms of the irreducible complexity and minimal function of biochemical systems. Even the staunch Darwinist Richard Dawkins will admit that life is specified functionally, cashing out the functionality of organisms in terms of reproduction of genes. Thus Dawkins (1987, p. 9) will write: “Complicated things have some quality, specifiable in advance, that is highly unlikely to have been acquired by random chance alone. In the case of living things, the quality that is specified in advance is . . . the ability to propagate genes in reproduction.”

Information can be specified. Information can be complex. Information can be both complex and specified. Information that is both complex and specified I call “complex specified information,” or CSI for short. CSI is what all the fuss over information has been about in recent years, not just in biology, but in science generally. It is CSI that for Manfred Eigen constitutes the great mystery of biology, and one he hopes eventually to unravel in terms of algorithms and natural laws. It is CSI that for cosmologists underlies the fine-tuning of the universe, and which the various anthropic principles attempt to understand (cf. Barrow and Tipler, 1986). It is CSI that David Bohm’s quantum potentials are extracting when they scour the microworld for what Bohm calls “active information” (cf. Bohm, 1993, pp. 35-38). It is CSI that enables Maxwell’s demon to outsmart a thermodynamic system tending towards thermal equilibrium (cf. Landauer, 1991, p. 26). It is CSI on which David Chalmers hopes to base a comprehensive theory of human consciousness (cf. Chalmers, 1996, ch. 8). It is CSI that within the Kolmogorov-Chaitin theory of algorithmic information takes the form of highly compressible, non-random strings of digits (cf. Kolmogorov, 1965; Chaitin, 1966).

Nor is CSI confined to science. CSI is indispensable in our everyday lives. The 16-digit number on your VISA card is an example of CSI. The complexity of this number ensures that a would-be thief cannot randomly pick a number and have it turn out to be a valid VISA card number. What’s more, the specification of this number ensures that it is your number, and not anyone else’s. Even your phone number constitutes CSI. As with the VISA card number, the complexity ensures that this number won’t be dialed randomly (at least not too often), and the specification ensures that this number is yours and yours only. All the numbers on our bills, credit slips, and purchase orders represent CSI. CSI makes the world go round. It follows that CSI is a rife field for criminality. CSI is what motivated the greedy Michael Douglas character in the movie Wall Street to lie, cheat, and steal. CSI’s total and absolute control was the objective of the monomaniacal Ben Kingsley character in the movie Sneakers. CSI is the artifact of interest in most techno-thrillers. Ours is an information age, and the information that captivates us is CSI.

4. Intelligent Design

Whence the origin of complex specified information? In this section I shall argue that intelligent causation, or equivalently design, accounts for the origin of complex specified information. My argument focuses on the nature of intelligent causation, and specifically, on what it is about intelligent causes that makes them detectable. To see why CSI is a reliable indicator of design, we need to examine the nature of intelligent causation. The principal characteristic of intelligent causation is directed contingency, or what we call choice. Whenever an intelligent cause acts, it chooses from a range of competing possibilities. This is true not just of humans, but of animals as well as extra-terrestrial intelligences. A rat navigating a maze must choose whether to go right or left at various points in the maze. When SETI (Search for Extra-Terrestrial Intelligence) researchers attempt to discover intelligence in the extra-terrestrial radio transmissions they are monitoring, they assume an extra-terrestrial intelligence could have chosen any number of possible radio transmissions, and then attempt to match the transmissions they observe with certain patterns as opposed to others (patterns that presumably are markers of intelligence). Whenever a human being utters meaningful speech, a choice is made from a range of possible sound-combinations that might have been uttered. Intelligent causation always entails discrimination, choosing certain things, ruling out others.

Given this characterization of intelligent causes, the crucial question is how to recognize their operation. Intelligent causes act by making a choice. How then do we recognize that an intelligent cause has made a choice? A bottle of ink spills accidentally onto a sheet of paper; someone takes a fountain pen and writes a message on a sheet of paper. In both instances ink is applied to paper. In both instances one among an almost infinite set of possibilities is realized. In both instances a contingency is actualized and others are ruled out. Yet in one instance we infer design, in the other chance. What is the relevant difference? Not only do we need to observe that a contingency was actualized, but we ourselves need also to be able to specify that contingency. The contingency must conform to an independently given pattern, and we must be able independently to formulate that pattern. A random ink blot is unspecifiable; a message written with ink on paper is specifiable. Wittgenstein (1980, p. 1e) made the same point as follows: “We tend to take the speech of a Chinese for inarticulate gurgling. Someone who understands Chinese will recognize language in what he hears. Similarly I often cannot discern the humanity in man.”

In hearing a Chinese utterance, someone who understands Chinese not only recognizes that one from a range of all possible utterances was actualized, but is also able to specify the utterance as coherent Chinese speech. Contrast this with someone who does not understand Chinese. In hearing a Chinese utterance, someone who does not understand Chinese also recognizes that one from a range of possible utterances was actualized, but this time, because lacking the ability to understand Chinese, is unable to specify the utterance as coherent speech. To someone who does not understand Chinese, the utterance will appear gibberish. Gibberish–the utterance of nonsense syllables uninterpretable within any natural language–always actualizes one utterance from the range of possible utterances. Nevertheless, gibberish, by corresponding to nothing we can understand in any language, also cannot be specified. As a result, gibberish is never taken for intelligent communication, but always for what Wittgenstein calls “inarticulate gurgling.”

The actualization of one among several competing possibilities, the exclusion of the rest, and the specification of the possibility that was actualized encapsulates how we recognize intelligent causes, or equivalently, how we detect design. Actualization-Exclusion-Specification, this triad constitutes a general criterion for detecting intelligence, be it animal, human, or extra-terrestrial. Actualization establishes that the possibility in question is the one that actually occurred. Exclusion establishes that there was genuine contingency (i.e., that there were other live possibilities, and that these were ruled out). Specification establishes that the actualized possibility conforms to a pattern given independently of its actualization.

Now where does choice, which we’ve cited as the principal characteristic of intelligent causation, figure into this criterion? The problem is that we never witness choice directly. Instead, we witness actualizations of contingency which might be the result of choice (i.e., directed contingency), but which also might be the result of chance (i.e., blind contingency). Now there is only one way to tell the difference–specification. Specification is the only means available to us for distinguishing choice from chance, directed contingency from blind contingency. Actualization and exclusion together guarantee we are dealing with contingency. Specification guarantees we are dealing with a directed contingency. The Actualization-Exclusion-Specification triad is therefore precisely what we need to identify choice and therewith intelligent causation.

Psychologists who study animal learning and behavior have known of the Actualization-Exclusion-Specification triad all along, albeit implicitly. For these psychologists–known as learning theorists–learning is discrimination (cf. Mazur, 1990; Schwartz, 1984). To learn a task an animal must acquire the ability to actualize behaviors suitable for the task as well as the ability to exclude behaviors unsuitable for the task. Moreover, for a psychologist to recognize that an animal has learned a task, it is necessary not only to observe the animal making the appropriate behavior, but also to specify this behavior. Thus to recognize whether a rat has successfully learned how to traverse a maze, a psychologist must first specify the sequence of right and left turns that conducts the rat out of the maze. No doubt, a rat randomly wandering a maze also discriminates a sequence of right and left turns. But by randomly wandering the maze, the rat gives no indication that it can discriminate the appropriate sequence of right and left turns for exiting the maze. Consequently, the psychologist studying the rat will have no reason to think the rat has learned how to traverse the maze. Only if the rat executes the sequence of right and left turns specified by the psychologist will the psychologist recognize that the rat has learned how to traverse the maze. Now it is precisely the learned behaviors we regard as intelligent in animals. Hence it is no surprise that the same scheme for recognizing animal learning recurs for recognizing intelligent causes generally, to wit, actualization, exclusion, and specification.

Now this general scheme for recognizing intelligent causes coincides precisely with how we recognize complex specified information: First, the basic precondition for information to exist must hold, namely, contingency. Thus one must establish that any one of a multiplicity of distinct possibilities might obtain. Next, one must establish that the possibility which was actualized after the others were excluded was also specified. So far the match between this general scheme for recognizing intelligent causation and how we recognize complex specified information is exact. Only one loose end remains–complexity. Although complexity is essential to CSI (corresponding to the first letter of the acronym), its role in this general scheme for recognizing intelligent causation is not immediately evident. In this scheme one among several competing possibilities is actualized, the rest are excluded, and the possibility which was actualized is specified. Where in this scheme does complexity figure in?

The answer is that it is there implicitly. To see this, consider again a rat traversing a maze, but now take a very simple maze in which two right turns conduct the rat out of the maze. How will a psychologist studying the rat determine whether it has learned to exit the maze. Just putting the rat in the maze will not be enough. Because the maze is so simple, the rat could by chance just happen to take two right turns, and thereby exit the maze. The psychologist will therefore be uncertain whether the rat actually learned to exit this maze, or whether the rat just got lucky. But contrast this now with a complicated maze in which a rat must take just the right sequence of left and right turns to exit the maze. Suppose the rat must take one hundred appropriate right and left turns, and that any mistake will prevent the rat from exiting the maze. A psychologist who sees the rat take no erroneous turns and in short order exit the maze will be convinced that the rat has indeed learned how to exit the maze, and that this was not dumb luck. With the simple maze there is a substantial probability that the rat will exit the maze by chance; with the complicated maze this is exceedingly improbable. The role of complexity in detecting design is now clear since improbability is precisely what we mean by complexity (cf. section 2).

This argument for showing that CSI is a reliable indicator of design may now be summarized as follows: CSI is a reliable indicator of design because its recognition coincides with how we recognize intelligent causation generally. In general, to recognize intelligent causation we must establish that one from a range of competing possibilities was actualized, determine which possibilities were excluded, and then specify the possibility that was actualized. What’s more, the competing possibilities that were excluded must be live possibilities, sufficiently numerous so that specifying the possibility that was actualized cannot be attributed to chance. In terms of probability, this means that the possibility that was specified is highly improbable. In terms of complexity, this means that the possibility that was specified is highly complex. All the elements in the general scheme for recognizing intelligent causation (i.e., Actualization-Exclusion-Specification) find their counterpart in complex specified information–CSI. CSI pinpoints what we need to be looking for when we detect design.

As a postscript, I call the reader’s attention to the etymology of the word “intelligent.” The word “intelligent” derives from two Latin words, the preposition inter, meaning between, and the verb lego, meaning to choose or select. Thus according to its etymology, intelligence consists in choosing between. It follows that the etymology of the word “intelligent” parallels the formal analysis of intelligent causation just given. “Intelligent design” is therefore a thoroughly apt phrase, signifying that design is inferred precisely because an intelligent cause has done what only an intelligent cause can do–make a choice.

5. The Law of the Conservation of Information

Evolutionary biology has steadfastly resisted attributing CSI to intelligent causation. Although Manfred Eigen recognizes that the central problem of evolutionary biology is the origin of CSI, he has no thought of attributing CSI to intelligent causation. According to Eigen natural causes are adequate to explain the origin of CSI. The only question for Eigen is which natural causes explain the origin of CSI. The logically prior question of whether natural causes are even in-principle capable of explaining the origin of CSI he ignores. And yet it is a question that undermines Eigen’s entire project. Natural causes are in-principle incapable of explaining the origin of CSI. To be sure, natural causes can explain the flow of CSI, being ideally suited for transmitting already existing CSI. What natural causes cannot do, however, is originate CSI. This strong proscriptive claim, that natural causes can only transmit CSI but never originate it, I call the Law of Conservation of Information. It is this law that gives definite scientific content to the claim that CSI is intelligently caused. The aim of this last section is briefly to sketch the Law of Conservation of Information (a full treatment will be given in Uncommon Descent, a book I am jointly authoring with Stephen Meyer and Paul Nelson).

To see that natural causes cannot account for CSI is straightforward. Natural causes comprise chance and necessity (cf. Jacques Monod’s book by that title). Because information presupposes contingency, necessity is by definition incapable of producing information, much less complex specified information. For there to be information there must be a multiplicity of live possibilities, one of which is actualized, and the rest of which are excluded. This is contingency. But if some outcome B is necessary given antecedent conditions A, then the probability of B given A is one, and the information in B given A is zero. If B is necessary given A, Formula (*) reduces to I(A&B) = I(A), which is to say that B contributes no new information to A. It follows that necessity is incapable of generating new information. Observe that what Eigen calls “algorithms” and “natural laws” fall under necessity.

Since information presupposes contingency, let us take a closer look at contingency. Contingency can assume only one of two forms. Either the contingency is a blind, purposeless contingency–which is chance; or it is a guided, purposeful contingency–which is intelligent causation. Since we already know that intelligent causation is capable of generating CSI (cf. section 4), let us next consider whether chance might also be capable of generating CSI. First notice that pure chance, entirely unsupplemented and left to its own devices, is incapable of generating CSI. Chance can generate complex unspecified information, and chance can generate non-complex specified information. What chance cannot generate is information that is jointly complex and specified.

Biologists by and large do not dispute this claim. Most agree that pure chance–what Hume called the Epicurean hypothesis–does not adequately explain CSI. Jacques Monod (1972) is one of the few exceptions, arguing that the origin of life, though vastly improbable, can nonetheless be attributed to chance because of a selection effect. Just as the winner of a lottery is shocked at winning, so we are shocked to have evolved. But the lottery was bound to have a winner, and so too something was bound to have evolved. Something vastly improbable was bound to happen, and so, the fact that it happened to us (i.e., that we were selected–hence the name selection effect) does not preclude chance. This is Monod’s argument and it is fallacious. It fails utterly to come to grips with specification. Moreover, it confuses a necessary condition for life’s existence with its explanation. Monod’s argument has been refuted by the philosophers John Leslie (1989), John Earman (1987), and Richard Swinburne (1979). It has also been refuted by the biologists Francis Crick (1981, ch. 7), Bernd-Olaf Küppers (1990, ch. 6), and Hubert Yockey (1992, ch. 9). Selection effects do nothing to render chance an adequate explanation of CSI.

Most biologists therefore reject pure chance as an adequate explanation of CSI. The problem here is not simply one of faulty statistical reasoning. Pure chance is also scientifically unsatisfying as an explanation of CSI. To explain CSI in terms of pure chance is no more instructive than pleading ignorance or proclaiming CSI a mystery. It is one thing to explain the occurrence of heads on a single coin toss by appealing to chance. It is quite another, as Küppers (1990, p. 59) points out, to follow Monod and take the view that “the specific sequence of the nucleotides in the DNA molecule of the first organism came about by a purely random process in the early history of the earth.” CSI cries out for explanation, and pure chance won’t do. As Richard Dawkins (1987, p. 139) correctly notes, “We can accept a certain amount of luck in our [scientific] explanations, but not too much.”

If chance and necessity left to themselves cannot generate CSI, is it possible that chance and necessity working together might generate CSI? The answer is No. Whenever chance and necessity work together, the respective contributions of chance and necessity can be arranged sequentially. But by arranging the respective contributions of chance and necessity sequentially, it becomes clear that at no point in the sequence is CSI generated. Consider the case of trial-and-error (trial corresponds to necessity and error to chance). Once considered a crude method of problem solving, trial-and-error has so risen in the estimation of scientists that it is now regarded as the ultimate source of wisdom and creativity in nature. The probabilistic algorithms of computer science (e.g., genetic algorithms–see Forrest, 1993) all depend on trial-and-error. So too, the Darwinian mechanism of mutation and natural selection is a trial-and-error combination in which mutation supplies the error and selection the trial. An error is committed after which a trial is made. But at no point is CSI generated.

Natural causes are therefore incapable of generating CSI. This broad conclusion I call the Law of Conservation of Information, or LCI for short. LCI has profound implications for science. Among its corollaries are the following: (1) The CSI in a closed system of natural causes remains constant or decreases. (2) CSI cannot be generated spontaneously, originate endogenously, or organize itself (as these terms are used in origins-of-life research). (3) The CSI in a closed system of natural causes either has been in the system eternally or was at some point added exogenously (implying that the system though now closed was not always closed). (4) In particular, any closed system of natural causes that is also of finite duration received whatever CSI it contains before it became a closed system.

This last corollary is especially pertinent to the nature of science for it shows that scientific explanation is not coextensive with reductive explanation. Richard Dawkins, Daniel Dennett, and many scientists are convinced that proper scientific explanations must be reductive, moving from the complex to the simple. Thus Dawkins (1987, p. 316) will write, “The one thing that makes evolution such a neat theory is that it explains how organized complexity can arise out of primeval simplicity.” Thus Dennett (1995, p. 153) will view any scientific explanation that moves from simple to complex as “question-begging.” Thus Dawkins (1987, p. 13) will explicitly equate proper scientific explanation with what he calls “hierarchical reductionism,” according to which “a complex entity at any particular level in the hierarchy of organization” must properly be explained “in terms of entities only one level down the hierarchy.” While no one will deny that reductive explanation is extremely effective within science, it is hardly the only type of explanation available to science. The divide-and-conquer mode of analysis behind reductive explanation has strictly limited applicability within science. In particular, this mode of analysis is utterly incapable of making headway with CSI. CSI demands an intelligent cause. Natural causes will not do.

William A. Dembski, presented at Naturalism, Theism and the Scientific Enterprise: An Interdisciplinary Conference at the University of Texas, Feb. 20-23, 1997.

References

Barrow, John D. and Frank J. Tipler. 1986. The Anthropic Cosmological Principle. Oxford: Oxford University Press.
Behe, Michael. 1996. Darwin’s Black Box: The Biochemical Challenge to Evolution. New York: The Free Press.
Bohm, David. 1993. The Undivided Universe: An Ontological Interpretation of Quantum Theory. London: Routledge.
Chaitin, Gregory J. 1966. On the Length of Programs for Computing Finite Binary Sequences. Journal of the ACM, 13:547-569.
Chalmers, David J. 1996. The Conscious Mind: In Search of a Fundamental Theory. New York : Oxford University Press.
Crick, Francis. 1981. Life Itself: Its Origin and Nature. New York: Simon and Schuster.
Dawkins, Richard. 1987. The Blind Watchmaker. New York: Norton.
Dembski, William A. 1998. The Design Inference: Eliminating Chance through Small Probabilities. Forthcoming, Cambridge University Press.
Dennett, Daniel C. 1995. Darwin’s Dangerous Idea: Evolution and the Meanings of Life. New York: Simon & Schuster.
Devlin, Keith J. 1991. Logic and Information. New York: Cambridge University Press.
Dretske, Fred I. 1981. Knowledge and the Flow of Information. Cambridge, Mass.: MIT Press.
Earman, John. 1987. The Sap Also Rises: A Critical Examination of the Anthropic Principle. American Philosophical Quarterly, 24(4): 307&SHY;317.
Eigen, Manfred. 1992. Steps Towards Life: A Perspective on Evolution, translated by Paul Woolley. Oxford: Oxford University Press.
Forrest, Stephanie. 1993. Genetic Algorithms: Principles of Natural Selection Applied to Computation. Science, 261:872-878.
Hacking, Ian. 1965. Logic of Statistical Inference. Cambridge: Cambridge University Press.
Hamming, R. W. 1986. Coding and Information Theory, 2nd edition. Englewood Cliffs, N. J.: Prentice-Hall.
Kolmogorov, Andrei N. 1965. Three Approaches to the Quantitative Definition of Information. Problemy Peredachi Informatsii (in translation), 1(1): 3-11.
Küppers, Bernd-Olaf. 1990. Information and the Origin of Life. Cambridge, Mass.: MIT Press.
Landauer, Rolf. 1991. Information is Physical. Physics Today, May: 23&SHY;29.
Leslie, John. 1989. Universes. London: Routledge.
Mazur, James. E. 1990. Learning and Behavior, 2nd edition. Englewood Cliffs, N.J.: Prentice Hall.
Monod, Jacques. 1972. Chance and Necessity. New York: Vintage.
Schwartz, Barry. 1984. Psychology of Learning and Behavior, 2nd edition. New York: Norton.
Shannon, Claude E. and W. Weaver. 1949. The Mathematical Theory of Communication. Urbana, Ill.: University of Illinois Press.
Stalnaker, Robert. 1984. Inquiry. Cambridge, Mass.: MIT Press.
Swinburne, Richard. 1979. The Existence of God. Oxford: Oxford University Press.
Wittgenstein, Ludwig. 1980. Culture and Value, edited by G. H. von Wright, translated by P. Winch. Chicago: University of Chicago Press.
Wouters, Arno. 1995. Viability Explanation. Biology and Philosophy, 10:435-457.
Yockey, Hubert P. 1992. Information Theory and Molecular Biology. Cambridge: Cambridge University Press.