Waiting Longer for Two Mutations
A roll-up of "Waiting Longer for Two Mutations", Parts 1-5 Original at Michael Behe's Amazon.com BlogAn interesting paper appeared recently in an issue of the journal Genetics, “Waiting for Two Mutations: With Applications to Regulatory Sequence Evolution and the Limits of Darwinian Evolution” (Durrett, R & Schmidt, D. 2008. Genetics 180: 1501-1509). As the title implies, it concerns the time one would have to wait for Darwinian processes to produce some helpful biological feature (here, regulatory sequences in DNA) if two mutations are required instead of just one. It is a theoretical paper, which uses models, math, and computer simulations to reach conclusions, rather than empirical data from field or lab experiments, as The Edge of Evolution does. The authors declare in the abstract of their manuscript that they aim “to expose flaws in some of Michael Behe’s arguments concerning mathematical limits to Darwinian evolution.” Unsurprisingly (bless their hearts), they pretty much do the exact opposite. Since the journal Genetics publishes letters to the editors (most journals don’t), I sent a reply to the journal. The original paper by Durrett and Schmidt can be found here, my response here, and their reply here.
In their paper (as I write in my reply) “They develop a population genetics model to estimate the waiting time for the occurrence of two mutations, one of which is premised to damage an existing transcription-factor-binding site, and the other of which creates a second, new binding site within the nearby region from a sequence that is already a near match with a binding site sequence (for example, 9 of 10 nucleotides already match).”
The most novel point of their model is that, under some conditions, the number of organisms needed to get two mutations is proportional not to the inverse of the square of the point mutation rate (as it would be if both mutations had to appear simultaneously in the same organism), but to the inverse of the point mutation rate times the square root of the point mutation rate (because the first mutation would spread in the population before the second appeared, increasing the odds of getting a double mutation). To see what that means, consider that the point mutation rate is roughly one in a hundred million (1 in 10^8). So if two specific mutations had to occur at once, that would be an event of likelihood about 1 in 10^16. On the other hand, under some conditions they modeled, the likelihood would be about 1 in 10^12, ten thousand times more likely than the first situation. Durrett and Schmidt (2008) compare the number they got in their model to my literature citation1 that the probability of the development of chloroquine resistance in the malarial parasite is an event of order 1 in 10^20, and they remark that it “is 5 million times larger than the calculation we have just given.” The implied conclusion is that I have greatly overstated the difficulty of getting two necessary mutations. Below I show that they are incorrect.
Serious problems
Interesting as it is, there are some pretty serious problems in the way they applied their model to my arguments, some of which they owned up to in their reply, and some of which they didn’t. When the problems are fixed, however, the resulting number is remarkably close to the empirical value of 1 in 10^20. I will go through the difficulties in turn.
The first problem was a simple oversight. They were modeling the mutation of a ten-nucleotide-long binding site for a regulatory protein in DNA, so they used a value for the mutation rate that was ten-times larger than the point mutation rate. However, in the chloroquine-resistance protein discussed in The Edge of Evolution, since particular amino acids have to be changed, the correct rate to use is the point mutation rate. That leads to an underestimate of a factor of about 30 in applying their model to the protein. As they wrote in their reply, “Behe is right on this point.” I appreciate their agreement here.
The second problem has to do with their choice of model. In their original paper they actually developed models for two situations — for when the first mutation is neutral, and for when it is deleterious. When they applied it to the chloroquine-resistance protein, they unfortunately decided to use the neutral model. However, it is very likely that the first protein mutation is deleterious. As I wrote discussing a hypothetical case in Chapter 6 of The Edge:
“Suppose, however, that the first mutation wasn’t a net plus; it was harmful. Only when both mutations occurred together was it beneficial. Then on average a person born with the mutation would leave fewer offspring than otherwise. The mutation would not increase in the population, and evolution would have to skip a step for it to take hold, because nature would need both necessary mutations at once…. The Darwinian magic works well only when intermediate steps are each better (‘more fit’) than preceding steps, so that the mutant gene increases in number in the population as natural selection favors the offspring of people who have it. Yet its usefulness quickly declines when intermediate steps are worse than earlier steps, and is pretty much worthless if several required intervening steps aren’t improvements.”
If the first mutation is indeed deleterious, then Durrett and Schmidt (2008) applied the wrong model to the chloroquine-resistance protein. In fact, if the parasite with the first mutation is only 10% as fit as the unmutated parasite, then the population-spreading effect they calculate for neutral mutations is pretty much eliminated, as their own model for deleterious mutations shows. What do the authors say in their response about this possibility? “We leave it to biologists to debate whether the first PfCRT mutation is that strongly deleterious.” In other words, they don’t know; it is outside their interest as mathematicians. (Again, I appreciate their candor in saying so.) Assuming that the first mutation is seriously deleterious, then their calculation is off by a factor of 10^4. In conjunction with the first mistake of 30-fold, their calculation so far is off by five-and-a-half orders of magnitude.
Making a String of Ones
The third problem also concerns the biology of the system. I’m at a bit of a loss here, because the problem is not hard to see, and yet in their reply they stoutly deny the mistake. In fact, they confidently assert it is I who am mistaken. I had written in my letter, ‘‘… their model is incomplete on its own terms because it does not take into account the probability of one of the nine matching nucleotides in the region that is envisioned to become the new transcription-factor-binding site mutating to an incorrect nucleotide before the 10th mismatched codon mutates to the correct one.’’ They retort, “This conclusion is simply wrong since it assumes that there is only one individual in the population with the first mutation.” That’s incorrect. Let me explain the problem in more detail.
Consider a string of ten digits, either 0 or 1. We start with a string that has nine 1’s, and just one 0. We want to convert the single 0 to a 1 without switching any of the 1’s to a zero. Suppose that the switch rate for each digit is one per hundred copies of the string. That is, we copy the string repeatedly, and, if we focus on a particular digit, about every hundredth copy or so that digit has changed. Okay, now cover all of the numbers of the string except the 0, and let a random, automated procedure copy the string, with a digit-mutation rate of one in a hundred. After, say, 79 copies, we see that the visible 0 has just changed to a 1. Now we uncover the rest of the digits. What is the likelihood that one of them has changed in the meantime? Since all the digits have the same mutation rate, then there is a nine in ten chance that one of the other digits has already changed from a 1 to a 0, and our mutated string still does not match the target of all 1’s. In fact, only about one time out of ten will we uncover the string and find that no other digits have changed except the visible digit. Thus the effective mutation rate for transforming the string with nine matches out of ten to a string with ten matches out of ten will be only one tenth of the basic digit-mutation rate. If the string is a hundred long, the effective mutation rate will be one-hundredth the basic rate, and so on. (This is very similar to the problem of mutating a duplicate gene to a new selectable function before it suffers a degradative mutation, which has been investigated by Lynch and co-workers.2
So, despite their self-assured tone, in fact on this point Durrett and Schmidt are “simply wrong.” And, as I write in my letter, since the gene for the chloroquine resistance protein has on the order of a thousand nucleotides, rather than just the ten of Durrett and Schmidt’s postulated regulatory sequence, the effective rate for the second mutation is several orders of magnitude less than they thought. Thus with the, say, two orders of magnitude mistake here, the factor of 30 error for the initial mutation rate, and the four orders of magnitude for mistakenly using a neutral model instead of a deleterious model, Durrett and Schmidt’s calculation is a cumulative seven and a half orders of magnitude off. Since they had pointed out that their calculation was about five million-fold (about six and a half orders of magnitude) lower than the empirical result I cited, when their errors are corrected the calculation agrees pretty well with the empirical data.
An Irrelevant Example
Now I’d like to turn to a couple of other points in Durrett and Schmidt’s reply which aren’t mistakes with their model, but which do reflect conceptual errors. As I quote above, they state in their reply, “This conclusion is simply wrong since it assumes that there is only one individual in the population with the first mutation.” I have shown above that, despite their assertion, my conclusion is right. But where do they get the idea that “it assumes that there is only one individual in the population with the first mutation”? I wrote no such thing in my letter about “one individual.” Furthermore, I “assumed” nothing. I merely cited empirical results from the literature. The figure of 1 in 10^20 is a citation from the literature on chloroquine resistance of malaria. Unlike their model, it is not a calculation on my part.
Right after this, in their reply Durrett and Schmidt say that the “mistake” I made is a common one, and they go on to illustrate “my” mistake with an example about a lottery winner. Yet their own example shows they are seriously confused about what is going on. They write:
“When Evelyn Adams won the New Jersey lottery on October 23, 1985, and again on February 13, 1986, newspapers quoted odds of 17.1 trillion to 1. That assumes that the winning person and the two lottery dates are specified in advance, but at any point in time there is a population of individuals who have won the lottery and have a chance to win again, and there are many possible pairs of dates on which this event can happen…. The probability that it happens in one lottery 1 year is ~1 in 200.”
No kidding. If one has millions of players, and any of the millions could win twice on any two dates, then the odds are certainly much better that somebody will win on some two dates then that Evelyn Adams win on October 23, 1985 and February 13, 1986. But that has absolutely nothing to do with the question of changing a correct nucleotide to an incorrect one before changing an incorrect one to a correct one, which is the context in which this odd digression appears. What’s more, it is not the type of situation that Durrett and Schmidt themselves modeled. They asked the question, given a particular ten-base-pair regulatory sequence, and a particular sequence that is matched in nine of ten sites to the regulatory sequence, how long will it take to mutate the particular regulatory sequence, destroying it, and then mutate the particular near-match sequence to a perfect-match sequence? What’s even more, it is not the situation that pertains in chloroquine resistance in malaria. There several particular amino acid residues in a particular protein (PfCRT) have to mutate to yield effective resistance. It seems to me that the lottery example must be a favorite of Durrett and Schmidt’s, and that they were determined to use it whether it fit the situation or not.
Multiplying Resources
The final conceptual error that Durrett and Schmidt commit is the gratuitous multiplication of probabilistic resources. In their original paper they calculated that the appearance of a particular double mutation in humans would have an expected time of appearance of 216 million years, if one were considering a one kilobase region of the genome. Since the evolution of humans from other primates took much less time than that, Durrett and Schmidt observed that if the DNA “neighborhood” were a thousand times larger, then lots of correct regulatory sites would already be expected to be there. But, then, exactly what is the model? And if the relevant neighborhood is much larger, why did they model a smaller neighborhood? Is there some biological fact they neglected to cite that justified the thousand-fold expansion of what constitutes a “neighborhood,” or were they just trying to squeeze their results post-hoc into what a priori was thought to be a reasonable time frame?
When I pointed this out in my letter, Durrett and Schmidt did not address the problem. Rather, they upped the stakes. They write in their reply, “there are at least 20,000 genes in the human genome and for each gene tens if not hundreds of pairs of mutations that can occur in each one.” The implication is that there are very, very many ways to get two mutations. Well, if that were indeed the case, why did they model a situation where two particular mutations — not just any two — were needed? Why didn’t they model the situation where any two mutations in any of 20,000 genes would suffice? In fact, since that would give a very much shorter time span, why did the journal Genetics and the reviewers of the paper let them get away with such a miscalculation?
The answer of course is that in almost any particular situation, almost all possible double mutations (and single mutations and triple mutations and so on) will be useless. Consider the chloroquine-resistance mutation in malaria. There are about 10^6 possible single amino acid mutations in malarial parasite proteins, and 10^12 possible double amino acid mutations (where the changes could be in any two proteins). Yet only a handful are known to be useful to the parasite in fending off the antibiotic, and only one is very effective — the multiple changes in PfCRT. It would be silly to think that just any two mutations would help. The vast majority are completely ineffective. Nonetheless, it is a common conceptual mistake to naively multiply postulated “helpful mutations” when the numbers initially show too few.
A Very Important Point
Here’s a final important point. Genetics is an excellent journal; its editors and reviewers are top notch; and Durrett and Schmidt themselves are fine researchers. Yet, as I show above, when simple mistakes in the application of their model to malaria are corrected, it agrees closely with empirical results reported from the field that I cited. This is very strong support that the central contention of The Edge of Evolution is correct: that it is an extremely difficult evolutionary task for multiple required mutations to occur through Darwinian means, especially if one of the mutations is deleterious. And, as I argue in the book, reasonable application of this point to the protein machinery of the cell makes it very unlikely that life developed through a Darwinian mechanism.
References
- White, N. J., 2004 Antimalarial drug resistance. J. Clin. Invest. 113: 1084–1092.
- Lynch, M. and Conery, J.S. 2000. The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155.