Social Choice and Trees

(I’m going to make a point about the relevance of social choice impossibility results to the drawing of phylogenetic trees, but it’ll take a while to get there.)
I just finished reading Richard Dawkins’ fascinating book The Ancestor’s Tale, which discusses all of life from the perspective of evolutionary biology. In particular, it traces back our own ancestry, marking each point at which the ancestors of other living species diverged from our own. There’s all sorts of fascinating things to learn, both about the history of actual evolution, and the general principles of how these sorts of things tend to work.

However, there’s one interesting point he makes several times, related to his central point in The Selfish Gene, which is a challenge to the entire framework of the book. If the relevant unit of selection is often the gene, rather than the individual, then all the phylogenetic trees he draws might not make any sense. You might think that since genes are passed down through individuals, then the gene tree should be basically the same as the individual tree. But because sexual reproduction means that any given individual only inherits half the genes from any parent, and we’ve got only about 30,000 genes (well under a million at any rate), this means that once you go back about 20 generations, most of those ancestors will have contributed no genes at all to your heredity. Thus, even if we share a common ancestor that far back (or even more recently – apparently within the last 5000 years according to Dawkins), our genes might tell a different story about relationships. We might be closely related ancestrally-speaking, but relatively distant in terms of some or all of the genes we share.

An obvious example is with the sex chromosomes.  All of us with a Y chromosome are in that one respect at least more closely related to male chimpanzees (and gorillas, and monkeys) than we are to female humans. Even ignoring the sex-linked cases, it turns out that the ABO blood-type genes have existed as a stable mix in the population for tens of millions of years. Thus, anyone with two A genes is in that sense more closely related to a gibbon with two A genes than they are to any human with blood type B or O. But it turns out that this is the case not just in the case of stable mixes, but also with genes where one allele eventually takes over just by drift.

The majority of both molecular and morphological characteristics show chimps as our closest relatives. But a sizeable minority show that gorillas are instead, or that chimps are most closely related to gorillas and both are equally close to humans.

This should not surprise us.  Different genes are inherited through different routes. The population ancestral to all three species will have been diverse – each gene having many different lineages. It is quite possible for a gene in humans and gorillas to be descended from one lineage, while in chimps it is descended from a more distantly related one. All that is needed is for anciently diverged genetic lineages to continue through to the chimp-human split so humans can descend from one and chimps from another.*

[*The longer the time between species splits (or the smaller the population size), the more ancestral lineages are lost by genetic drift. So tidy-minded taxonomists, who hope that species trees coincide with gene trees, will find it easier to deal with animals whose divergences are well spaced out in time, unlike African apes. But there are always genes [like the blood-type one] for which separate lineages are systematically maintained by natural selection over huge spans of time.]

So we have to admit that a single tree is not the whole story. Species trees can be drawn, but they must be considered a simplified summary of a multitude of gene trees. I can imagine interpreting a species tree in two different ways. The first is the conventional genealogical interpretation. One species is the closest relative of another if, out of all the species considered, it shares the most recent common genealogical ancestor. The second is, I suspect, the way of the future. A species tree can be seen as depicting the relationships among a democratic majority of the genome. It represents the results of a ‘majority vote’ among gene trees. (The Ancestor’s Tale, p. 136)

However, I worry that this “way of the future” might not end up making sense in every case. Although with humans, chimpanzees, and gorillas, it might be that a clear majority of genes favor the [HC]G tree over the H[CG] or C[HG] alternatives, there may well be lineages in which nothing so nice happens. At least since Arrow’s Impossibility Theorem, it’s been known that majority votes for rankings have some apparently unwanted features. It might be the case that the trees involved here give enough additional structure over and above rankings that the impossibility result doesn’t apply, but the fact that similar impossibility results have been shown for aggregating probability functions and causal graphs suggests to me that something similar can be found for trees.

Thus, there may be cases where even given complete genetic data about some group of species, there might not be any reasonable way to combine the conflicting trees drawn by different genes into a single tree representing the different species.

Of course, it’s also plausible that actual historical evolution has never left us with just such a situation, or at least that they’re exceedingly rare, but these are substantive hypotheses that I think Dawkins hasn’t considered. He admits that he’s not very up on some of the mathematical issues. (At one points he characterizes Bayesian inference as a special type of maximum-likelihood analysis!) I suspect that if these possibilities turn out to be the case, then he would use this as more support for taking a gene-based view of evolution over an individual-based or species-based view – but it would be unfortunate for the organizing conceit of this particular book, which I think is quite nice, and would make a nice structure for an introductory class on evolution, in high school or even elementary school.