Are phylogenetic patterns the same in anthropology and biology?

The use of phylogenetic methods in anthropological fields such as archaeology, linguistics and stemmatology (involving what are often called “culture data”) is based on an analogy between human cultural evolution and biological evolution. We need to understand this analogy thoroughly, including how well anthropology data fit the model of a phylogenetic tree, as used in biology. I provide a direct comparison of anthropology datasets with both phenotype and genotype datasets from biology. The anthropology datasets fit the tree model approximately as well as do the genotype data, which is detectably worse than the fit of the phenotype data. This is true for datasets with <500 parsimony-informative characters, as well as for larger datasets. This implies that cross-cultural (horizontal) processes have been important in the evolution of cultural artifacts, as well as branching historical (vertical) processes, and thus a phylogenetic network will be a more appropriate model than a phylogenetic tree.


D
As noted by Nunn (2011), evolutionary anthropology involves studies such as assessing the taxonomic affinities of newly discovered hominins, reconstructing primate phylogeny, constructing language trees, evaluating the history of copying of written texts or oral tales, and the evolution of tools and other manufactured items. All of these involve the techniques associated with the comparative method, of which phylogenetics is one. We therefore need to quantitatively evaluate the extent to which phylogenetic techniques, as developed in biology, can be applied to anthropology.
The use of phylogenetic methods seems to be relatively unproblematic in the case of physical anthropology (i.e. studies of the origin and evolutionary development of humans as a species; Holliday 2003), although this particular field is concerned as much with population genetics as it is with species phylogenies (especially the emerging field of population genomics). However, the use of phylogenetic methods in other anthropological fields, such as archaeology, linguistics and stemmatology (involving what are often called "culture data"), is based on an analogy between human cultural evolution and biological evolution Pagel 2009;Howe & Windram 2011).
This analogy assumes that the patterns of historical change in anthropology and biology are similar enough that the analytical methods can be combined. It is not actually a requirement, but there is often the assumption that the underlying processes are also similar. Both anthropology and biology apparently involve an evolutionary process, in which the study objects form groups that change via modification of their intrinsic attributes, the attributes being transformed through time from ancestral to derived states (often called "innovations" in anthropology). However, this apparent similarity is basically a metaphor, because human culture is not a collection of biological objects. In Popperian terms, biology is part of the "world that consists of physical bodies" while culture and linguistics are part of the "world of the products of the human mind" (Geisler & List 2013).
If we are going to draw an analogy between anthropological studies and biological studies, and use this analogy to justify the use of certain analytical techniques, then we need to understand the analogy thoroughly. One important aspect of this analogy is how well anthropological data fit the model of a phylogenetic tree, as used in biology, and this is what I investigate here.

The tree model in anthropology
It has often been claimed that cross-cultural processes, such as trade and exchange, have been more important in the evolution of cultural ideas and artifacts (in the broadest sense) than have branching historical processes. That is, horizontal processes dominate rather than vertical processes, as has been traditionally assumed in phylogenetics (especially for eukaryotes). Under these circumstances, the data would fit a network model rather than a tree model, the latter being the basis of most phylogenetic analyses (Felsenstein 2004). If so, then the tree-based phylogenetic methods would be of limited practical use in anthropology.
As a quantitative contribution to the debate, Collard et al. (2006) compared 21 anthropological datasets with 21 biological datasets, to assess the extent to which they fit a phylogenetic tree model. They analyzed each dataset using maximum parsimony, and then calculated the ensemble retention index (RI), which ranges from 0 (none of the characters are compatible with the optimal tree) to 1 (all of the characters are compatible with that tree). Their conclusion was that the range of index values did not differ between the anthropological and biological datasets, and so the anthropological data fit a tree model at least as well as do the biological data.
Unfortunately, there are two problems with the datasets used by Collard et al., and these biases combine to invalidate their conclusion. First of all, among their datasets there is a clear relationship between the RI and the number of parsimony-informative characters, and the datasets with the greatest number of such characters were all biological. Second, the biological datasets included both phenotypic and genotypic characters, and there is no a priori reason to believe that the genotypic datasets are directly comparable to the anthropological ones.
These two problems are illustrated in Figure 1. Among the 42 datasets, the log-linear correlation between the RI and the number of parsimony-informative characters is r = -0.502 (p=0.019), indicating that 25% of the variation in the RI is accounted for solely by variation in the number of parsimony-informative characters. Unfortunately, all of the datasets with >200 parsimonyinformative characters were biological; and this imbalance has a marked influence on the comparison between the two sources of data. (Note that in counting the characters I have made no attempt to distinguish between datasets with multi-state characters and those where the data have been coded as binary. This could be done by re-coding all of the multi-state characters as binary or additive binary. Doing so would refine the analysis but not change the conclusions.) Fig. 1. Scatterplot of the relationship between the (ensemble) retention index and the number of parsimony-informative characters for 21 empirical datasets from biology and 21 datasets from anthropology. Open circles: phenotype data; closed circles: genotype data; open squares: culture data. The data are taken from Collard et al. (2006). Furthermore, 11 of the biological datasets consisted of phenotype data alone (morphology or behavior) and 10 of them consisted of genotype data alone (mtDNA sequences). The graph ( Fig.  1) shows that the genotype datasets have, in general, more parsimony-informative characters and a smaller RI than do the phenotype datasets. This makes inappropriate any comparison with the anthropological data (consisting of human behavior, and the morphology of cultural artifacts), Y due to bias. A Welch t-test (which accounts for the unequal variances of the two samples) comparing the RIs of the anthropological datasets to all of the biological datasets yields t = 0.426 (p=0.673). This inappropriate test apparently supports the conclusion of Collard et al., that the anthropological datasets fit a tree model just as well as do the biological datasets. However, a Welch t-test more appropriately comparing the anthropological datasets solely to the phenotype biological datasets yields t = 2.011 (p=0.063), which suggests that the anthropological datasets might fit a tree model less well than do the biological datasets.
Clearly, we need a revised comparison in order to validly test the original hypothesis proposed by Collard et al.

Revised comparison
My revised comparison consists principally of expanding the collection of phenotype biological datasets, in order to make a valid comparison of them to the anthropological datasets. However, I also increase the total sample size somewhat, and I also make a direct comparison between the genotype biological datasets and some of the anthropological datasets.
The extra biological phenotype datasets were obtained by searching the TreeBASE 2.0 database (Sanderson, et al. 1994) for studies with a title containing the words "morphology" or "behavior" / "behaviour". All 20 studies retrieved for the Vertebrata and Insecta that actually contained some non-genotype data were used (Table 1). (Note that the taxonomic restriction follows the choice of datasets of Collard et al.) All of these datasets have <500 parsimonyinformative characters. Five genotype datasets with >1000 parsimony-informative characters were also used, taken from previous comparative studies that I have conducted (Table 1).
Extra anthropological datasets (Table 1) were also taken from the literature (plus an unpublished one of my own). Four of these datasets match the types of study chosen by Collard et al., which were mainly from archaeology and physical anthropology. However, four studies were also specifically chosen from stemmatology, along with five studies from linguistics.
For each dataset, the search for the maximum-parsimony tree was conducted using the parsimony ratchet (Nixon 1999) with 200 replicates. These searches, and all statistical calculations, were conducted using PAUP* 4.0b10 (Swofford 1998), and the ratchet commands were produced with PAUPrat (Sikes & Lewis 2001). The data are available in the Appendices (Supplementary Online Material).
The new data for the main comparison are illustrated in Figure 2a, with 31 datasets for both the phenotype biological data and the anthropological data, containing <500 parsimony-informative characters. For these data there is no correlation between the character number and the RI (r = -0.164, p=0.203). More to the point, there is a large difference in RI between the anthropological and biological datasets (Welch t-test: t = 4.502, p<0.0001), with the anthropological data fitting a tree model much less well than do the biological data.
This leads to the issue of whether genotype biological data can validly be compared to anthropology data. I discuss the theory in the next section, but one practical problem is that genotype data often have many more characters than do anthropology data. There are, however, two anthropological fields in which thousands of characters can be collected: comparative linguistics and stemmatology. It is therefore at least possible to make a direct comparison with genotype data. I will briefly explain why these fields can produce such large datasets. First, comparative linguistics is the study of the relationships between languages. Typically, lists of word meanings (a few hundred meanings) are recorded for the various languages being studied, and these lists are turned into cognate sets of putatively homologous words (a few thousand sets), which are then coded as a binary matrix (i.e. the multi-state data are re-coded as binary). This can lead to very large datasets; for example, Gray et al. (2009) produced a dataset with 400 languages and >34,000 cognate sets. Second, stemmatology is the study of the relationships between different versions of the same written text. The putatively homologous parts of the study texts are aligned and encoded as a string of characters, usually with each character corresponding to a word. The length of the alignment depends on the length of the text (hundreds to thousands of words). This can also lead to very large datasets; for example, Windram et al. (2008) produced a dataset with 21 texts and 16,548 characters.
Here, I have used 13 genotype datasets, from both the mitochondrial and nuclear genomes, including both protein-coding and RNA-coding sequences, and ranging from 234 to 8,004 parsimony-informative characters (Fig. 2b). For comparison, I have five datasets from linguistics and three from stemmatology, ranging from 185 to 4,554 parsimony-informative characters (Fig.  2b). There is a borderline statistical difference in the RIs of these two groups (Welch t-test: t = 2.096, p=0.063), with the genotype datasets fitting the tree model less well than do the anthropology datasets.
So, while these are only small comparisons, there are two different patterns in the data. That is, in biology there is some indication that the large genotype datasets fit a tree less well than do the smaller phenotype datasets (mean RI ! 0.5 versus 0.7, respectively), whereas there is no difference in fit for the large versus small anthropological datasets (mean RI ! 0.6 in both cases). This possible distinction between genotype and phenotype data in biology might be worth investigating further.

Discussion
The answer to the question posed in the title of this paper is: apparently not, at least for datasets with <500 parsimony-informative characters. Under these circumstances, anthropological datasets fit the maximum-parsimony tree model less well than do phenotype biological datasets. This implies that, indeed, cross-cultural (horizontal) processes have been important in the evolution of cultural artifacts, as well as branching historical (vertical) processes, which is a different conclusion from that reached by Collard et al. This does not invalidate the use of phylogenetic analysis in culture studies, but it does mean that a phylogenetic network would be a more useful model than a phylogenetic tree (Bapteste et al. 2013).
Evolutionary theory has four basic contributions that it could possibly make to anthropology (Pappas & Mooers 2011): (i) correlating human genetics with other aspects of culture; (ii) adopting evolutionary theory to provide causal mechanisms in culture history; (iii) using evolutionary concepts as metaphors; and (iv) applying phylogenetic techniques to anthropological data. Here, we are interested in the use of contribution (iii) as a justification for adopting (iv). Analyzing cultural data using phylogenetics is thus often seen as predicated on the adoption of the tree metaphor for representing phylogenetic history.
The tree metaphor has a long history in cultural studies. For example, the first empirical use of this metaphor in a chronological context appears to have been that of Buffon (1755) in biology, Gallet (1800) in linguistics, andCollin &Schlyter (1827) in stemmatology, all of which long pre-date empirical use within the context of modern evolutionary theory (!elakovsk" 1853 in B linguistics; Mivart 1865 in biology). The tree is thus apparently a powerful as well as a useful metaphor, although its adoption in culture studies has been only sporadic (e.g. see Geisler & List 2013 for linguistics). It is for this reason that the tests carried out by Collard et al. (2006) and myself are of interest.
There are, however, clear limitations to the comparison undertaken here. Notably, the phenotype data were restricted to those from the Vertebrata and Insecta, with one exception (spider behavior). This was apparently adopted by Collard et al. based on the idea of comparing humans with related animals. However, it seems very likely that if the comparison were to be made using other systematic groups then the result would be different. The evolution of plants, for example, is usually considered to have involved extensive hybridization (Mallet 2005), so that plant datasets are less likely to fit a tree model than are animals. A similar argument applies to prokaryotes, which have long been subject to horizontal gene transfer. However, this does not negate the overall conclusion, which is that a tree model is unlikely to be particularly suitable for anthropology studies.
Furthermore, the comparison made here is rather simplistic, and deliberately so. Culture evolution is complex, and a tree is a very simple model. However, the hypothesis tested here is comparative, comparing the use of the tree model in biology and anthropology -I am evaluating the relative applicability of the model, not its realism. So, the test here asks solely how well the data fit a tree within a parsimony context. This has been the most common context in anthropology, to date, and so it was natural for Collard et al. to use a maximum-parsimony analysis, and to assess the fit of the data to the optimal tree. Using the ensemble retention index (Farris 1989) keeps the focus on the characters themselves, rather than on any aspect of the tree building. For simplicity, I have followed suit.
However, in practice, any statistic that quantifies the fit of data to a tree could be used, and any tree-building algorithm could be used. An alternative, but still simple, approach would be to make the comparison using a distance-based analysis (e.g. Bryant et al. 2006). For example, the fit of the data to a neighbor-joining tree could be compared to its fit to a neighbor-net network -this would provide a heuristic assessment of how much better the data fit a reticulating network compared to a tree.
A more sophisticated approach would involve the direct comparison of different phylogenetic models in a likelihood context. The limitation here is the current dearth of explicit alternative models in anthropology. This topic is being actively investigated in the context of phylogenetic networks, which add reticulations in the basic tree model (e.g. Yu et al. 2011), but explicit network models have rarely been developed in anthropology (see Dunn 2014). For example, Ross et al. (2013) note that in genetics there are specific measures of genetic distance but that in their anthropology study they can use only a generic distance measure (Jaccard in their case). A more general point of interest here concerns the applicability of phylogenetic analysis to anthropological studies. It seems to be commonly accepted by proponents that an analogy can be drawn between anthropological data and specifically genotype data in biology Pagel 2009;Howe & Windram 2011), and that this can be used to justify the application of phylogenetics. This seems to be based on a misconception.
In culture studies information is usually passed between "generations" via repeated teaching, learning or imitation, whereas genotype information is passed as DNA-coded information in genes. Languages must be learned anew by each user, as must tool making and artistic procedures, whereas each possessor of a gene inherits a physical copy of the original.

?
Furthermore, the DNA "alphabet" is a very different thing from a language alphabet, for example.
The recording of cultural information in permanent forms, such as written works, does not invalidate this point. Rather, if emphasizes further differences between cultural history and genotype history -that cultural information can skip many generations, and also that it can be transmitted diagonally as well as vertically and horizontally (i.e. simultaneously from one culture to another as well as from the distant past to the present). Genes do not skip generations, at least not in nature.
So, genes are physical objects whereas memes (cultural replicators) are theoretical concepts. It is not necessarily obvious what the replicating units of language, for example, really areequating phonemes with nucleotides seems rather superficial. Concepts such as phonemes seem much more closely related to concepts like phenotype than to genotype. Genotype refers to measurement of features relating solely to the genes, which are the units of inheritance. Phenotype, on the other hand, refers to features that are the product of genes + interactions between genes + interactions between genes and the environment. This is much closer to the concept of characters in cultural studies.
For genetic data the units of comparison are to some extent self-evident (e.g. DNA residues). However, phenotypic data have traditionally required careful study of qualitative features, in order to establish the basis for comparing like with like, including comparative anatomy, developmental studies, and so on. A similar situation pertains in anthropology, where elucidating the nature of the characters to be compared has traditionally occupied much of the time of anthropologists. As far as phylogenetic methods are concerned, for both phenotype and anthropology data establishing the character transformation series is more than half the battle (and its description occupies a large proportion of any publication), and the algorithmic treebuilding is to some extent then simply a graphical representation of the congruent patterns among those transformation series (based on an objective and repeatable method).
One can, nevertheless, question whether all forms of anthropology data have a phenotype in the same conceptual way. For example, some anthropology data refer to physical objects (pottery, rugs, arrowheads, etc) while others refer to immaterial concepts, such as phonemes in linguistics. Immaterial concepts are much more difficult to compare in any meaningful way, because there is no actual object to study. For example, the characters that have been used for phylogeny reconstruction in linguistics include phonological, lexical and syntactic traits. None of these approximates to a genetic character in any way, although they are all features that are "observable" in an objective and repeatable manner. Furthermore, linguistic borrowing cannot be simply equated with introgression or horizontal gene transfer, because the physical processes are quite different.
So, there seems to be no a priori reason to expect anthropological data to be analogous to genotype data. However, no such theoretical objection appears necessarily to apply to an analogy between culture data and phenotype data. The patterns produced by cultural processes and phenotypic processes have much more in common than either does with genotypic processes.
In this sense, the discovery of a practical difference between the phylogenetic patterns of biology and anthropology is of importance. Tree-based phylogenetic analyses will apparently be of considerably less practical usefulness in anthropology than they have been in animal biology. At Z best, they might function as a heuristic tool, but even that is not certain. A phylogenetic tree is a graph, and graphs are excellent tools for exploratory data analysis. However, a phylogenetic tree is a very specific type of graph, with a precise interpretation in terms of representing evolutionary history. If the model underlying that interpretation is inappropriate, then the analysis is unlikely to be much use even as a heuristic.
It would be more productive to employ analysis techniques that accommodate both horizontal and vertical evolutionary processes. Indeed, the first empirical phylogenetic models in stemmatology (Collin & Schlyter 1827), linguistics (Gallet 1800) and biology (Buffon 1755) all actually display reticulation, rather than a bifurcating tree, and so this suggestion is a longstanding one.