| Literature DB >> 18453548 |
Arjun B Prasad1, Marc W Allard, Eric D Green.
Abstract
The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computational requirements and the negative consequences of systematic biases. To explore these issues and to clarify the phylogenetic relationships among mammals, we have analyzed a large data set of over 60 megabase pairs (Mb) of high-quality genomic sequence, which we generated from 41 mammals and 3 other vertebrates. All sequences are orthologous to a 1.9-Mb region of the human genome that encompasses the cystic fibrosis transmembrane conductance regulator gene (CFTR). To understand the characteristics and challenges associated with phylogenetic analyses of such a large data set, we partitioned the sequence data in several ways and utilized maximum likelihood, maximum parsimony, and Neighbor-Joining algorithms, implemented in parallel on Linux clusters. These studies yielded well-supported phylogenetic trees, largely confirming other recent molecular phylogenetic analyses. Our results provide support for rooting the placental mammal tree between Atlantogenata (Xenarthra and Afrotheria) and Boreoeutheria (Euarchontoglires and Laurasiatheria), illustrate the difficulty in resolving some branches even with large amounts of data (e.g., in the case of Laurasiatheria), and demonstrate the valuable role that very large comparative sequence data sets can play in refining our understanding of the evolutionary relationships of vertebrates.Entities:
Mesh:
Year: 2008 PMID: 18453548 PMCID: PMC2515873 DOI: 10.1093/molbev/msn104
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Multispecies Comparative Sequence Data Set
| Clade | Scientific Name | Common Name | Total Sequence | Coding | Conserved Noncoding |
| Catarrhini | Human | 1,877,426 | 20,647 | 102,884 | |
| Chimpanzee | 1,573,483 | 17,962 | 86,513 | ||
| Lowland gorilla | 1,761,981 | 20,489 | 93,962 | ||
| Sumatran orangutan | 1,478,010 | 18,344 | 80,548 | ||
| Red-cheeked gibbon | 2,154,624 | 20,122 | 97,708 | ||
| Black and white colobus | 2,023,939 | 20,575 | 99,065 | ||
| Vervet monkey | 1,555,031 | 18,638 | 87,051 | ||
| Rhesus macaque | 1,678,549 | 20,569 | 92,538 | ||
| Olive baboon | 1,680,295 | 20,575 | 89,897 | ||
| Platyrrhini | White-tufted-ear marmoset | 1,869,361 | 19,783 | 88,306 | |
| Dusky titi | 1,810,674 | 18,263 | 84,974 | ||
| Owl monkey | 2,059,585 | 20,581 | 96,483 | ||
| Bolivian squirrel monkey | 1,695,311 | 16,692 | 67,548 | ||
| Strepsirrhini | Small-eared galago | 1,732,353 | 20,373 | 86,512 | |
| Ring-tailed lemur | 1,399,362 | 20,545 | 84,060 | ||
| Gray mouse lemur | 1,541,029 | 19,103 | 86,239 | ||
| Rodentia | Norway rat | 1,883,088 | 20,344 | 78,983 | |
| Mouse | 1,486,509 | 19,079 | 73,094 | ||
| Guinea pig | 1,815,594 | 20,504 | 85,548 | ||
| 13-lined ground squirrel | 1,757,846 | 20,505 | 89,020 | ||
| Lagomorpha | New Zealand white rabbit | 1,889,755 | 20,453 | 81,226 | |
| Cetartiodactyla | Cow | 2,022,671 | 20,357 | 85,135 | |
| Sheep | 1,816,302 | 20,149 | 76,197 | ||
| Indian muntjac | 1,450,172 | 15,340 | 67,216 | ||
| Domestic pig | 1,198,526 | 17,006 | 60,133 | ||
| Perissodactyla | Horse | 1,423,288 | 17,580 | 75,633 | |
| Carnivora | Cat | 1,737,938 | 20,374 | 81,560 | |
| Clouded leopard | 1,691,656 | 16,001 | 74,568 | ||
| Dog | 1,317,853 | 16,374 | 69,142 | ||
| Domestic ferret | 1,494,791 | 20,456 | 75,743 | ||
| Chiroptera | Seba's short-tailed bat | 1,069,438 | 14,424 | 38,369 | |
| Greater horseshoe bat | 1,684,815 | 20,495 | 85,118 | ||
| Eulipotyphla | Middle-African hedgehog | 1,985,767 | 20,081 | 72,111 | |
| European common shrew | 1,734,562 | 18,845 | 63,737 | ||
| Xenarthra | Armadillo | 1,454,970 | 16,850 | 59,554 | |
| Afrotheria | African elephant | 2,040,789 | 20593 | 87,812 | |
| Lesser hedgehog tenrec | 1,765,269 | 18,087 | 74,734 | ||
| Marsupialia | North American opossum | 1,627,985 | 15,114 | 45,484 | |
| Gray short-tailed opossum | 1,174,555 | 12,480 | 33,565 | ||
| Tammar wallaby | 1,846,640 | 18,545 | 61,489 | ||
| Monotremata | Duck-billed platypus | 1,268,713 | 18,543 | 49,457 | |
| Aves | Chicken | 744,025 | 19,934 | 32,648 | |
| 257,833 | 16,938 | 7,760 | |||
| Actinopterygii | 273,621 | 17,033 | 7,779 | ||
| Total | 69,805,984 | 825,745 | 3,217,103 |
The total amount of assembled sequence (in bases) following removal of low-quality sequence and overlaps between BAC sequences (Thomas et al. 2003).
The number of bases in the coding partition (see text for details).
The number of bases in the conserved noncoding partition (see text for details).
FML tree derived from the analysis of the coding sequence partition using RY-coded bases and a codon position partitioned CF + Γ model. Branch lengths indicate likelihood-inferred substitutions per site with a GTR + Γ model. ML bootstrap proportions are listed above Bayesian posterior probabilities for all branches at less than 100% bootstrap proportion and 1.0 Bayesian posterior probability support. Platypus was constrained to the mammals (its branch is marked with an asterisk to reflect this). The fishes (Tetraodon and Fugu) were used to root, but their branches are not shown. Branch lengths were optimized using ML from nucleotide-coded data with a GTR + Γ model.
FML tree derived from the analysis of coding plus conserved noncoding sequence matrix using RY-coded bases. A CF + Γ model was used, with 4 partitions: 3 for codon positions and 1 for conserved noncoding sequence. Long branches leading to platypus and chicken were abbreviated for clarity. Other features are the same as indicated in figure 1.
Pairwise ILD P Valuesa
Values above diagonal are for NT-coded data, below are for RY-coded data.
All protein-coding sequences.
Codon position 1, 2, or 3 (as indicated) within coding sequence.
Conserved noncoding sequence.
FML trees for each of 10 sequential, equal-sized partitions from the coding plus conserved noncoding sequence matrix. Numbers (1–10) reflect the specific partition used. The arrangement of taxa and branches indicated in colors other than black vary among partitions. Nodes annotated with hollow circles have less than 50% bootstrap proportions, those with shaded circles have 50% to 75% bootstrap proportions, and those with solid circles have 75% bootstrap proportions or greater. Branches that are the same in all trees are indicated in black, with some collapsed to higher level taxa for simplicity.
Relative Likelihood Support for Placental Root across Coding Plus Conserved Noncoding Sequence Matrix
| Partition | Total | Combined | SH Test | ||||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||||
| Atlantogenata | 0 | 0 | 0 | 0 | 0 | n/a | 0 | 0 | 0 | 0 | 0 | 0 | Best |
| Epitheria | 4.5 | 1.7 | 2.9 | 9.1 | 3.9 | n/a | 1.6 | 16.7 | 7.5 | 8.0 | 56.0 | 69.4 | <0.0001 |
| Exafroplacentaila | 4.5 | 1.8 | 0.5 | 7.9 | 3.8 | n/a | 1.6 | 16.9 | 6.6 | 6.7 | 50.3 | 62.9 | <0.0001 |
NOTE.—n/a, not applicable.
Partition 6 contains a region where there is incomplete armadillo sequence, so no placental root can be inferred.
Likelihood score for entire coding plus conserved noncoding sequence matrix with 4 partitions for model parameters (3 for codon position and 1 for conserved noncoding).
FThree possible roots for Placentalia. SH test results from the coding plus conserved noncoding sequence matrix for both nucleotide- and RY-coded matrices. (A) Hypothesis rooting Placentalia between Xenarthra and Epitheria (Boreoeutheria + Afrotheria). (B) Hypothesis rooting Placentalia between Afrotheria and Exafroplacentalia (Boreoeutheria + Afrotheria). (C) Hypothesis rooting Placentalia between Boreoeutheria and Atlantogenata (Afrotheria + Xenarthra).