| Literature DB >> 17374158 |
Chenhong Li1, Guillermo Ortí, Gong Zhang, Guoqing Lu.
Abstract
BACKGROUND: Molecular systematics occupies one of the central stages in biology in the genomic era, ushered in by unprecedented progress in DNA technology. The inference of organismal phylogeny is now based on many independent genetic loci, a widely accepted approach to assemble the tree of life. Surprisingly, this approach is hindered by lack of appropriate nuclear gene markers for many taxonomic groups especially at high taxonomic level, partially due to the lack of tools for efficiently developing new phylogenetic makers. We report here a genome-comparison strategy to identifying nuclear gene markers for phylogenetic inference and apply it to the ray-finned fishes--the largest vertebrate clade in need of phylogenetic resolution.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17374158 PMCID: PMC1838417 DOI: 10.1186/1471-2148-7-44
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Single-copy genes are useful markers for phylogeny inference. Gene duplication and subsequent loss may not cause incongruence between gene tree and species tree if gene loss occurs before the first speciation event (a), or before the second speciation event (b). The only case that would cause incongruence is when the gene survived both speciation events and is asymmetrically lost in taxon 2 and taxon 3 (c).
Figure 2The bioinformatic pipeline for phylogenetic markers development. It involves within- and across-genome sequences comparison, in silico test with sequences in other species, and experimental validation. Numbers of genes and exons identified for D. rerio are indicated by the asterisk. Exon length (L), within-genome similarity (S), between-genome similarity (Sx), and coverage (C) are adjustable parameters (see methods).
PCR primers and annealing temperatures used to amplify 10 new markers
| Gene* | Primers | Sequences | Annealing temp | PCR steps |
| zic1 | zic1_F9 | 5' GGACGCAGGACCGCARTAYC 3' | 57 | 1st |
| zic1_R967 | 5' CTGTGTGTGTCCTTTTGTGRATYTT 3' | PCR | ||
| zic1_F16 | 5' GGACCGCAGTATCCCACYMT 3' | 57 | 2nd | |
| zic1_R963 | 5' GTGTGTCCTTTTGTGAATTTTYAGRT 3' | PCR | ||
| myh6 | myh6_F459 | 5' CATMTTYTCCATCTCAGATAATGC 3' | 53 | 1st |
| myh6_R1325 | 5' ATTCTCACCACCATCCAGTTGAA 3' | PCR | ||
| myh6_F507 | 5' GGAGAATCARTCKGTGCTCATCA 3' | 62 | 2nd | |
| myh6_R1322 | 5' CTCACCACCATCCAGTTGAACAT 3' | PCR | ||
| RYR3 | RYR3_F15 | 5' GGAACTATYGGTAAGCARATGG 3' | 55 | 1st |
| RYR3_R968 | 5' TGGAAGAAKCCAAAKATGATGC 3' | PCR | ||
| RYR3_F22 | 5' TCGGTAAGCARATGGTGGACA 3' | 62 | 2nd | |
| RYR3_R931 | 5' AGAATCCRGTGAAGAGCATCCA 3' | PCR | ||
| Ptr | Ptr_F458 | 5' AGAATGGATWACCAACACYTACG 3' | 55 | 1st |
| Ptr_R1248 | 5' TAAGGCACAGGATTGAGATGCT 3' | PCR | ||
| Ptr_F463 | 5' GGATAACCAACACYTACGTCAA 3' | 62 | 2nd | |
| Ptr_R1242 | 5' ACAGGATTGAGATGCTGTCCA 3' | PCR | ||
| tbr1 | tbr1_F1 | 5' TGTCTACACAGGCTGCGACAT 3' | 57 | 1st |
| tbr1_R820 | 5' GATGTCCTTRGWGCAGTTTTT 3' | PCR | ||
| tbr1_F86 | 5' GCCATGMCTGGYTCTTTCCT 3' | 62 | 2nd | |
| tbr1_R811 | 5' GGAGCAGTTTTTCTCRCATTC 3' | PCR | ||
| ENC1 | ENC1_F85 | 5' GACATGCTGGAGTTTCAGGA 3' | 53 | 1st |
| ENC1_R982 | 5' ACTTGTTRGCMACTGGGTCAAA 3' | PCR | ||
| ENC1_F88 | 5' ATGCTGGAGTTTCAGGACAT 3' | 62 | 2nd | |
| ENC1_R975 | 5' AGCMACTGGGTCAAACTGCTC 3' | PCR | ||
| Gylt | Glyt_F559 | 5' GGACTGTCMAAGATGACCACMT 3' | 55 | 1st |
| Glyt_R1562 | 5' CCCAAGAGGTTCTTGTTRAAGAT 3' | PCR | ||
| Glyt_F577 | 5' ACATGGTACCAGTATGGCTTTGT 3' | 62 | 2nd | |
| Glyt_R1464 | 5' GTAAGGCATATASGTGTTCTCTCC 3' | PCR | ||
| SH3PX3 | SH3PX3_F461 | 5' GTATGGTSGGCAGGAACYTGAA 3' | 55 | 1st |
| SH3PX3_R1303 | 5' CAAACAKCTCYCCGATGTTCTC 3' | PCR | ||
| SH3PX3_F532 | 5' GACGTTCCCATGATGGCWAAAAT 3' | 62 | 2nd | |
| SH3PX3_R1299 | 5' CATCTCYCCGATGTTCTCGTA 3' | PCR | ||
| plagl2 | plagl2_F9 | 5' CCACACACTCYCCACAGAA 3' | 55 | 1st |
| plagl2_R930 | 5' TTCTCAAGCAGGTATGAGGTAGA 3' | PCR | ||
| plagl2_F51 | 5' AAAAGATGTTTCACCGMAAAGA 3' | 62 | 2nd | |
| plagl2_R920 | 5' GGTATGAGGTAGATCCSAGCTG 3' | PCR | ||
| sreb2 | sreb2_F10 | 5' ATGGCGAACTAYAGCCATGC 3' | 55 | 1st |
| sreb2_R1094 | 5' CTGGATTTTCTGCAGTASAGGAG 3' | PCR | ||
| sreb2_F27 | 5' TGCAGGGGACCACAMCAT 3' | 62 | 2nd | |
| sreb2_R1082 | 5' CAGTASAGGAGCGTGGTGCT 3' | PCR |
*Gene markers are named following annotations in ENSEMBLE. zic1, zic family member 1; myh6, myosin, heavy polypeptide 6; RYR3 (si:ch211-189g6.1), novel protein similar to vertebrate ryanodine receptor 3; Ptr (si:ch211-105n9.1), hypothetical protein LOC564097; tbr1, T-box brain 1; ENC1(559445 Entrezgene), similar to ectodermal-neural cortex 1; Glyt (zgc:112079), glycosyltransferase; SH3PX3, similar to SH3 and PX domain containing 3 gene; plagl2, pleiomorphic adenoma gene-like 2; sreb2, Super conserved receptor expressed in brain 2.
Summary information of the 10 gene markers amplified in 14 taxa
| Gene | Exon ID | No. of bp | No. of var. | No. of PI | Genetic distance (%) | Sub. rate | CI-MP | α | RCV | Treeness |
| zic1 | ENSDARE00000015655 | 894 | 296 | 210 | 13(2.3–22.6) | 0.64 | 0.61 | 1.64 | 0.13 | 0.23 |
| myh6 | ENSDARE00000025410 | 735 | 323 | 235 | 18(7.8–23.2) | 1.35 | 0.54 | 0.68 | 0.11 | 0.22 |
| RYR3 | ENSDARE00000465292 | 825 | 389 | 258 | 18(8–23.6) | 1.25 | 0.56 | 0.67 | 0.11 | 0.21 |
| Ptr | ENSDARE00000145053 | 705 | 304 | 234 | 18(5.3–28.1) | 1.03 | 0.57 | 1.64 | 0.12 | 0.29 |
| tbr1 | ENSDARE00000055502 | 666 | 256 | 170 | 14(3–25.6) | 0.65 | 0.67 | 2.91 | 0.10 | 0.28 |
| ENC1 | ENSDARE00000367269 | 810 | 312 | 248 | 16(6.7–24.3) | 1.13 | 0.55 | 1.10 | 0.16 | 0.33 |
| Gylt | ENSDARE00000039808 | 870 | 463 | 335 | 21(6.6–29.7) | 1.18 | 0.60 | 1.70 | 0.12 | 0.27 |
| SH3PX3 | ENSDARE00000117872 | 705 | 290 | 226 | 16(6.2–24) | 1.11 | 0.55 | 1.53 | 0.14 | 0.22 |
| plagl2 | ENSDARE00000136964 | 675 | 250 | 184 | 14.3(5.1–21.5) | 0.81 | 0.61 | 0.92 | 0.10 | 0.33 |
| sreb2 | ENSDARE00000029022 | 987 | 344 | 225 | 13(4–21.6) | 0.85 | 0.61 | 0.88 | 0.11 | 0.23 |
| RAG1 | - | 1344 | 684 | 514 | 20(8.1–29) | 1.28 | 0.57 | 1.68 | 0.05 | 0.23 |
bp, base pairs; var., variable sites; PI, parsimony informative sites; Genetic distance, average uncorrected distance, number in parenthesis are range of the distances; Sub. rate, relative substitution rate estimated using Bayesian approach; CI-MP, consistency index; α, gamma distribution shape parameter; RCV, relative composition variability.
Figure 3A comparison of the maximum likelihood phylogram inferred in this study with the conventional phylogeny. (a) Left panel – the phylogram of 14 taxa inferred from protein sequences of 10 genes; (b) right panel – a "consensus" phylogeny following Nelson [50]. The numbers on the branches are Bayesian posterior probability, ML bootstrap values estimated from protein sequences and ML bootstrap values estimated from RY-coded nucleotide sequence. Asterisks indicate bootstrap supports less than 50.