| Literature DB >> 20181251 |
Jill M Duarte1, P Kerr Wall, Patrick P Edger, Lena L Landherr, Hong Ma, J Chris Pires, Jim Leebens-Mack, Claude W dePamphilis.
Abstract
BACKGROUND: Although the overwhelming majority of genes found in angiosperms are members of gene families, and both gene- and genome-duplication are pervasive forces in plant genomes, some genes are sufficiently distinct from all other genes in a genome that they can be operationally defined as 'single copy'. Using the gene clustering algorithm MCL-tribe, we have identified a set of 959 single copy genes that are shared single copy genes in the genomes of Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa. To characterize these genes, we have performed a number of analyses examining GO annotations, coding sequence length, number of exons, number of domains, presence in distant lineages, such as Selaginella and Physcomitrella, and phylogenetic analysis to estimate copy number in other seed plants and to demonstrate their phylogenetic utility. We then provide examples of how these genes may be used in phylogenetic analyses to reconstruct organismal history, both by using extant coverage in EST databases for seed plants and de novo amplification via RT-PCR in the family Brassicaceae.Entities:
Mesh:
Year: 2010 PMID: 20181251 PMCID: PMC2848037 DOI: 10.1186/1471-2148-10-61
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Shared single copy genes in various combinations of angiosperm genomes. Angiosperm genomes are abbreviated as follows: Ath - Arabidopsis thaliana; Ptr - Populus trichocarpa; Vvi - Vitis vinifera; Osa - Orysa sativa. The number of tribes represents the number of PlantTribes found at medium stringency (3.0) that contain a single member from each of the genomes sampled.
Shared single copy nuclear genes are present throughout plant lineages
| Taxonomic Group | Number of single copy APVO PlantTribes present |
|---|---|
| Eurosids | 913 |
| Asterids | 519 |
| Core Eudicots | 76 |
| Basal Eudicots | 189 |
| Monocots | 948 |
| Basal Angiosperms | 48 |
| Gymnosperms | 502 |
| Vascular Plants | 438 |
| Green Algae | 190 |
Summary of additional file 4 with total number of shared single copy nuclear genes present and distribution of count numbers for various taxonomic groups. The eurosids include members from both Rosids I and II; core eudicots are comprised of families considered basal to either the Eurosids or the Asterids such as the Caryophyllales and the Vitales; basal eudicots are represented by the ranunculids Eschscholzia californica and Papaver somniferum; Monocots include both members of the Poaceae and non-grass monocots; basal angiosperms include members of the magnoliids as well as Amborella trichopoda and Nuphar advena; gymnosperms includes conifers, cycads, Ginkgo and Gnetales; Vascular plants include representatives from ferns, mosses and hornwort; and green algae include members of both chlorophyte and charophyte green algal lineages.
Figure 2Overrepresentation and underrepresentation of shared single copy genes in select GO categories. Bar chart showing GO slim categories that are overrepresented in the APVO PlantTribes using the TAIR8 annotation of the Arabidopsis thaliana genome using an initial alpha value of 0.05 with a subsequent Bonferroni correction for multiple tests. Bottom green bar represents percentage of APVO shared single copy genes with the given annotation; top blue bar is percentage of genes with the given annotation for the remainder of the genome. Overrepresentation and underrepresentation was detected using a chi-square test comparing the slim GO annotation of the single copy tribes versus all else in the Arabidopsis genome.
Amplification of shared single copy nuclear genes in Brassicaceae
| Marker | At2 g21870 | At2 g32520 | At3 g47810 | At4 g31720 | At4 g33250 | At5 g47570 | At5 g63135 | At5 g63135 | At2 g13360 | At4 g15790 | At4 g37830 | At5 g23290 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | |
| 450 | 450 | 350 | 250 | 210 | 300 | 250 | 300 | 900 | 150 | 150 | 150 | |
| 3 | 1 | 1 | 3 | 3 | 3 | 2 | 1 | 1 | 1 | 1 | 2 | |
| 2 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | |
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | |
| 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | |
| 1 | 1 | 0 | 1 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | |
| 700 | 1000 | 2000 | 1300 | 1200 | 2000 | 2000 | 3000 | 500 | 1000 | 1500 | 700 | |
| 800 | 600 | 600 | 1100 | 700 | 400 | 300 | 200 | 500 | ||||
| 700 | 200 | |||||||||||
PCR and RT-PCR amplification results from Brassicaceae showing number and size of bands. The number of bands amplified is indicated (0-5).
Figure 3Single copy nuclear genes improve phylogenetic resolution in the Brassicaceae. A single most-parsimonious tree was found from combined analysis of complete data-matrix containing genes At2 g32520, At2 g13360, and At5 g23290 (L = 961, consistency index = 0.774, retention index = 0.529). Bootstrap values are shown above branches. Note that individual gene trees gave similar topologies. The phylogeny is consistent with published phylogenies using more taxa and other molecular markers. Bootstrap values from Beilstein et al., 2006 are shown below branches.
Shared single copy nuclear genes are a rich source of phylogenetic information.
| ATH | Annotation | # SEQ | # NT | # VAR | PI | >50 MP | >50 ML |
|---|---|---|---|---|---|---|---|
| At2 g13360 | AGT1 | 49 | 1203 | 703 | 51% | 63% | 65% |
| At3 g47810 | MAIGO 1 | 91 | 573 | 359 | 49% | 66% | 73% |
| At2 g32520 | dienelactone hydrolase family protein | 73 | 721 | 536 | 64% | 75% | 81% |
| At3 g52300 | ATPQ | 129 | 519 | 395 | 62% | 65% | 66% |
| At5 g06360 | Ribosomal protein S8e | 51 | 780 | 449 | 47% | 58% | 58% |
| At5 g04600 | RNA recognition motif (RRM)-containing protein | 63 | 579 | 475 | 74% | 68% | 71% |
| At2 g21870 | probable atp synthase 24 kda subunit, mitochondrial precursor | 95 | 606 | 492 | 68% | 78% | 78% |
| At4 g33250 | eukaryotic translation initiation factor 3 subunit 11 (eif3k) | 60 | 662 | 453 | 59% | 68% | 66% |
| At4 g30010 | fiber protein Fb15 | 129 | 251 | 217 | 77% | 62% | 63% |
| At1 g77710 | Probable ubiquitin-fold modifier 1 precursor | 143 | 254 | 202 | 55% | 46% | 45% |
| At4 g08230 | glycine-rich protein | 51 | 413 | 286 | 54% | 60% | 68% |
| At4 g31720 | STG1, TAFII15 | 64 | 448 | 323 | 51% | 63% | 67% |
| At4 g37830 | putative cytochrome c oxidase subunit VIa precursor | 151 | 216 | 192 | 71% | 43% | 45% |
| At5 g47570 | similar to hypothetical protein 25.t00006 [Brassica oleracea] | 90 | 387 | 288 | 64% | 58% | 65% |
| At5 g23290 | putative c-myc binding protein | 69 | 404 | 289 | 64% | 62% | 69% |
| At1 g27530 | expressed protein | 60 | 525 | 332 | 51% | 68% | 71% |
| At3 g20390 | Endoribonuclease L-PSP, putative | 132 | 372 | 313 | 69% | 70% | 70% |
| At5 g63135 | hypothetical protein | 64 | 448 | 323 | 51% | 65% | 63% |
| Concatenated alignment of 13 shared single copy genes | 69 | 7701 | 5072 | 55% | 81% | 87% |
This table provides the following information about the 18 shared single copy genes that were used for phylogenetic analysis based on EST and finished cDNA sequences: the Arabidopsis thaliana locus ID (ATH); the TAIR annotation in Arabidopsis (Annotation); the number of sequences in the final alignment (# SEQ); the number of nucleotides in the final alignment (# NT); the number of variable characters in the final alignment (# VAR); the percentage of characters that are parsimony-informative (% PI); the percent of nodes in the MP bootstrap consensus trees that are supported in greater than 50% of the bootstrap replicates; the percent of nodes in the ML bootstrap consensus trees that are supported in greater than 50% of the bootstrap replicates.
Figure 4Angiosperm phylogeny using ESTs for 13 shared single copy genes. The tree depicted to the left is the MP tree determined from the concatenated data matrix for 13 single copy genes using 69 seed plant taxa. The tree depicted on the right is the ML tree determined from the concatenated data matrix for 13 single copy genes using 69 seed plant taxa. Bootstrap values are indicated by the colored bars placed on branches with greater than 50% bootstrap support. Picea sitchensis was used as the outgroup taxa for all analyses. Taxa are color-coded as follows: monocots (green); euasterid I (light blue); euasterid II (dark blue); eurosid I (pink); eurosid II (red); core eudicot (purple); basal eudicot (brown); magnoliid (orange); basal angiosperm (dark gray); gymnosperms (black).