| Literature DB >> 26061739 |
Reinhold Stockenhuber1, Stefan Zoller2, Rie Shimizu-Inatsugi3, Felix Gugerli4, Kentaro K Shimizu3, Alex Widmer5, Martin C Fischer5.
Abstract
The lack of DNA sequence information for most non-model organisms impairs the design of primers that are universally applicable for the study of molecular polymorphisms in nuclear markers. Next-generation sequencing (NGS) techniques nowadays provide a powerful approach to overcome this limitation. We present a flexible and inexpensive method to identify large numbers of nuclear primer pairs that amplify in most Brassicaceae species. We first obtained and mapped NGS transcriptome sequencing reads from two of the distantly related Brassicaceae species, Cardamine hirsuta and Arabis alpina, onto the Arabidopsis thaliana reference genome, and then identified short conserved sequence motifs among the three species bioinformatically. From these, primer pairs to amplify coding regions (nuclear protein coding loci, NPCL) and exon-primed intron-crossing sequences (EPIC) were developed. We identified 2,334 universally applicable primer pairs, targeting 1,164 genes, which provide a large pool of markers as readily usable genomic resource that will help addressing novel questions in the Brassicaceae family. Testing a subset of the newly designed nuclear primer pairs revealed that a great majority yielded a single amplicon in all of the 30 investigated Brassicaceae taxa. Sequence analysis and phylogenetic reconstruction with a subset of these markers on different levels of phylogenetic divergence in the mustard family were compared with previous studies. The results corroborate the usefulness of the newly developed primer pairs, e.g., for phylogenetic analyses or population genetic studies. Thus, our method provides a cost-effective approach for designing nuclear loci across a broad range of taxa and is compatible with current NGS technologies.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26061739 PMCID: PMC4465667 DOI: 10.1371/journal.pone.0128181
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Workflow for the identification of conserved nucleotide sequences in multiple Brassicaceae species and subsequent primer design.
Blue boxes refer to the three major steps in the workflow, white boxes indicate the general steps taken Explanations on the right provide specific results from this study.
Species and samples used in this study for primer testing and sequencing.
| Accession Name | Species | Herbarium Number | Collector and Date | Origin | PCR | Sequencing | Family | Lineage | Genus |
|---|---|---|---|---|---|---|---|---|---|
| - |
| CH0Z-20100490 | Steiger P., | Switzerland, San Salvatore TI 420 m asl | ☑ | ☑ | outgroup | ☐ | ☐ |
| - |
| - | Fischer M., 2011 | 45.90919° N 9.39207° | ☑ | ☐ | ☐ | ☐ | ☐ |
| Col-0 |
| - | - | - | ☑ | ☑ | ☑ | ☑ | ☐ |
| - |
| Z/ZT MB 14820 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☑ | ☑ | ☐ | ☑ |
| - |
| Z/ZT MB 14821 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☑ | ☐ | ☐ | ☑ |
| - |
| - | Gugerli F. | - | ☑ | ☑ | ☐ | ☐ | ☑ |
| - |
| Z/ZT MB 14816 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☑ | ☐ | ☐ | ☑ |
| - |
| Z/ZT MB 14814 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☑ | ☐ | ☐ | ☑ |
| - |
| XX0Z-19820365 | Käser U., 2010 | Botanical Garden Jaen, France | ☑ | ☑ | ☐ | ☑ | ☐ |
| - |
| Z/ZT MB 14815 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☑ | ☑ | outgroup | ☐ |
| - |
| - | - | - | ☑ | ☑ | ☐ | ☑ | ☐ |
| - |
| XX0Z-20010028 | Käser U., 2010 | Botanical Gardens University Bonn-Germany | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| - | Marhold K. | Russia | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| Z/ZT MB 14836 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☐ | ☐ | ☐ | ☐ |
| - |
| Z/ZT MB 14813 | Baltisberger M., 2011 | Davos, Switzerland, 2100 m asl | ☑ | ☐ | ☐ | ☐ | ☐ |
| HAY1 |
| - | Shimizu-Inatsugi R., | University of Zurich, Switzerland | ☑ | ☑ | ☑ | ☑ | ☐ |
| - |
| Z/ZT MB 14818 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☐ | ☐ | ☐ | ☐ |
| - |
| XX0Z-20001358 | Schneeberger E. | Denmark, Bornholm, Teglkas, Shore | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| XX0Z-20000361 | Käser U., 2009 | Giardino Botanico Alpino Rezia-Bormio | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| Z/ZT MB 14830 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☑ | ☐ | ☐ | outgroup |
| - |
| XX0Z-19770612 | Käser U., 2009 | Botanical Garden St. Gallen-Switzerland | ☑ | ☐ | ☐ | ☐ | ☐ |
| - |
| Z/ZT MB 14807 | Baltisberger M., 2011 | Davos, Switzerland, 1600 m asl | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| Z/ZT MB 14817 | Baltisberger M., 2011 | Davos, Switzerland, 2300 m asl | ☑ | ☐ | ☐ | ☐ | ☐ |
| - |
| Z/ZT MB 14834 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| XX0Z-20100109 | Käser U., 2010 | EX BG Kiel; University Konstanz-Germany | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| Z/ZT MB 14819 | Baltisberger M., 2011 | Davos, Switzerland, 2200–2400 m asl | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| XX0Z-19963427 | Käser U., 2010 | - | ☑ | ☑ | ☑ | ☑ | ☐ |
| - |
| CH0Z-20060845 | Affeltranger K., 2006 | Switzerland, Binn VS 1280 m asl | ☑ | ☑ | ☑ | ☐ | ☐ |
| - |
| - | Shimizu-Inatsugi R., 2007 | Botanic Garden Zurich, Switzerland | ☑ | ☑ | ☐ | ☑ | ☐ |
| - |
| Z/ZT MB 14807 | Baltisberger M., 2011 | Switzerland | ☑ | ☑ | ☑ | ☐ | ☐ |
The use of each species is divided into PCR, sequencing, family, "lineage" and genus. PCR indicates use for PCR amplification and sequencing indicates sequencing of the species, respectively. Family, "lineage" and genus refer to the application of species sequences at the three taxonomic levels that were phylogenetically tested in this study. The term "outgroup" refers to a taxon being sequenced and used as outgroup for phylogenetic analysis at a specific relationship level.
Fig 2Phylogenetic inference at the family level.
Best Maximum Likelihood phylogram of concatenated gene sequences are shown. Bootstrap support values and posterior probabilities are given above or below the corresponding branches, respectively. Values below 50/0.5 are omitted. "Lineage"-brackets refer to lineages sensu Al-Shehbaz (2012). Percentage amplification success per species is given in brackets next to each species name.
Fig 3Phylogenetic inference at the "lineage" level.
Best Maximum Likelihood phylogram of concatenated gene sequences are shown. Bootstrap support values and posterior probabilities are given above or below the corresponding branches, respectively. Values below 50/0.5 are omitted. Percentage amplification success per species is given in brackets next to each species name.
Fig 4Phylogenetic inference at the genus level.
Best Maximum Likelihood phylogram of concatenated gene sequences are shown. B Bootstrap support values and posterior probabilities are given above or below the corresponding branches, respectively. Values below 50/0.5 are omitted. Percentage amplification success per species is given in brackets next to each species name.
Details on alignments and markers used in our study as well as often-used markers at three taxonomic levels.
| Family |
|
|
|
|
|
|
|
| ITS |
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| e | e | e | ii | e | e | e | ii | nrDNA | cpDNA | cpDNA |
|
| 14 | 15 | 13 | 13 | 14 | 14 | 12 | 13 | 14 | 14 | 12 |
|
| 93.30% | 100% | 86.70% | 86.70% | 93.30% | 93.30% | 80.00% | 86.70% | - | - | - |
|
| 398 | 471 | 401 | 326 | 427 | 430 | 399 | 296 | 599 | 707 | 725 |
|
| 78 | 71 | 47 | 29 | 69 | 60 | 95 | 31 | 156 | 57 | 58 |
|
| 19.60% | 15.10% | 11.70% | 8.70% | 16.20% | 14.00% | 23.80% | 10.50% | 26.00% | 8.10% | 8.00% |
|
| 0.113 | 0.054 | 0.069 | 0.079 | 0.089 | 0.056 | 0.147 | 0.071 | 0.154 | 0.067 | 0.071 |
|
| 47.40% | 47.20% | 41.60% | 43.90% | 47.60% | 48.50% | 43.90% | 45.00% | 54.00% | 30.70% | 25.50% |
|
|
|
|
|
|
|
|
|
| - | - | - |
|
| GTR+G+I | GTR+G+I | GTR+G | GTR+G | GTR+G+I | GTR+G+I | GTR+G+I | GTR+G+I | - | - | - |
|
|
|
|
|
|
|
|
|
|
| ||
|
| e | ii | ii | i | i | ii | nrDNA | cpDNA | cpDNA | ||
|
| 7 | 7 | 7 | 6 | 7 | 6 | 7 | 7 | 7 | ||
|
| 100% | 100% | 100% | 87.50% | 100% | 87.50% | - | - | - | ||
|
| 470 | 449 | 480 | 336 | 157 | 515 | 599 | 725 | 651 | ||
|
| 33 | 33 | 39 | 17 | 4 | 31 | 72 | 14 | 12 | ||
|
| 7.00% | 7.30% | 8.10% | 5.10% | 2.60% | 5.60% | 12.00% | 1.90% | 1.80% | ||
|
| 0.058 | 0.097 | 0.084 | 0.071 | 0.042 | 0.077 | 0.167 | 0.043 | 0.041 | ||
|
| 47.00% | 37.40% | 36.70% | 45.70% | 44.40% | 41.10% | 55.60% | 30.90% | 25.70% | ||
|
|
|
|
|
|
|
| - | - | - | ||
|
| GTR+G+I | GTR+G | HKY+I (GTR+I) | GTR+I | F81 (GTR) | GTR+G | - | - | - | ||
|
|
|
|
|
|
| ||||||
|
| e | iiii | ii | ii | nrDNA | ||||||
|
| 6 | 5 | 6 | 4 | 6 | ||||||
|
| 100% | 83.30% | 100% | 66.70% | - | ||||||
|
| 470 | 451 | 548 | 528 | 617 | ||||||
|
| 13 | 13 | 7 | 1 | 36 | ||||||
|
| 2.80% | 2.90% | 1.30% | 0.20% | 5.90% | ||||||
|
| 0.041 | 0.069 | 0.033 | 0.054 | 0.099 | ||||||
|
| 48.80% | 38.60% | 36.00% | 44.20% | 52.70% | ||||||
|
|
|
|
|
| - | ||||||
|
| GTR+G | GTR+I | GTR+G | GTR+G | - |
Marker structure refers to structure of amplified fragment (e = exon, i = one intron in the fragment, ii = two introns in the fragment, iii = three introns in the fragment). Sequencing success is the percentage of obtained, readable sequences. PIC refers to number of parsimony-informative characters in an alignment. PIC % shows the percentage of parsimony-informative sites within alignments. Substitution model refers to the applied substitution model for phylogenetic inference; values in brackets refer to alternative substitution model used in RAxML. Asterisk indicates that information is based on TAIR10.
Comparison of our approach and other publications with similar scopes mentioned in our study.
| Study | Taxon range | Method | Target loci | No. of loci found | Standard PCR conditions | Length (bp) |
|---|---|---|---|---|---|---|
|
| Order (Arctinopterygii) | database mining | NPCL | 154 | no | > 800 |
|
| Order (Squamata) | database mining | NPCL | 85 | no | ≥ 700 |
|
| Subkingdom (Eumetazoa) | database mining | EPIC | 52 | no | n. A. |
|
| Infraclass (Teleostei) | database mining | EPIC | 210 | yes | 207–324 |
|
| Family (Lamiaceae) | database mining | EPIC | 50 | no | 362–1717 |
|
| Subphylum (Vertebrata) | database mining | NPCL | 102 | yes (nested PCR) | 510–1650 |
|
| Order (Cycadales) | database mining | EPIC & UTR | 46 | no | 259–1890 |
|
| Genus ( | database mining & RNAseq | NPCL & UTR | 7 | no | 277–796 |
|
| Family (Brassicaceae) | database mining & RNAseq | NPCL & EPIC & UTR | 2,334 | yes | 339–787 |
Target loci indicate which fragments were targeted, standard PCR conditions indicates the availability of a uniform PCR protocol for all markers, length refers to fragment length of the regions found in a study. No. of loci found refers to the number of detected primer pairs or loci in the respective study.