| Literature DB >> 24832230 |
Michał T Lorenc1, Satomi Hayashi2, Jiri Stiller3, Hong Lee4, Sahana Manoli5, Pradeep Ruperao6, Paul Visendi7, Paul J Berkman8, Kaitao Lai9, Jacqueline Batley10, David Edwards11.
Abstract
Single nucleotide polymorphisms (SNPs) are becoming the dominant form of molecular marker for genetic and genomic analysis. The advances in second generation DNA sequencing provide opportunities to identify very large numbers of SNPs in a range of species. However, SNP identification remains a challenge for large and polyploid genomes due to their size and complexity. We have developed a pipeline for the robust identification of SNPs in large and complex genomes using Illumina second generation DNA sequence data and demonstrated this by the discovery of SNPs in the hexaploid wheat genome. We have developed a SNP discovery pipeline called SGSautoSNP (Second-Generation Sequencing AutoSNP) and applied this to discover more than 800,000 SNPs between four hexaploid wheat cultivars across chromosomes 7A, 7B and 7D. All SNPs are presented for download and viewing within a public GBrowse database. Validation suggests an accuracy of greater than 93% of SNPs represent polymorphisms between wheat cultivars and hence are valuable for detailed diversity analysis, marker assisted selection and genotyping by sequencing. The pipeline produces output in GFF3, VCF, Flapjack or Illumina Infinium design format for further genotyping diverse populations. As well as providing an unprecedented resource for wheat diversity analysis, the method establishes a foundation for high resolution SNP discovery in other large and complex genomes.Entities:
Year: 2012 PMID: 24832230 PMCID: PMC4009776 DOI: 10.3390/biology1020370
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Summary of wheat cultivar data and mapping.
| Wheat variety | Data generated | Data mapped to reference | % read pairs mapped |
|---|---|---|---|
| Drysdale | 168 Gbp | 8.65 Gbp | 5.14% |
| Excalibur | 146 Gbp | 5.36 Gbp | 3.66% |
| Gladius | 180 Gbp | 8.47 Gbp | 4.70% |
| RAC875 | 132 Gbp | 4.1 Gbp | 3.10% |
Summary of Single nucleotide polymorphism (SNP) validation.
| SNP Primer Name | Forward Primer | Reverse Primer | SNP score | Validation |
|---|---|---|---|---|
| UQ7A27 | TAACATAAGCAAAGTTCTATTA | TTTGGAACACAATCGGAACTT | 6 | Failed |
| UQ7A1397 | TCTATTGGATTCTTTCCGAT | TCACCCTGTGGAATGAAAGA | 5 | Failed |
| UQ7A5622 | TTAGCCAAAATGGACCCAAA | CCTCTTTATTCAATCTGGAAACG | 2 | True SNP |
| UQ7A129835 | TTCTTACTGTGGCTGCATCA | GCCATCCTAAACGACCTTCA | 5 | True SNP |
| UQ7A9400 | GCCCATATGCAGTTCATGGT | AGAGCCAAACCTTCCCTGAT | 2 | Failed |
| UQ7A7915 | CATGCCAACCCAAGTAGACC | GAAGCGTGAAAATTTCGTGA | 6 | True SNP |
| UQ7A6107 | TGGTGTTTACGCTGAAGTTACC | CTGGCCTGGGCACTACATA | 6 | True SNP |
| UQ7A2603 | GTCACCAACCAGCTCGAAAT | TTGTAGCTTTGCCTCTGTGAA | 2 | Failed |
| UQ7A3491 | AGTCGCCGGCAGTAAAAATA | CCGAAGAAAATGTGGTGGAG | 4 | True SNP |
| UQ7A4532 | TTTCCTCTAGATCTGTGCAAAATG | CATCCAGGACTGCATAAGCTC | 6 | True SNP |
| UQ7A100138 | TCCCTGGTCCACGAGTTATT | AAATGGTTTGAGCCTTGTGC | 7 | Failed |
| UQ7A136305 | CATCATCTTTGAAAAATCCTAGCC | TGTTCTGCAAGCTTCGTCTG | 5 | True SNP |
| UQ7A155877 | AAGCTGTTGTGCCAGTGTTG | GAGCTAGCGTCGCTGACATA | 4 | True SNP |
| UQ7A180868 | GACCGTCATCGAATGTAGCA | TCGTCCACCCAGACCTTATC | 3 | True SNP |
| UQ7A287189 | GGCGATCATCACTTAAGAAACC | CAGTAATGAGGTTTCTGCTTGG | 2 | Failed |
| UQ7A322716 | TCTGTTCGCAAACCAACG | GTGCGTTATCAGGGGAACAT | 11 | True SNP |
| UQ7A57227 | ATGGGTGAAGGGAATACAGC | TGCATGCACATACAACCAAA | 5 | True SNP |
| UQ7A87191 | TCAGTTCGGTAAGGATGAAGA | GAAGCAGTATGCATCTAAACTTTG | 6 | Heterozygous |
| UQ7B21 | GCAGGGTTAATTTCTAGCAAGC | GCCTTTTATCCAAAGCCATC | 8 | Failed |
| UQ7B484 | CTCAACCTCCCAAGCATGA | GCTATCCAGCTACCCTGTGC | 11 | Failed |
| UQ7B3940 | GCCAGAGGCACTAGCATCAC | GGTAATTGTGGAGCAAGCAA | 6 | True SNP |
| UQ7B4960 | GCATGGCATTTCAAGATCAG | GGAGGAGGACAAAGCCAGAT | 5 | True SNP |
| UQ7B5991 | CCAAGCCACCACCCTTTAT | TAATCCCCGTCATCTCGAAG | 4 | True SNP |
| UQ7B120997 | CTCCTCAGATGACCAATTTGC | CACCAAAATATGCTGTACAATTCTATG | 7 | Failed |
| UQ7B256895 | GCAGCAGAGGTAGGCACTTC | GAAATGCTTCGAGTGTGGTG | 11 | True SNP |
| UQ7B64318 | GGGTCCAGACTTCCACGTTA | CCCACATTAATTTGTACGACCTC | 6 | Failed |
| UQ7B97303 | TGATTCGAGCCCATATAGGAA | AGCCATGCGGAAATATTGAG | 8 | True SNP |
| UQ7D283 | TGAGTAAGACAACAATCAGAGCA | CAATGCGAGCAAAAAGATCA | 5 | True SNP |
| UQ7D429 | TGTGCTGACGTGGCATCTAT | GCATGTGGAAAACGAGTGTG | 3 | True SNP |
| UQ7D689 | CATCTGGCCTCAACATCAAA | TGTTGGTAGTGAGGCACTTCTT | 9 | Failed |
| UQ7D948 | GGCGATACTCGATGAAAGAAA | TTGGAAACTACAATTGCACAAC | 9 | True SNP |
| UQ7D1189 | GCGTGGAGTAGAGGGACAAG | TCCAAAAAGCAAAACAAATGC | 4 | True SNP |
| UQ7D1491 | AGCGCAAGGAGGAGGTTAGT | GAGCCAAGTCCTTGTCAATTT | 7 | True SNP |
| UQ7D1846 | AATGTGTTCCATCCAAGACG | GCCAAGGTCGACATGTGATA | 10 | True SNP |
| UQ7D2314 | AAACAAGTCTGTGTTGCGTCA | TGCAGATACATGGCTCCAGA | 2 | Monomorphic |
| UQ7D20375 | CTGCCACCAAACGGATTAAC | AATGCATTGGCAGTCACAAG | 6 | True SNP |
| UQ7D27168 | TAATGCTATGCCGTGTCAGC | GCCACCTATTATTGAAGGCATC | 2 | True SNP |
| UQ7D38754 | GAGCGAGCAATGCTAGTGTG | GAACCCATTTGATAACCGTGA | 3 | Failed |
| UQ7D59683 | CGTCCACATTGTTGCAAATC | TTGACCCTGAAGGAAGGATG | 6 | True SNP |
| UQ7D68910 | TTGCTTTATGCCACTGGAGA | TAGGCCGTGAAACATCAACA | 3 | True SNP |