| Literature DB >> 23227038 |
Santosh Kumar1, Travis W Banks, Sylvie Cloutier.
Abstract
The decreasing cost along with rapid progress in next-generation sequencing and related bioinformatics computing resources has facilitated large-scale discovery of SNPs in various model and nonmodel plant species. Large numbers and genome-wide availability of SNPs make them the marker of choice in partially or completely sequenced genomes. Although excellent reviews have been published on next-generation sequencing, its associated bioinformatics challenges, and the applications of SNPs in genetic studies, a comprehensive review connecting these three intertwined research areas is needed. This paper touches upon various aspects of SNP discovery, highlighting key points in availability and selection of appropriate sequencing platforms, bioinformatics pipelines, SNP filtering criteria, and applications of SNPs in genetic analyses. The use of next-generation sequencing methodologies in many non-model crops leading to discovery and implementation of SNPs in various genetic studies is discussed. Development and improvement of bioinformatics software that are open source and freely available have accelerated the SNP discovery while reducing the associated cost. Key considerations for SNP filtering and associated pipelines are discussed in specific topics. A list of commonly used software and their sources is compiled for easy access and reference.Entities:
Year: 2012 PMID: 23227038 PMCID: PMC3512287 DOI: 10.1155/2012/831460
Source DB: PubMed Journal: Int J Plant Genomics ISSN: 1687-5389
List of most cited/used software for sequence assembly of NGS data. Source locations for these software are compiled in Table 4.
| Name (current version) | Assembly type | Supported parameters | Output format | Platform | |||
|---|---|---|---|---|---|---|---|
| (algorithm) | Color space | Read length | Gapped alignment | Paired-end | |||
| CLC-Bio1 | Reference2 | Yes | Arbitrary | Yes | Yes | CLC-Bio | Linux/Windows/Mac OS X |
| SeqMan NGen1 | Reference2 | Yes | Arbitrary | Yes | Yes | ACE, BAM | Windows/Mac OS X |
| NextGENe1 | Reference2 | Yes | Arbitrary | Yes | Yes | Next | Windows/Mac OS X |
| Bowtie (2) | Reference (FM-index) | Yes | Arbitrary | Yes | Yes | SAM | Linux/Windows/Mac OS X |
| BWA | Reference (FM-index) | Yes | Arbitrary | Yes | Yes | SAM | Linux |
| SOAP (3) | Reference (FM-index) | Yes | Arbitrary | No | Yes | SOAP2/3 | Linux |
| MAQ (0.6.6) | Reference (Hashing reads) | Yes | ≤127 | Yes | Yes | MAQ | Linux/Solaris/Mac OS X |
| Novoalign (2.07.07) | Reference | Yes | Arbitrary | Yes | Yes | SAM | Linux/Mac OS X |
| Mosaik (1.1.0018) | Reference | Yes | Arbitrary | Yes | Yes | SAM | Linux/Windows/Mac OS X/Solaris |
| SHRiMP (2.2.2) | Reference | Yes | Arbitrary | Yes | Yes | SAM | Linux/Mac OS X |
| Mira (3.4) | Reference2 | Yes | Arbitrary | Yes | Yes | FASTA, ACE | Linux |
1Commercial software. 2Option for de novo assembly and modules included for variant calling.
Download information of software used for NGS data.
| Software | Source |
|---|---|
| Bowtie |
|
| BWA |
|
| SOAP |
|
| MAQ |
|
| Novoalign |
|
| CLC-Bio Genomics |
|
| SeqManNGen |
|
| NextGENe |
|
| Mosaik |
|
| SHRiMP |
|
| Mira |
|
| Cassava |
|
| Newbler |
|
| Novoalign |
|
| Tablet |
|
| SNP-VISTA |
|
| Samtools |
|
| Savant |
|
| SOAPsnp |
|
| GATK |
|
| The_Genome_Analysis_Toolkit | |
| SNver |
|
| MaCH |
|
| IMPUTE2 |
|
| download_impute2 | |
| MEGA |
|
| PHYLIP |
|
Figure 1Graphical user interface of Tablet, an assembly visualization program, displays the reference genome on top and the mapped reads with color-coded SNPs on the bottom.
Commonly used NGS variant calling software. Download information for these software is compiled in Table 4. A more comprehensive list of variant calling programs is available at http://seqanswers.com/wiki/Software/list.
| Software | Multisample support | Reference | Features | Platform |
|---|---|---|---|---|
| Samtools | Yes | Aligned reads | Include computation of genotype likelihoods and variant calling | Linux |
| SOAPsnp | No | Variant database | Part of SOAP3 for variant calling | Linux |
| GATK | Yes | Aligned reads | Include variant caller, SNP filter, and SNP quality calibrator | Linux |
| SNVer | Yes | Aligned reads | Fast variant caller, assigning SNP significance based on read depth | Windows, Linux, |
| SHORE | Yes | Aligned reads | Variant calling based on reference sequence even from other species | Linux, |
| MaCH | Yes | Genotype likelihoods | Variant calling with or without LD information | Windows, Linux, Mac OSX |
| IMPUTE2 | Yes | Candidate SNPs and genotype likelihoods | Variant calling and linkage map-based SNP imputation | Windows, Linux, |
Figure 2Validation of a T/C SNP by a KASPar assay (KBiosciences, Herts, England). Genotypes with a “T” are represented by black dots with a white cross clustered in the upper left and those with a “C” by white dots with a black cross in the bottom right cluster. The two black dots near the bottom left are negative controls. No heterozygous individuals were present in this population.
Commonly used genotyping platforms.
| Name | Assay type | Technology | Throughput | Multiplexing | Relative scale |
|---|---|---|---|---|---|
| Genechip | Hybridization | Oligo nucleotide array | 96/5 days | Up to 18 × 106 | Small/large |
| Infinium II | Hybridization | Bead array | Up to 128/5 days | Up to 13 × 106 | Large/small-large |
| Goldengate | Primer extension-ligation | Bead array | 172/3 days | Up to 3,072 | Medium/large |
| iPlex | Primer extension | Mass spectrometry (MALDI-TOF) | 3840/2.5 days | Up to 40 | Medium/large |
| Taqman | PCR | Taqman probe | Up to 1536/day | Up to 256 | Medium/medium |
| SNPlex | PCR | Capillary electrophoresis | Up to 1536/3 days | Up to 48 | Medium/large |
| KASPar | PCR | FRET quenching oligos | Up to 96/day | — | Medium/large |
| Invader | Primer annealing/endonuclease digestion | FRET quenching oligos | Up to 384/day | Up to 200,000 | Medium/large |
| HRM | PCR | Melting curve analysis | Up to 1536/day | — | Medium/large |