| Literature DB >> 21635747 |
Jose M Blanca1, Laura Pascual, Peio Ziarsolo, Fernando Nuez, Joaquin Cañizares.
Abstract
BACKGROUND: The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application.Entities:
Mesh:
Year: 2011 PMID: 21635747 PMCID: PMC3124440 DOI: 10.1186/1471-2164-12-285
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
ngs_backbone filters for SNV selection.
| Description and pass conditions | Value | |
|---|---|---|
| Frequency of most frequent allele in the selected pool allele is less than | 0.80 | |
| Percentage of divergence in the unigene is smaller than or equal to | 4 | |
| No duplicated or fragment regions are detected by Blast | -- | |
| The distance from intron/exon boundary is greater than | 30 | |
| The distance to ends of unigene is greater than | 30 | |
| The distance from neighboring SNPs is greater than | 60 | |
| The SNV can be detected by endonuclease restriction | -- | |
| Select the kind of marker: SNP or indel | -- | |
| Frequency of most frequent allele in the selected libraries is less than | 0.67 |
Figure 1ngs_backbone statistical analysis. Sequence length distribution of cleaned Sanger (a) and Illumina (b) sequences. Boxplot of quality pair base lecture with respect to sequence position of Sanger (c) and Illumina (d) sequences. Alignment sequence coverage distribution of Sanger (e) and Illumina (f) sequences.
SNVs detected
| SA | IL | HL | |
|---|---|---|---|
| 17237 | 19052 | 23306 | |
| 3389 | 3044 | 5410 | |
| 13848 | 16008 | 27896 | |
| 16575 | 17005 | 30827 | |
| - | 9903 | - | |
| 16575 | 9640 | 23360 |
SA: Sanger collection: SNVs detected with Sanger sequences.
IL: Illumina collection: SNVs detected with Illumina sequences.
HL: Higher likelihood collection: SNVs detected with Illumina and Sanger sequences.
SNVs selected in the different collections using different ngs_backbone filters.
| SA | IL | HL | CO | PO | |
|---|---|---|---|---|---|
| 16575 | 9640 | 23360 | 2855 | 514 | |
| 11312 | 6763 | 16150 | 1925 | 294 | |
| 16502 | 9619 | 23271 | 2847 | 507 | |
| 16249 | 8996 | 22523 | 2722 | 510 | |
| 4360 | 3434 | 6934 | 860 | 291 | |
| 6155 | 4765 | 9730 | 1190 | 98 | |
| 645 | 480 | 996 | 129 | 25 |
SA: Sanger collection: SNVs detected with Sanger sequences.
IL: Illumina collection: SNVs detected with Illumina sequences.
HL: Higher likelihood collection: SNVs detected with Illumina and Sanger sequences.
CO: Common collection: SNVs detected in Illumina and Sanger collections.
PO: Polymorphic collection: SNVs with an estimated frequency of most common allele under 0.67.
EU: Easily usable SNVs set: SNVs selected using UCR I30 and CL30 filters.
Statistics for assayed SNVs in the different collections.
| SNVs | HRM detected | % Polymorphic markers | Average frequency b | ||
|---|---|---|---|---|---|
| 14 | -- | 21.4 | 0.98 | 0.04 | |
| 14 | 12 | 41.7 | 0.95 | 0.09 | |
| 33 | 28 | 71.4 | 0.85 | 0.22 | |
| 15 | 13 | 69.2 | 0.80 | 0.28 |
a Percentage of polymorphic markers: number of polymorphic markers with respect to detected HRM markers or total markers for each set.
b Average frequency of most frequent allele of all detected markers or total markers for each set.
c PIC (polymorphic information index) of all detected markers or total markers for each set.