| Literature DB >> 28011720 |
Valentino Ruggieri1,2, Irantzu Anzar2, Andreu Paytuvi2, Roberta Calafiore1, Riccardo Aiese Cigliano2, Walter Sanseverino2, Amalia Barone1.
Abstract
The recent development of Sequence Capture methodology represents a powerful strategy for enhancing data generation to assess genetic variation of targeted genomic regions. Here, we present SUPER-CAP, a bioinformatics web tool aimed at handling Sequence Capture data, fine calculating the allele frequency of variations and building genotype-specific sequence of captured genes. The dataset used to develop this in silico strategy consists of 378 loci and related regulative regions in a collection of 44 tomato landraces. About 14,000 high-quality variants were identified. The high depth (>40×) of coverage and adopting the correct filtering criteria allowed identification of about 4,000 rare variants and 10 genes with a different copy number variation. We also show that the tool is capable to reconstruct genotype-specific sequences for each genotype by using the detected variants. This allows evaluating the combined effect of multiple variants in the same protein. The architecture and functionality of SUPER-CAP makes the software appropriate for a broad set of analyses including SNP discovery and mining. Its functionality, together with the capability to process large data sets and efficient detection of sequence variation, makes SUPER-CAP a valuable bioinformatics tool for genomics and breeding purposes.Entities:
Keywords: heterozygous variants; sequence reconstruction; target enrichment; web tool analysis
Mesh:
Year: 2017 PMID: 28011720 PMCID: PMC5381350 DOI: 10.1093/dnares/dsw050
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1The SUPER-CAP tool. The main steps and procedures underlying the tool are graphically reported in the workflow.
Alignment statistics for the 44 individual captures (Sample ID)
| Sample ID | Raw reads | Mapped reads | Mapped reads Q > 30 | Mapped reads Q > 30 WD | Alignment rate (%) | On target reads (%) | On target reads + 200 bp (%) | Mean depth of coverage |
|---|---|---|---|---|---|---|---|---|
| 1A | 1,344,456 | 1,327,567 | 1,269,157 | 1,251,088 | 98.74 | 75.45 | 76.06 | 43.2 |
| 3A | 1,314,574 | 1,293,498 | 1,234,679 | 1,213,125 | 98.4 | 75.07 | 75.65 | 42.42 |
| 5A | 1,413,706 | 1,391,398 | 1,331,163 | 1,310,092 | 98.42 | 75.94 | 76.55 | 45.97 |
| 8A | 1,376,510 | 1,342,280 | 1,279,940 | 1,262,285 | 97.51 | 74.93 | 75.51 | 43.94 |
| 14A | 1,580,074 | 1,231,066 | 1,142,665 | 1,128,063 | 77.91 | 67.22 | 67.75 | 39.67 |
| 15A | 1,686,492 | 1,340,263 | 1,274,719 | 1,262,193 | 79.47 | 75.66 | 76.27 | 44.92 |
| 20A | 1,480,340 | 1,162,656 | 1,101,676 | 1,056,514 | 78.54 | 71.73 | 72.28 | 40.27 |
| 21A | 1,304,914 | 1,289,290 | 1,228,450 | 1,211,447 | 98.8 | 74.62 | 75.21 | 41.77 |
| 26A | 1,103,352 | 871,462 | 830,910 | 821,995 | 78.98 | 75.72 | 76.35 | 30.03 |
| 27A | 1,469,212 | 1,436,034 | 1,365,188 | 1,344,283 | 97.74 | 74.73 | 75.3 | 47.02 |
| 28A | 1,354,892 | 1,332,311 | 1,264,586 | 1,190,895 | 98.33 | 70.39 | 70.87 | 40.76 |
| 30A | 1,397,308 | 1,379,189 | 1,319,320 | 1,300,299 | 98.7 | 75.76 | 76.43 | 45.43 |
| 32A | 1,893,224 | 1,480,474 | 1,410,253 | 1,394,575 | 78.2 | 74.55 | 75.18 | 48.69 |
| 34A | 1,891,936 | 1,478,312 | 1,411,043 | 1,396,990 | 78.14 | 75.18 | 75.84 | 48.54 |
| 35A | 1,788,530 | 1,398,636 | 1,333,126 | 1,316,721 | 78.2 | 74.51 | 75.16 | 46.02 |
| 38A | 1,886,966 | 1,468,035 | 1,399,866 | 1,386,113 | 77.8 | 74.57 | 75.21 | 48.23 |
| 40A | 2,651,822 | 2,575,073 | 2,463,126 | 2,432,466 | 97.11 | 76.67 | 77.25 | 88.92 |
| 41A | 2,683,868 | 2,594,090 | 2,482,444 | 2,451,378 | 96.65 | 76.75 | 77.31 | 90.36 |
| 42A | 1,450,568 | 1,424,424 | 1,348,207 | 1,329,068 | 98.2 | 74.28 | 74.84 | 46.26 |
| 43A | 2,082,552 | 2,006,847 | 1,921,991 | 1,899,495 | 96.36 | 77.02 | 77.59 | 70.57 |
| 45A | 2,409,462 | 2,316,282 | 2,221,361 | 2,193,201 | 96.13 | 76.59 | 77.18 | 80.52 |
| 57A | 1,297,772 | 1,275,552 | 1,216,245 | 1,198,785 | 98.29 | 75 | 75.56 | 42.68 |
| 64A | 1,446,368 | 1,429,441 | 1,366,044 | 1,346,218 | 98.83 | 75.98 | 76.56 | 47.44 |
| 66A | 2,267,108 | 2,176,890 | 2,081,441 | 2,057,807 | 96.02 | 76.59 | 77.17 | 74.86 |
| 70A | 1,372,572 | 1,297,126 | 1,223,744 | 1,206,911 | 94.5 | 72.75 | 73.28 | 41.49 |
| 75A | 1,799,384 | 1,396,835 | 1,323,958 | 1,312,523 | 77.63 | 73.81 | 74.38 | 45.59 |
| 78A | 2,060,104 | 1,992,077 | 1,903,906 | 1,876,376 | 96.7 | 76.81 | 77.4 | 72.14 |
| 79A | 1,666,724 | 1,295,896 | 1,235,321 | 1,224,643 | 77.75 | 75.02 | 75.63 | 44.71 |
| 85A | 1,879,204 | 1,439,306 | 1,364,666 | 1,344,083 | 76.59 | 73.83 | 74.42 | 47.2 |
| 87A | 2,342,600 | 2,247,877 | 2,150,105 | 2,124,074 | 95.96 | 76.24 | 76.83 | 77.91 |
| 92A | 2,247,742 | 2,165,543 | 2,069,466 | 2,043,464 | 96.34 | 75.25 | 75.84 | 73.65 |
| 93A | 1,877,134 | 1,465,142 | 1,394,154 | 1,379,154 | 78.05 | 74.36 | 74.95 | 48.29 |
| 94A | 2,105,320 | 2,019,814 | 1,927,953 | 1,904,794 | 95.94 | 75.59 | 76.25 | 68.51 |
| 97A | 2,489,394 | 2,402,528 | 2,299,398 | 2,270,879 | 96.51 | 76.11 | 76.72 | 82.59 |
| 99A | 2,479,650 | 2,380,804 | 2,280,336 | 2,251,710 | 96.01 | 76.49 | 77.11 | 81.97 |
| 102A | 1,760,260 | 1,374,039 | 1,305,068 | 1,293,203 | 78.06 | 73.31 | 73.91 | 44.39 |
| 103A | 1,969,444 | 1,886,176 | 1,780,373 | 1,747,969 | 95.77 | 71.5 | 72 | 61.9 |
| 105A | 1,942,196 | 1,516,300 | 1,450,772 | 1,438,620 | 78.07 | 76.3 | 76.9 | 51.32 |
| 109A | 1,835,360 | 1,428,227 | 1,356,340 | 1,343,895 | 77.82 | 74.8 | 75.39 | 49.28 |
| 111A | 2,541,684 | 2,006,126 | 1,913,540 | 1,893,325 | 78.93 | 76.22 | 76.82 | 68.21 |
| 115A | 1,547,164 | 1,229,818 | 1,166,492 | 1,099,375 | 79.49 | 70.75 | 71.3 | 41.5 |
| 117A | 1,682,246 | 1,333,089 | 1,273,573 | 1,262,226 | 79.24 | 76.42 | 77.03 | 44.83 |
| 118A | 1,665,304 | 1,320,669 | 1,260,016 | 1,248,760 | 79.3 | 75.09 | 75.72 | 43.63 |
| 120A | 1,651,964 | 1,308,823 | 1,249,979 | 1,237,454 | 79.23 | 76.03 | 76.64 | 43.62 |
The number of raw reads (Raw reads), mapped reads (Mapped reads), mapped reads with quality Q > 30 (Mapped reads Q>30) and without duplications (Mapped reads Q > 30 WD) were reported. For each sample, the alignment rate (Alignment rate), the specificity (On target reads), the specificity including the flanking regions of 200 bp (On target reads + 200 bp) and the average depth of coverage (Mean depth of coverage ) are also reported.
Figure 2Distribution of variants detected for each genotype according to the type of variants. Proportion of Ho and He SNP and INDEL are reported as well as the number of private variants. A dendrogram built by using the complete set of variants shows the relationship among the genotypes.
Figure 3Average He variants density (He density) for each gene family (reported as EC number) in each category (LARGE, MEDIUM, SMALL, SINGLE). He density is expressed as the average number of He variants for 10 Kbp. Bar errors represent SD of the mean.
Figure 4Distribution and classification of the discovered variants according to type (transition vs transversion and insertions vs deletion) and according to the genetic feature they were found in (promoter, exon, intron, UTRs).
Distribution of 6,098 genic variants per type of predicted effect (PREDICTED EFFECT) on the relative protein as predicted by SNPeff.
| PREDICTED EFFECT | VARIANTS No |
|---|---|
| 3′ UTR variant | 256 |
| 5′ UTR variant | 155 |
| Intron variant | 4,354 |
| 5′ UTR premature start codon gain variant | 13 |
| Splice region variant & intron variant | 115 |
| Synonymous_variant | 601 |
| Missense variant | 559 |
| Inframe deletion | 7 |
| Inframe insertion | 10 |
| Frameshift_variant | 14 |
| Splice acceptor variant | 2 |
| Stop_gained variant | 9 |
| Stop_lost variant | 3 |