| Literature DB >> 26235384 |
Bart J G Broeckx1, Christophe Hitte2, Frank Coopman3, Geert E C Verhoeven4, Sarah De Keulenaer1, Ellen De Meester1, Thomas Derrien2, Jessica Alfoldi5, Kerstin Lindblad-Toh6, Tim Bosmans7, Ingrid Gielen4, Henri Van Bree4, Bernadette Van Ryssen4, Jimmy H Saunders4, Filip Van Nieuwerburgh1, Dieter Deforce1.
Abstract
By limiting sequencing to those sequences transcribed as mRNA, whole exome sequencing is a cost-efficient technique often used in disease-association studies. We developed two target enrichment designs based on the recently released annotation of the canine genome: the exome-plus design and the exome-CDS design. The exome-plus design combines the exons of the CanFam 3.1 Ensembl annotation, more recently discovered protein-coding exons and a variety of non-coding RNA regions (microRNAs, long non-coding RNAs and antisense transcripts), leading to a total size of ≈ 152 Mb. The exome-CDS was designed as a subset of the exome-plus by omitting all 3' and 5' untranslated regions. This reduced the size of the exome-CDS to ≈ 71 Mb. To test the capturing performance, four exome-plus captures were sequenced on a NextSeq 500 with each capture containing four pre-capture pooled, barcoded samples. At an average sequencing depth of 68.3x, 80% of the regions and well over 90% of the targeted base pairs were completely covered at least 5 times with high reproducibility. Based on the performance of the exome-plus, we estimated the performance of the exome-CDS. Overall, these designs provide flexible solutions for a variety of research questions and are likely to be reliable tools in disease studies.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26235384 PMCID: PMC4522663 DOI: 10.1038/srep12810
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Statistics for exome sequencing sixteen dogs.
| Sample | Pool | Total reads | Mapped reads | Duplicate reads | Remaining reads | Remaining (%) | Sequencing depth (x) |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 284,357,886 | 264,735,195 | 9,414,179 | 255,321,016 | 89.8 | 85.8 |
| 2 | 1 | 281,522,490 | 261,170,320 | 9,366,318 | 251,804,002 | 89.4 | 84.3 |
| 3 | 1 | 249,659,670 | 231,433,861 | 7,819,714 | 223,614,147 | 89.6 | 75.4 |
| 4 | 1 | 181,728,820 | 168,679,105 | 4,382,284 | 164,296,821 | 90.4 | 55.5 |
| 5 | 2 | 266,996,086 | 251,028,902 | 17,002,907 | 234,025,995 | 87.7 | 75.3 |
| 6 | 2 | 187,857,302 | 176,207,940 | 12,226,544 | 163,981,396 | 87.3 | 53.9 |
| 7 | 2 | 233,403,500 | 216,361,182 | 13,330,685 | 203,030,497 | 87.0 | 65.0 |
| 8 | 2 | 314,005,584 | 289,641,450 | 23,514,154 | 266,127,296 | 84.8 | 82.9 |
| 9 | 3 | 262,726,150 | 246,019,167 | 14,919,187 | 231,099,980 | 88.0 | 74.8 |
| 10 | 3 | 181,120,464 | 169,294,819 | 10,140,076 | 159,154,743 | 87.9 | 51.6 |
| 11 | 3 | 269,017,896 | 247,287,291 | 16,377,215 | 230,910,076 | 85.8 | 73.1 |
| 12 | 3 | 243,350,554 | 227,421,662 | 12,820,659 | 214,601,003 | 88.2 | 69.5 |
| 13 | 4 | 154,004,914 | 142,631,944 | 9,095,086 | 133,536,858 | 86.7 | 42.6 |
| 14 | 4 | 193,942,804 | 175,552,484 | 13,670,936 | 161,881,548 | 83.5 | 50.1 |
| 15 | 4 | 221,094,842 | 204,380,382 | 15,086,795 | 189,293,587 | 85.6 | 59.5 |
| 16 | 4 | 364,079,702 | 337,983,155 | 31,679,889 | 306,303,266 | 84.1 | 93.9 |
Regions with a sequencing depth below 5x.
| Sample | Pool | Regions with minimum sequencing depth <5x (%) | Regions with maximum sequencing depth < 5x (%) |
|---|---|---|---|
| 1 | 1 | 42,705 (17.6) | 6,977 (2.9) |
| 2 | 1 | 42,749 (17.6) | 6,980 (2.9) |
| 3 | 1 | 45,739 (18.8) | 7,307 (3.0) |
| 4 | 1 | 55,643 (22.9) | 9,346 (3.8) |
| 5 | 2 | 41,798 (17.2) | 8,502 (3.5) |
| 6 | 2 | 54,390 (22.4) | 10,953 (4.5) |
| 7 | 2 | 50,032 (20.6) | 10,439 (4.3) |
| 8 | 2 | 40,793 (16.8) | 8,238 (3.4) |
| 9 | 3 | 44,312 (18.2) | 8,884 (3.7) |
| 10 | 3 | 57,615 (23.7) | 11,185 (4.6) |
| 11 | 3 | 44,956 (18.5) | 8,698 (3.6) |
| 12 | 3 | 46,948 (19.3) | 9,125 (3.8) |
| 13 | 4 | 66,381 (27.3) | 12,380 (5.1) |
| 14 | 4 | 57,903 (23.8) | 10,446 (4.3) |
| 15 | 4 | 41,031 (16.9) | 7,457 (3.1) |
| 16 | 4 | 54,075 (22.3) | 10,156 (4.2) |
Figure 1Relation between the minimal percentage covered of each region (%) and the percentage of the total number of regions (%).
For each individual region, the proportion of the region covered at a minimum sequencing depth of 5x, was calculated.
Coverage of targeted base pairs (≥5x).
| Sample | Pool | base pairs exome-plus (%) | base pairs exome-CDS (%) |
|---|---|---|---|
| 1 | 1 | 145,066,553 (95.6) | 67,225,626 (94.3) |
| 2 | 1 | 145,081,909 (95.6) | 67,250,020 (94.4) |
| 3 | 1 | 144,607,207 (95.3) | 67,007,231 (94.0) |
| 4 | 1 | 143,036,351 (94.3) | 66,060,290 (92.7) |
| 5 | 2 | 145,147,282 (95.7) | 67,070,797 (94.1) |
| 6 | 2 | 143,617,018 (94.7) | 66,121,333 (92.8) |
| 7 | 2 | 144,051,293 (95.0) | 66,351,108 (93.1) |
| 8 | 2 | 145,267,122 (95.8) | 67,097,056 (94.2) |
| 9 | 3 | 144,802,496 (95.5) | 66,904,165 (93.9) |
| 10 | 3 | 143,126,270 (94.3) | 65,865,667 (92.4) |
| 11 | 3 | 144,861,681 (95.5) | 66,904,165 (93.9) |
| 12 | 3 | 144,585,713 (95.3) | 66,786,409 (93.7) |
| 13 | 4 | 142,170,547 (93.7) | 65,328,731 (91.7) |
| 14 | 4 | 143,170,286 (94.4) | 66,018,207 (92.7) |
| 15 | 4 | 145,360,509 (95.8) | 67,273,067 (94.4) |
| 16 | 4 | 143,449,544 (94.6) | 66,120,048 (92.8) |
Figure 2Comparison of the GC content per region (% GC) for the completely covered regions with a minimum sequencing depth of 5x (=group 1), the regions with a varying sequencing depth (=group 2) and the regions with a maximum sequencing depth of 4x (=group 3).
Each box represents the 25th (Q1), median (Q2) and the 75th (Q3) quartile, the whiskers represent 1.5 times the interquartile range (Q3–Q1). Outliers are represented as circles. Vertical lines represent the cutoffs at 32.0% GC and 64.5% GC. The yellow boxes represent the values for all the regions in a group. The green boxes represent the values for the remaining regions after exclusion of the regions that Roche Nimblegen predicted to not being sequenced. The red boxes represent the values for the remaining regions after exclusion of all the regions with repeats and Ns.
Variants called inside the target regions.
| Sample | Pool | Exome-plus (n) | Exome-CDS (n) |
|---|---|---|---|
| 1 | 1 | 266,334 | 118,686 |
| 2 | 1 | 267,695 | 119,322 |
| 3 | 1 | 259,499 | 115,874 |
| 4 | 1 | 250,196 | 110,047 |
| 5 | 2 | 271,834 | 118,987 |
| 6 | 2 | 269,445 | 117,882 |
| 7 | 2 | 269,995 | 118,085 |
| 8 | 2 | 273,081 | 119,495 |
| 9 | 3 | 278,462 | 122,429 |
| 10 | 3 | 274,222 | 119,880 |
| 11 | 3 | 278,688 | 122,285 |
| 12 | 3 | 262,793 | 116,038 |
| 13 | 4 | 254,919 | 111,805 |
| 14 | 4 | 260,454 | 114,874 |
| 15 | 4 | 269,313 | 118,527 |
| 16 | 4 | 262,786 | 114,849 |
Figure 3Venn diagram showing the overlap between the exome-1.0 (=52 Mb), exome-CDS (=71 Mb) and the exome-plus (=152 Mb).
The depicted numbers represent the size in Mb for the various intersections. Overall, 34.77 Mb is shared by all designs. Inside the target space of the exome-plus, the exome-1.0 targets 17.57 Mb more than the exome-CDS and the exome-CDS targets 36.37 Mb more than the exome-1.0. Finally, 0.09 Mb, 0.12 Mb and 62.99 Mb are targeted uniquely by the exome-1.0, the exome-CDS and the exome-plus, respectively.
Performance parameters of the exome-plus, the exome-CDS and the exome-1.0.
| exome-plus | exome-CDS | exome-1.0 | |
|---|---|---|---|
| fully covered regions (%) | 79.7 | 85.4 | 84.9 |
| base pairs covered (%) | 95.1 | 93.4 | 90.2 |
| % on target (%) | 75.8 | — | 90.4 |
| reproducibility regions (%) | 63.5 | 71.4 | 79.9 |
| reproducibility base pairs (%) | 90.4 | 87.7 | 87.4 |
| non-reference variants (n) | 266,857 | 117,442 | 61,820 |