| Literature DB >> 33257779 |
Marc Pauper1, Erdi Kucuk1,2, Alexander Hoischen1,2,3, Lisenka E L M Vissers1,4, Christian Gilissen5,6, Aaron M Wenger7, Shreyasee Chakraborty7, Primo Baybayan7, Michael Kwint1, Bart van der Sanden1,4, Marcel R Nelen1, Ronny Derks1, Han G Brunner1,2,8.
Abstract
Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.Entities:
Mesh:
Year: 2020 PMID: 33257779 PMCID: PMC8115091 DOI: 10.1038/s41431-020-00770-0
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Overview of samples, sequencing statistics, identified variation, and Mendelian inheritance errors.
| Sample | Coverage (×) | # SVs | Total affected sequence (bp) | SV | Unique SVs in cohort | Study-specific SVs | SNV | SNV MIE (%) |
|---|---|---|---|---|---|---|---|---|
| T1P | 15.7 | 29,030 | 12,775,459 | 4409 | 75 | 18,601 | 3,142,916 | 448,808 |
| T1F | 12.6 | 26,475 | 11,340,949 | (15.2%) | 245 | 14,172 | 2,663,675 | (14.3%) |
| T1M | 14.9 | 27,421 | 12,998,704 | 236 | 18,334 | 3,007,973 | ||
| T2P | 17.3 | 28,766 | 12,974,306 | 3186 | 81 | 19,098 | 3,383,890 | 290,676 |
| T2F | 14.2 | 27,412 | 12,595,695 | (11.1%) | 244 | 19,549 | 3,127,690 | (8.6%) |
| T2M | 17.2 | 28,649 | 12,851,920 | 253 | 18,539 | 3,353,748 | ||
| T3P | 15.0 | 28,111 | 12,689,520 | 2681 | 57 | 19,321 | 3,147,660 | 219,392 |
| T3F | 18.3 | 28,971 | 13,327,881 | (9.5%) | 247 | 19,198 | 3,382,461 | (7.0%) |
| T3M | 18.0 | 28,962 | 12,884,749 | 264 | 19,419 | 3,352,767 | ||
| T4P | 16.1 | 28,640 | 12,830,123 | 2763 | 33 | 19,470 | 3,166,126 | 301,415 |
| T4F | 16.7 | 28,746 | 12,540,739 | (9.6%) | 261 | 21,653 | 3,261,812 | (9.5%) |
| T4M | 16.1 | 28,322 | 12,683,375 | 256 | 19,523 | 3,101,729 | ||
| T5P | 41.6 | 33,056 | 14,085,392 | 2130 | 16 | 21,539 | 3,956,435 | 125,023 |
| T5F | 37.6 | 33,277 | 14,235,204 | (6.4%) | 228 | 18,974 | 3,905,927 | (3.2%) |
| T5M | 40.1 | 33,138 | 13,949,579 | 273 | 22,023 | 3,932,800 |
Columns from (left to right) indicate: sample identifier, average coverage across the genome (GRCh38), number of identified SVs (≥50 bp), total number of based affected by SVs, number of SVs in proband with a Mendelian inheritance error (% is indicated below), number of SVs only occurring in this sample, number of SVs only found in this study (not in HG0002 and Audano et al.), number of identified SNVs, number of SNVs in proband with a Mendelian inheritance error (% is indicated below).
F father, M mother, P proband, SV structural variant, MIE Mendelian Inheritance Error, SNV single nucleotide variant.
Fig. 1Comparison of structural variants called with long-read sequencing and short-read sequencing.
A comparison of structural variants identified in Trio 5 between long-read sequencing using PBSV and short-read sequencing using three algorithms for structural variant detection. The plot depicts the number of different structural variants that were identified by each combination of methods, indicated below the corresponding bar. Deletions in red, insertions in blue, and inversions in yellow. The bottom left bar plot depicts the total number of SVs identified with each method. B Pie charts show the number of mendelian inheritance errors for the three types of SVs identified by LRS and the percentage of concordant calls.
Fig. 2Schematic digital ideogram depicting genomic regions larger than 1 Kb without LRS or SRS coverage.
From top to bottom tracks indicate: regions with sequence coverage in SRS but no coverage in LRS (blue); regions with sequence coverage in LRS but no coverage in SRS (red); regions with no sequence coverage in neither LRS and SRS (black); genome regions that are difficult to assess like centromeres, telomeres, and gaps (orange); regions with segmental duplications larger than 20 Kb (green). Note that blue, red, black, and orange regions are mutually exclusive but that suggestive overlap from the figure is due to the limited resolution.
Fig. 3Interpretation of variation identified by LRS but not SRS.
From left to right showing three groups: the number of genes of which coding regions are affected by an SV, the number genes affected by a putatively damaging SNV identified only in LRS, the number of genes of which coding regions are covered less than 10% in SRS but that do have coverage in LRS. Individual bars indicate different types of disorders based on diagnostic gene panels as used by Genome Diagnostics Nijmegen (https://www.radboudumc.nl/en/patientenzorg/onderzoeken/exome-sequencing-diagnostics/information-for-referrers/exome-panels), genetic testing laboratory. Numbers in the legend indicate the total number of genes in each of the gene panels.