| Literature DB >> 28378423 |
Daniel M Borràs1,2,3, Rolf H A M Vossen4, Michael Liem4, Henk P J Buermans4, Hans Dauwerse5, Dave van Heusden5, Ron T Gansevoort6, Johan T den Dunnen4,5,7, Bart Janssen1, Dorien J M Peters5, Monique Losekoot7, Seyed Yahya Anvar4,5.
Abstract
A genetic diagnosis of autosomal-dominant polycystic kidney disease (ADPKD) is challenging due to allelic heterogeneity, high GC content, and homology of the PKD1 gene with six pseudogenes. Short-read next-generation sequencing approaches, such as whole-genome sequencing and whole-exome sequencing, often fail at reliably characterizing complex regions such as PKD1. However, long-read single-molecule sequencing has been shown to be an alternative strategy that could overcome PKD1 complexities and discriminate between homologous regions of PKD1 and its pseudogenes. In this study, we present the increased power of resolution for complex regions using long-read sequencing to characterize a cohort of 19 patients with ADPKD. Our approach provided high sensitivity in identifying PKD1 pathogenic variants, diagnosing 94.7% of the patients. We show that reliable screening of ADPKD patients in a single test without interference of PKD1 homologous sequences, commonly introduced by residual amplification of PKD1 pseudogenes, by direct long-read sequencing is now possible. This strategy can be implemented in diagnostics and is highly suitable to sequence and resolve complex genomic regions that are of clinical relevance.Entities:
Keywords: ADPKD; DNA diagnostics; PKD1; PacBio; complex genomic regions; long-read sequencing; single-molecule real-time sequencing; variant detection
Mesh:
Substances:
Year: 2017 PMID: 28378423 PMCID: PMC5488171 DOI: 10.1002/humu.23223
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Figure 1Flowchart of the applied analytical approach for the identification of potentially pathogenic variants and VUCS in ADPKD patient samples. Key processes in the workflow describe details and thresholds used for (1) sequencing of pooled LR‐PCR amplified fragments with PacBio RSII and postprocessing of reads including alignments and read quality filters; (2) identification of variants using two independent strategies including the reconstruction of allelic sequences, and small variant calling using Quiver; (3) standardization of variant nomenclature to represent a correct HGVS description and facilitate the comparison between datasets; (4) enrichment of variant annotations with VEP (including effect prediction, ClinVar, SIFT, PolyPhen, 1000 Genomes Project, dbSNP, and SwissProt annotations among others), and selection of high‐confidence variants; (5) identification of potentially pathogenic variants and VUCS based on their confidence, effect prediction, and population frequencies
Figure 2SMRT sequencing and variant calling of LR‐PCR amplicons. A: Sequencing depth (DP; in number of reads) of the alignments to chromosome 16 and chromosome 4. Number of uniquely aligned reads (y axis, blue line) sequenced with PacBio that mapped to PKD1 and PKD2. Off‐target amplification is discriminated from the main PKD1 gene sequences showing alignments to pseudogene homologous sequences at proximal loci (e.g., PKD1P1, PKD1P5, PKD1P6) (blue boxes). B: Mapping quality (MQ; in Phred quality scores; values >90 were scaled down for visualization purposes), and sequencing depth (DP; in number of reads) of uniquely aligned molecules to PKD1 (NM_001009944.2) for the five LR‐PCR fragments amplified. Mapping quality of alignments with even coverage distribution along the amplified fragments (fragments), including regions with SDs, repetitive elements (repeats), and high GC content (GC%). Despite fragments A and E showing lower coverage, compared with the average sequencing depth of ≥421× (minimum ≥19×; maximum 1,528×), they had sufficient coverage for variant calling within the exon regions, including the first exons of PKD1, with average coverage of ≥55× (minimum ≥24×; maximum 91×) (Supp. Table S4). C: We detected 1,506 intron variants (blue) and 177 coding or splice‐site variants (yellow). The predicted transcript effects of coding and splice‐site variants were quantified (bar chart) as log10 count (x axis)
Figure 3Comparison of long‐read detected pathogenic variants or VUCS, uniquely identified per patient (y axis), with the screening results for the PKD1 gene locus (x axis; NM_001009944.2). Most of the pathogenic variants (red) could be confirmed by our long‐read strategy (red bars) with high sensitivity for PKD1. Only a single insertion could not be confirmed for patient 16. Other identified nonpathogenic variants or VUCS are shown as black bars and dots for PacBio and Sanger, respectively. The LoH analysis performed (pink or gray boxes) support the presence of the two large deletions also reported by MLPA (pink boxes). LoH regions are not a direct identification of large deletions but a clear indication of their presence within the amplified LR‐PCR fragments
Uniquely identified pathogenic variants or variants of unknown clinical significance identified by PacBio sequencing
| Patient | Genomic position | Exon | c. notation | p. notation | SNP ID | Freq (%) | Depth | PolyPhen | VEP impact | Comparison with Sanger sequencing | |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
| |
| 3 | chr16 | 2,161,525 | 15 | c.3643C | p.(Leu1215Val) | rs144338515 | 49.2 | 576 | Possibly damaging (0.899) | Moderate | Overlap |
| 3 | chr16 | 2,160,693 | 15 | c.4475G | p.(Arg1492Pro) | 32.8 | 563 | Possibly damaging (0.665) | Moderate | Overlap | |
| 3 | chr16 | 2,157,963 | 16 | c.6986G | p.(Arg2329Gln) | rs575211353 | 43.3 | 538 | Benign (0.37) | Moderate | Overlap |
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| ||
| 5 | chr16 | 2,185,509 | 1 | c.182C | p.(Pro61Leu) | 28.8 | 43 | Benign (0.119) | Moderate | PacBio | |
|
|
|
|
|
|
|
|
|
| |||
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
| |||
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 | chr16 | 2,152,396 | 25 | c.9187C | p.(Arg3063Cys) | rs145906459 | 36.1 | 557 | Benign (0.39) | Moderate | Overlap |
|
|
|
|
|
|
|
|
|
| |||
|
|
|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
|
|
| |
| 14 | chr16 | 2,162,850 | 13 | c.3100A | p.(Asn1034Asp) | rs369180760 | 36.5 | 321 | Benign (0.098) | Moderate | Overlap |
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
| ||
| 19 | chr16 | 2,139,750 | 46 | c.12890A | p.(Lys4297Arg) | rs758833703 | 14.1 | 46 | Benign (0.07) | Moderate | PacBio |
|
|
|
|
|
|
|
|
|
|
| ||
Notes:. Sanger‐detected pathogenic variants are shown in bold. PacBio variants were filtered by coding sequence predicted effects (frameshifts, missense, in‐frame deletions, and splicing variants), as well as DP>15 and >50 subreads, and variant frequency (>10% for substitutions, and >15% for insertions and deletions) (RefSeq NM_001009944.2).
Additional information of each variant including SIFT classification, and 1000G frequencies among other annotations can be obtained from the VCF files uploaded to EGA with accession number EGAS00001002106.