| Literature DB >> 32929119 |
Qiandong Zeng1, Natalia T Leach2, Zhaoqing Zhou2, Hui Zhu2, Jean A Smith2, Lynne S Rosenblum2, Angela Kenyon2, Ruth A Heim3, Marcia Eisenberg3, Stanley Letovsky2, Patricia M Okamoto4.
Abstract
Next-generation sequencing (NGS) is widely used in genetic testing for the highly sensitive detection of single nucleotide changes and small insertions or deletions. However, detection and phasing of structural variants, especially in repetitive or homologous regions, can be problematic due to uneven read coverage or genome reference bias, resulting in false calls. To circumvent this challenge, a computational approach utilizing customized scaffolds as supplementary reference sequences for read alignment was developed, and its effectiveness demonstrated with two CBS gene variants: NM_000071.2:c.833T>C and NM_000071.2:c.[833T>C; 844_845ins68]. Variant c.833T>C is a known causative mutation for homocystinuria, but is not pathogenic when in cis with the insertion, c.844_845ins68, because of alternative splicing. Using simulated reads, the custom scaffolds method resolved all possible combinations with 100% accuracy and, based on > 60,000 clinical specimens, exceeded the performance of current approaches that only align reads to GRCh37/hg19 for the detection of c.833T>C alone or in cis with c.844_845ins68. Furthermore, analysis of two 1000 Genomes Project trios revealed that the c.[833T>C; 844_845ins68] complex variant had previously been undetected in these datasets, likely due to the alignment method used. This approach can be configured for existing workflows to detect other challenging and potentially underrepresented variants, thereby augmenting accurate variant calling in clinical NGS testing.Entities:
Year: 2020 PMID: 32929119 PMCID: PMC7490669 DOI: 10.1038/s41598-020-71471-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1c.833T>C and the complex variant, c.[833T>C;844_845ins68] on chromosome 21. (a) Sequence structures of the 68 bp insertion in cis with c.833T>C show that the variant descriptions for c.[833T>C;844_845ins68] and c.832_833ins68 in ClinVar and gnomAD are equivalent. In both panels, coding exon 8 is shaded in gray with the c.833 wild-type base highlighted in pink and the locations of the bases that differ between the wild-type and 68 bp insertion sequences in blue. The duplicated intronic sequence is underlined. In the lower panel, the 68 bp insertion sequences are shown in boxes with the relative locations of the c.833C variant in cis highlighted in yellow. Intronic sequence is in lowercase. (b) Schematic representation of the variant region depicts the exons in blue with the locations of the c.833 base as a normal [T] or pathogenic [C] variant, and the 68 bp insertion between the c.844 and c.845 bases in orange. Dotted lines in gray indicate the region that is excised due to alternative splicing.
Figure 2Alignment of simulated reads to the custom scaffolds and hg19 reference genome. Examples of simulated read alignments to custom scaffolds CBS_MU (left panel) and CBS_WT (middle panel) and to GRCh37/hg19 reference genome (right panel) are visualized using the Integrative Genomics Viewer (IGV; Broad Institute, Cambridge, MA). Arrows point to the locations of the informative bases that are used for variant calling on each scaffold and the hg19 reference. (a) c.833T>C detected at WT:3184 on Ref:CBS_WT or chr21:44,483,184 on Ref:HG19; (b) c.844_845ins68 detected with Ref:CBS_MU at MU:3210G>C; (c) c.[833T>C;844_845ins68] detected at both MU:3210G>C and MU:3252A>G on Ref:CBS_MU; and (d) c.833T>C in trans with c.[833T>C; 844_845ins68] detected at MU:3210G>C and MU:3252A>G as HOM on Ref:CBS_MU and at WT:3184A>G as HET on Ref:CBS_WT. For (a–c), alignments are shown for simulated homozygous samples. For (b–d), a minority of reads with the 68 bp insertion will align to Ref:CBS_WT and Ref:HG19 due to high sequence homology as shown in the per-base tracks for two of the three mismatched bases (blue “C”, green “A”). A third divergent base is not visible because of soft-clipping by the variant caller.
Genotyping and phasing of variants with c.833T>C and/or c.844_845ins68 based on simulated reads.
| Scaffold alignment | Zygosity calls | ||||
|---|---|---|---|---|---|
| Ref:CBS_WT | Ref:CBS_MU | WT:3184 | MU:3210 | MU:3252 | |
| c.833T>C | Y | – | HET or HOM | – | – |
| c.844_845ins68 | – | Y | – | HOM | – |
| c.[833T>C;844_845ins68] | – | Y | – | HOM | HOM |
| c.[833T>C] in | Y | Y | HET | HOM | – |
| c.[833T>C] in | Y | Y | HET | HOM | HOM |
| c.[844_845ins68] in | – | Y | – | HOM | HET |
| Wild-type | Y | – | – | – | – |
Shown are the custom alignment scaffolds used and the expected zygosity of the informative base for each genotype tested by simulation. A heterozygous call (HET) is determined by an allele frequency between 20% and < 80%, while a homozygous call (HOM) has an allele frequency ≥ 80%.
c.833T>C and c.[833T>C; 844_845ins68] carrier rates.
| Ethnicity | Total number of samples | c.833T>C | c.[833T>C;844_845ins68] | |||||
|---|---|---|---|---|---|---|---|---|
| gnomAD | Current study | gnomAD, %AF (n) | Current study, %AF (n) | gnomAD, %AF (n) | Current study, %AF (n) | |||
| All Ethnicities | 13,970 | 60,318 | 0.17 (24) | 0.17 (103) | 1 | 21.10 (2948) | 18.49 (11,153) | < 0.0001 |
| African/African American | 4359 | 177 | 0.05 (2) | 0 | 1 | 38.72 (1688) | 45.2 (80) | 0.08 |
| Latin American | 424 | 323 | 0 | 0 | – | 12.54 (53) | 16.1 (52) | 0.17 |
| Ashkenazi Jewish | 145 | 169 | 0 | 0 | – | 7.93 (11) | 10.65 (18) | 0.44 |
| East Asian | 780 | 382 | 0 | 0 | – | 0.26 (2) | 4.71 (18) | < 0.0001 |
| European | 7718 | 518 | 0.29 (22) | 0.39 (2) | 0.66 | 13.7 (1057) | 14.29 (71) | 1 |
| Other (e.g., mixed ethnicities) | 544 | 292 | 0 | 0 | – | 14.76 (80) | 16.44 (48) | 0.55 |
Comparison of the carrier rates determined in this study v. gnomAD and corresponding P values from the Fisher’s Exact test are shown. The gnomAD population frequencies for c.833T>C were calculated using chromosome counts for rs5742905 for all available groups. Variant allele frequencies for the gnomAD European population are from the NFE group and exclude the Finnish group. The c.[833T>C; 844_845ins68] complex variant is described as c.832_833ins68 in gnomAD. Statistical analysis used α = 0.05 as the level of significance for rejecting the null hypothesis. AF, allele frequency; n, number of positive samples; P, P value.
Figure 3Detection of the c.833T>C single nucleotide variant in with the complex variant, c.[833T>C;844_845ins68]. IGV screenshots display read alignments for a clinical sample in which the pathogenic variant, c.833T>C, is in trans with c.[833T>C;844_845ins68] using either the custom scaffolds or GRCh37/hg19 genome as reference. Variant detection utilizes the informative bases on Ref:CBS_MU and Ref:CBS_WT as indicated by arrows at MU:3210 for the 68 bp insertion, MU:3252 for the c.833T>C variant in cis with the 68 bp insertion and WT:3184 for the c.833T>C variant on the opposite allele (see also Table 1 and Supplementary Fig. S1 online). Per base coverage tracks show the expected zygosity calls for MU:3210, MU:3252 and WT:3184 as HOM, HOM and HET, respectively. Read alignment to Ref:HG19 shows that only the c.833T>C variant can be detected at chr 21:44,483,184.