| Literature DB >> 24689082 |
Maren F Hansen1, Ulrike Neckmann1, Liss A S Lavik2, Trine Vold2, Bodil Gilde2, Ragnhild K Toft1, Wenche Sjursen3.
Abstract
The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing.Entities:
Keywords: Amplicon sequencing; MLH1; MSH2; MSH6; PMS2; diagnostics; hereditary colorectal cancer; massive parallel sequencing; mismatch repair
Year: 2014 PMID: 24689082 PMCID: PMC3960061 DOI: 10.1002/mgg3.62
Source DB: PubMed Journal: Mol Genet Genomic Med ISSN: 2324-9269 Impact factor: 2.183
Figure 1Workflow for sequencing the MMR genes. For each sample 88 singleplex, PCRs are setup. The workflow is then separated into two simultaneous workflows. 10 amplicons unsuited for MPS are analyzed by Sanger sequencing. This generates 20 cycle sequencing reactions (forward and reverse direction) that are analyzed with capillary electrophoresis using a 3130xl Genetic Analyzer. Data analysis is done with the software SeqScape v. 2.5 and a variant report is generated. The remaining 78 amplicons are pooled into eight multiplex PCRs that adds the sample-specific MIDs. Subsequently, all multiplex reactions to be analyzed in a single-GS Junior run are pooled to a total pool prior to the emulsion PCR. Sequencing of the enriched amplicons is performed by the GS Junior benchtop sequencer from Roche. Data analysis is done with the GS Amplicon Variant Analyzer (AVA) software and a variant report is generated. Ultimately, variant reports are combined to a final result report for each patient and test results are sent to the requisitioner.
Figure 2Histograms of signal distributions from FP and TP variants. (A) Distribution from an overcall of a homopolymeric stretch of 6-mer As. The stretch was called containing 7-mer As in 23% of the reads. A single peak around 6-mer with long tails indicates a false-positive call. (B) Signal distribution from a true call (c.680_683del in MSH2) leading to the sequence change GAAAGAAAAAAAG→GAAAAAAAG. This distribution shows strong evidence for both 3-mer and 7-mer. Dual peaks with distributions for both forward and reverse reads centered on the values 3-mer and 7-mer indicate a true-positive variant. (C) Signal distribution from an overcall of a homopolymeric stretch of 6-mer Ts. The stretch was called containing 7-mer Ts in 24% of the reads. A single peak around 6-mer Ts with long tails indicates a false-positive call. (D) Signal distribution from a true-positive call (c.*85T>A in MSH6) of a variant located between two homopolymeric regions (TTTTTTAAAAA). This distribution indicates both 5-mer and 6-mer Ts for this homopolymeric region. Note that histogram (A and B) are made up of more reads than histogram (C and D). Higher coverage gives nicer distributions and facilitate interpretation.
All true-positive variants detected by massive parallel sequencing.
| Gene | DNA | dbSNP rsID | Protein | Class | # Samples |
|---|---|---|---|---|---|
| c.-7C>T | rs104894994 | p.(=) | 3 | 1 | |
| c.-28A>G | rs56198082 | p.(=) | 3 | 1 | |
| c.-93G>A | rs1800734 | p.(=) | 1 | 10 | |
| c.39_40dup | Not found | p.(Thr14Argfs*4) | 5 | 1 | |
| c.655A>G | rs1799977 | p.(Ile219Val) | 1 | 19 | |
| c.866_867del | Not found | p.(His289Profs*17) | 5 | 1 | |
| c.1411_1414del | rs63751592 | p.(Lys471Aspfs*19) | 5 | 1 | |
| c.1558+14G>A | rs41562513 | p.(=) | 1 | 2 | |
| c.1668-19A>G | rs9876116 | p.(=) | 1 | 22 | |
| c.1771dup | Not found | p.(Asp591Glyfs*2) | 5 | 1 | |
| c.1852_1853delinsGC | rs35502531 | p.(Lys618Ala) | 1 | 1 | |
| c.1959G>T | rs1800146 | p.(=) | 2 | 2 | |
| c.*35_*37del | rs193922366 | p.(=) | 2 | 3 | |
| c.-118T>C | rs2303425 | p.(=) | 1 | 9 | |
| c.211+9C>G | rs2303426 | p.(=) | 1 | 25 | |
| c.571_573del | Not found | p.(Leu191del) | 4 | 1 | |
| c.680_683del | Not found | p.(Arg227Lysfs*18) | 5 | 1 | |
| c.965G>A | rs4987188 | p.(Gly322Asp) | 2 | 2 | |
| c.969_970del | Not found | p.(Gtn324Valfs*8) | 5 | 1 | |
| c.1511-9A>T | rs12998837 | p.(=) | 1 | 9 | |
| c.1661+12G>A | rs3732183 | p.(=) | 1 | 21 | |
| c.1666T>C | rs61756466 | p.(=) | 3 | 1 | |
| c.1705_1706del | rs63751463 | p.(Glu569Ilefs*2) | 5 | 1 | |
| c.1759G>C | rs63751140 | p.(Gly587Arg) | 5 | 1 | |
| c.1786_1788del | rs63749831 | p.(Asn596del) | 4 | 1 | |
| c.2006-6T>C | rs2303428 | p.(=) | 1 | 8 | |
| c.2120_2122delins14 | Not found | p.(Cys707Serfs*3) | 5 | 1 | |
| c.-159C>T | rs41540312 | p.(=) | 1 | 11 | |
| c.116G>A | rs1042821 | p.(Gly39Glu) | 1 | 3 | |
| c.186C>A | rs1042820 | p.(=) | 1 | 11 | |
| c.260+22C>G | rs55927047 | p.(=) | 1 | 11 | |
| c.276A>G | rs1800932 | p.(=) | 1 | 10 | |
| c.540T>C | rs1800935 | p.(=) | 1 | 19 | |
| c.628-56C>T | rs1800936 | p.(=) | 1 | 11 | |
| c.642C>T | rs1800937 | p.(=) | 1 | 12 | |
| c.1186C>G | rs2020908 | p.(Leu396Val) | 2 | 3 | |
| c.1405del | Not found | p.(Tyr469Ilefs*12) | 5 | 1 | |
| c.1943del | Not found | p.(Ser648Metfs*6) | 5 | 1 | |
| c.2302_2304del | rs63750647 | p.(Pro768del) | 5 | 1 | |
| c.2633T>C | rs2020912 | p.(Val878Ala) | 3 | 1 | |
| c.3261dup | Not found | p.(Phe1088Leufs*5) | 5 | 1 | |
| c.3438+14A>T | rs2020911 | p.(=) | 1 | 19 | |
| c.3438+14delinsTT | Not found | p.(=) | 3 | 1 | |
| c.3439-16C>T | rs192614006 | p.(=) | 2 | 1 | |
| c.3514dup | rs63751327 | p.(Arg1172Lysfs*5) | 5 | 1 | |
| c.3699_3702dup | Not found | p.(Leu1235Argfs*4) | 4 | 1 | |
| c.3804dup | rs267608118 | p.(Cys1269Metfs*6) | 5 | 1 | |
| c.3832_3845del | Not found | p.(Pro1278Tyrfs*6) | 4 | 1 | |
| c.3848_3850dup | Not found | p.(Ile1283dup) | 3 | 1 | |
| c.4001+12_4001+15del | 267608134 | p.(=) | 3 | 1 | |
| c.4001+42_4001+45dup | Not found | p.(=) | 3 | 1 | |
| c.*85T>A | rs2020906 | p.(=) | 2 | 2 | |
| c.-154C>G | rs3735296 | p.(=) | 1 | 7 | |
| c.52A>G | rs63750123 | p.(Ile18Val) | 3 | 1 | |
| c.59G>A | rs10254120 | p.(Arg20Gln) | 1 | 6 | |
| c.251-72A>G | rs117831773 | p.(=) | 1 | 2 | |
| c.288C>T | rs12532895 | p.(=) | 1 | 6 | |
| c.705+17A>G | rs62456182 | p.(=) | 1 | 20 | |
| c.823C>T | Not found | p.(Gln275*) | 4 | 1 | |
| c.989-1G>T | Not found | p.(=) | 5 | 2 | |
| c.1408C>T | rs1805321 | p.(Pro470Ser) | 1 | 20 | |
| c.1437C>G | rs63750685 | p.(His479Gln) | 1 | 1 | |
| c.1454C>A | rs1805323 | p.(Thr485Lys) | 1 | 7 | |
| c.1531A>G | rs2228007 | p.(Thr511Ala) | 1 | 4 | |
| c.1621G>A | rs2228006 | p.(Glu541Lys) | 1 | 11 | |
| c.1688G>T | rs63750668 | p.(Arg563Leu) | 2 | 1 | |
| c.1866G>A | rs1805324 | p.(Met622Ile) | 1 | 1 | |
| c.1970del | Not found | p.(Asn657Ilefs*8) | 4 | 1 | |
| c.2006+6G>A | rs111905775 | p.(=) | 1 | 7 | |
| c.2007-4G>A | rs1805326 | p.(=) | 1 | 6 | |
| c.2007-7C>T | rs55954143 | p.(=) | 1 | 6 | |
| c.2156del | Not found | p.(Gln719Argfs*6) | 4 | 1 |
The columns shown in order are: Gene name, variant at DNA level, dbSNP rsID, variant at protein level, proposed class of variants, and number of samples the variants were detected in.
PMS2 (NG_008466.1), MSH6 (NG_007111.1), MLH1 (NG_007109.1), and MSH2 (NG 007110.1).
Variant classes are: 1 = neutral, 2 = likely neutral, 3 = uncertain, 4 = likely pathogenic, and 5 = pathogenic. Classification was done based on published literature, prediction tools, frequency (both publicly available and from our own diagnostic database) and conservation of nucleotides and amino acids.
Variants sequenced to test indel detection capabilities of the GS Junior platform.
PMS2 variants confirmed by cDNA sequencing.
r[=]+[989_1015del].
Figure 3Variant frequency of all true variants identified in run 1–4 plotted against the coverage of their respective amplicons.
Coverage results run 1–4.
| Run 1 | Run 2 | Run 3 | Run 4 | |
|---|---|---|---|---|
| Samples | 8 | 8 | 8 | 8 |
| Amplicons | 624 | 624 | 624 | 624 |
| Passed reads | 143,904 | 116,190 | 90,725 | 118,367 |
| Mapped reads | 120,796 | 87,873 | 68,973 | 82,598 |
| Min/avg/max | 0/194/791 | 13/141/873 | 0/111/326 | 24/132/420 |
| Coverage SD | 108 | 87 | 51 | 53 |
| Variation coefficient | 0.56 | 0.62 | 0.46 | 0.40 |
| Spread corr. factor 90%/95% | 2.23/2.85 | 2.07/2.47 | 1.94/2.35 | 1.84/2.21 |
| Sample capacity | 10/8 | 11/9 | 12/10 | 13/10 |
| Amplicons with no coverage | 1 | 0 | 4 | 0 |
| # amplicons <38 | 9 | 13 | 14 | 7 |
Calculated based on average mapped reads (90,060) for the four sequencing runs.
Figure 4Distribution of coverage for each of the 78 amplicons for MSH2,MSH6,MLH1, and PMS2 in our best performing run (run 4). Minimum coverage (38×) threshold is indicated with a light gray line. Seven amplicons all from a single-amplicon (MLH1_ex12B) was below the 38× threshold.
Figure 5Correlation between amplicon length and the mean coverage for each amplicon in run 1 (A) and run 4 (B).