| Literature DB >> 28851938 |
Yangrae Cho1,2, Chul-Ho Lee3, Eun-Goo Jeong1, Min-Ho Kim1, Jong Hui Hong1, Younhee Ko3,4, Bomnun Lee1, Gilly Yun1, Byong Joon Kim1, Jongcheol Jung1, Jongsun Jung5, Jin-Sung Lee6.
Abstract
Next-generation sequencing (NGS) technology has improved enough to discover mutations associated with genetic diseases. Our study evaluated the feasibility of targeted NGS as a primary screening tool to detect causal variants and subsequently predict genetic diseases. We performed parallel computations on 3.7-megabase-targeted regions to detect disease-causing mutations in 103 participants consisting of 81 patients and 22 controls. Data analysis of the participants took about 6 hours using local databases and 200 nodes of a supercomputer. All variants in the selected genes led on average to 3.6 putative diseases for each patient while variants restricted to disease-causing genes identified the correct disease. Notably, only 12% of predicted causal variants were recorded as causal mutations in public databases: 88% had no or insufficient records. In this study, most genetic diseases were caused by rare mutations and public records were inadequate. Most rare variants, however, were not associated with genetic diseases. These data implied that novel, rare variants should not be ignored but interpreted in conjunction with additional clinical data. This step is needed so appropriate advice can be given to primary doctors and parents, thus fulfilling the purpose of this method as a primary screen for rare genetic diseases.Entities:
Mesh:
Year: 2017 PMID: 28851938 PMCID: PMC5574920 DOI: 10.1038/s41598-017-09247-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Work flow for generation of a binary alignment matrix (BAM). These files were used in subsequent genotyping and variant discovery with four separate variant calling methods.
Summary of specifications for the computation time for 103 DNA samples versus one DNA sample.
| Number of samples | Sample Size | Computation time | ||
|---|---|---|---|---|
| FASTQ to BAM | BAM to VCF | Total (Hour) | ||
| 1 | 1.0GB | 0:45 | 2:30 | 3:15 |
| 1 | 1.5GB | 1:05 | 3:30 | 4:35 |
| 1 | 2.8GB | 2:03 | 6:50 | 8:53 |
| 103 | 165GB (1.5 GB ea.) | 1:05 | 3:30 | 5:00* |
OS: CentOS6.5, CPU: Intel Xeon 2Socket E5520 2.3 GHz 4Core × 200Node (total = 1,600 Cores), Disk Drive: MAHA distributed parallel file system for diagnosis (MAHA-FsDx: 1.4 PetaByte). *The actual time required for the analysis of 103 samples on 200 nodes of a MAHA-FsDx, parallel computer was 5 hours, instead of 4 hours 35 minutes.
Statistics for sequence-read coverages and rare variants leading to the causal variants for rare genetic diseases in each patient.
| Captured regions | 307 genes | 211 genes | 65 genes | |
|---|---|---|---|---|
| Sum of reads on target regions | 575 Mb | 138 Mb | 109 Mb | 43 Mb |
| On target rates1 | 30% | 7.20% | 5.20% | 2.20% |
| Length of target regions | 3.7 Mb | 1.3 Mb | 0.9 Mb | 0.35 Mb |
| Mean read depth2 | 161× | 128× | 120× | 127× |
| Number of variants | NA | 1186 | 762 | 271 |
| Number of rare variants with probable deleterious effects | NA | 8.6 | 6.2 | 3.4 |
| Number of disease candidates | NA | 3.6 | 2.7 | 1.8 |
1[(Sum of reads on target regions)/(sum of all reads, 2.1 Gigabytes)] × 100. 2(Sum of reads on target regions)/(length of target regions).
Figure 2Read depth distribution for the target calculated by the formula: sum of read numbers in each bracket divided by the number of total positions. Read depth distribution for the target was similar for both the 65 and 307 target genes. The blue line almost overlapped the orange line.
Call for disorders based on nucleotide sequence variations detected by next-generation sequencing
| Patient # | Gene | Variants | Variant type | Variant call | Heredity1 |
|---|---|---|---|---|---|
|
| |||||
| 37 | FGFR3 | (Gly382Arg (c.1138G > A, het) | rs28931614 | Yes | AD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 | RB1 | c.1215 + 1G > A | rs587776783 | Yes | AD |
| 89 | SOS1 | p.Arg552Gly(c.1654A > G) | rs137852814 | Yes | AD |
| 102 | WT1 | p.Arg462Trp(c.1384 C > T, het) | rs121907900 | Yes | AD |
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 | GCH1 | p.Glu242*(c.724 G > , het) | stop | Yes | AD |
| 56 | JAG1 | p.Tyr434* (c.1302 C > A, het) | stop | Yes | AD |
| 62 | NF1 | p.Gln554*(c.1660C > T) | stop | Yes | AD |
| 84 | RB1 | p.Arg272* (c.814 A > T, het) | stop | Yes | AD |
|
| |||||
| 34 | FBN1 | c.1698_1712del15(exon13, het) | ins | Yes | AD |
| 55 | JAG1 | c.1 720 + 1dupG(intron13, het) | del | Yes | AD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 | COL1A2 | c.595-1G > C | SNV | Yes | AD |
| 26 | COMP | p.Asp376His (c.1126 G > C,het) | SNV | Yes | AD |
| 54 | IVD | p.Ala29Thr (c.85G > A,het) | SNV | Yes | AR |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 | TSC2 | p.Gly1204Arg(c.3610G > C, het) | SNV | Yes | AD |
|
| |||||
| 39 | GALC | p.Ile562Thr(c.1685T > C, homo) | SNV | No | x 27% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 | BTK | del. Exon6~10 (both copies) | partial del | Yes | 2 copies |
| 29 | DMD | exon2~exon7 duplication | partial dup | Yes | AD |
| 60 | MECP2 | MECP2 duplication | gene dup | Yes | AD |
1Heredity = basis for calling disorders; AD = autosomal dominant inheritance; AR = autosomal recessive inheritance, x = No call. Bold indicates a patient with a disorder resulting from composite heterozygotes, phasing issue not resolved but regarded as trans; 2Disease call was unsuccessful due to a high frequency of alleles among healthy populations.
Summary of variant types in 103 participants.
| Type of variant | # of samples | # of variants | Relevant genes | Detection |
|---|---|---|---|---|
| SNV/Trinucleotide expansion | 6 | 6 | CYP21A21, IKBKG2 PAH1, PARMS3, VPS33B1 | Failure |
| Single Nucleotide Variation (SNV, AD, AR-homo) | 54 | 57 | Success | |
| Compound heterozygote | 13½ 4 | 27 | ABCA12, ACADM, AGL, PRODH, ARSA, ATP7B, GALC, GBA, HEXA, MCCC1, PAH, PKHD1, VPS33B | Success |
| Copy Number variation (CNV) | 7½4 | 8 | PRODH, BTK, DMD, MECP2, PMP22, STS | Success |
| Carrier (het, inheritance type = AR) | 10 | 10 | ATP7B, BTK, GALT, GBA, HBB, IVD, LMNA, MCCC2, PRODH, SLC22A5 | Success |
| Negative control | 12 | 13 | ABCC8, ACADS, ALPL, CFTR, CYP21A2, GALC, GALT, GJB2, GJB2, NF1, OTC, PMP22, VHL | Success |
| Total number | 103 | 121 | ||
| Predefined variant | 121 | |||
| Correct answer | 115 | |||
| Incorrect answer | 22 | 6 | ||
| Analytical sensitivity | 95% | 115/121 |
1Variants in introns were detected by TNGS, but excluded from disease calling due to ambiguity of their biological roles. 2Read depth was zero for IKBKG. 3Repeat expansion from 20 to 25 was not detected. 4Patient number seven was a heterozygote with two types of variants.