| Literature DB >> 28056804 |
Chun Hang Au1, Anskar Y H Leung2, Ava Kwong3,4,5, Tsun Leung Chan1, Edmond S K Ma6.
Abstract
BACKGROUND: Complex insertions and deletions (indels) from next-generation sequencing (NGS) data were prone to escape detection by currently available variant callers as shown by large-scale human genomics studies. Somatic and germline complex indels in key disease driver genes could be missed in NGS-based genomics studies.Entities:
Keywords: Bioinformatics; Complex indel; Next-generation sequencing; Variant calling
Mesh:
Year: 2017 PMID: 28056804 PMCID: PMC5217656 DOI: 10.1186/s12864-016-3449-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1INDELseek algorithm as illustrated by the BRCA2 complex indel of sample 2. Left: INDELseek directly reads NGS read alignments in the standard SAM/BAM format. After refining matches and mismatches in the supplied alignments, clusters of closely spaced mismatches, insertions and/or deletions are identified as potential complex indel calls. False positives are removed according to filters based on read base quality, allele frequency and allele sequencing depth. Final complex indel calls are reported in the standard VCF format. Right: A representative BWA-MEM alignment of a sample 2 NGS read was shown. The corresponding reference sequence (chr13:g.32912956_32912969) and base calls of the read were shown above the below the alignment, respectively. In the alignment refinement step, M operators were refined as matches (=) and mismatches (X). A cluster of closely spaced variants was identified as a potential complex indel call (highlighted as a red box). The complex indel call passed the defined quality thresholds and was reported as a variant call in VCF format, which corresponds to the BRCA2 complex indel c.4467_4474delinsTGTTTTT
Evaluation of INDELseek complex indel detection performance
| Dataset | Sample count and description | Sensitivity | Specificity |
|---|---|---|---|
| Real NGS data | |||
| 1. Protein-coding and flanking regions from whole-genome sequencing (random fragments) | 1 (NA12878) | 100% | 100% |
| 160 putative complex indels | |||
| 26 negative control loci | |||
| 2. Hereditary breast and/or ovarian cancer panel (amplicons) | 239 | 100% | 100% |
| 3 positive samples ( | |||
| 236 negative samples | |||
| 3. Myeloid neoplasm panel (amplicons) | 23 | 100% | 100% |
| 5 positive samples ( | |||
| 18 negative samples (NA12878 and 17 healthy controls) | |||
| Semi-simulated data by engineering mutations to real NGS data | |||
| 1. Whole-genome sequencing (random fragments) | 8671 collected from COSMIC and dbSNP | 93.7% | N/A |
| 2. Hereditary breast and/or ovarian cancer panel (amplicons) | 237 collected from COSMIC and dbSNP | 96.2% | N/A |
| 3. Myeloid neoplasm panel (amplicons) | 576 collected from COSMIC and dbSNP | 94.6% | N/A |
N/A Not applicable
Fig. 2Types of complex indels detected by INDELseek. a Net deletion of bases (e.g. chr3:g.190106073_190106074delGGinsC). b No net change in length (e.g. chr1:g.24201919_24201920delTTinsCC). c Net insertion of bases (e.g. chr15:g.41483633_41483636delCACCinsACACT). Corresponding alignments of reference (Ref) and variant (Var) sequences are shown
Complex indels detected by INDELseek in human clinical samples
| Sample | Gene | Mutation | Allele frequency | Sequencing depth (X) | NGS method | Orthogonal validation |
|---|---|---|---|---|---|---|
| Germline pathogenic mutations in hereditary breast and/or ovarian cancers | ||||||
| 1 |
| c.4046_4047delinsA p.Thr1349Lysfs*17 | 37.9% | 730 | * | † |
| 2 |
| c.4467_4474delinsTGTTTTT p.Lys1489Asnfs*15 | 74.9% | 1272 | * | † |
| 3 |
| c.8400_8402delinsAAAA p.Phe2801Lysfs*11 | 33.6% | 4141 | * | † |
| Somatic pathogenic mutations in myeloid neoplasms | ||||||
| 4 |
| c.1102_1136delinsT p.Lys368Trpfs*51 | 40.8% | 2274 | ‡ | † |
| 5 |
| c.1154delAinsCTTGTC p.Lys385Thrfs*47 | 31.9% | 2998 | ‡ | † |
| 6 |
| c.1129_1154delinsTGTC p.Lys377Cysfs*46 | 73.6% | 2159 | ‡ | † |
| 7 |
| c.1118_1125delinsCTTG p.Asp373Alafs*56 | 15.3% | 3603 | ‡ | § |
| 8 |
| c.1620_1627delinsGA p.Ile540_Glu543delinsMetLys | 57.7% | 4629 | ‡ | † |
| 9 |
| c.1248_1257delinsTTGG p.Thr417_Asp419delinsTrp | 39.0% | 11109 | ‡ | * |
| 10 |
| c.1248_1256delinsTTTCCG p.Thr417_Asp419delinsPheArg | 2.9% | 13724 | ‡ | * |
|
| c.1249_1258delinsGGATGGAACT p.Thr417_Arg420delinsGlyTrpAsnTrp | 3.3% | 13651 | ‡ | * | |
|
| c.1250_1258delinsAACCTC p.Thr417_Asp419delinsLysPro | 11.9% | 13525 | ‡ | * | |
|
| c.1251_1258delinsCTCCT p.Tyr418_Arg420delinsSerTrp | 2.1% | 13376 | ‡ | * | |
| 11 |
| c.1250_1256delinsT p.Thr417_Asp419delinsIle | 5.7% | 7326 | ‡ | § |
|
| c.1251_1257delinsAACA p.Tyr418_Asp419delinsThr | 2.2% | 7416 | ‡ | § | |
| 12 |
| c.1251_1256delinsGGG p.Tyr418_Asp419delinsGly | 2.7% | 14829 | ‡ | * |
| 13 |
| c.1253_1258delinsCCG p.Tyr418_Arg420delinsSerGly | 40.7% | 68180 | ‡ | * |
| 14 |
| c.1256_1257delinsGTCTA p.Asp419delinsGlyLeu | 17.9% | 19042 | ‡ | * |
*Microfluidic PCR and MiSeq sequencing
†Sanger sequencing
‡Probe extension/ligation and MiSeq sequencing
§PCR fragment analysis
Gained or rescued protein-truncating effect of complex indels
| Gene | Genomic position | Multiple-nucleotide variants (MNV) | Predicted protein change | |
|---|---|---|---|---|
| MNV called as a haplotype | MNV called as separate single-nucleotide variants | |||
| Gained protein-truncating effect | ||||
|
| 13:32914101-32914102 | c.5609_5610delTCinsAG |
| p.Phe1870Tyr, p.Phe1870Leu |
|
| 17:41245984-41245987 | c.1561_1564delGCAGinsTAAA |
| p.Asp522Asn, p.Ala521Glu, p.Ala521Ser |
|
| 17:41244552-41244553 | c.2995_2996delCTinsTA |
| p.Leu999Gln, p.= |
|
| 17:7578486-7578488 | c.442_444delGATinsTGA |
| p.Asp148Glu, p.Asp148Gly, p.Asp148Tyr |
|
| 17:7578286-7578287 | c.562_563delCTinsTA |
| p.Leu188Gln, p.= |
| Rescued protein-truncating effect | ||||
|
| 17:7579366-7579368 | c.319_321delTACinsCAA | p.Tyr107Gln |
|
|
| 17:7578535-7578536 | c.394_395delAAinsTG | p.Lys132Trp | p.Lys132Arg, |
|
| 17:7578433-7578434 | c.496_497delTCinsGG | p.Ser166Gly |
|
|
| 17:7578426-7578431 | c.499_503delinsTACCT | p. Gln167_His168delinsTyrLeu | p.His168Leu, p.Gln167His, |
|
| 17:7578210-7578212 | c.637_639delCGAinsTGG | p.Arg213Trp | p.=, |
|
| 17:7577508-7577509 | c.772_773delGAinsTT | p.Glu258Leu | p.Glu258Val, |
Bold text indicates predicted protein truncation