| Literature DB >> 21278161 |
Ilia Zhidkov1, Raphael Cohen, Nophar Geifman, Dan Mishmar, Eitan Rubin.
Abstract
Several methods have been proposed for detecting insertion/deletions (indels) from chromatograms generated by Sanger sequencing. However, most such methods are unsuitable when the mutated and normal variants occur at unequal ratios, such as is expected to be the case in cancer, with organellar DNA or with alternatively spliced RNAs. In addition, the current methods do not provide robust estimates of the statistical confidence of their results, and the sensitivity of this approach has not been rigorously evaluated. Here, we present CHILD, a tool specifically designed for indel detection in mixtures where one variant is rare. CHILD makes use of standard sequence alignment statistics to evaluate the significance of the results. The sensitivity of CHILD was tested by sequencing controlled mixtures of deleted and undeleted plasmids at various ratios. Our results indicate that CHILD can identify deleted molecules present as just 5% of the mixture. Notably, the results were plasmid/primer-specific; for some primers and/or plasmids, the deleted molecule was only detected when it comprised 10% or more of the mixture. The false positive rate was estimated to be lower than 0.4%. CHILD was implemented as a user-oriented web site, providing a sensitive and experimentally validated method for the detection of rare indel-carrying molecules in common Sanger sequence reads.Entities:
Mesh:
Year: 2011 PMID: 21278161 PMCID: PMC3074157 DOI: 10.1093/nar/gkq1354
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Representative fragment of an ABI trace file with mixed intact and 9-bp-deleted templates. The framed bases on the top panel represent the strongest calls, while bottom sequences represent the second-best calls harboring the 9-bp deletion and hence, the sequence frame-shift.
Experimental evaluation of CHILD sensitivity and reproducibility
| % Δ construct | Predicted | % Δ construct | Predicted | ||||
|---|---|---|---|---|---|---|---|
| Loc | Size | Loc | Size | ||||
| Plasmid pMitA | |||||||
| 0 | 1E-111 | 421–423 | 2 | 15 | 8E-168 | 315–366 | 51 |
| NA | NA | NA | 2E-29 | 301–352 | 51 | ||
| 1E-99 | 183–184 | 1 | 1E-99 | 23–24 | 1 | ||
| 2.5 | NA | NA | NA | 20 | 1E-29 | 321–372 | 51 |
| 2E-68 | 354–405 | 51 | 6E-36 | 321–372 | 51 | ||
| 1E-99 | 105–106 | 1 | 1E-99 | 292–344 | 52 | ||
| 5 | 4E-129 | 319–370 | 51 | 25 | 2E-33 | 313–364 | 51 |
| 7E-140 | 323–374 | 51 | 7E-179 | 321–372 | 51 | ||
| 2E-10 | 312–363 | 51 | 1E-99 | 308–360 | 52 | ||
| 7.5 | 4E-19 | 317–368 | 51 | 30 | 1E-158 | 320–371 | 51 |
| 4E-19 | 310–361 | 51 | 4E-161 | 323–374 | 51 | ||
| 4E-27 | 320–371 | 51 | 1E-99 | 293–345 | 52 | ||
| 10 | 3E-25 | 321–372 | 51 | 50 | 2E-29 | 323–374 | 51 |
| 2E-23 | 323–374 | 51 | 6E-47 | 389–440 | 51 | ||
| 6E-129 | 304–356 | 52 | 3E-29 | 410–461 | 51 | ||
The table shows an analysis of the 51- and 9-bp deletion constructs with CHILD.
(Upper part) Results for pMitA and pMitAΔ51.
Analysis was conducted for chromatograms generated with increasing concentration of the deletion-containing construct (%Δconstruct) and with two different sequencing primers (T7 and SP6).
The confidence that the indel is not the result of noise (P-value), the indel positions in the corresponding ABI trace file (LOC) and the inferred length of the indel (size) are given as provided by CHILD, analyzing each biological replication separately.
(lower part) The same analysis as in upper part, using pMitB and pMitBΔ9.
NA: no indel was found in the corresponding ABI trace file (the alignment was statistically insignificant or otherwise incompatible with an indel).
aIndels of length 1–2 bp are likely to be artificial (see text) and are ignored in subsequent analysis.
Figure 2.Examples of n + 1-shifted chromatograms (‘shadow sequences’). Alignment of secondary (upper) and primary (lower) sequences generated using (A) a highly diluted mixture of plasmids pMitA and pMitAΔ51 (2.5%) or (B) a pure sample of the pGEM ezf+ plasmid and a column-purified T7 primer. The primary and secondary sequences from each chromatogram were aligned using SSEARCH (see ‘Materials and Methods’ section).
The performance of indel detection algorithms with controlled mixtures of indel constructs
| % Δ construct | primer | CHILD | ShiftDetector | Indelligent |
|---|---|---|---|---|
| 2.5 | M13 | – | 1 | |
| 1 | – | 1 | ||
| – | ||||
| – | 1 | |||
| 5 | M13 | – | 1 | |
| – | 1 | |||
| – | 1 | |||
| 7.5 | M13 | 1 | ||
| 1 | ||||
| – | 1 | |||
| 10 | M13 | – | 1 | |
| 1 | 1 | |||
| 2, | 1 | |||
| 15 | M13 | – | 1 | |
| 1 | 1,29 | 1 | ||
| 2, | 1 | |||
| 20 | M13 | 1 | ||
| 1,13,32,52 | 1,2 | |||
| 1,2 | ||||
| 25 | M13 | 1,2,3,4 | ||
| 1,13,32 | 1,2,3,4,5,6 | |||
| 28, | 1,2,3,4 | |||
| 30 | M13 | 2, | 1,2,3,4 | |
| 1,32,50 | 1,4,52 | |||
| 52 | 1,4, | |||
| 50 | M13 | 1,2,3,4 | ||
| 5 | T7 | – | – | 1 |
| – | 1 | 1 | ||
| – | – | 1 | ||
| SP6 | – | 2 | 1,2 | |
| – | 2 | 1,2 | ||
| – | 2 | 1,2 | ||
| 10 | T7 | 7 | 1 | |
| 2 | 1 | |||
| 1,2 | ||||
| SP6 | – | 2 | 1,2 | |
| – | 2 | 1,2 | ||
| – | 2 | 1,2 | ||
| 15 | T7 | 1.2 | ||
| 1 | ||||
| 1 | 1 | |||
| SP6 | – | 2 | 1,2 | |
| – | 2 | 1,2 | ||
| – | 2 | 1,2 | ||
| 20 | T7 | – | 1,2, | |
| 1 | 1, | |||
| 1, | ||||
| SP6 | 2 | 1,2,7, | ||
| 2 | 1,8, | |||
| 2 | 1, |
The performance of CHILD, ShiftDetector and Indelligent in resolving traces resulting from mixtures of indel-carrying and wild-type molecules.
The plasmids pMitA and pMitB (see text) were used to generate chromatograms with increasing concentration of the deletion-containing construct (%Δ construct), using the M13, T7 or SP6 primers, and repeating the entire process three times.
Each chromatogram was analyzed with all three algorithms and the results are summarized for each replicate (‘-’ indicates no result was reported).
Correct indel size assignments are indicated in bold.
For Indelligent, chromatograms were converted using the Sequencher package (GeneCodes, Ann Arbor MI), using default parameters.
For ShiftDetector, the significance cutoff was set to 0.0001 to match the default stringency of CHILD.
Where applicable, multiple indel size predictions are provided as a comma-separated list.