| Literature DB >> 27656653 |
Axel Poulet1, Maud Privat2, Flora Ponelle2, Sandrine Viala2, Stephanie Decousus1, Axel Perin1, Laurence Lafarge2, Marie Ollier3, Nagi S El Saghir4, Nancy Uhrhammer2, Yves-Jean Bignon2, Yannick Bidet3.
Abstract
Screening for BRCA mutations in women with familial risk of breast or ovarian cancer is an ideal situation for high-throughput sequencing, providing large amounts of low cost data. However, 454, Roche, and Ion Torrent, Thermo Fisher, technologies produce homopolymer-associated indel errors, complicating their use in routine diagnostics. We developed software, named AGSA, which helps to detect false positive mutations in homopolymeric sequences. Seventy-two familial breast cancer cases were analysed in parallel by amplicon 454 pyrosequencing and Sanger dideoxy sequencing for genetic variations of the BRCA genes. All 565 variants detected by dideoxy sequencing were also detected by pyrosequencing. Furthermore, pyrosequencing detected 42 variants that were missed with Sanger technique. Six amplicons contained homopolymer tracts in the coding sequence that were systematically misread by the software supplied by Roche. Read data plotted as histograms by AGSA software aided the analysis considerably and allowed validation of the majority of homopolymers. As an optimisation, additional 250 patients were analysed using microfluidic amplification of regions of interest (Access Array Fluidigm) of the BRCA genes, followed by 454 sequencing and AGSA analysis. AGSA complements a complete line of high-throughput diagnostic sequence analysis, reducing time and costs while increasing reliability, notably for homopolymer tracts.Entities:
Year: 2016 PMID: 27656653 PMCID: PMC5021467 DOI: 10.1155/2016/5623089
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 2Graphic representations of homopolymer flowgrams. Individual flowgrams of indel variants were generated by AGSA. The X-axis represents the signal intensity computed during pyrosequencing. Red bars represent the number of reads for each intensity interval of 0.1; blue bars represent the percentage of variation as shown by AVA (intensity interval of 1). Distribution of standard flowgrams discriminates real indel (a) from artefacts (b).
Figure 1Organization of the AGSA software. The diagram represents the operational flow of the AGSA software. The red boxes represent input files required to operate the software. The blue boxes represent output files generated by AGSA. The green boxes are steps where the software performs a test.
Figure 3AGSA detected efficiently all the variants reported in Sanger analysis. Efficiencies of Sanger sequencing versus Roche pyrosequencing analysed with AGSA. The blue bars represent confirmed variants. The green bars represent false positive variants (technical artefacts). For pyrosequencing, false positives are defined as variants not confirmed by Sanger sequencing. For Sanger sequencing, false positives were not found by pyrosequencing and were not confirmed by a second Sanger run of the same sample. The red bars represent false negative variants. No false negative was found by pyrosequencing. False negatives for Sanger analysis were detected by pyrosequencing and they were actually found on a second Sanger run of the same sample. The yellow bar represents variants that were not called by AGSA because of poor coverage (inducing a number of variant reads < 4).
Figure 4Performance of AGSA software for evaluation of homopolymers. 299 indel variants were found by AGSA in homopolymer sequences. After analysis of individual flowgrams, 246 (86%) were classified as false positive variants and 43 (14%) as true variants. Sanger sequencing confirmed that the 246 AGSA-classified false positives were actually wild-type sequences. Among the 43 potentially real variants, 29 (10%) were confirmed with Sanger analysis and 14 (5%) were actually wild-type Sanger sequences.
Figure 5A composite sample including 28 variants validated in Sanger was analysed both with AGSA software and with SeqNext, using the same threshold of 20%. AGSA detected all 28 variants and 10 false positive variants whereas SeqNext missed 1 real variant and reported 28 false positives.
Comparison of the variants detected by Sanger, by GS-Flx with SeqNext analysis, and by GS-Flx with AGSA analysis.
| Gene | HGVS nomenclature | Sanger | AVA + AgsA | SeqNext |
|---|---|---|---|---|
| BRCA1 | c.19-47del29 | h | h (56%) | — |
| c.81-12delC | — | — | h (41%)† | |
| c.124delA | h | h (48%) | h (48%) | |
| c.212+1G>A | h | h (72%) | — | |
| c.342-343delTC | h | h (36%) | h (37%) | |
| c.671-11dup | — | h (45%)† | h (43%)† | |
| c.798-799del | h | h (48%) | h (48%) | |
| c.1116G>A | h | h (43%) | h (43%) | |
| c.1390dupA | h | h (49%) | h (48%) | |
| c.1823-1826del | h | h (46%) | h (46%) | |
| c.1953_1956delGAAA | h | h (35%) | h (34%) | |
| c.2077G>A | h | h (60%) | h (62%) | |
| c.2082C>T | H |
|
| |
| c.2269delG | h | h (66%) | h (66%) | |
| c.2612C>T | h | h (42%) | h (42%) | |
| c.3113A>G | h | h (51%) | h (52%) | |
| c.3548A>G | h | h (49%) | h (49%) | |
| 3839-3843del5ins4 | h | h (52%) | h (52%) | |
| c.4127del | h | h (56%) | h (44%) | |
| c.4214-4215delIns5 | — | — | h (23%)† | |
| c.4221delins9 | — | — | h (26%)† | |
| C.4227-4237delins16 | — | — | h (24%)† | |
| c.4243-4244delGA | — | — | h (26%)† | |
| c.4281_4282ins39 | h | h (44%) |
| |
| c.4308T>C | h | h (55%) | h (49%) | |
| c.4575-4585del11 | h | h (46%) | h (43%) | |
| c.4810C>T | h | h (58%) | h (53%) | |
| c.5266dupC | h | h (59%) | h (59%) | |
| c.5333-20_5333-19insT | — | — | h (25%)† | |
|
| ||||
| BRCA2 | c.37_44del8 | h | h (25%) | h (26%) |
| c.1114A>C | h | h (47%) | h (51%) | |
| c.1246A>G | h | h (47%) | h (49%) | |
| c.1553_1554insT | — | h (31%) | — | |
| c.1748_1749insA | — | h (47%) | h (26%)† | |
| c.1759-1761delinsC | — | — | h (25%)† | |
| c.1774delT | — | — | h (33%)† | |
| c.1804-1806delins3 | — | — | h (21%)† | |
| c.1803dupA | — | — | h (43%)† | |
| c.1815dupA | — | h (68%)† | h (31%)† | |
| c.1823dupA | — | — | h (33%)† | |
| c.1833dupA | — | — | h (21%)† | |
| c.2589T>A | — | — | h (34%)† | |
| c.2803G>A | h | h (39%) | h (40%) | |
| c.3479G>A | — | h (31%) | — | |
| c.3807T>C | h | h (44%) | h (42%) | |
| c.4332-4333delTA | — | — | h (66%)† | |
| c.4350dupT | — | — | h (44%)† | |
| c.4781delins3 | — | — | h (22%)† | |
| c.5073dupA | h | h (42%) | h (41%) | |
| c.5385dupA | — | — | h (22%)† | |
| c.5459_5460insA | — | h (32%) | — | |
| c.7977-10dup | — | — | h (70%)† | |
| c.8125dupA | — | — | h (23%)† | |
| c.8147-8148insA | — | — | h (29%)† | |
| c.8574dup | — | h (38%)† | h (30%)† | |
| c.8797del | — | H† | — | |
| c.8800del | — | h (80%) | — | |
| c.8823dupA | — | — | h (28%)† | |
| c.8946dup | — | h (27%) | — | |
| c.10083del | — | — | h (21%)† | |
| c.10115dupC | — | — | h (50%)† | |
| c.10122delC | — | — | h (32%)† | |
|
| ||||
| False negative (out of 64)†† | 0 (reference)†† | 0†† | 1† | |
| False positive (out of 64)† | 0 (reference)† | 10† | 28†† | |
% of reads carrying the variant is given in parentheses.
Depth of the variant was <40 reads.
†Cells highlight false positives; ††cells highlight false negatives.
Figure 6250 patients were studied for BRCA mutations by Access Array Fluidigm combined with 454 pyrosequencing and AGSA analysis. This sequencing methodology was compared to Sanger analysis in terms of percentage of amplicons to be reanalysed for BRCA1 (a) and BRCA2 (b), cost per patient (c), and time required to analyse 96 patients (d).