| Literature DB >> 26605948 |
Luiz H Araujo1, Cynthia Timmers1, Konstantin Shilo1, Weiqiang Zhao1, Jianying Zhang2, Lianbo Yu2, Thanemozhi G Natarajan3, Clinton J Miller3, Ayse Selen Yilmaz1,2,4, Tom Liu1, Joseph Amann1, José Roberto Lapa E Silva5, Carlos Gil Ferreira6, David P Carbone1.
Abstract
Tumor specimens are often preserved as formalin-fixed paraffin-embedded (FFPE) tissue blocks, the most common clinical source for DNA sequencing. Herein, we evaluated the effect of pre-sequencing parameters to guide proper sample selection for targeted gene sequencing. Data from 113 FFPE lung tumor specimens were collected, and targeted gene sequencing was performed. Libraries were constructed using custom probes and were paired-end sequenced on a next generation sequencing platform. A PCR-based quality control (QC) assay was utilized to determine DNA quality, and a ratio was generated in comparison to control DNA. We observed that FFPE storage time, PCR/QC ratio, and DNA input in the library preparation were significantly correlated to most parameters of sequencing efficiency including depth of coverage, alignment rate, insert size, and read quality. A combined score using the three parameters was generated and proved highly accurate to predict sequencing metrics. We also showed wide read count variability within the genome, with worse coverage in regions of low GC content like in KRAS. Sample quality and GC content had independent effects on sequencing depth, and the worst results were observed in regions of low GC content in samples with poor quality. Our data confirm that FFPE samples are a reliable source for targeted gene sequencing in cancer, provided adequate sample quality controls are exercised. Tissue quality should be routinely assessed for pre-analytical factors, and sequencing depth may be limited in genomic regions of low GC content if suboptimal samples are utilized.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26605948 PMCID: PMC4659597 DOI: 10.1371/journal.pone.0143092
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Pre-analytical parameters and sequencing metrics.
| Feature | Median (range) |
|---|---|
|
| |
| FFPE storage time (years) | 9.17 (0.32–24.22) |
| PCR/QC ratio | 0.19 (0.03–0.58) |
| DNA input (ng) | 899 (77–2,337) |
|
| |
| Sequence reads and coverage | 5,041,132 (1,464,386–7,738,458) |
| Reads mapped to genome | 4,918,758 (1,148,705–7,597,557) |
| Bases mapped to genome | 390,673,221 (86,893,349–608,314,077) |
| On-target bases mapped to genome | 277,117,362 (63,725,143) |
| Alignment rate (%) | 98.1 (78.4–98.9) |
| Target coverage | 881x (204–1,373) |
| Fraction of bases covered ≥ 20x (%) | 95.4 (78.9–98.8) |
| Fraction of bases covered ≥ 50x (%) | 90.8 (66.7–97.7) |
| Fraction of bases covered ≥ 100x (%) | 84.6 (52.1–95.1) |
| Insert size (in base pairs) | 89.4 (73.5–120.7) |
| Mean base quality score (Phred score) | 35.1 (32.4–35.7) |
| Mean quality score last 20 bases | 31.6 (25.0–32.6) |
| Mean quality score last 10 bases | 31.2 (23.8–32.2) |
| Fraction of bases ≥ Q30 (%) | 98.5 (74.5–100) |
Abbreviations: FFPE, formalin-fixed paraffin-embedded tissue blocks; PCR/QC, PCR-based quality control; Q30, Phred scale quality of 30.
Fig 1Three pre-analytical variables (FFPE storage time, PCR/QC ratio, and DNA input in the library preparation) were significantly correlated to most post-sequencing parameters (A). The pre-analytical variables were classified as below or above median values to illustrate the impact on insert size (B) and on read quality/Phred score (C). Abbreviations: FFPE, formalin-fixed paraffin-embedded tissue blocks; PCR/QC, PCR-based quality control.
Fig 2FFPE storage time (A), PCR/QC ratio (B), and DNA input (C) were correlated to sequencing depth of coverage. A combined score (D) was constructed based on these three parameters, and was highly correlated to sequencing depth. Abbreviations: FFPE, formalin-fixed paraffin-embedded tissue blocks; PCR/QC, PCR-based quality control.
Summary of correlation analyses between the pre-sequencing combined score and final sequencing parameters.
| Parameter | Pearson correlation coefficient ( | p-value |
|---|---|---|
|
| ||
| Overall dataset | 0.628 | < 0.01 |
| 20x coverage | 0.779 | < 0.01 |
| 50x coverage | 0.790 | < 0.01 |
| 100x coverage | 0.792 | < 0.01 |
|
| ||
|
| 0.650 | < 0.01 |
|
| 0.796 | < 0.01 |
|
| 0.703 | < 0.01 |
|
| 0.666 | < 0.01 |
|
| ||
| Insert size | 0.664 | <0.01 |
| Alignment rate | 0.570 | <0.01 |
| Off-target rate | -0.423 | <0.01 |
| Base quality | 0.476 | <0.01 |
Fig 3The combined score was strongly correlated to post-sequencing parameters (A). Correlation to the 50x coverage was used to define pre-analytical thresholds that could predict sequencing efficiency, and are illustrated by smoothing curves (B).
Fig 4Normalized depth of coverage presents wide variability within the genome, with worse coverage observed in regions with lower GC content (A). The GC content effect was additive to sample quality to predict depth of coverage, as observed after stratifying sample quality using the combined score (B). Abbreviations: St. Dev., standard deviation. Obs: ** indicates significance at p<0.01.
GC content and depth of coverage at hotspot positions for recurrent somatic mutations in non-small cell lung cancer.
| Genes | Codon | Exon | Chr position | GC content | Depth |
|---|---|---|---|---|---|
| Median (range) | |||||
|
| G12 | 2 | Chr12:25,398,285 | 0.33 | 51 (3–183) |
| Q61 | 3 | Chr12:25,380,276 | 0.36 | 741(161–1,513) | |
|
| E545 | 10 | Chr3:178,936,091 | 0.30 | 159 (23–312) |
| H1047 | 20 | Chr3:178,952,085 | 0.37 | 1,067 (210–1864) | |
|
| V600 | 15 | Chr7:140,453,136 | 0.34 | 459 (95–867) |
|
| Q61 | 3 | Chr1:115,256,529 | 0.40 | 648 (69–1,512) |
|
| Q56 | 2 | Chr15:66,727,451 | 0.50 | 808 (118–1,525) |
| D67 | 2 | Chr15:66,727,483 | 0.50 | 1,204 (188–2,185) | |
|
| G719 | 18 | Chr7:55,241,708 | 0.51 | 1,409 (289–2,940) |
| Dels | 19 | Chr7:55,242,470 | 0.53 | 503 (125–1,224) | |
| T790 | 20 | Chr7:55,249,071 | 0.55 | 1,534 (344–3,019) | |
| L858 | 21 | Chr7:55,259,524 | 0.54 | 918 (190–1,826) | |
|
| G776 | 20 | Chr17:37,880,996 | 0.58 | 930 (111–1593) |
Abbreviations: Chr, chromosome; Dels, deletions.