| Literature DB >> 27463017 |
Mathias Stiller1,2,3, Antje Sucker2, Klaus Griewank2, Daniela Aust4, Gustavo Bruno Baretton4, Dirk Schadendorf2, Susanne Horn1,2.
Abstract
DNA derived from formalin-fixed and paraffin-embedded (FFPE) tissue has been a challenge to large-scale genomic sequencing, due to its low quality and quantities. Improved techniques enabling the genome-wide analysis of FFPE material would be of great value, both from a research and clinical perspective.Comparing a single-strand DNA library preparation method originally developed for ancient DNA to conventional protocols using double-stranded DNA derived from FFPE material we obtain on average 900-fold more library molecules and improved sequence complexity from as little as 5 ng input DNA. FFPE DNA is highly fragmented, usually below 100bp, and up to 60% of reads start after or end prior to adenine residues, suggesting that crosslinks predominate at adenine residues. Similar to ancient DNA, C > T substitutions are slightly increased with maximum rates up to 3% at the ends of molecules. In whole exome sequencing of single-strand libraries from lung, breast, colorectal, prostate and skin cancers we identify known cancer mutations. In summary, we show that single-strand library preparation enables genomic sequencing, even from low amounts of degraded FFPE DNA. This method provides a clear advantage both in research and clinical settings, where FFPE material (e.g. from biopsies) often is the only source of DNA available. Improving the genetic characterization that can be performed on conventional archived FFPE tissue, the single-strand library preparation allows scarce samples to be used in personalized medicine and enables larger sample sizes in future sequencing studies.Entities:
Keywords: cancer; formalin-fixed paraffin embedded (FFPE) tissue; high-throughput sequencing; library preparation; whole exome sequencing
Mesh:
Substances:
Year: 2016 PMID: 27463017 PMCID: PMC5312299 DOI: 10.18632/oncotarget.10827
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Samples and library preparation yield
| Sample No. | Cancer | FFPE storage (years) | Library molecules ds NEB | Sequencing library | Library molecules ds MPI | Sequencing library | Library molecules ss | Sequencing library | Times more molecules ss vs. ds NEB |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Melanoma | 7 | 5.58E + 06 | 1.87E + 07 | 2.90E + 09 | A5347 | 519 | ||
| 2 | Melanoma | 8 | 4.58E + 06 | 3.94E + 06 | 5.60E + 09 | A5348 | 1223 | ||
| 3 | Melanoma | 7 | 5.70E + 06 | 2.38E + 07 | 4.10E + 09 | A5349 | 720 | ||
| 4 | Melanoma | 9 | 1.26E + 06 | 3.04E + 07 | 6.32E + 09 | A5350 | 5018 | ||
| NTC1 | Water | n.a. | 3.40E + 06 | 8.74E + 05 | 1.79E + 08 | A5351 | n.d. | ||
| 5 | Melanoma | 8 | 6.95E + 06 | 1.88E + 07 | 6.76E + 09 | 973 | |||
| 6 | Melanoma | 7 | 7.55E + 06 | 8.92E + 06 | 4.04E + 09 | 535 | |||
| 7 | Melanoma | 7 | 8.95E + 06 | 3.78E + 07 | 2.56E + 09 | 286 | |||
| 8 | Melanoma | 8 | 6.18E + 06 | 1.36E + 07 | 6.08E + 09 | 985 | |||
| 9 | Melanoma | 9 | 1.27E + 07 | 2.46E + 07 | 3.82E + 09 | 300 | |||
| 10 | Melanoma | 8 | 6.15E + 06 | 5.22E + 07 | 3.83E + 09 | 622 | |||
| 11 | Melanoma | 4 | 7.35E + 06 | 1.12E + 07 | 6.64E + 09 | 903 | |||
| 12 | Melanoma | 6 | 8.40E + 06 | 2.10E + 07 | 2.83E + 09 | 337 | |||
| 13 | Melanoma | 8 | 1.44E + 07 | 1.96E + 07 | 1.16E + 10 | 810 | |||
| 14 | Melanoma | 9 | 7.30E + 06 | 3.16E + 07 | 5.96E + 09 | A8244 | 816 | ||
| 15 | Melanoma | 4 | 5.73E + 06 | 4.02E + 07 | 2.85E + 09 | A8245 | 498 | ||
| 16 | Melanoma | 4 | 1.14E + 07 | 3.86E + 07 | 3.16E + 09 | A8246 | 278 | ||
| NTC2 | Water | n.a. | n.a. | 9.60E + 05 | 4.80E + 07 | A8247 | n.d. | ||
| 17 | Melanoma | 13 | 1.68E + 06 | MS1NEB | 8.82E + 06 | MS1dsMPI | 2.60E + 09 | MS1 | 1550 |
| 18 | Melanoma | 15 | 7.98E + 06 | MS2NEB | 6.14E + 06 | MS2dsMPI | 2.04E + 09 | MS2 | 255 |
| 19 | Melanoma | 17 | 6.05E + 06 | MS3NEB | 3.66E + 07 | MS3dsMPI | 2.72E + 09 | MS3 | 450 |
| 20 | Melanoma | 19 | 3.23E + 06 | MS4NEB | 2.56E + 07 | MS4dsMPI | 8.25E + 09 | MS4 | 2558 |
| 21 | Melanoma | 11 | 3.65E + 06 | MS5NEB | 1.27E + 07 | MS5dsMPI | 3.19E + 09 | MS5 | 873 |
| NTC3 | Water | n.a. | 5.28E + 05 | 3.60E + 05 | 6.65E + 07 | ||||
| 22 | Lung cancer | 11 | n.a. | n.a. | 1.44E + 09 | A8231 | |||
| 23 | Breast cancer | 11 | n.a. | n.a. | 1.02E + 09 | A8232 | |||
| 24 | Colorectal cancer | 11 | n.a. | n.a. | 2.36E + 09 | A8233 | |||
| 25 | Prostate cancer | 11 | n.a. | n.a. | 1.25E + 09 | A8234 | |||
| 26 | Colorectal cancer | 9 | n.a. | n.a. | 8.65E + 08 | A8235 | |||
| 27 | Lung cancer | 9 | n.a. | n.a. | 1.58E + 09 | A8236 | |||
| 28 | Breast cancer | 9 | n.a. | n.a. | 6.45E + 08 | A8237 | |||
| 29 | Prostate cancer | 9 | n.a. | n.a. | 7.35E + 08 | A8238 | |||
| 30 | Lung cancer | 6 | n.a. | n.a. | 2.84E + 09 | A8239 | |||
| 31 | Prostate cancer | 6 | n.a. | n.a. | 2.72E + 09 | A8240 | |||
| 32 | Colorectal cancer | 6 | n.a. | n.a. | 1.58E + 09 | A8241 | |||
| 33 | Breast cancer | 6 | n.a. | n.a. | 2.54E + 09 | A8242 | |||
| NTC4 | Water | n.a. | n.a. | n.a. | 6.00E + 07 | A8243 |
Five ng of FFPE DNA yield higher numbers of molecules in single-strand (ss) than in double-strand (ds) DNA library preparation. Ds MPI: a custom method developed for ancient DNA at the Max Planck Institute EVA Leipzig [13]. Ds NEB: a commercial method from New England Biolabs. N.a. not available. N.d. not determined. NTC. Negative control (also see methods section). For a more detailed overview of the order of sequencing experiments see Supplementary Table S1.
Figure 1Comparison of double-strand and single-strand libraries
(A) Overall copy number yield of library preparation measured in digital droplet PCR. (B) Comparisons of copy number yield. Single-strand (ss) library preparation outperforms both double-strand (ds) methods (P = 3.716 × 10−12 for both, ss vs. dsNEB and ss vs. dsMPI comparisons). Custom ds library preparation (dsMPI) has higher yield than the commercial method (dsNEB, P = 7.995 × 10−06 for dsNEB vs. dsMPI). Raw p-values from two-sided Wilcoxon rank sum test (n = 21 samples per library preparation). (C) Molecule length of 100,000 unique, mapped reads in single-strand and double-strand libraries. As merged reads were analyzed, the plot displays reads with a maximum length of 141 bp (2 × 76 bp reads before overlap merging) for ss libraries. (D) Estimated complexity of single-strand libraries (ss) is higher than of double-strand (ds) libraries.
Sequencing characteristics of genomic single-and double-strand libraries
| Sample No. | Library type | Library ID | Mergetrimmed reads | Mapped reads | % | Unique reads | % | Mapped reads of unique | % | Molecules in library | Median molecule length (bp) | Mappable bp | Estimated genome coverage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ss | A5347 | 1539768 | 1294718 | 84 | 1539768 | 100 | 1282731 | 83 | 2.90E+09 | 76 | 1.84E+11 | 55.64 |
| 2 | ss | A5348 | 1255872 | 1078130 | 86 | 1255872 | 100 | 1070829 | 85 | 5.60E+09 | 58 | 2.77E+11 | 83.92 |
| 3 | ss | A5349 | 1318068 | 1111409 | 84 | 1318068 | 100 | 1102175 | 84 | 4.10E+09 | 81 | 2.78E+11 | 84.15 |
| 4 | ss | A5350 | 1607846 | 1397991 | 87 | 1607846 | 100 | 1390490 | 86 | 6.32E+09 | 62 | 3.39E+11 | 102.69 |
| NTC1 | ss | A5351 | 236065 | 38093 | 16 | 236065 | 100 | 23877 | 10 | 1.79E+08 | n.d. | ||
| 17 | ss | MS1 | 4114005 | 3683777 | 90 | 4036964 | 98 | 3606736 | 89 | 2.60E+09 | 51 | 1.16E+11 | 35.23 |
| 18 | ss | MS2 | 3229456 | 2859033 | 89 | 3176913 | 98 | 2806490 | 88 | 2.04E+09 | 61 | 1.08E+11 | 32.77 |
| 19 | ss | MS3 | 3138691 | 2799387 | 89 | 3101849 | 99 | 2762545 | 89 | 2.72E+09 | 60 | 1.44E+11 | 43.53 |
| 20 | ss | MS4 | 3988112 | 3615532 | 91 | 3959556 | 99 | 3586976 | 91 | 8.25E+09 | 52 | 3.86E+11 | 116.92 |
| 21 | ss | MS5 | 3821539 | 3362766 | 88 | 3738118 | 98 | 3279345 | 88 | 3.19E+09 | 57 | 1.56E+11 | 47.28 |
| 17 | ds | MS1dsMPI | 2868381 | 1623445 | 57 | 2582011 | 90 | 1337075 | 52 | 8.82E+06 | 76 | 3.12E+08 | 0.09 |
| 18 | ds | MS2dsMPI | 3426906 | 1902541 | 56 | 2723388 | 79 | 1199023 | 44 | 6.14E+06 | 86 | 1.85E+08 | 0.06 |
| 19 | ds | MS3dsMPI | 3575035 | 2046427 | 57 | 3339449 | 93 | 1810841 | 54 | 3.66E+07 | 86 | 1.59E+09 | 0.48 |
| 20 | ds | MS4dsMPI | 3400508 | 2065718 | 61 | 3109447 | 91 | 1774657 | 57 | 2.56E+07 | 79 | 1.06E+09 | 0.32 |
| 21 | ds | MS5dsMPI | 3812279 | 2181953 | 57 | 3304041 | 87 | 1673715 | 51 | 1.27E+07 | 82 | 4.57E+08 | 0.14 |
| 17 | ds | MS1NEB | 3910758 | 2194541 | 56 | 2785746 | 71 | 1069529 | 38 | 1.68E+06 | 88 | 4.04E+07 | 0.01 |
| 18 | ds | MS2NEB | 2902274 | 727263 | 25 | 2393922 | 82 | 218911 | 9 | 7.98E+06 | 92 | 5.54E+07 | 0.02 |
| 19 | ds | MS3NEB | 3870670 | 2197359 | 57 | 2369050 | 61 | 695739 | 29 | 6.05E+06 | 99 | 1.08E+08 | 0.03 |
| 20 | ds | MS4NEB | 2788557 | 1631982 | 59 | 1622804 | 58 | 466229 | 29 | 3.23E+06 | 87 | 4.70E+07 | 0.01 |
| 21 | ds | MS5NEB | 3435934 | 1624254 | 47 | 2152694 | 63 | 341014 | 16 | 3.65E+06 | 88 | 3.19E+07 | 0.01 |
The percentages of mapped reads are reported without quality filters and before duplicate read removal as well as thereafter for the remaining unique reads. Ss. Single-strand. Ds. Double-strand. NTC. Negative control.
Sequencing characteristics of single-strand exome libraries
| Sample No. | Library ID | Mergetrimmed reads | Mapped reads on genome | % | Mapped reads on target | % | Unique reads | % | Mapped reads on target of unique | % | Molecules in library | Median molecule length (bp) | Mappable bp | Estimated exome coverage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22 | A8231 | 10400578 | 8880352 | 85 | 3664128 | 35 | 9027766 | 87 | 2694184 | 30 | 1.44E + 09 | 72 | 2.69E + 10 | 814 |
| 23 | A8232 | 13928657 | 8590615 | 62 | 4673607 | 34 | 7675101 | 55 | 4673607 | 61 | 1.02E + 09 | 57 | 1.95E + 10 | 591 |
| 24 | A8233 | 14725566 | 11548822 | 78 | 5440538 | 37 | 9148314 | 62 | 1400802 | 15 | 2.36E + 09 | 64 | 1.43E + 10 | 434 |
| 25 | A8234 | 13510755 | 9111893 | 67 | 4503396 | 33 | 7794873 | 58 | 445740 | 6 | 1.25E + 09 | 60 | 2.47E + 09 | 75 |
| 26 | A8235 | 13878538 | 9302472 | 67 | 4884491 | 35 | 7652137 | 55 | 434072 | 6 | 8.65E + 08 | 59 | 1.60E + 09 | 48 |
| 27 | A8236 | 12960033 | 9510227 | 73 | 4625240 | 36 | 7428921 | 57 | 652190 | 9 | 1.58E + 09 | 62 | 4.91E + 09 | 149 |
| 28 | A8237 | 14125480 | 8905778 | 63 | 4674789 | 33 | 8020581 | 57 | 344949 | 4 | 6.45E + 08 | 59 | 9.29E + 08 | 28 |
| 29 | A8238 | 13978045 | 8283185 | 59 | 4598067 | 33 | 7644855 | 55 | 190019 | 2 | 7.35E + 08 | 57 | 5.70E + 08 | 17 |
| 30 | A8239 | 12035341 | 8406977 | 70 | 3801180 | 32 | 8220549 | 68 | 1002280 | 12 | 2.84E + 09 | 71 | 1.68E + 10 | 509 |
| 31 | A8240 | 10933844 | 6549725 | 60 | 3043865 | 28 | 7709969 | 71 | 698527 | 9 | 2.72E + 09 | 71 | 1.23E + 10 | 374 |
| 32 | A8241 | 11869932 | 7089421 | 60 | 3557527 | 30 | 7483786 | 63 | 408108 | 5 | 1.58E + 09 | 65 | 3.52E + 09 | 107 |
| 33 | A8242 | 11869182 | 7085969 | 60 | 3872693 | 33 | 6586746 | 55 | 203156 | 3 | 2.54E + 09 | 55 | 2.39E + 09 | 72 |
| NTC4 | A8243 | 1588390 | 394319 | 25 | 114244 | 7 | 1327044 | 84 | 10681 | 1 | 6.00E + 07 | n.d. | n.d. | n.d. |
| 14 | A8244 | 10484892 | 9184487 | 88 | 4079825 | 39 | 8752541 | 83 | 2839016 | 32 | 5.96E + 09 | 75 | 1.21E + 11 | 3668 |
| 15 | A8245 | 9956899 | 8806918 | 88 | 3742592 | 38 | 8234199 | 83 | 2480366 | 30 | 2.85E + 09 | 86 | 6.11E + 10 | 1850 |
| 16 | A8246 | 13621017 | 11778127 | 86 | 5246793 | 39 | 10605136 | 78 | 3062774 | 29 | 3.16E + 09 | 71 | 5.04E + 10 | 1529 |
| NTC2 | A8247 | 1308857 | 295176 | 23 | 117051 | 9 | 1098888 | 84 | 8913 | 1 | 4.80E + 07 | n.d. | n.d. | n.d. |
N.d. not determined. NTC. Negative control.
Figure 2DNA damage in FFPE DNA
(A) A-fragmentation pattern of FFPE DNA molecules of single-strand library MS1. Frequencies of adenine as the first base of the reference sequence adjacent to the sequenced molecule are observed to be as high as 60%. (B) Substitution frequencies throughout the sequenced FFPE DNA molecules. (C) C > T and G > A substitution frequencies among called variants. (D) Alignment of reads to the human reference genome. C > T damage substitutions in sequencing reads are distinguished from C > T variants (arrow), C > T damage substitutions are dispersed throughout the sequencing reads. C > T variants occur at the same position in a fraction of the sequencing reads. Forward read bases are depicted as points, reverse read bases as commas.
Coding missense SNV called in exome sequencing data
| Library and cancer | Chr | Start | Ref | Alt | Coverage | Variant reads | Freq | Gene | Amino acid change | COSMIC IDs; occurence |
|---|---|---|---|---|---|---|---|---|---|---|
| A8231 | 1 | 115256529 | T | C | 10 | 6 | 0.6 | NRAS | NRAS:NM_002524:exon3:c.A182G:p.Q61R | 584;49(NS),12(lung), and more |
| Lung | 2 | 179469536 | C | A | 11 | 7 | 0.64 | TTN | TTN:NM_003319:exon109:c.G27085T:p.G9029C | 718146,718147,718144,1148633,718145;1(lung) |
| 7 | 23390935 | C | A | 14 | 4 | 0.29 | IGF2BP3 | IGF2BP3:NM_006547:exon6:c.G672T:p.Q224H | 287378;1(large_intestine) | |
| 17 | 39296361 | A | G | 24 | 11 | 0.46 | KRTAP4-6 | KRTAP4-6:NM_030976:exon 1:c.T379C:p.S127P | 1479552;2(breasl),1(lung) | |
| 18 | 63511294 | C | T | 8 | 3 | 0.38 | CDH7 | CDH7:NM_004361: exon7:c.C1228T:p.P410S | 1523528,1523527;1(lung) | |
| A8233 | 1 | 145248876 | A | G | 18 | 4 | 0.22 | NOTCH2NL | NOTCH2NL:NM_203458:exon2:c.A20G:p.N7S | 291720,291719;1(large_intestine) |
| Coloon | 5 | 833915 | G | T | 8 | 4 | 0.5 | ZDHHC11 | ZDHHC11:NM_024786:exon7:c.C908A:p.A303D | 131334,131335;1(liver),4(prostate) |
| 3 | 195452799 | C | T | 8 | 4 | 0.5 | MUC20 | MUC20:NM_001282506:exon2:c.C1325T:p.T442I | 149548;1(stomach) | |
| 14 | 20296004 | C | T | 9 | 5 | 0.56 | OR4N2 | OR4N2:NM_001004723:exon2:c.C397T:p.P133S | 147728;1(stomach) | |
| 14 | 19571357 | T | C | 10 | 2 | 0.2 | POTEG | POTEG:NM_001005356:exon7:c.T1136C:p.L379S | 1477410;3(breast) | |
| A8239 | 1 | 146458025 | T | C | 33 | 10 | 0.3 | NBPF10 | NBPF10:NM_001039703:exon78:c.T9751C:p.Y3251H | 1320027;1(ovary) |
| Lung | 7 | 151927021 | C | A | 10 | 3 | 0.3 | KMT2C | KMT2G:NM_170606:exon18:c.G2963T:p.C988F | 150426,150427;1(stomach) |
| 9 | 69423641 | T | G | 8 | 4 | 0.5 | ANKRD20A4 | ANKRD20A4:NM_001098805:exon15:c.T1937G:p.M646R | 1490069;1(breast) | |
| A8240 | 14 | 19571357 | T | C | 8 | 2 | 0.25 | POTEG | POTEG:NM_001005356:exon7:c.T1136C:p.L379S | 1477410;3(breast) |
| Prostate | 2 | 107049714 | C | G | 9 | 4 | 0.44 | RGPD3 | RGPD3:NM_001144013:exon16:c.G2233C:p.E745Q | 1526982,1526983;1(lung) |
| 7 | 151962265 | C | T | 10 | 2 | 0.2 | KMT2C | KMT2C:NM_170606:exon8:c.G1042A:p.D348N | 228111,228110;1(skin) | |
| A8244 | 1 | 145248876 | A | G | 18 | 2 | 0.11 | NOTCH2NL | NOTCH2NL:NM_203458:exon2:c.A20G:p.N7S | 291720,291719;1(large_intestine) |
| Melanoma | 1 | 145281656 | A | T | 19 | 2 | 0.11 | NOTCH2NL | NOTCH2NL:NM_203458:exon4:c.A586T:p.T196S | 1179143,1179144;1(prostate),1(central_nervous_system) |
| 13 | 103701690 | G | A | 9 | 5 | 0.56 | SLC10A2 | SLC10A2:NM_000452:exon5:c.C868T:p.P290S | 1666308;1(eye) | |
| 20 | 1902301 | G | A | 13 | 2 | 0.15 | SIRPA | SIRPA:NM_001040023:exon3:c.G697A:p.V233I | 1647366,723497;1(prostate),3(lung) | |
| X | 140993885 | C | T | 8 | 4 | 0.5 | MAGEC1 | MAGEC1:NM_005462:exon4:c.C695T:p.P232L | 226892;1(skin) | |
| A8245 | 19 | 7810517 | T | A | 15 | 3 | 0.2 | CD209 | CD209:NM_001144894:exon2:c.A503T:p.Q168L | 221848,221847;1(oesophagus),1(skin),1(cervix) |
| Melanoma | 19 | 7810586 | T | A | 17 | 2 | 0.12 | CD209 | CD209:NM_001144894:exon2:c.A434T:p.Q145L | 221850,221849;1(skin) |
| 7 | 76126737 | C | T | 9 | 4 | 0.44 | DTX2 | DTX2:NM_001102595:exon5:c.C1093T:p.R365C | 300761;1(ovary),1(large_intestine), and more | |
| 9 | 21239504 | T | C | 24 | 6 | 0.25 | IFNA14 | IFNA14:NM_002172:exon1:c.A431G:p.K144R | 403874;1(lung) | |
| A8246 | 1 | 152187935 | C | T | 30 | 5 | 0.17 | HRNR | HRNR:NM_001009931:exon3:c.G6170A:p.R2057Q | 224173;1(skin) |
| Melanoma | 12 | 11461549 | G | C | 19 | 2 | 0.11 | PRB4 | PRB4:NM_002723:exon3:c.C368G:p.P123R | 1628396;1(liver) |
| 12 | 31250875 | G | C | 9 | 4 | 0.44 | DDX11 | DDX11:NM_001257144:exon18:c.G1819C:p.A607P | 304699,1318021;1(lung),1(large_intestine), and more | |
| 12 | 52699033 | G | A | 10 | 3 | 0.3 | KRT86 | KRT86:NM_002284:exon5:c.G745A:p.V249I | 404426;1(lung) | |
| 15 | 78290635 | C | T | 8 | 3 | 0.38 | TBC1D2B | TBC1D2B:NM_015079:exon13:c.G2707A:p.D903N | 458956,86571;2(NS),2(urinary_tract), and more | |
| 5 | 23526987 | C | G | 23 | 6 | 0.26 | PRDM9 | PRDM9:NM_020227:exon11:c.C1790G:p.T597S | 231055;1(skin),2(prostate),1(large_intestine) |
Subset of variants detected in this study and also detected previously in cancer sequencing [18]. Full lists of called variants are listed in Supplementary Table S6. Freq: frequency.