| Literature DB >> 28243320 |
Bálint Péterfia1, Alexandra Kalmár2, Árpád V Patai2, István Csabai3, András Bodor3, Tamás Micsik4, Barnabás Wichmann5, Krisztina Egedi4, Péter Hollósi6, Ilona Kovalszky4, Zsolt Tulassay1, Béla Molnár1.
Abstract
Background: To support cancer therapy, development of low cost library preparation techniques for targeted next generation sequencing (NGS) is needed. In this study we designed and tested a PCR-based library preparation panel with limited target area for sequencing the top 12 somatic mutation hot spots in colorectal cancer on the GS Junior instrument. Materials andEntities:
Keywords: Targeted next generation sequencing - colorectal cancer - cancer genotyping - targeted cancer therapy - GS Junior - mutation hot-spots - mutation screening.
Year: 2017 PMID: 28243320 PMCID: PMC5327365 DOI: 10.7150/jca.16037
Source DB: PubMed Journal: J Cancer ISSN: 1837-9664 Impact factor: 4.207
Full cDNA sequences and target regions of selected genes.
| Gene Symbol | cDNA sequence | full cDNA | target region |
|---|---|---|---|
| APC | ENST00000457016 | 1-8532 | 3805-4737 |
| BRAF | ENST00000288602 | 1-2301 | 1766-1809 |
| CTNNB1 | ENST00000349496 | 1-2346 | 14-155 |
| EGFR ex18 | ENST00000275493 | 1-3633 | 2116-2174 |
| EGFR ex21 | ENST00000275493 | 1-3633 | 2540-2620 |
| FBXW7 | ENST00000281708 | 1-2124 | 1290-1418 |
| FBXW7 | ENST00000281708 | 1-2124 | 1425-1561 |
| FBXW7 | ENST00000281708 | 1-2124 | 1701-1812 |
| KRAS | ENST00000311936 | 1-567 | 9-111 |
| MSH6 | ENST00000234420 | 1-4083 | 3240-3300 |
| MSH6 | ENST00000234420 | 1-4083 | 2832-2938 |
| NRAS | ENST00000369535 | 1-570 | 21-73 |
| NRAS | ENST00000369535 | 1-570 | 140-239 |
| PIK3CA | NM_006218.1 | 1-3207 | 1566-1656 |
| PIK3CA | NM_006218.1 | 1-3207 | 3051-3164 |
| SMAD2 | ENST00000262160 | 1-1404 | 895-965 |
| SMAD4 | ENST00000342988 | 1-1659 | 968-1108 |
| SMAD4 | ENST00000342988 | 1-1659 | 1471-1622 |
| SMAD4 | ENST00000342988 | 1-1659 | 281-408 |
| TP53 | ENST00000269305 | 1-1183 | 418-560 |
| TP53 | ENST00000269305 | 1-1183 | 673-782 |
| TP53 | ENST00000269305 | 1-1183 | 783-919 |
Gene Symbol, cDNA Ensembl ID or RefSeq ID of selected genes together with the target regions are listed.
List of primer sequences designed to amplify target regions.
| amplicon name | Forward primer sequence (5'-3') | Reverse primer sequence (5'-3') | Chromo-some number | Amplicon Start | Amplicon Stop | amplicon length | pool number | primer conc. in the pool (μM) |
|---|---|---|---|---|---|---|---|---|
| APC_1 | TACAGACTTATTGTGTAGAAGATACTCCA | chr5 | 112175067 | 112175239 | 172 | Pool1 | 1 | |
| APC_2 | GCTAATACCCTGCAAATAGCAGAA | AAGAAAATTCAACAGCTTTGTGCCT | chr5 | 112175186 | 112175357 | 171 | Pool2 | 2 |
| APC_3 | GCAGGGTTCTAGTTTATCTTCAGAATCA | GTGAACTGACAGAAGTACATCTGCTA | chr5 | 112175302 | 112175468 | 166 | Pool1 | 4 |
| APC_4 | TCAGGAGACCCCACTCATGTT | GCATGGTTTGTCCAGGGCTATC | chr5 | 112175422 | 112175585 | 163 | Pool2 | 4 |
| APC_5 | CATTATAAGCCCCAGTGATCTTCCA | GCATTTACTGCAGCTTGCTTAGG | chr5 | 112175539 | 112175712 | 173 | Pool1 | 1 |
| APC_6 | ACTGCTGAAAAGAGAGAGAGTGGA | AGCACTCAGGCTGGATGAAC | chr5 | 112175666 | 112175815 | 149 | Pool2 | 2 |
| APC_7 | CGGAAAGTACTCCAGATGGATTTTCTT | CATTTGATTCTTTAGGCTGCTCTGATTC | chr5 | 112175769 | 112175930 | 161 | Pool1 | 0,5 |
| APC_8 | AGGAAAATGACAATGGGAATGAAACA | GACTTTGTTGGCATGGCAGAAAT | chr5 | 112175877 | 112176051 | 174 | Pool2 | 1 |
| BRAF | CATCCACAAAATGGATCCAGACAAC | GCTTGCTCTGATAGGAAAATGAGAT | chr7 | 140453075 | 140453249 | 174 | Pool1 | 0,5 |
| CTNNB1_1 | ATTTCAATGGGTCATATCACAGATTCTT | GTAAGACTGTTGCTGCCAGTG | chr3 | 41265928 | 41266093 | 165 | Pool1 | 2 |
| CTNNB1_2 | AGACAGAAAAGCGGCTGTTAGT | AGGTATCCACATCCTCTTCCTCAG | chr3 | 41266051 | 41266181 | 130 | Pool2 | 1 |
| EGFR_ex18 | GTGACCCTTGTCTCTGTGTTCTT | CTGTGCCAGGGACCTTACC | chr7 | 55241580 | 55241754 | 174 | Pool1 | 4 |
| EGFR_ex21 | CTGGCAGCCAGGAACGTA | GGAAAATGCTGGCTGACCTAAAG | chr7 | 55259454 | 55259602 | 148 | Pool2 | 3 |
| FBXW7_1_1 | GAGCACACTGTCACTATTTCAGTAACT | ACACCTTATATGGGCATACTTCCAC | chr4 | 153249262 | 153249414 | 152 | Pool2 | 2 |
| FBXW7_1_2 | CATGAAGATGCATACAACGCACA | GGTGGAGTATGGTCATCACAAATGAG | chr4 | 153249367 | 153249514 | 147 | Pool1 | 1 |
| FBXW7_2_1 | TGCAACGTGTGTAGACAGGTTT | GCCACTCTTAGGGTTTGGGATAT | chr4 | 153247189 | 153247361 | 172 | Pool2 | 2 |
| FBXW7_2_2 | CATGTAAACACTGGCCTGTCTCA | CCTTGACTAAATCTACCATGTTTTCTCA | chr4 | 153247316 | 153247477 | 161 | Pool1 | 1 |
| FBXW7_3 | CACTGTCCTGTTTTGATATCCCAGA | GGATCTCTTGATACATCAATCCGTGTTT | chr4 | 153245353 | 153245522 | 169 | Pool1 | 1,25 |
| KRAS_kod12_13 | AAAGAATGGTCCTGCACCAGTAA | AAGGCCTGCTGAAAATGACTGA | chr12 | 25398161 | 25398332 | 171 | Pool1 | 1 |
| MSH6_1 | GTCCTATGTGTCGCCCAGTA | CTTCCTCACAGCCTATTAGAATGTCATT | chr2 | 48030601 | 48030744 | 143 | Pool2 | 6 |
| MSH6_2 | CTTTGACTCTGATTATGACCAAGCTCT | CAAATTGCGAGTGGTGAAATTCTCA | chr2 | 48027918 | 48028092 | 174 | Pool1 | 2 |
| NRAS_1 | CCACTGGGCCTCACCTCTAT | CTGATTACTGGTTTCCAACAGGTTCT | chr1 | 115258657 | 115258819 | 162 | Pool1 | 0,75 |
| NRAS_2_1 | GGTAACCTCATTTCCCCATAAAGATTCA | AGTACAGTGCCATGAGAGACCA | chr1 | 115256349 | 115256523 | 174 | Pool1 | 4 |
| NRAS_2_2 | CCTTCGCCTGTCCTCATGTAT | CCCCAGGATTCTTACAGAAAACAAGT | chr1 | 115256481 | 115256605 | 124 | Pool2 | 0,5 |
| PIK3CA_1 | CAGAGTAACAGACTAGCTAGAGACAATGA | CTCCATTTTAGCACTTACCTGTGACT | chr3 | 178935995 | 178936140 | 145 | Pool1 | 0,5 |
| PIK3CA_2 | TGGAATGCCAGAACTACAATCTTTTGA | CTGTTTAATTGTGTGGAAGATCCAATCC | chr3 | 178951969 | 178952137 | 168 | Pool2 | 1 |
| SMAD2 | CTATATGCCTTCTTGTCATTTCTACCGT | GGAGAAACCTTCCATGCATCACA | chr18 | 45374846 | 45374984 | 138 | Pool1 | 3 |
| SMAD4_1_1 | GAAAAACTGTGTTGTGGAGTGCAA | CTCCTACCTGAACATCCATTTCAAAGTA | chr18 | 48591712 | 48591846 | 134 | Pool2 | 2 |
| SMAD4_1_2 | CCTGAGTATTGGTGTTCCATTGCT | TCTCAATGGCTTCTGTCCTGTG | chr18 | 48591795 | 48591969 | 174 | Pool1 | 0,75 |
| SMAD4_2_1 | ATTTAGAATGTAGGGAGGATGGGAAGA | CAGCCTTTCACAAAACTCATCCTG | chr18 | 48604556 | 48604704 | 148 | Pool2 | 0,5 |
| SMAD4_2_2 | GACCTTCGTCGCTTATGCATACT | GGTCTGCAATCGGCATGGTA | chr18 | 48604658 | 48604820 | 162 | Pool1 | 2 |
| SMAD4_3_1 | GTGGCTGGTCGGAAAGGATT | AACTCGTTCGTAGTGATATGGATTCAC | chr18 | 48575056 | 48575214 | 158 | Pool2 | 2 |
| SMAD4_3_2 | CGTTTGACTTAAAATGTGATAGTGTCTGT | CGCGGGCTATCTTCCAAATTTATAAT | chr18 | 48575159 | 48575333 | 174 | Pool1 | 0,5 |
| TP53_1_1 | GACCTAAGAGCAATCAGTGAGGAA | CATCTACAAGCAGTCACAGCAC | chr17 | 7578287 | 7578447 | 160 | Pool1 | 1 |
| TP53_1_2 | CGCCTCACAACCTCCGTCAT | ATGTTTTGCCAACTGGCCAAG | chr17 | 7578406 | 7578533 | 127 | Pool2 | 2 |
| TP53_2_1 | GGGATGTGATGAGAGGTGGAT | CCATCCTCACCATCATCACACTG | chr17 | 7577372 | 7577532 | 160 | Pool1 | 2 |
| TP53_2_2 | GGCTCCTGACCTGGAGTCTT | CATCTTGGGCCTGTGTTATCTCC | chr17 | 7577489 | 7577634 | 145 | Pool2 | 0,75 |
| TP53_3_1 | CGCTTCTTGTCCTGCTTGCTTA | TCCTATCCTGAGTAGTGGTAATCTACTG | chr17 | 7576996 | 7577170 | 174 | Pool1 | 1 |
| TP53_3_2 | GCACCTCAAAGCTGTTCCGT | CAAGGGTGGTTGGGAGTAGATG | chr17 | 7577121 | 7577255 | 134 | Pool2 | 4 |
As indicated, primers were designed to amplify in two separate multiplex PCR reactions. The genome position of recognition sites are provided together with their concentrations in the corresponding PCR primer pool.
Re-sequencing primers.
| primer name | primer sequence (5'-3') | Chr. Numb. | Primer 5' Start | Primer 3' Stop |
|---|---|---|---|---|
| APC_3676-4192_F | 5 | 112839246 | 112839269 | |
| APC_3676-4192_R | 5 | 112839806 | 112839786 | |
| APC_3968-4425_F | 5 | 112839540 | 112839556 | |
| APC_3968-4425_R | 5 | 112840040 | 112840024 | |
| APC_4296-4804_F | 5 | 112839869 | 112839889 | |
| APC_4296-4804_R | 5 | 112840421 | 112840398 | |
| CTNNB1_14-241_F | 3 | 41224400 | 41224419 | |
| CTNNB1_14-241_R | 3 | 41224771 | 41224790 | |
| SMAD4_250-424_F | 18 | 51048594 | 51048615 | |
| SMAD4_250-424_R | 18 | 51049096 | 51049121 | |
| FBXW7_1645-1855_F | 4 | 152324496 | 152324476 | |
| FBXW7_1645-1855_R | 4 | 152324128 | 152324105 | |
| KRAS_F | GGCCTGCTGAAAATGACTGA | 12 | 25245396 | 25245376 |
| KRAS_R_Biotin | biotin-AGCTGTATCGTCAAGGCACTCT | 12 | 25245316 | 25245338 |
| KRAS_seq | AAACTTGTGGTAGTTGGA | 12 | 25245372 | 25245354 |
| NRAS_F | GATTCTTACAGAAAACAAGTGGTTATAGAT | 1 | 114713978 | 114713948 |
| NRAS_R_Biotin | biotin-GCAAATACACAGAGGAAGCCTTCG | 1 | 114713841 | 114713865 |
| NRAS_seq | CTGTTTGTTGGACATACTG | 1 | 114713940 | 114713921 |
| BRAF_F | TGAAGACCTCACAGTAAAAATAGG | 7 | 140753380 | 140753356 |
| BRAF_R_Biotin | biotin-TCCAGACAACTGTTCAAACTGAT | 7 | 140753289 | 140753312 |
| BRAF_seq | TGATTTTGGTCTAGCTACA | 7 | 140753356 | 140753337 |
List of primers used to validate GS Junior sequencing results. The sequence of universal 5' M13 tags are underlined. The position of primers refers to the GRCh38/hg83 genome. Chr. Numb.: chromosome number; F: forward; R: reverse; Biotin: 5' biotin tag; seq: sequencing primer.
Figure 1Sequencing depth. Average number of reads grouped by the samples (A) and amplicons (B), respectively. Sample types are color-coded and the name of the amplicons are shown on the lower horizontal axis. Please note the logarithmic vertical scale.
Figure 2Frequency of homopolymer miscalls. Frequency of false positive insertion or deletion variant calls at homopolymer regions with different lengths (A, black line). The frequency of different length homopolymers on the target region is illustrated with grey line. The probability of homopolymer miscalling was calculated by dividing the frequency of false positive calls with the frequency of homopolymers (B) Sequencing results of normal control samples that were considered to be free of these mutations.
All variants detected by the GS Junior panel.
| sample type | sample number | mutations (gene symbol, cDNA position, type, percentage in the sample) | other variants | ||
|---|---|---|---|---|---|
| NAT | 1 | APC 3871:C/T (59%) [germ] | |||
| AD-LGD | 9 | KRAS 38:G/A (31%) | |||
| AD-LGD | 10 | APC 4314:A/- (69%) | KRAS 35:G/T (36%) | ||
| AD-LGD | 11 | KRAS 35:G/C (47%) | APC SNP 4479:G/A (100%) | ||
| AD-LGD | 12 | FBXW7 1513:C/T (40%) | KRAS 35:G/A (49%) | APC SNP 4479:G/A (47%) | |
| AD-LGD | 13 | KRAS 35:G/A (31%) | APC SNP 4479:G/A (47%) | ||
| AD-LGD | 14 | CTNNB1 134:C/T (16%) | APC SNP 4479:G/A (100%) | ||
| AD-LGD | 17 | APC 3927:AAAGA/----- (23%) | |||
| AD-LGD | 20 | APC 3964:G/T (21%) | APC SNP 4479:G/A (100%) | ||
| AD-LGD | 21 | APC 4241.:-/T (40%) | APC SNP 4479:G/A (61%) | ||
| AD-LGD | 22 | APC 4033:G/T (38%) | APC SNP 4479:G/A (100%) | ||
| AD-LGD | 24 | APC SNP 4479:G/A (41%) | |||
| AD-HGD | 25 | SMAD4 1528:G/A (47%) | APC SNP 4479:G/A (79%) | ||
| AD-HGD | 26 | BRAF 1799:T/A (33%) | APC SNP 4479:G/A (54%) | ||
| AD-HGD | 27 | APC SNP 4479:G/A (99%) | |||
| AD-HGD | 28 | APC 3925:G/T (26%) | APC SNP 4479:G/A (50%) | ||
| AD-HGD | 29 | APC 3927:AAAGA/----- (55%) | KRAS 35:G/C (29%) | APC SNP 4479:G/A (51%) | |
| AD-HGD | 30 | APC 3927:AAAGA/----- (87%) | KRAS 35:G/A (35%) | SMAD4 400:G/A (82%) | |
| AD-serr | 31 | APC SNP 4479:G/A (47%) | |||
| AD-serr | 33 | BRAF 1799:T/A (29%) | APC SNP 4479:G/A (51%) | ||
| AD-serr | 34 | BRAF 1799:T/A (4%) | APC SNP 4479:G/A (73%) | ||
| AD-serr | 37 | APC 4348:C/T (20%) | FBXW7 1745:C/T (3%) | TP53 845:G/A (6%) | |
| AD-serr | 38 | APC 4348:C/T (4%) | FBXW7 1745:C/T (9%) | TP53 845:G/A (4%) | |
| AD-serr | 39 | CTNNB1 47:C/T (21%) # | TP53 845:G/A (8%) | ||
| AD-serr | 40 | FBXW7 1745:C/T (2%) | TP53 845:G/A (11%) | ||
| AD-serr | 41 | FBXW7 1745:C/T (5%) | TP53 845:G/A (4%) | ||
| CRC | 42 | APC 4135:G/T (73%) + | FBXW7 1740:C/G (22%) [new] + | KRAS 35:G/A (15%) * | |
| CRC | 43 | APC 3927:AAAGA/----- (74%) + | SMAD4 413:C/G (70%) [new] + | TP53 711:G/C (59%) | |
| CRC | 45 | FBXW7 1393:C/T (41%) | FBXW7 1745:C/T (56%) | ||
| CRC | 46 | TP53 814:G/A (23%) | APC SNP 4479:G/A (31%) | ||
| CRC | 47 | APC SNP 4479:G/A (100%) | |||
| CRC | 48 | TP53 733:G/A (12%) | APC SNP 4479:G/A (50%) | ||
| CRC | 49 | APC 3916:G/T (41%) + | APC SNP 4479:G/A (13%) + | ||
| CRC | 50 | NRAS 181:C/A (19%) * | SMAD4 1569:C/G (8%) | TP53 1538:C/T (45%) | APC SNP 4479:G/A (100%) + |
| CRC | 51 | APC 3927:AAAGA/----- (16%) + | FBXW7 1745:C/T (18%) | TP53 845:G/A (18%) | APC SNP 4479:G/A (100%) + |
| CRC | 52 | TP53 845:G/A (13%) | APC SNP 4479:G/A (49%) | ||
| CRC | 53 | KRAS 35:G/T (14%) * | TP53 814:G/A (13%) | APC SNP 4479:G/A (48%) + | |
| CRC | 54 | APC SNP 4479:G/A (97%) | |||
| CRC | 55 | APC 3915:A/- (20%) + | KRAS 35:G/A (13%) # ! | KRAS 38:G/A (6%) * + | APC SNP 4479:G/A (47%) + |
| CRC | 56 | KRAS 35:G/A (43%) * | APC SNP 4479:G/A (74%) + | ||
| CRC | 57 | KRAS 35:G/T (17%) * | SMAD4 1051:G/C (18%) | TP53 845:G/A (14%) | APC SNP 4479:G/A (32%) + |
| CRC | 58 | BRAF 1799:T/A (35%) * | TP53 818:G/A (66%) | TP53 845:G/A (9%) | APC SNP 4479:G/A (95%) + |
| cell line | 59 | SMAD4 1051:G/C (100%) | APC SNP 4479:G/A (100%) | ||
| cell line | 60 | BRAF 1799:T/A (31%) | TP53 818:G/A (100%) | APC SNP 4479:G/A (100%) | |
+ : variants that have been validated by Sanger sequencing with a concordant result. * : variants that have been validated by PyroMark pyrosequencing with a concordant result. # : variants that have been validated by Sanger sequencing with a different result. ! : variants that have been validated by PyroMark pyrosequencing with a different result. [new]: potential new mutation/not described in COSMIC or HGMD databases. [germ]: suspected germline mutation. NAT: normal adjacent tumor; AD-LGD: low-grade adenoma; AD-HGD: high-grade adenoma; AD-serr: serrated adenoma; CRC: colorectal cancer.
Figure 5Novel mutations discovered by the panel. FBXW7 c.1740:C>G missense substitution (A), and SMAD4 c.413C>G nonsense mutation (B) were re-sequenced by Sanger sequencing protocol. The name of the variant, the sample and the allele frequency obtained by the GS Junior instrument is indicated above the electropherograms. Variable positions are signed with red arrows. The FBXW7 mutation causes a His580Gln amino acid change in the protein sequence, while SMAD4 mutation causes the formation of a stop codon, thus the product is a truncated protein. FW: forward sequencing primer; Rev: reverse sequencing primer.
Figure 7The two false positive mutations identified by the multiplex NGS panel. AVA Global alignment of the false positive KRAS (A) and CTNNB1 (B) mutations after GS Junior sequencing. Sequencing depth is illustrated with blue lines, while the allele frequency of variants is visualized as green columns. Consensus sequences of reads is also represented in the global alignment. AVA detected two KRAS mutations (35G>A and 38G>A) in CRC_55 sample (A, red arrows), but only the 38G>A proved to be valid by Sanger sequencing (C) and pyrosequencing (D). Seven CTNNB1 variants appeared on AVA global alignment of AD-serr_39 sample, but only the 47C>T variant passed the AVA criteria and was annotated as a mutation (B, red arrow). Even this latter variant could not be confirmed by Sanger re-sequencing (E, black arrow), so CTNNB1 47C>T was also considered a false mutation call. Black vertical arrows point at wild type peaks, while red arrows show the peaks or position of mutations detected by the corresponding method.
Figure 3The number of mutations per sample in different pathological groups detected with the multiplex panel. The number of mutations (A) and the number of samples bearing multiple mutations (B) increases along the adenoma-CRC sequence (red color: samples with 3 mutations; yellow: samples with 2 mutations; green: samples with 1 mutation). Similar tendency is visible in the number of mutations (C) and the proportion of multiple mutated samples (D) in different types of the adenomas. NAT: normal adjacent tumor samples; AD-LGD: low-grade adenoma samples; AD-HGD: high-grade adenoma samples; AD-serr: serrated adenoma samples; CRC: colorectal cancer samples.
Figure 4Mutation frequencies in colorectal cancer and different stages of adenomas. The figure demonstrates the proportion of samples in different pathological groups bearing at least one mutation in a given gene. NAT: normal adjacent tumor samples; CRC: colorectal cancer samples; AD-LGD: low-grade adenoma samples; AD-HGD: high-grade adenoma samples; AD-serr: serrated adenoma samples.
Figure 6Re-sequencing of KRAS (A), BRAF (B) and NRAS (C) point mutations of CRC samples were validated by pyrosequencing. The name of the variant, sample and allele frequency detected by the GS Junior instrument together with the short sequence to analyze are indicated. The 5'-3' direction of sequencing primer and the position of the variable nucleotide peaks are signed with black horizontal and red vertical arrows, respectively. Black vertical arrows point at wild type peaks of the variable nucleotides, while red arrows show abnormal peaks denoting the presence of a mutation. Wt: wild type sequence.