| Literature DB >> 29320538 |
Ann-Marie Patch1, Katia Nones1, Stephen H Kazakoff1, Felicity Newell1, Scott Wood1, Conrad Leonard1, Oliver Holmes1, Qinying Xu1, Venkateswar Addala1, Jenette Creaney2,3, Bruce W Robinson2,3, Shujin Fu4, Chunyu Geng4, Tong Li4, Wenwei Zhang4, Xinming Liang4, Junhua Rao4, Jiahao Wang4, Mingyu Tian4, Yonggang Zhao4, Fei Teng4, Honglan Gou4, Bicheng Yang4, Hui Jiang4, Feng Mu4, John V Pearson1, Nicola Waddell1.
Abstract
Technological innovation and increased affordability have contributed to the widespread adoption of genome sequencing technologies in biomedical research. In particular large cancer research consortia have embraced next generation sequencing, and have used the technology to define the somatic mutation landscape of multiple cancer types. These studies have primarily utilised the Illumina HiSeq platforms. In this study we performed whole genome sequencing of three malignant pleural mesothelioma and matched normal samples using a new platform, the BGISEQ-500, and compared the results obtained with Illumina HiSeq X Ten. Germline and somatic, single nucleotide variants and small insertions or deletions were independently identified from data aligned human genome reference. The BGISEQ-500 and HiSeq X Ten platforms showed high concordance for germline calls with genotypes from SNP arrays (>99%). The germline and somatic single nucleotide variants identified in both sequencing platforms were highly concordant (86% and 72% respectively). These results indicate the potential applicability of the BGISEQ-500 platform for the identification of somatic and germline single nucleotide variants by whole genome sequencing. The BGISEQ-500 datasets described here represent the first publicly-available cancer genome sequencing performed using this platform.Entities:
Mesh:
Year: 2018 PMID: 29320538 PMCID: PMC5761881 DOI: 10.1371/journal.pone.0190264
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Average genome read depth using BGISEQ-500 and HiSeq X Ten data.
The average whole-genome sequencing read depth for each platform (blue BGISEQ-500, yellow HiSeq X Ten), for each tumour (T) and normal (N) sample is displayed for three mesothelioma patients (9869, 11202 and 11398). Prior to variant calling sequence reads underwent quality filtering, and the subsequent average read depth remained similar between sequencing platforms, this is a more relevant measure of read depth as it represents the ‘usable’ portion of the sequencing data for detecting variants. The average quality-filtered sequencing read depth is indicated by the shaded bar.
The percent concordance of germline genotypes ascertained by SNP arrays compared to the BGISEQ-500 and HiSeq X Ten data.
| Patient | SNP array vs BGISEQ-500 | SNP array vs HiSeq X Ten |
|---|---|---|
| 99.797 | 99.789 | |
| 99.794 | 99.794 | |
| 99.797 | 99.795 |
Number of germline and somatic variants identified in three mesothelioma samples using whole genome sequencing.
The percentage of the germline variants identified in this study and reported in European population data from gnomAD are presented in brackets.
| SNV | Indels | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 9869 | 11202 | 11398 | All Patients | 9869 | 11202 | 11398 | All Patients | ||
| Germline | Identified in both platforms | 3,033,980 | 3,146,317 | 3,092,543 | 9,272,840 | 193,359 | 190,436 | 185,905 | 569,700 |
| (96.8%) | (96.8%) | (96.8%) | (96.8%) | (91.7%) | (91.8%) | (92%) | (91.8%) | ||
| HiSeq X Ten only | 313,015 | 321,627 | 407,966 | 1,042,608 | 33,143 | 35,253 | 41,480 | 109,876 | |
| (42.3%) | (42.3%) | (41.9%) | (42.1%) | (58.5%) | (58.4%) | (59.2%) | (58.7%) | ||
| BGISEQ-500 only | 161,128 | 118,336 | 92,050 | 371,514 | 7,025 | 6,931 | 5,789 | 19,745 | |
| (4%) | (2.4%) | (4.1%) | (3.55%) | (13.8%) | (13.8%) | (11.6%) | (13.1%) | ||
| Total | 3,508,123 | 3,586,280 | 3,592,559 | 10,686,962 | 233,527 | 232,620 | 233,174 | 699,321 | |
| Somatic | Identified in both platforms | 3,554 | 2,342 | 1,955 | 7,851 | 197 | 168 | 114 | 479 |
| HiSeq X Ten only | 697 | 424 | 411 | 1,532 | 135 | 93 | 78 | 306 | |
| BGISEQ-500 only | 540 | 474 | 493 | 1,507 | 102 | 156 | 229 | 487 | |
| Total | 4,791 | 3,240 | 2,859 | 10,890 | 434 | 417 | 421 | 1,272 | |
Fig 2Germline variants identified in three mesothelioma samples (patients: 9869, 11202 and 11398) using BGISEQ-500 and HiSeq X Ten data.
The number of germline SNV (a) and indels (b) identified in each patient using the BGISEQ-500 and HiSeq X Ten platforms. We investigated germline SNV (c) and indels (d) which were only called in one platform and that fall into three categories: i) identified as germline in the other platform but with low evidence; ii) identified in the other platform but predicted as a somatic variant; or iii) not identified in the other platform. Across the 3 patients only 197,434 (1.85%) SNVs were truly unique to the HiSeq X Ten and not identified in the BGISEQ-500 (c). Similarly in the BGISEQ-500 platform only 38,236 SNVs (0.36% of the total) were truly unique to the BGISEQ-500, not called in the HiSeq X Ten data (c). The same pattern was observed for indels (d), only 3.23% were unique to HiSeq X Ten and 0.19% to BGISEQ-500.
Fig 3The sequence coverage of germline variants and the length of the indels which were identified in one sequence platform.
Read depth in Illumina for variants unique to BGISEQ-500 (a) read depth in BGI for variants unique to Illumina (b). The distribution of the length (number of bases) of the indels that were identified in both sequencing platforms or unique to the HiSeq X Ten or BGISEQ-500 data is plotted (c).
Fig 4Somatic variants in mesothelioma patients identified using BGISEQ-500 and HiSeq X Ten data.
A summary of the somatic variants identified in 3 mesothelioma patient samples (patient ID: 9869, 11202 and 11398) using different sequencing platforms. The number of somatic SNV (a) and indels (b) identified using the BGISEQ-500 and HiSeq X Ten platforms in each patient. The somatic SNV (c) and indels (d) which were only called in one platform fall into three categories: i) identified as somatic in the other platform but with low evidence; ii) identified in the other platform but predicted as a germline variant; or iii) not identified in the other platform.