| Literature DB >> 33026356 |
Julian Gruendner1, Nicolas Wolf1, Lars Tögel2, Florian Haller2, Hans-Ulrich Prokosch1, Jan Christoph1.
Abstract
BACKGROUND: The introduction of next-generation sequencing (NGS) into molecular cancer diagnostics has led to an increase in the data available for the identification and evaluation of driver mutations and for defining personalized cancer treatment regimens. The meaningful combination of omics data, ie, pathogenic gene variants and alterations with other patient data, to understand the full picture of malignancy has been challenging.Entities:
Keywords: Fast Healthcare Interoperability Resources; GEnome MINIng; data analysis; data standardization; genetic databases; next-generation sequencing
Mesh:
Year: 2020 PMID: 33026356 PMCID: PMC7578821 DOI: 10.2196/19879
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Genomics pipeline and analysis system. NGS: next-generation sequencing; VCF: variant call format.
Figure 2Part 1: Next-generation sequencing processing and analysis pipeline architecture. GUI: graphical user interface; VCF: variant call format; GEMINI: GEnome MINIng.
Figure 3GEMINI pipeline schema. SnpEff: single nucleotide polymorphism effect; GEMINI: GEnome MINIng.
Figure 4Exemplified user interface.
Figure 5Combining genomics and patient data. VCF: variant call format; GEMINI: GEnome MINIng; FHIR: Fast Healthcare Interoperability Resources.
Comparison of the statistical evaluation of the 1000 Genomes Project and GEMINI (GEnome MINIng) pipeline.
| Mutations | 1000 Genomes Project (n) | GEMINI (n) | |||
|
| |||||
|
| Total variants | ~1,300,000 | 1,275,275 | ||
|
| Average per sample | ~105,000 | 104,757 | ||
|
| |||||
|
| Total variants | ~59,000 | 59,157 | ||
|
| Average per sample | ~13,000 | 12,715 | ||
|
| |||||
|
| Total variants | 432 | 432 | ||
|
| Average per sample | 26 | 26 | ||
Comparison of the filtered results of the mutations in the GEMINI (GEnome MINIng) and Illumina pipeline.
| Chromosome | Position | Codon change (according to the Human Genome Variation Society coding) | Illumina | GEMINI |
| 1 | 40366658 | c.539A>G | ✓ | ✓ |
| 1 | 40366659 | c.537_538insCG | ✓ | ✓ |
| 2 | 215632255 | c.1518_1519delTGinsCA | ✓ | ✓ |
| 3 | 12645693 | c.776C>G | ✓ | ✓ |
| 4 | 106156187 | c.1088C>T | ✓ |
|
| 5 | 112178795 | c.7504G>A | ✓ | ✓ |
| 5 | 176520270 | c.1189_1190delGGinsAC | ✓ | ✓ |
| 5 | 176522605 | c.1702_1704delCCAinsGCC | ✓ | ✓ |
| 7 | 116411990 | c.3029C>T | ✓ | ✓ |
| 9 | 21970916 | c.442G>A |
| ✓ |
| 16 | 3778424 | c.6624A>C | ✓ | ✓ |
| 16 | 3779338 | c.5709dupG | ✓ | ✓ |
| 16 | 3779338 | c.5709delG | ✓ | ✓ |
| 16 | 3779361 | c.5687A>C | ✓ | ✓ |
| 17 | 7579408 | c.277_278delCT | ✓ | ✓ |
| 22 | 41546158 | c.2773C>A | ✓ | ✓ |
Example of a prepared data set.a
| Patient_ID | Gender | Date of birth | Disease | <prefix>b ref | <prefix>b alt | <prefix>b gts |
| 28 | male | 01.10.41 | 1 | G | C | C |
aThis table shows only the examples of values.
b
Format of the initial raw data set.a
| Patient_IDb | Genderc | Date of birthd | Diagnosise (ICDf10 Code) | Geneh |
| PSEUDO-ID-1 | Female | 01.01.50 | 2019-01-01T00:00:00+00:00 |
|
| PSEUDO-ID-2 | Male | 01.01.50 | 2019-01-01T00:00:00+00:00 |
|
| PSEUDO-ID-3 | Male | 01.01.50 | - | - |
| PSEUDO-ID-4 | Male | 01.01.50 | 2019-01-01T00:00:00+00:00 | - |
aSince the combined data set comprised 206 patients, 135 diagnoses, and 152 genes, values shown in this table are only examples, as the entire data cannot be represented here.
bExamples of IDs of patients.
cExamples of genders.
dExamples of birth dates of patients.
eExample value, C20 or C61; timestamp diagnosis. (One column per diagnosis - if there is no diagnosis for a patient, the column will be empty in this patient's row.)
fInternational Classification of Diseases.
hGene name examples. (One column per Gene - if there is no gene mutation for a patient, the column will be empty in this patient's row.)
Figure 6Distribution (%) of the gene mutations by location. Y-axis: location (number of patients); x-axis: gene, eg, 80% for stomach for TET2 means that 4 of 5 patients with stomach cancer had a mutation in TET2.