| Literature DB >> 30305723 |
Michaela Kuhlen1, Julia Taeubner2, Triantafyllia Brozou2, Dagmar Wieczorek3, Reiner Siebert4, Arndt Borkhardt2.
Abstract
The discovery of cancer-predisposing syndromes (CPSs) using next-generation sequencing (NGS) technologies is of increasing importance in pediatric oncology with regard to diagnosis, treatment, surveillance, family counselling and research. Recent studies indicate that a considerable percentage of childhood cancers are associated with CPSs. However, the ratio of CPSs that are caused by inherited vs. de novo mutations (DNMs), the risk of recurrence, and even the total number of genes, which should be considered as a true cancer-predisposing gene, are still unknown. In contrast to sequencing only single index patients, family-based NGS of the germline is a very powerful tool for providing unique insights into inheritance patterns (e.g., DNMs, parental mosaicism) and types of aberrations (e.g., SNV, CNV, indels, SV). Furthermore, functional perturbations of key cancer pathways (e.g., TP53, FA/BRCA) by at least two co-inherited heterozygous digenic mutations from each parent and currently unrecognized rare variants and unmeasured genetic interactions between common and rare variants may be a widespread genetic phenomenon in the germline of affected children. Therefore, family-based trio sequencing has the potential to reveal a striking new landscape of inheritance in childhood cancer and to facilitate the integration and efforts of individualized treatment strategies, including personalized and preventive medicine and cancer surveillance programs. Consequently, cancer genetics is becoming an increasingly common approach in modern oncology, so trio-sequencing should also be routinely integrated into pediatric oncology.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30305723 PMCID: PMC6755997 DOI: 10.1038/s41388-018-0520-9
Source DB: PubMed Journal: Oncogene ISSN: 0950-9232 Impact factor: 9.867
Overview on Trio-NGS Studies
| Research of the study | Sample size | Tissue type | Type of sequencing (WGS or WES) | Analysis pipeline | Silent mutations | Main research findings | Reference |
|---|---|---|---|---|---|---|---|
| Determination of the contribution of post-zygotic events to | 50 parent-child trios | Peripheral blood | whole-genome sequencing (80-fold coverage for defining | Variants were called with CG software v.2.4. | not considered | 6.5 % of the presumed germline | [ |
| National study of the human genome in order to discover a complete set of variations between individual genomes. Trio approach to identify | 10 parent-child trios | Peripheral blood | whole-genome sequencing (high depth, 50×) | SNVs and short indels were called using the genome analysis toolkit (GATK) | not considered | They reproted 536k novel SNVs and 283k novel short indels from mapping approaches and developed a population-wide | [ |
| The first direct comparative analysis of male and female germline mutation rates from the complete genome of two parent-child trios | 2 parent-child trios | Peripheral blood | whole-genome sequencing | Three different algorithms, the family-aware probabilistic Illumina-read–based method, the family-aware Illumina genotype-likelihood–based method and the sample-independent multiple technology genotype–based method were developed for DNM discovery | not considered | Identification of 49 and 35 germline | [ |
| Diagnostic exome sequencing was immediately successful in diagnosing patients in whom traditional technologies were uninformative. The data demonstrate the utility of family-based exome sequencing and analysis to obtain the highest reported detection rate in an unselected clinical cohort, illustrating the utility of diagnostic exome-sequencing as a transformative technology for the molecular diagnosis of genetic disease | 500 parent-child trios | Peripheral blood | whole-exome sequencing | The sequence data were aligned to the reference human genome (GRCh37), and variant calls were generated using CASAVA (Consensus Assessment of Sequence And Variation, Illumina) and Pindel. The HGMD, the Single Nucleotide Polymorphism database, the 1000 Genomes Project,HapMap data, and online search engines like PubMed were used to search for previously described gene mutations and polymorphisms. Data were annotated with the Ambry Variant Analyzer tool, including nucleotide and amino acid conservation, biochemical nature of amino acid substitutions, population frequency (Exome Variant Server (National Heart, Lung, and Blood Institute Grand Opportunity Exome Sequencing Project) and the 1000 Genomes Project), and predicted functional impact (including PolyPhen and SIFT in silico prediction tools). Sequence alignments of the reads were viewed using IGV (Integrative Genomics Viewer) | synonymous variants were filtered, except those at the first and last nucleotide position of an exon | The diagnostic rate was significantly higher among families undergoing a trio (37%) as compared to singleton (21%) whole-exome sequencing strategy. Overall, 30.4% (152/500) of patients undergoing WES data anaylsis had a positive gene finding in a characterized gene. Approximately 26% (130/500) received a definitive molecular diagnosis and 4.2% (22/500) received a likely positive result with relevant alterations detected in characterized genes. Among 416 patients who underwent novel gene analysis, 7.5% (31) were positive for a novel gene finding. The overall positive rate among all gene types was 38.5% (160/416). Uncertain findings in characterized genes were found in 8.8% of probands (44/500). Approximately half of all patients (52%) had no relevant gene findings (215/416) | [ |
| Analysis of 11,020 | 250 parent-child trios (231 trios, 11 families with monozygotic twins and 8 families with dizygotic twins) | Peripheral blood | whole-genome sequencing | Alignment and variant calling were devised on the basis of GATK best practices v2. Sequence data were mapped to the human reference genome Build 37 using bwa 0.5.9-r16, duplicate reads were removed using Picard tools, local indel realignment was performed around indels using GATK IndelRealigner and base qualities were recalibrated using GATK BaseQualityScoreRecalibration. Variants were called using GATK UnifiedGenotyper v1.4 on all samples simultaneously and filtered using GATK VariantQualityScoreRecalibration | Silent mutations were considered. Gene-level mutation rates, separately estimating synonymous, missense and nonsense mutation rates were additionally calculated. | The study shows that | [ |
| Identifictaion of | 50 parent-child trios (patients with severe ID and their unaffected parents) | N/A | whole-genome sequencing (average genome-wide coverage of 80-fold) | not considered | Severe intellectual disability (ID) occurs in 0.5% of newborns. 84 | [ | |
| Trio approach to investigate mutational signature and differences between maternally and paternally derived DNMs. A data set of 7,216 autosomal | 816 parent-child trios | Peripheral blood | whole-genome (average genome-wide coverage of 60-fold) | cgatools calldiff program, to identifiy the parental origin of the DNM allele, phasing of the | not considered | Results show that the number of | [ |
| Identification and analyses of | 343 families (patient with ASD and at least one unaffected sibling) | Peripheral blood | whole-exome sequencing | Standard Illumina analysis pipeline (CASAVA), BWA for alignment, and GATK for refinements, SNV and indel variant caller: Multinomial Model | Silent mutations were considered, proband versus sibling at 40x coverage (53 to 42) | [ | |
| Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. Mutation rates and characteristics of de novo indels and SVs in the general population remain largely unexplored | 231 parent-child trios (11 quartets with monozygotic (MZ) twins, and eight quartets with dizygotic (DZ) twins, for a total of 258 genetically distinct children) | Peripheral blood | whole-genome, medium coverage (14.5x median sequence depth; 38.4x median physical depth of paired-end sequencing data combined with a family-based design) | Reads were aligned to the GRCh37/hg19 human genome reference using BWA 0.5.9-r164. Aligned data were processed following the Genome Analysis Toolkit (GATK) best practices v2. Duplicate reads were marked using Picard tools ( | not considered | This study reports 332 validated | [ |
| Identification of candiate genes for intellectual disability (ID). A meta-analysis on 2.637 | 2.104 parent-child trios (820 patients, 359 females, 461 males, IQ 50-70 and IQ < 30) | Peripheral blood | whole-exome sequencing (median coverage of 75 × ) | Variants were called using GATK unified genotyper (version 3.2-2) and annotated with a custom diagnostic annotation pipeline | not considered | Statistical analyses identified 10 new candiate genes (DLG4, PPM1D, RAC1, SMAD6, SON, SOX5, SYNCRIP, TCF20, TLK2 and TRIP12) that are associated with ID | [ |
| Identification of genetic risk factors and the role of | 175 parent-child trios | N/A | whole-exome sequencing | Data was processed with Picard, BWA for mapping and SNPs were called using GATK for all trios | 161 coding region point mutations (101 missense, 50 silent and 10 nonsense mutations) | Half of the patients (46.3%) carried a missense or nonsense | [ |
| Analysis of 21 | 20 parent-child trios (individuals with sporadic ASD and their parents) | Peripheral blood | whole-exome sequencing (sufficient coverage to call variants for ~90% of the primary target, 26.4 Mb) | The exome definition was based on consensus coding sequence (CCDS 2009) of the human reference genome (build36). BWA (0.5.6) reads mapping, consensus genotypes were generated using SAMtools, variant positions were pulled and filtered using the samtools.pl varFilter, variants were then run through a custom pipeline, Haystack, to identify Mendelian errors, possibly | not considered | In total, 21 | [ |
| Meta-analysis of 6.570 mutations showed that germline methylation influences mutation rates and is increased with paternal age in all families | 3 multi-sibling families | Peripheral blood | whole-genome sequencing (24.7-fold coverage on average) | De NovoGear software | not considered | The mutation rate increases with paternal age in all families, but the number of additional mutations per year differed by more than two-fold between families, in parental germline 3.8% of mutations were mosaic, resulting in 1.3% of mutations being shared by siblings, average of 64 DNMs (43-84) per child, the average genome-wide mutation rate of 1.28 × 10−8 mutations per nucleotide per generation and the ratio of paternal to maternal (3.5) mutations are slightly higher but compatible with previous estimates, on average, the number of mutations in the child increases approximately linearly by 2.9 mutations with each additional year in the parents’ age | [ |
| The aim of the study was to identify | 20 parent-child trios (children with intellectual disability and their parents from ten centres in Germany and Switzerland) | Peripheral blood | whole-exome sequencing (samples were sequenced as 100 bp paired-end runs on a HiSeq2000 system (Illumina). Pools of 12 indexed libraries were sequenced on four lanes) | To identify putative | Silent mutations were considered. The study detected on average 10,500 synonymous and 9,600 non-synonymous variants. The synonymous mutation rate was lower in cases compared to controls, whereas the average number of protein-altering (missense, nonsense, frameshift, and splice site) variants was significantly higher in the case group than in the control group | The study identified 87 | [ |
| Analyses of | 238 parent-child trios (928 individuals, 225 families (200 quartets, 25 trios)) | Peripheral blood | whole-exome sequencing | Short read sequences were aligned to hg18 with BWA, variants were predicted using SAMtools, the data was normalized across each family by only analyzing bases with at least 20 unique reads in all family members, to allow an accurate comparison between the | This data demonstrates that non-synonymous | [ | |
| Healty families participating in the 1000 genome project | N/A | whole-genome and whole-exome sequencing | N/A | Whole-genome sequencing: 74 germline SNVs occur | reviewed in ref. [ | ||
| 10 parent-child trios (patients with unexplained mental retardation) | Peripheral blood | whole-exome sequencing (median coverage of 42-fold) | diBayes algorithm, SOLiD Small Indel Tool | not considered (excluded all nongenic, intronic (other than canonical splice sites) and synonymous variants) | The discovery of nine | [ | |
| Evaluating novel bioinformatics approaches to aid identification of new gene-disease associations. Trio analysis to identify both diagnostic genotypes in known genes and candidate genotypes in novel genes | 119 parent-child trios | Peripheral blood | whole-exome sequencing (on average, 94.2% of the exome-wide consensus coding sequence was covered with at least 10-fold coverage) | Using BWA-0.5.10, sequencing reads were mapped to a Genome Reference Consortium Human Genome Build 37-derived alignment set including decoy sequences; the same reference genome is used in the 1000 Genomes Project. Polymerase chain reaction duplicates were removed using picard-tools. Single-nucleotide variants and small insertions/deletions (indels) were called using the UnifiedGenotyper of the GATK and annotated using SnpEff-3.3 | not analyzed (synonymous mutations were assigned a score of 0) | This study indicates that the application of appropriate bioinformatic approaches to clinical sequence data can also help to implicate novel disease genes and suggest expanded phenotypes for known disease genes.These results suggest that some cases resolved by WES will have direct therapeutic implications on the patient | [ |
Abbreviations: de novo mutations (DNMs), Whole-exome sequencing (WES), Whole-genome sequencing (WGS), autism spectrum disorder (ASD), intellectual disability (ID), amplicon-based deep sequencing (ADS), genome analysis toolkit (GATK), Burrows-Wheeler Aligner (BWA), not applicable (NA)
Inherited versus de novo mutation rates
| Inherited mutations | De novo mutations | |
|---|---|---|
| Single-nucleotide variants (SNVs) in the genome | ~ 4.4 × 106 (1) | 44–82 (1, 2) |
| SNVs in the exome (coding SNVs) | 22,186 (2) | 1–2 (3) |
| Small insertion and deletions (INDELs) | ~550,000 (1) | up to 9 (4) |
| Copy number variations (CNVs) | ~276 (1, 2) | 0.0077–0.041 (4) |
| Ratio of paternal allele versus maternal allele | 1 : 1(5, 6) | 3.5–3.9: 1 (5, 6,7) |
| Parental age effect at conception | No effect(8) | Strong effect(8) |
Benefits of trio germline sequencing in children with cancer
| Sequencing of the index patient only | Trio sequencing | |
|---|---|---|
| Identification of well-known CPSs | + | + |
| SNVs, indels, SVs, CNVs | + | + |
| Inheritance information including | ||
| Homozygosity mapping | Isodisomy | + |
| Inference of compound heterozygosity | − | + |
| Inheritance anomalies | − | + |
| De novo mutations incl. age effects | − | + |
| Mosaicism | (+) | + |
| Concomitant variants | + | + |
| Phenotypic variability, age-related penetrance and gender-specific cancer risk | − | + |
| Phasing of variants | − | + |
| Treatment adaptation & surveillance | + | + |
| Risk evaluation of unaffected parents, surveillance & precision prevention | (−) | + |
| Determination of the accurate risk to carry the variant for other family members | − | + |
| Prenatal diagnostics | n/a | + |