| Literature DB >> 30987386 |
Tatiana Maroilley1,2, Maja Tarailo-Graovac3,4.
Abstract
The problem of 'missing heritability' affects both common and rare diseases hindering: discovery, diagnosis, and patient care. The 'missing heritability' concept has been mainly associated with common and complex diseases where promising modern technological advances, like genome-wide association studies (GWAS), were unable to uncover the complete genetic mechanism of the disease/trait. Although rare diseases (RDs) have low prevalence individually, collectively they are common. Furthermore, multi-level genetic and phenotypic complexity when combined with the individual rarity of these conditions poses an important challenge in the quest to identify causative genetic changes in RD patients. In recent years, high throughput sequencing has accelerated discovery and diagnosis in RDs. However, despite the several-fold increase (from ~10% using traditional to ~40% using genome-wide genetic testing) in finding genetic causes of these diseases in RD patients, as is the case in common diseases-the majority of RDs are also facing the 'missing heritability' problem. This review outlines the key role of high throughput sequencing in uncovering genetics behind RDs, with a particular focus on genome sequencing. We review current advances and challenges of sequencing technologies, bioinformatics approaches, and resources.Entities:
Keywords: bioinformatics; genome sequencing; long/short read sequencing; missing heritability; rare disease; variant annotation; variant detection; variation databases
Mesh:
Year: 2019 PMID: 30987386 PMCID: PMC6523881 DOI: 10.3390/genes10040275
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1ClinVar variome. Representation of ClinVar variant types (as of December 2018). About 13% were structural variants. The annotation of variants is according to sequence ontology [19].
Figure 2Uncovering missing heritability. A spectrum of variants, beyond the SNVs (single nucleotide variants), contributes to human genetic conditions as either germline or somatic variations. In addition, different types of variants, such as large insertions (including mobile element insertions (MEI)), deletions, duplications, as well as translocations, inversions, repeat expansions and other complex changes may be the source of genetic modifiers with the capacity to alleviate or exacerbate the effect of the primary pathogenic variant, and thus contribute to phenotypic variability (severe-mild-none).
Examples of diagnoses facilitated by Whole Genome Sequencing (WGS).
| Authors | Year | Gene | Disease | Type of Variation | Type of WGS | Ref. |
|---|---|---|---|---|---|---|
|
| 2011 | Multiple | Severe congenital abnormalities | De novo SV (chromothripsis) | SOLiD | [ |
|
| 2014 |
| Phelan-McDermid syndrome | De novo 66 kb deletion | Complete Genomics | [ |
|
| 2014 |
| Cohen syndrome | 1.7 kb and 122 kb deletions | Complete Genomics | [ |
|
| 2014 |
| Rett syndrome | De novo 0.6 kb deletion | Complete Genomics | [ |
|
| 2014 |
| Intellectual disability | De novo 62 kb interspersed duplication | Complete Genomics | [ |
|
| 2014 |
| Cornelia de Lange syndrome | De novo 2.1 kb deletion | Complete Genomics | [ |
|
| 2014 | Multiple | 16p11.2 deletion syndrome | De novo 611 kb deletion | Complete Genomics | [ |
|
| 2014 |
| Intellectual Disability | De novo 382 kb deletion | Complete Genomics | [ |
|
| 2017 |
| DPDD | Large intragenic inversion | Illumina | [ |
|
| 2017 | Multiple | Pulmonary alveolar proteinosis | 425 kb deletion | Illumina | [ |
|
| 2017 |
| Polycystic kidney disease | Various, 18/19 probands | PacBio | [ |
|
| 2017 | Multiple | Severe congenital abnormalities | De novo SV (chromothripsis) | ONT 1 + Illumina | [ |
|
| 2018 |
| Central hypoventilation syndrome | GCN (25) repeat expansion [+25] | Illumina | [ |
|
| 2018 |
| Nemaline myopathy 1 | Large deletion | Illumina | [ |
|
| 2018 |
| Tuberous sclerosis type 2 | De novo deep intronic SNV | Illumina | [ |
|
| 2018 |
| Ocular albinism | Deep intronic variant | Illumina | [ |
|
| 2018 |
| Ornithine transcarbamylase deficiency | Deep intronic variant | Illumina | [ |
|
| 2018 | Multiple | Global developmental delay | Balanced inverted translocation | Illumina | [ |
|
| 2018 |
| Global developmental delay | De novo 63 kb tandem duplication | Illumina | [ |
|
| 2018 |
| Bardet-Biedl syndrome | Retrotransposon insertion | Illumina | [ |
|
| 2018 |
| Epileptic encephalopathy | De novo 13 bp duplication | Illumina | [ |
|
| 2018 |
| Glycogen storage disease type Ia | 7.1 kb deletion | ONT 1 | [ |
|
| 2018 |
| Carney complex | De novo 2184 bp deletion | PacBio | [ |
|
| 2018 |
| Coffin-Siris syndrome | De novo complex SV dupINVinvDEL | Illumina | [ |
|
| 2018 |
| Seizures; Intellectual disability | De novo complex SV delINVdup | Illumina | [ |
|
| 2018 |
| Cone-rod dystrophy; Hearing loss | complex homozygous SV delINVdel | Illumina | [ |
|
| 2018 |
| Birth asphyxia; Fetal distress | De novo complex SV dupINVdup | Illumina + ONT 1 | [ |
|
| 2018 |
| BAFME 3 | TTTCA and TTTTA repeat expansions | PacBio +ONT | [ |
|
| 2018 |
| BAFME 3 | TTTCA and TTTTA repeat expansions | PacBio + ONT | [ |
|
| 2018 |
| BAFME 3 | TTTCA and TTTTA repeat expansions | PacBio + ONT | [ |
|
| 2019 |
| BAFME 3 | 4.6 kb intronic repeat insertion | PacBio | [ |
1 Oxford Nanopore Tech. 2 Lionel et al., reported 18 diagnoses by WGS; however, the majority was missed by exome panels since panels did not include the corresponding gene. The two deep intronic variants included in this table would not have been detected by exome sequencing approaches. 3 Benign adult familial myoclonic epilepsy.
Examples of bioinformatics tools that facilitate comprehensive genome analyses.
| Authors | Year | Tool | Method | Input 1 | Variants Detected | Reference |
|---|---|---|---|---|---|---|
|
| 2011 | CNVnator | Read Depth | PE2 Short read WGS | Copy Number Variants | [ |
|
| 2012 | DELLY | Paired-ends, Read depth, Split-reads | Short read WGS | Structural Variants | [ |
|
| 2014 | MToolBox | Read re-alignment | WGS or WES | Mitochondrial Variants | [ |
|
| 2014 | LUMPY | Paired-ends, Read depth, Split-reads | PE short read WGS | Structural Variants | [ |
|
| 2016 | Canvas | Read Depth | WGS or WES | Copy Number Variations | [ |
|
| 2016 | Manta | Pair Read, Split Read | PE short read WGS | Indels, Structural Variants | [ |
|
| 2017 | Expansion-Hunter | Sequence-graph | PE short read WGS | Large Expansion of Short Tandem Repeats | [ |
|
| 2017 | DIGTYPER | Breakpoint-Spanning, Split Alignments | PE short read WGS | Inversions, Tandem Duplications | [ |
|
| 2017 | Seeksv | Split Read, Discordant Paired-End, Read Depth, 2 Ends Unmapped | SE/PE 2 short read WGS | Structural Variants + Virus Integration | [ |
|
| 2018 | GangSTR | Enclosing, Fully Repetitive, Spanning and Off-target Fully Repetitive Read Pairs | PE short read WGS | Tandem Repeat expansions | [ |
|
| 2018 | Strelka2 | Mixture-model | PE short read WGS | Single Nucleotide Variants, Indels | [ |
|
| 2018 | Pindel | Split-reads | PE short read WGS | Indels, Structural Variants (small and medium-size) | [ |
|
| 2018 | SvABA | Local assembly | PE short read WGS | Indels, Structural Variants (20–300 bp) | [ |
|
| 2018 | SVE/FusorSV | 8 SV callers combination + Data mining | PE short read WGS | Deletions + Duplications + Inversions 3 | [ |
|
| 2018 | SV2 | Supervised support vector machine classifiers | PE short read WGS | Deletions + Duplications | [ |
1 All tools take BAM files as input. MToolBoxaccepts FASTQ files. Strelka2, SV2, SvABA, ExpansionHunter, Manta also accept CRAM files, SV2 requires SVs to genotype, SNV VCF files and PED files. SVE/FusorSV accepts FASTQ, BAM and VCF files. SvABA also accepts SAM files. 2 PE = Paired-Ended; SE = Single-Ended 3 Other SVs could be explored if they are present in the training dataset.
Figure 3Populations represented in the gnomAD database. An example of various population exomes/genomes aggregated in the most comprehensive database, gnomAD (European populations are depicted in a spectrum of red colors).