| Literature DB >> 35463004 |
Matt A Field1,2,3.
Abstract
Precision medicine programs to identify clinically relevant genetic variation have been revolutionized by access to increasingly affordable high-throughput sequencing technologies. A decade of continual drops in per-base sequencing costs means it is now feasible to sequence an individual patient genome and interrogate all classes of genetic variation for < $1,000 USD. However, while advances in these technologies have greatly simplified the ability to obtain patient sequence information, the timely analysis and interpretation of variant information remains a challenge for the rollout of large-scale precision medicine programs. This review will examine the challenges and potential solutions that exist in identifying predictive genetic biomarkers and pharmacogenetic variants in a patient and discuss the larger bioinformatic challenges likely to emerge in the future. It will examine how both software and hardware development are aiming to overcome issues in short read mapping, variant detection and variant interpretation. It will discuss the current state of the art for genetic disease and the remaining challenges to overcome for complex disease. Success across all types of disease will require novel statistical models and software in order to ensure precision medicine programs realize their full potential now and into the future.Entities:
Keywords: FPGA—field-programmable gate array; GPU-accelerated; high-throughput sequencing; pathogenic variant; precision medicine; variant detection; variant prioritization
Year: 2022 PMID: 35463004 PMCID: PMC9024231 DOI: 10.3389/fmed.2022.806696
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Resources for variant detection in precision medicine programs.
|
|
|
|
|---|---|---|
| dbSNP ( | Population level variation |
|
| gnomAD ( | Population level variation |
|
| 1000 Genomes Phase 3 ( | Population level variation |
|
| Database of Genomic Variants ( | Population level variation |
|
| Variant Effect Predictor ( | Variant annotation |
|
| dbNFSP ( | Variant annotation |
|
| AnnoVar ( | Variant annotation |
|
| ClinVar ( | Clinical annotation |
|
| LOVD ( | Clinical annotation |
|
| PolyPhen2 ( | Functional impact |
|
| SIFT ( | Functional impact |
|
| CADD ( | Functional impact |
|
| GTEx ( | Gene expression |
|
| Multi-symbol checker ( | Gene naming |
|
| OMIM ( | Gene / disease annotation |
|
Figure 1Software and hardware-based strategies being employed to address bioinformatic bottlenecks in large scale precision medicine programs.
Software based solutions.
|
|
|
|
|---|---|---|
| Algorithm development | – Develop novel approaches | – Requires community uptake |
| – Existing suite of tools available for benchmarking | – Challenging to significantly change existing workflows | |
| Algorithm optimization | – Quicker to improve existing algorithms | – Gains are often minimal if software well-designed initially |
| – Simple to benchmark versus previous releases | – Any changes in expected output requires verification | |
| Job partitioning | – Increases parallelization and reduces serial run time | – Splitting and combining results adds software complexity |
| Standardized file formats | – Standardized formats allows easy algorithm benchmarking | – No flexibility for new data types or information |
Hardware based solutions.
|
|
|
|
|---|---|---|
| Compute cluster | – Low cost entry | – Controller is single point of failure |
| – Uses commodity hardware | – Technical expertise required | |
| Cloud compute | – Highly scalable | – Data transfer and cost |
| – No local installation | – Privacy concerns for sensitive data | |
| FPGA | – Direct hardware / software link | – Challenging to program/re-program |
| – Relatively low cost | – Integration requires technical expertise | |
| GPU | – Cheaper than CPUs | – Chipset specific coding required |
| – High parallelization possible | – Higher power usage than FPGAs |
Strategies for variant prioritization.
|
|
|
|
|---|---|---|
| Consensus-approach running | – Minimize algorithm biases | – Adds computational complexity |
| multiple algorithms | – Reduce specificity or sensitivity by taking intersection or union | – Longer run time |
| Stratify by impact on genes | – Prioritize disease enriched variant | – Changes reported relevant to specific version of gene model |
| sets (e.g., missense or splice-site variants) | – Multiple isoforms often available | |
| Functional inference prediction | – Prioritize mutations likely to disrupt protein | – Tools have known high false positive rates |
| software | ||
| Overlap population-level | – Allows filtering of common population-level variation | – Contains errors and incomplete records due to lack of curation |
| variant databases | ||
| Overlap disease-specific | – Identify variants or genes previously implicated in disease | – Large numbers of non-causal variants often included |
| databases | ||
| Pedigree sequencing | – Generate pedigree-wide annotation (disease inheritance | – Obtaining samples for larger family |
| compound heterozygosity, etc) | ||
| Paired cancer sequencing | – Matched tumor/normal samples can detect somatic variation | – Sample purity |
| – Tumor heterogeneity |