| Literature DB >> 35207663 |
Yung-Chun Wang1, Yuchang Wu2, Julie Choi1, Garrett Allington3,4, Shujuan Zhao1, Mariam Khanfar1, Kuangying Yang1, Po-Ying Fu1, Max Wrubel1, Xiaobing Yu1,5, Kedous Y Mekbib6, Jack Ocken6, Hannah Smith4,6, John Shohfi6, Kristopher T Kahle4,7,8,9, Qiongshi Lu2, Sheng Chih Jin1,10.
Abstract
Rapid methodological advances in statistical and computational genomics have enabled researchers to better identify and interpret both rare and common variants responsible for complex human diseases. As we continue to see an expansion of these advances in the field, it is now imperative for researchers to understand the resources and methodologies available for various data types and study designs. In this review, we provide an overview of recent methods for identifying rare and common variants and understanding their roles in disease etiology. Additionally, we discuss the strategy, challenge, and promise of gene therapy. As computational and statistical approaches continue to improve, we will have an opportunity to translate human genetic findings into personalized health care.Entities:
Keywords: bioinformatics; common variant; gene therapy; genomics; precision medicine; rare variant; statistical genetics
Year: 2022 PMID: 35207663 PMCID: PMC8878256 DOI: 10.3390/jpm12020175
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Overview of base pairs-to-bedside approach. Advances in genomic analysis, precision medicine, and gene therapy allow for the genetic evaluation of sporadic and inherited variants in families and large cohorts. Further elucidation of genetic etiology and disease pathomechanisms through genomic and integrative multi-omics studies then catalyze the production of new therapeutic options such as gene therapy for patient care.
Statistical approaches for population-based or family-based rare variant analyses.
| Type | Methods | Strengths | Weaknesses | Ref. |
|---|---|---|---|---|
| Rare | Combined Multivariate and Collapsing (CMC) test |
More powerful and robust for analyzing a set of rare variants than testing each variant individually |
Reduced power when the grouped variants have effects in opposite directions | [ |
| Variable Threshold (VT) |
Makes no assumption about the causal variant’s allele frequency Boosts power using functional annotations that give higher weights to functional variants |
Reduced power when the set of variants grouped together have effects in opposite directions High computational burden for permutation test | [ | |
| Sequence kernel association test (SKAT) |
Considers rare variants with opposite effect directions Test statistics have a closed form approximation for their null distribution Computationally efficient Can adjust for covariates |
Less powerful when causal variants have the same effect direction | [ | |
| Cohort allelic sums test (CAST) |
More powerful and robust for analyzing a set of rare variants than testing each variant individually |
Reduced power when the grouped variants have effects in opposite directions | [ | |
| Weighted sum test (WST) |
Can account for linkage disequilibrium (LD) between variants |
Lower statistical power given few causal variants within a gene | [ | |
| Kernel-based adaptive clustering method (KBAC) |
Has higher statistical power in the presence of variant interaction |
No closed form null distribution for test statistics High computational burden | [ | |
| Versatile gene-based association study (VEGAS) |
Only uses summary statistics as input Can account for LD between variants |
Less powerful for detecting a large gene with many typed non- causal variants High computational burden | [ | |
| Gene-based association test that uses extended Simes procedure (GATES) |
Only uses summary statistics as input Can account for LD between variants Variants can have opposite effect directions Computationally efficient |
Designed for genome-wide association studies (GWAS) and has lower power in rare variant analysis | [ | |
| Multivariate Association Analysis using Score Statistics (MAAUSS) |
Leverages multiple phenotypes to improve statistical power |
High computational burden | [ | |
| Multi-trait analysis of rare-variant associations (MTAR) |
Improved statistical power in multi-trait multi-variant association analysis Only uses summary statistics as input |
Relies on a concordant common and rare variant genetic correlation between traits | [ | |
| De novo variants analysis | DeNovoWEST |
Estimates positive predictive values of each DNV being pathogenic Incorporates a gene-based weighting strategy |
Limited to exome | [ |
| Chimpanzee–human divergence model |
Estimates the relative locus-specific rates of DNVs |
Can only be applied to a selected candidate gene set | [ | |
| denovolyzeR |
Adjusts for sequence depth and the divergences based on human–chimp differences Does not require any control samples for comparison |
Relies on a pre-computed tabulation of the probability of DNVs arising in each gene Limited to exome | [ | |
| Autosomal recessive variant analysis | Resampling-based statistical framework |
Leverages trio data to compare the observed number of recessive genotypes with the empirically estimated counts under the null Accounts for confounding due to population stratification and consanguinity |
Limited to exome Strong assumption that all subjects’ genotypes are independent | [ |
| Sampling the observed genotypes and phenotypes by chance |
Incorporates the probabilities of sampling the observed genotypes and phenotypes by chance Incorporates the phenotypic similarity of patients with the same recessive candidate gene Corrects for gene-specific levels of autozygosity Takes account of population structure |
Limited to exome Requires systematic genotype and phenotype data on a known number of families Difficult to perform when recording of phenotype terms is incomplete and inconsistent | [ | |
| The phased haplotypes-based framework |
Uses the phased haplotypes from unaffected parents to estimate the expected number of biallelic genotypes in affected probands Accounts for the fact that some fraction of the variants expected by chance are actually causal |
Limited to exome Strong assumption that all subjects’ genotypes are independent Strong assumption of full penetrance of all genotypes | [ | |
| Joint analysis of transmitted variants and DNVs | Transmission and de novo association test (TADA), extTADA |
TADA is the first method developed to jointly model de novo and transmitted mutations by a hierarchical Bayesian modeling framework extTADA performs a Markov chain Monte Carlo for the Bayesian analysis |
Both are limited to exome Both cannot incorporate recessive genotypes and model across disease traits | [ |
| TADA-Annotations (TADA-A) |
Can combine information on all DNVs in both coding and nearby non-coding regions across studies |
Cannot incorporate transmitted variants | [ | |
| TADA-Recessive (TADA-R) |
Can integrate signals from DNVs, transmitted dominant, and transmitted recessive variants |
Limited to exome | [ | |
| Multi-trait TADA (M-TADA) |
Can jointly analyze DNVs from multiple traits |
Limited to exome Cannot incorporate transmitted variants Can only perform pair-wise comparison | [ | |
| X-linked variant analysis | Various XCI modes integrated statistical approach |
Considers all X-linked processes (random, skewed, and escaped XCI) Performs a permutation-based procedure to assess the significance with well-controlled type I error rate |
Has lower power in the random or escaped XCI test Cannot provide accurate effect size estimate in the escaped XCI model | [ |
| 1 and 2 degree-of-freedom tests for association |
Easy to implement using the contingency table approach |
False assumption of equal phenotypic effects between males’ hemizygotes and females’ homozygotes Does not consider nonrandom XCI and escape from XCI | [ | |
| Distinct XCI processes combined using a modified Fisher’s method |
Considers all X-linked processes (random, skewed, and escaped XCI) Is the most statistically efficient and not sensitive to the unknown biological models |
Strong assumption that all subjects’ genotypes are independent Cannot adjust for covariates | [ | |
| Sex-specific burden analyses |
Can estimate the fraction of probands attributable to rare X-linked variants |
Strong assumption of a monogenic model with full penetrance Wide confidence intervals for several key parameters | [ | |
| Digenic variant analysis | The genetic linkage method |
Takes account of phenocopies and reduced penetrance Able to deal with allelic heterogeneity Able to identify rare alleles that are present in small numbers of families |
Requires pedigrees of related individuals (and parents’ samples) Not suitable for common or complex-trait diseases Unable to deal with high dimensional data and non-linear regression tests | [ |
| The candidate gene approach |
Useful as the first step in exploring known pathways in complex diseases Offers high statistical power and is computationally efficient |
Subjective in the process of choosing specific candidate genes Lack of replication studies Relies on prior hypotheses about disease mechanisms Unable to deal with high dimensional data and non-linear regression tests | [ | |
| Case-only study design |
No need for control recruitment Improved statistical power compared to the case–control design Less multiple-testing correction |
Potential increase in type I error rate if the independence assumption is violated Unable to deal with high dimensional data and non-linear regression tests | [ | |
| Random forests |
Broad applications in data mining and machine learning Flexible and powerful statistical learning tools for analysis Relatively fast and can handle big GWAS |
Sensitive to insufficient training data, confounding effects, reproducibility, and accessibility Potential slow-performing algorithm when dealing with large data set Requires much computational power and resources | [ |
Association analysis methods are ordered and grouped by different types of genetic variants. Each method for certain types of genetic variants is listed in middle column. The references are indicated in the last column.
Commercially Available Gene Therapies in the U.S. in Alphabetical Order (2021) [132].
| Name | Manufacturer | Target Disease | Gene of Interest | FDA |
|---|---|---|---|---|
| Abecma | Celgene | Relapsed or refractory multiple myeloma | BCMA | March 2021 [ |
| Breyanzi | Juno Therapeutics | Relapsed or refractory large B-cell lymphoma | CD137 (4-1BB TNF- | February 2021 [ |
| Imlygic ( | BioVex | Melanoma (unresectable cutaneous, subcutaneous, and nodal lesions) | GM-CSF (immune stimulatory protein) | October 2015 [ |
| Kymriah | Novartis | Pediatric B-cell precursor acute lymphoblastic | CD137 (4-1BB TNF- | August 2017 [ |
| Relapsed or refractory large B-cell lymphoma in adult | CD137 (4-1BB TNF- | May 2018 [ | ||
| Luxturna | Spark | Retinal dystrophy (biallelic RPE65 mutation- | RPE65 (human retinal pigment epithelial 65 kDa protein) | December 2017 [ |
| Provenge | Dendreon | Asymptomatic or minimally symptomatic metastatic castration-resistant prostate | ACP3 | April 2010 [ |
| Tecartus | Kite Pharma | Relapsed or refractory mantle cell lymphoma (MCL) in adult | CD28 and CD3-zeta | July 2020 [ |
| Relapsed or refractory | CD28 and CD3-zeta | October 2021 [ | ||
| Yescarta | Kite Pharma | Relapsed or refractory large B-cell lymphoma | CD28 and CD3-zeta | October 2017 [ |
| Relapsed or refractory | CD28 and CD3-zeta | March 2021 [ | ||
| Zolgensma ( | Novartis Gene | Spinal muscular atrophy (Type I) | SMN1 (human | May 2019 [ |
Licensed gene therapies in the U.S. approved by the Office of Tissues and Advanced Therapies (OTAT) as of 26 October 2021. Name = trade name (proper name); Manufacturer = name of pharmaceutical / biotechnology company licensed; Target Disease = FDA approved indication(s) excluding disease state(s) in ongoing clinical trials; Gene of Interest = biological/therapy target (and encoded protein if applicable); FDA approval date = indication license date based on FDA approval letters.