Literature DB >> 32330225

Predicting target genes of non-coding regulatory variants with IRT.

Zhenqin Wu1,2, Nilah M Ioannidis2, James Zou2,3.   

Abstract

SUMMARY: Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.
AVAILABILITY AND IMPLEMENTATION: Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2020        PMID: 32330225      PMCID: PMC7575052          DOI: 10.1093/bioinformatics/btaa254

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  43 in total

1.  Determinants of CpG islands: expression in early embryo and isochore structure.

Authors:  L Ponger; L Duret; D Mouchiroud
Journal:  Genome Res       Date:  2001-11       Impact factor: 9.043

2.  Permutation importance: a corrected feature importance measure.

Authors:  André Altmann; Laura Toloşi; Oliver Sander; Thomas Lengauer
Journal:  Bioinformatics       Date:  2010-04-12       Impact factor: 6.937

Review 3.  Chromatin-state discovery and genome annotation with ChromHMM.

Authors:  Jason Ernst; Manolis Kellis
Journal:  Nat Protoc       Date:  2017-11-09       Impact factor: 13.491

4.  Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin.

Authors:  Sean Whalen; Rebecca M Truty; Katherine S Pollard
Journal:  Nat Genet       Date:  2016-04-04       Impact factor: 38.330

5.  Identification of Susceptibility Loci for Cutaneous Squamous Cell Carcinoma.

Authors:  Maryam M Asgari; Wei Wang; Nilah M Ioannidis; Jacqueline Itnyre; Thomas Hoffmann; Eric Jorgenson; Alice S Whittemore
Journal:  J Invest Dermatol       Date:  2016-01-29       Impact factor: 8.551

6.  FIRE: functional inference of genetic variants that regulate gene expression.

Authors:  Nilah M Ioannidis; Joe R Davis; Marianne K DeGorter; Nicholas B Larson; Shannon K McDonnell; Amy J French; Alexis J Battle; Trevor J Hastie; Stephen N Thibodeau; Stephen B Montgomery; Carlos D Bustamante; Weiva Sieh; Alice S Whittemore
Journal:  Bioinformatics       Date:  2017-12-15       Impact factor: 6.937

7.  Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants.

Authors:  Xin Li; Alexis Battle; Konrad J Karczewski; Zach Zappala; David A Knowles; Kevin S Smith; Kim R Kukurba; Eric Wu; Noah Simon; Stephen B Montgomery
Journal:  Am J Hum Genet       Date:  2014-09-04       Impact factor: 11.025

8.  ORegAnno 3.0: a community-driven resource for curated regulatory annotation.

Authors:  Robert Lesurf; Kelsy C Cotto; Grace Wang; Malachi Griffith; Katayoon Kasaian; Steven J M Jones; Stephen B Montgomery; Obi L Griffith
Journal:  Nucleic Acids Res       Date:  2015-11-17       Impact factor: 16.971

9.  Genome-wide association study identifies novel susceptibility loci for cutaneous squamous cell carcinoma.

Authors:  Harvind S Chahal; Yuan Lin; Katherine J Ransohoff; David A Hinds; Wenting Wu; Hong-Ji Dai; Abrar A Qureshi; Wen-Qing Li; Peter Kraft; Jean Y Tang; Jiali Han; Kavita Y Sarin
Journal:  Nat Commun       Date:  2016-07-18       Impact factor: 14.919

10.  Genetic effects on gene expression across human tissues.

Authors:  Alexis Battle; Christopher D Brown; Barbara E Engelhardt; Stephen B Montgomery
Journal:  Nature       Date:  2017-10-11       Impact factor: 49.962

View more
  2 in total

Review 1.  Non-coding regulatory elements: Potential roles in disease and the case of epilepsy.

Authors:  Susanna Pagni; James D Mills; Adam Frankish; Jonathan M Mudge; Sanjay M Sisodiya
Journal:  Neuropathol Appl Neurobiol       Date:  2021-12-16       Impact factor: 6.250

2.  GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data.

Authors:  Edoardo Giacopuzzi; Niko Popitsch; Jenny C Taylor
Journal:  Nucleic Acids Res       Date:  2022-03-21       Impact factor: 16.971

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.