Literature DB >> 29514223

Improving the value of public RNA-seq expression data by phenotype prediction.

Shannon E Ellis1,2, Leonardo Collado-Torres2,3, Andrew Jaffe1,2,3,4, Jeffrey T Leek1,2.   

Abstract

Publicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions. We develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70 000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project. We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package and the predictions for recount2 are available from the recount R package. With data and phenotype information available for 70,000 human samples, expression data is available for use on a scale that was not previously feasible.

Entities:  

Mesh:

Year:  2018        PMID: 29514223      PMCID: PMC5961118          DOI: 10.1093/nar/gky102

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  29 in total

1.  Multiple-laboratory comparison of microarray platforms.

Authors:  Rafael A Irizarry; Daniel Warren; Forrest Spencer; Irene F Kim; Shyam Biswal; Bryan C Frank; Edward Gabrielson; Joe G N Garcia; Joel Geoghegan; Gregory Germino; Constance Griffin; Sara C Hilmer; Eric Hoffman; Anne E Jedlicka; Ernest Kawasaki; Francisco Martínez-Murillo; Laura Morsberger; Hannah Lee; David Petersen; John Quackenbush; Alan Scott; Michael Wilson; Yanqin Yang; Shui Qing Ye; Wayne Yu
Journal:  Nat Methods       Date:  2005-04-21       Impact factor: 28.547

2.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

3.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.

Authors:  Martin J Aryee; Andrew E Jaffe; Hector Corrada-Bravo; Christine Ladd-Acosta; Andrew P Feinberg; Kasper D Hansen; Rafael A Irizarry
Journal:  Bioinformatics       Date:  2014-01-28       Impact factor: 6.937

4.  RNA sequencing identifies multiple fusion transcripts, differentially expressed genes, and reduced expression of immune function genes in BRAF (V600E) mutant vs BRAF wild-type papillary thyroid carcinoma.

Authors:  Robert C Smallridge; Ana-Maria Chindris; Yan W Asmann; John D Casler; Daniel J Serie; Honey V Reddi; Kendall W Cradic; Michael Rivera; Stefan K Grebe; Brian M Necela; Norman L Eberhardt; Jennifer M Carr; Bryan McIver; John A Copland; E Aubrey Thompson
Journal:  J Clin Endocrinol Metab       Date:  2013-12-02       Impact factor: 5.958

5.  The sequence read archive.

Authors:  Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

6.  bwtool: a tool for bigWig files.

Authors:  Andy Pohl; Miguel Beato
Journal:  Bioinformatics       Date:  2014-01-30       Impact factor: 6.937

7.  MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.

Authors:  Matthew N Bernstein; AnHai Doan; Colin N Dewey
Journal:  Bioinformatics       Date:  2017-09-15       Impact factor: 6.937

8.  Rail-RNA: scalable analysis of RNA-seq splicing and coverage.

Authors:  Abhinav Nellore; Leonardo Collado-Torres; Andrew E Jaffe; José Alquicira-Hernández; Christopher Wilks; Jacob Pritt; James Morton; Jeffrey T Leek; Ben Langmead
Journal:  Bioinformatics       Date:  2017-12-15       Impact factor: 6.937

9.  DNA methylation arrays as surrogate measures of cell mixture distribution.

Authors:  Eugene Andres Houseman; William P Accomando; Devin C Koestler; Brock C Christensen; Carmen J Marsit; Heather H Nelson; John K Wiencke; Karl T Kelsey
Journal:  BMC Bioinformatics       Date:  2012-05-08       Impact factor: 3.169

10.  RNA sequencing of cancer reveals novel splicing alterations.

Authors:  Jeyanthy Eswaran; Anelia Horvath; Sucheta Godbole; Sirigiri Divijendra Reddy; Prakriti Mudvari; Kazufumi Ohshiro; Dinesh Cyanam; Sujit Nair; Suzanne A W Fuqua; Kornelia Polyak; Liliana D Florea; Rakesh Kumar
Journal:  Sci Rep       Date:  2013       Impact factor: 4.379

View more
  16 in total

1.  Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia.

Authors:  Leonardo Collado-Torres; Emily E Burke; Amy Peterson; JooHeon Shin; Richard E Straub; Anandita Rajpurohit; Stephen A Semick; William S Ulrich; Amanda J Price; Cristian Valencia; Ran Tao; Amy Deep-Soboslay; Thomas M Hyde; Joel E Kleinman; Daniel R Weinberger; Andrew E Jaffe
Journal:  Neuron       Date:  2019-06-04       Impact factor: 17.173

2.  Bias-invariant RNA-sequencing metadata annotation.

Authors:  Hannes Wartmann; Sven Heins; Karin Kloiber; Stefan Bonn
Journal:  Gigascience       Date:  2021-09-22       Impact factor: 6.524

3.  Autosomal sex-associated co-methylated regions predict biological sex from DNA methylation.

Authors:  Evan Gatev; Amy M Inkster; Gian Luca Negri; Chaini Konwar; Alexandre A Lussier; Anne Skakkebaek; Marla B Sokolowski; Claus H Gravholt; Erin C Dunn; Michael S Kobor; Maria J Aristizabal
Journal:  Nucleic Acids Res       Date:  2021-09-20       Impact factor: 16.971

4.  recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor.

Authors:  Leonardo Collado-Torres; Abhinav Nellore; Andrew E Jaffe
Journal:  F1000Res       Date:  2017-08-24

5.  Recounting the FANTOM CAGE-Associated Transcriptome.

Authors:  Eddie Luidy Imada; Diego Fernando Sanchez; Ben Langmead; Luigi Marchionni; Leonardo Collado-Torres; Christopher Wilks; Tejasvi Matam; Wikum Dinalankara; Aleksey Stupnikov; Francisco Lobo-Pereira; Chi-Wai Yip; Kayoko Yasuzawa; Naoto Kondo; Masayoshi Itoh; Harukazu Suzuki; Takeya Kasukawa; Chung-Chau Hon; Michiel J L de Hoon; Jay W Shin; Piero Carninci; Andrew E Jaffe; Jeffrey T Leek; Alexander Favorov; Gloria R Franco
Journal:  Genome Res       Date:  2020-02-20       Impact factor: 9.043

6.  Systematic analysis of the effects of different nitrogen source and ICDH knockout on glycolate synthesis in Escherichia coli.

Authors:  Kangjia Zhu; Guohui Li; Ren Wei; Yin Mao; Yunying Zhao; Aiyong He; Zhonghu Bai; Yu Deng
Journal:  J Biol Eng       Date:  2019-04-04       Impact factor: 4.355

7.  Explainable Deep Learning for Augmentation of Small RNA Expression Profiles.

Authors:  Jelena Fiosina; Maksims Fiosins; Stefan Bonn
Journal:  J Comput Biol       Date:  2019-12-18       Impact factor: 1.479

8.  The impact of sex on gene expression across human tissues.

Authors:  Meritxell Oliva; Manuel Muñoz-Aguirre; Sarah Kim-Hellmuth; Valentin Wucher; Ariel D H Gewirtz; Daniel J Cotter; Princy Parsana; Silva Kasela; Brunilda Balliu; Ana Viñuela; Stephane E Castel; Pejman Mohammadi; François Aguet; Yuxin Zou; Ekaterina A Khramtsova; Andrew D Skol; Diego Garrido-Martín; Ferran Reverter; Andrew Brown; Patrick Evans; Eric R Gamazon; Anthony Payne; Rodrigo Bonazzola; Alvaro N Barbeira; Andrew R Hamel; Angel Martinez-Perez; José Manuel Soria; Brandon L Pierce; Matthew Stephens; Eleazar Eskin; Emmanouil T Dermitzakis; Ayellet V Segrè; Hae Kyung Im; Barbara E Engelhardt; Kristin G Ardlie; Stephen B Montgomery; Alexis J Battle; Tuuli Lappalainen; Roderic Guigó; Barbara E Stranger
Journal:  Science       Date:  2020-09-11       Impact factor: 47.728

Review 9.  Mining data and metadata from the gene expression omnibus.

Authors:  Zichen Wang; Alexander Lachmann; Avi Ma'ayan
Journal:  Biophys Rev       Date:  2018-12-29

10.  Maximizing the reusability of gene expression data by predicting missing metadata.

Authors:  Pei-Yau Lung; Dongrui Zhong; Xiaodong Pang; Yan Li; Jinfeng Zhang
Journal:  PLoS Comput Biol       Date:  2020-11-06       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.