Literature DB >> 34553213

Bias-invariant RNA-sequencing metadata annotation.

Hannes Wartmann1, Sven Heins1, Karin Kloiber1, Stefan Bonn1.   

Abstract

BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.
FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.
CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.
© The Author(s) 2021. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  RNA-seq metadata; automated annotation; bias invariance; deep learning; computational biology; bioinformatics; data reusability; domain adaptation; machine learning

Mesh:

Substances:

Year:  2021        PMID: 34553213      PMCID: PMC8559615          DOI: 10.1093/gigascience/giab064

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  29 in total

1.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

Authors:  Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen
Journal:  Nat Biotechnol       Date:  2013-09-15       Impact factor: 54.908

Review 2.  Deep learning.

Authors:  Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

Review 3.  Coming of age: ten years of next-generation sequencing technologies.

Authors:  Sara Goodwin; John D McPherson; W Richard McCombie
Journal:  Nat Rev Genet       Date:  2016-05-17       Impact factor: 53.242

4.  Overcoming bias and systematic errors in next generation sequencing data.

Authors:  Margaret A Taub; Hector Corrada Bravo; Rafael A Irizarry
Journal:  Genome Med       Date:  2010-12-10       Impact factor: 11.117

5.  The sequence read archive.

Authors:  Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

6.  Big Data: Astronomical or Genomical?

Authors:  Zachary D Stephens; Skylar Y Lee; Faraz Faghri; Roy H Campbell; Chengxiang Zhai; Miles J Efron; Ravishankar Iyer; Michael C Schatz; Saurabh Sinha; Gene E Robinson
Journal:  PLoS Biol       Date:  2015-07-07       Impact factor: 8.029

7.  A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data.

Authors:  Tianyu Kang; Wei Ding; Luoyan Zhang; Daniel Ziemek; Kourosh Zarringhalam
Journal:  BMC Bioinformatics       Date:  2017-12-19       Impact factor: 3.169

8.  CellO: comprehensive and hierarchical cell type classification of human cells with the Cell Ontology.

Authors:  Matthew N Bernstein; Zhongjie Ma; Michael Gleicher; Colin N Dewey
Journal:  iScience       Date:  2020-12-08

9.  Ontology-based annotations and semantic relations in large-scale (epi)genomics data.

Authors:  Eugenia Galeota; Mattia Pelizzola
Journal:  Brief Bioinform       Date:  2017-05-01       Impact factor: 11.622

10.  Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data.

Authors:  Aaron M Smith; Jonathan R Walsh; John Long; Craig B Davis; Peter Henstock; Martin R Hodge; Mateusz Maciejewski; Xinmeng Jasmine Mu; Stephen Ra; Shanrong Zhao; Daniel Ziemek; Charles K Fisher
Journal:  BMC Bioinformatics       Date:  2020-03-20       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.