Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Bias-invariant RNA-sequencing metadata annotation.

Literature DB >> 34553213

Bias-invariant RNA-sequencing metadata annotation.

Hannes Wartmann¹, Sven Heins¹, Karin Kloiber¹, Stefan Bonn¹.

Abstract

BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.
FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.
CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.

Entities: Chemical

Keywords: RNA-seq metadata; automated annotation; bias invariance; deep learning; computational biology; bioinformatics; data reusability; domain adaptation; machine learning

Mesh：

Substances：
RNA

Year: 2021 PMID： 34553213 PMCID： PMC8559615 DOI： 10.1093/gigascience/giab064

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

Keyword Cloud
References

29 in total

1. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

Authors: Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen
Journal: Nat Biotechnol Date: 2013-09-15 Impact factor: 54.908

Review 2. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

Review 3. Coming of age: ten years of next-generation sequencing technologies.

Authors: Sara Goodwin; John D McPherson; W Richard McCombie
Journal: Nat Rev Genet Date: 2016-05-17 Impact factor: 53.242

4. Overcoming bias and systematic errors in next generation sequencing data.

Authors: Margaret A Taub; Hector Corrada Bravo; Rafael A Irizarry
Journal: Genome Med Date: 2010-12-10 Impact factor: 11.117

5. The sequence read archive.

Authors: Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal: Nucleic Acids Res Date: 2010-11-09 Impact factor: 16.971

6. Big Data: Astronomical or Genomical?

Authors: Zachary D Stephens; Skylar Y Lee; Faraz Faghri; Roy H Campbell; Chengxiang Zhai; Miles J Efron; Ravishankar Iyer; Michael C Schatz; Saurabh Sinha; Gene E Robinson
Journal: PLoS Biol Date: 2015-07-07 Impact factor: 8.029

7. A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data.

Authors: Tianyu Kang; Wei Ding; Luoyan Zhang; Daniel Ziemek; Kourosh Zarringhalam
Journal: BMC Bioinformatics Date: 2017-12-19 Impact factor: 3.169

8. CellO: comprehensive and hierarchical cell type classification of human cells with the Cell Ontology.

Authors: Matthew N Bernstein; Zhongjie Ma; Michael Gleicher; Colin N Dewey
Journal: iScience Date: 2020-12-08

9. Ontology-based annotations and semantic relations in large-scale (epi)genomics data.

Authors: Eugenia Galeota; Mattia Pelizzola
Journal: Brief Bioinform Date: 2017-05-01 Impact factor: 11.622

10. Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data.

Authors: Aaron M Smith; Jonathan R Walsh; John Long; Craig B Davis; Peter Henstock; Martin R Hodge; Mateusz Maciejewski; Xinmeng Jasmine Mu; Stephen Ra; Shanrong Zhao; Daniel Ziemek; Charles K Fisher
Journal: BMC Bioinformatics Date: 2020-03-20 Impact factor: 3.169