Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis.

Literature DB >> 31289834

A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis.

Eunjee Lee^1,2,3, Seungyeul Yoo^1,2, Wenhui Wang^1,2, Zhidong Tu^1,2, Jun Zhu^1,2,3,4.

Abstract

BACKGROUND: Data errors, including sample swapping and mis-labeling, are inevitable in the process of large-scale omics data generation. Data errors need to be identified and corrected before integrative data analyses where different types of data are merged on the basis of the annotated labels. Data with labeling errors dampen true biological signals. More importantly, data analysis with sample errors could lead to wrong scientific conclusions. We developed a robust probabilistic multi-omics data matching procedure, proMODMatcher, to curate data and identify and correct data annotation and errors in large databases.
RESULTS: Application to simulated datasets suggests that proMODMatcher achieved robust statistical power even when the number of cis-associations was small and/or the number of samples was large. Application of our proMODMatcher to multi-omics datasets in The Cancer Genome Atlas and International Cancer Genome Consortium identified sample errors in multiple cancer datasets. Our procedure was not only able to identify sample-labeling errors but also to unambiguously identify the source of the errors. Our results demonstrate that these errors should be identified and corrected before integrative analysis.
CONCLUSIONS: Our results indicate that sample-labeling errors were common in large multi-omics datasets. These errors should be corrected before integrative analysis.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: data curation; data error; omics data integration

Mesh：

Year: 2019 PMID： 31289834 PMCID： PMC6615984 DOI： 10.1093/gigascience/giz080

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

17 in total

1. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.

Authors: B M Bolstad; R A Irizarry; M Astrand; T P Speed
Journal: Bioinformatics Date: 2003-01-22 Impact factor: 6.937

2. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes.

Authors: Scott Baskerville; David P Bartel
Journal: RNA Date: 2005-03 Impact factor: 4.942

3. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects.

Authors: Harm-Jan Westra; Ritsert C Jansen; Rudolf S N Fehrmann; Gerard J te Meerman; David van Heel; Cisca Wijmenga; Lude Franke
Journal: Bioinformatics Date: 2011-06-07 Impact factor: 6.937

Review 4. Global signatures of protein and mRNA expression levels.

Authors: Raquel de Sousa Abreu; Luiz O Penalva; Edward M Marcotte; Christine Vogel
Journal: Mol Biosyst Date: 2009-10-01

5. Genetic dissection of transcriptional regulation in budding yeast.

Authors: Rachel B Brem; Gaël Yvert; Rebecca Clinton; Leonid Kruglyak
Journal: Science Date: 2002-03-28 Impact factor: 47.728

6. Variations in DNA elucidate molecular networks that cause disease.

Authors: Yanqing Chen; Jun Zhu; Pek Yee Lum; Xia Yang; Shirly Pinto; Douglas J MacNeil; Chunsheng Zhang; John Lamb; Stephen Edwards; Solveig K Sieberts; Amy Leonardson; Lawrence W Castellini; Susanna Wang; Marie-France Champy; Bin Zhang; Valur Emilsson; Sudheer Doss; Anatole Ghazalpour; Steve Horvath; Thomas A Drake; Aldons J Lusis; Eric E Schadt
Journal: Nature Date: 2008-03-16 Impact factor: 49.962

7. Liver and adipose expression associated SNPs are enriched for association to type 2 diabetes.

Authors: Hua Zhong; John Beaulaurier; Pek Yee Lum; Cliona Molony; Xia Yang; Douglas J Macneil; Drew T Weingarth; Bin Zhang; Danielle Greenawalt; Radu Dobrin; Ke Hao; Sangsoon Woo; Christine Fabre-Suver; Su Qian; Michael R Tota; Mark P Keller; Christina M Kendziorski; Brian S Yandell; Victor Castro; Alan D Attie; Lee M Kaplan; Eric E Schadt
Journal: PLoS Genet Date: 2010-05-06 Impact factor: 5.917

8. An integration of genome-wide association study and gene expression profiling to prioritize the discovery of novel susceptibility Loci for osteoporosis-related traits.

Authors: Yi-Hsiang Hsu; M Carola Zillikens; Scott G Wilson; Charles R Farber; Serkalem Demissie; Nicole Soranzo; Estelle N Bianchi; Elin Grundberg; Liming Liang; J Brent Richards; Karol Estrada; Yanhua Zhou; Atila van Nas; Miriam F Moffatt; Guangju Zhai; Albert Hofman; Joyce B van Meurs; Huibert A P Pols; Roger I Price; Olle Nilsson; Tomi Pastinen; L Adrienne Cupples; Aldons J Lusis; Eric E Schadt; Serge Ferrari; André G Uitterlinden; Fernando Rivadeneira; Timothy D Spector; David Karasik; Douglas P Kiel
Journal: PLoS Genet Date: 2010-06-10 Impact factor: 5.917

9. MODMatcher: multi-omics data matcher for integrative genomic analysis.

Authors: Seungyeul Yoo; Tao Huang; Joshua D Campbell; Eunjee Lee; Zhidong Tu; Mark W Geraci; Charles A Powell; Eric E Schadt; Avrum Spira; Jun Zhu
Journal: PLoS Comput Biol Date: 2014-08-14 Impact factor: 4.475

10. Comprehensive molecular portraits of human breast tumours.

Authors:
Journal: Nature Date: 2012-09-23 Impact factor: 49.962

3 in total

1. Data Sanitization to Reduce Private Information Leakage from Functional Genomics.

Authors: Gamze Gürsoy; Prashant Emani; Charlotte M Brannon; Otto A Jolanki; Arif Harmanci; J Seth Strattan; J Michael Cherry; Andrew D Miranker; Mark Gerstein
Journal: Cell Date: 2020-11-12 Impact factor: 41.582

2. A community effort to identify and correct mislabeled samples in proteogenomic studies.

Authors: Seungyeul Yoo; Zhiao Shi; Bo Wen; SoonJye Kho; Renke Pan; Hanying Feng; Hong Chen; Anders Carlsson; Patrik Edén; Weiping Ma; Michael Raymer; Ezekiel J Maier; Zivana Tezak; Elaine Johanson; Denise Hinton; Henry Rodriguez; Jun Zhu; Emily Boja; Pei Wang; Bing Zhang
Journal: Patterns (N Y) Date: 2021-05-07

3. A Network Analysis of Multiple Myeloma Related Gene Signatures.

Authors: Yu Liu; Haocheng Yu; Seungyeul Yoo; Eunjee Lee; Alessandro Laganà; Samir Parekh; Eric E Schadt; Li Wang; Jun Zhu
Journal: Cancers (Basel) Date: 2019-09-27 Impact factor: 6.639

3 in total