| Literature DB >> 24659105 |
Sam Buckberry1, Stephen J Bent1, Tina Bianco-Miotto2, Claire T Roberts1.
Abstract
UNLABELLED: High-throughput gene expression microarrays are currently the most efficient method for transcriptome-wide expression analyses. Consequently, gene expression data available through public repositories have largely been obtained from microarray experiments. However, the metadata associated with many publicly available expression microarray datasets often lacks sample sex information, therefore limiting the reuse of these data in new analyses or larger meta-analyses where the effect of sex is to be considered. Here, we present the massiR package, which provides a method for researchers to predict the sex of samples in microarray datasets. Using information from microarray probes representing Y chromosome genes, this package implements unsupervised clustering methods to classify samples into male and female groups, providing an efficient way to identify or confirm the sex of samples in mammalian microarray datasets.Entities:
Mesh:
Year: 2014 PMID: 24659105 PMCID: PMC4080740 DOI: 10.1093/bioinformatics/btu161
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Sex prediction accuracy of the massiR package using human gene expression datasets with a range of male/female ratios. The correct sex prediction rate is 97.2% (±1.2 SEM) for datasets with >15 and <85% males, which is the area between the vertical dotted lines. Points represent mean, and vertical bars show the standard error of the mean. The gray band at the top of the plot shows the 95–100% range. These results are a summary of tests conducted using publicly available expression data from human brain, colorectal, kidney and placenta tissue and peripheral blood mononuclear cells. The data subsets for each were generated by randomly selecting male and female samples for predetermined dataset sizes and sex ratios