Literature DB >> 26858976

Detection bias in microarray and sequencing transcriptomic analysis identified by housekeeping genes.

Yijuan Zhang1, Oluwafemi S Akintola1, Ken J A Liu2, Bingyun Sun3.   

Abstract

This work includes the original data used to discover the gene ontology bias in transcriptomic analysis conducted by microarray and high throughput sequencing (Zhang et al., 2015) [1]. In the analysis, housekeeping genes were used to examine the differential detection ability by microarray and sequencing because these genes are probably the most reliably detected. The genes included here were compiled from 15 human housekeeping gene studies. The provided tables here comprise of detailed chromosomal location, detection breadth, normalized expression level, exon count, total exon length, and total intron length of each concerned gene and their related transcripts. We hope this information can help researchers better understand the differences in gene ontology-bias we discussed (Zhang et al., 2015) [1] and can encourage further improvement on these two technology platforms.

Entities:  

Keywords:  Housekeeping genes; Microarray; Next-generation sequencing; RNA-seq; Sequencing; Transcriptome

Year:  2015        PMID: 26858976      PMCID: PMC4706559          DOI: 10.1016/j.dib.2015.11.045

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data Housekeeping genes are the most reliably detected genes in high throughput fashion that have the least detection errors for examining differences in analysis. The detailed value of all concerned factors including the chromosomal location, the exon count, total exon length, total intron length, normalized expression value, detection breadth are provided here in a per gene or per transcript basis such that the data can be further queried or analyzed. The information included here should also help further improvement on these two popular technology platforms.

Data

Table S1, chromosomal location of housekeeping (HK) genes exclusively detected by MA alone, sequencing alone, as well as jointly. Table S2, exon count, total exon length, total intron length, and GC content of HK genes exclusively detected by MA alone, sequencing alone, as well as jointly. Table S3, detection breadth and the normalized maximum expression quantity of each HK gene exclusively detected by MA alone, sequencing alone, as well as jointly.

Experimental design, materials and methods

The data included here were downloaded from 15 published human housekeeping studies, i.e. Warrington [2], Hsiao [3], Eisenberg_03 [4], Tu [5], Dezso [6], She [7], Chang [8], Shyamsundar [9], Zhu_MA, Zhu_EST [10], Podder [11], Reverter [12], Ramskold [13], Eisenberg_13 [14] and Fagerberg [15], in which nine studies used microarray (MA) analysis, i.e. Warrington [2], Hsiao [3], Eisenberg_03 [4], Tu [5], Dezso [6], She [7], Chang [8], Shyamsundar [9], Zhu_MA, and the rest used sequencing analysis. The gene identifiers used in different studies were first converted to entrez gene ID using Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 (http://david.abcc.ncifcrf.gov/) [16], [17] as detailed in [1], [18]. The chromosomal location was queried against National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov). Genes with unknown genome locations were removed. The obtained entrez gene list was further converted to Refseq mRNA IDs using DAVID, and the Refgene information on exon count, exon starting and ending position as well as the coding sequences were obtained by querying the Refgene information from University of California, Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/index.html) against the latest human genome assembly (GRCh38) [19]. The total intron length was calculated by the total gene length minus total exon length. The GC content was deduced by the coding sequence only. Again transcripts could not be mapped to Refgene in UCSC database, and those without exon count or exon starting or ending information as well as sequencing information, were removed from the table. The expression quantity was collected from Chang [8], Eisenberg_03 [4], She [7], Warrington [2], Shyamsundar [9] and Fagerberg [15]. The raw expression quantity was first normalized against the maximum value in each individual list to make them comparable. For entrez genes having multiple quantification values in a single list (for example in cases where a single entrez gene ID was mapped to several IDs, each IDs in that particular study had an expression value), the maximum normalized expression value was used. The detective breadth (DB) [1], [18] described the number of studies, in which a HK gene had been identified. For example, if a gene was detected in 8 out of 9 MA studies, its DB value would be 8, and similarly if a gene was detected in 5 out of 6 sequencing studies, its DB value would be 5.
Subject areaBiology
More specific subject areaTranscriptomics
Type of dataExcel table
How data was acquiredMicroarray and sequencing
Data formatDownloaded from public domain, compiled and analyzed
Experimental factorsGene identifier was unified
Experimental featuresAnalysis of gene chromosomal location, gene structure, and gene expression
Data source location
Data accessibilityData is with the article
  19 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

2.  Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes.

Authors:  J A Warrington; A Nair; M Mahadevappa; M Tsyganskaya
Journal:  Physiol Genomics       Date:  2000-04-27       Impact factor: 3.107

3.  Human housekeeping genes are compact.

Authors:  Eli Eisenberg; Erez Y Levanon
Journal:  Trends Genet       Date:  2003-07       Impact factor: 11.639

4.  Membrane gene ontology bias in sequencing and microarray obtained by housekeeping-gene analysis.

Authors:  Yijuan Zhang; Oluwafemi S Akintola; Ken J A Liu; Bingyun Sun
Journal:  Gene       Date:  2015-09-25       Impact factor: 3.688

5.  Exploring the differences in evolutionary rates between monogenic and polygenic disease genes in human.

Authors:  Soumita Podder; Tapash C Ghosh
Journal:  Mol Biol Evol       Date:  2009-12-02       Impact factor: 16.240

6.  A compendium of gene expression in normal human tissues.

Authors:  L L Hsiao; F Dangond; T Yoshida; R Hong; R V Jensen; J Misra; W Dillon; K F Lee; K E Clark; P Haverty; Z Weng; G L Mutter; M P Frosch; M E MacDonald; E L Milford; C P Crum; R Bueno; R E Pratt; M Mahadevappa; J A Warrington; G Stephanopoulos; G Stephanopoulos; S R Gullans
Journal:  Physiol Genomics       Date:  2001-12-21       Impact factor: 3.107

7.  Human housekeeping genes, revisited.

Authors:  Eli Eisenberg; Erez Y Levanon
Journal:  Trends Genet       Date:  2013-06-27       Impact factor: 11.639

8.  A DNA microarray survey of gene expression in normal human tissues.

Authors:  Radha Shyamsundar; Young H Kim; John P Higgins; Kelli Montgomery; Michelle Jorden; Anand Sethuraman; Matt van de Rijn; David Botstein; Patrick O Brown; Jonathan R Pollack
Journal:  Genome Biol       Date:  2005-02-14       Impact factor: 13.583

9.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.

Authors:  Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

10.  Mining tissue specificity, gene connectivity and disease association to reveal a set of genes that modify the action of disease causing genes.

Authors:  Antonio Reverter; Aaron Ingham; Brian P Dalrymple
Journal:  BioData Min       Date:  2008-09-19       Impact factor: 2.522

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.