Literature DB >> 31440537

Reference quantitative transcriptome dataset for adult Caenorhabditis elegans.

Allison Piovesan1, Francesca Antonaros1, Pierluigi Strippoli1, Lorenza Vitale1, Maria Chiara Pelleri1, Maria Caracausi1.   

Abstract

Caenorhabditis elegans is a nematode widely used in biology and genomics as a model organism. We provide an integrated, quantitative reference map for the transcriptome of whole, wild type Bristol N2 strain C. elegans worms. The map has been obtained by meta-analysis of 110 gene expression profiles available in Gene Expression Omnibus (GEO) repository and integrated using the computational biology tool Transcriptome Mapper (TRAM). Following probe assignment to the relative locus and intra- and inter-sample normalization (in particular using the scaled quantile method), a mean, consensus reference value is provided for 45,932 transcripts, along with standard deviation. Expression values are all mapped in the context of genomic coordinates. The map provides easy access to relationships among expression values of different genes in this standard condition, highlights genomic segments with relatively high over-/under-expression and may serve as a reference to test for gene expression variation for both individual genes and the whole transcriptome in specific biological conditions (e.g. mutated strains or differently grown worms).

Entities:  

Keywords:  Adult worms; C. elegans; Gene expression; Meta-analysis; Transcriptome map

Year:  2019        PMID: 31440537      PMCID: PMC6700341          DOI: 10.1016/j.dib.2019.104152

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table Reference table for a quantitative gene expression value for each of the 45,932 Caenorhabditis elegans transcripts, offering the possibility for immediate establishment of quantitative relative ratio of expression for every pair of desired genes as well as analysis of global patterns of expression with any tool of gene expression profile elaboration. Benchmark to identify variation in individual gene expression value following comparison with gene profiles derived by worms in different biological conditions, e.g. different developmental stages, different feeding conditions or treatments, strains with knockdown of specific genes or with any type of genetic difference. Possibility to select genes with the desired features of the expression values (high/low, with high or low standard deviation from the mean among a large number of individuals, usefulness as a reference gene in gene expression studies). Possibility to select genomic segments with high/low expression values (mean of expression values of the genes contained in the segment), thus also identifying genomic open chromatin domains. The quantitative reference values of the enzyme mRNAs might be used in metabolic network models for the validation of hypotheses about the relationships among mRNA levels, corresponding enzymatic proteins and the quantities of their substrates or products obtained by metabolome experiments.

Data

Caenorhabditis elegans transcriptome map

Caenorhabditis elegans is a nematode widely used in biology and genomics as a model organism [1], [2]. We provide an integrated, quantitative reference map for the transcriptome of whole, wild type Bristol N2 strain Caenorhabditis elegans worms. The map has been obtained by meta-analysis of 110 gene expression profiles available in Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/) and integrated using the computational biology tool Transcriptome Mapper (TRAM) [3]. Gene expression profiles were derived from expression microarray experiments and fulfilled the described exclusion and inclusion criteria (Materials and Methods section). Sample identifiers (GEO accession numbers) and main sample features are listed in Supplementary Table 1. Following probe assignment to the relative locus and intra- and inter-sample normalization (in particular using the scaled quantile method), a mean, consensus reference value is provided for 45,932 transcripts, along with standard deviation (Supplementary Table 2). Expression values are all mapped in the context of genomic coordinates. The over-/under-expressed genomic segments shown in Table 1 were selected using the "Map" mode graphical representation. Detailed results are also released within the TRAM software available at: http://apollo11.isto.unibo.it/software/.
Table 1

The genomic segments significantly over-/under-expressed in the C. elegans transcriptome map. Over-expressed genes are in bold, under-expressed genes are with an asterisk and in bold. "+" or "-" signs indicate a value above or below the genome median, respectively. In order to simplify, segments with over-/under-expressed gene content fully included in a segment listed here are not shown.

#ChromosomeSegment StartSegment EndExpression Valueq-valueGenes in the segment
Over-expressed segments
1chrIV11,330,00111,350,0002308.350.00003906F54E12.2+ his-55 his-56 his-58 his-57 klp-12+
2chrIII10,970,00110,990,0002,113.980.00002970dhc-4- col-92 col-93 col-94
3chrV11,070,00111,090,0001,762.530.00002970act-3 act-2 act-1 Y42A5A.1-
4chrIV11,390,00111,410,0001,588.970.00010384tag-89- dsl-6- his-66 cyp-31A2+ his-63 his-64
5chrIV7,470,0017,490,0001,482.270.00000041plk-3 F55G1.6 + F55G1.9+ rod-1+ his-61 his-62 his-60 his-59
6chrIV11,320,00111,340,0001,223.150.00002952B0035.18 + B0035.6+ his-47+ his-48 his-46+ his-45 + Cel.6357- F54E12.2+ his-55 his-56 his-58
7chrI2,060,0012,080,0001,035.730.00014806Y37E3.1+ rpb-10+ moag-4+ arl-13- rla-1 Y37E3.8 phb-1
8chrV8,880,0018,900,0001,029.500.00000060K06C4.1- his-28 his-27 his-22 his-20+ his-19+ his-17 his-18 frpr-13-
9chrMT120,000975.750.00000001ND1 ATP6+ CYTB COX3 ND4 COX1 COX2 ND5+
10chrIII7,170,0017,190,000917.830.00043732acs-4+ srb-8- srb-7- rps-14 rpl-36 F37C12.3+ epg-4+ F37C12.1 + F37C12.14- F37C12.10+ rps-21
11chrI10,550,00110,570,000899.640.00033432F25H2.4+ ndk-1 F25H2.6 + F25H2.7- ubc-25 + F25H2.15+ pas-5+ rla-0 tct-1 F25H2.14+
12chrX7,300,0017,320,000772.990.00019922sur-5+ his-38 T08A9.6- spp-3 spp-2+ spp-6- spp-4+ spp-5
13chrII13,810,00113,830,000755.250.00019922ZK131.11+ his-16 his-13 his-12+ his-11+ his-10 his-9+ his-42+
14chrV2,310,0012,330,000722.000.00005651Y19D10B.1- Y19D10B.6- pud-1.2 pud-2.2 pud-3
15chrII8,560,0018,580,000634.590.00025710iff-2 stc-1+ F54C9.3+ col-38+ rpl-5 bcs-1+ F54C9.7- puf-5 F54C9.9+
16chrIV5,050,0015,070,000618.410.00014806msp-55 C09B9.7- msp-57 msp-53 R13H9.5 + R13H9.6+ rmd-6+
17chrIV8,330,0018,350,000611.610.00002612his-29+ his-30 lys-10- his-31+ his-32 his-34 lgc-6- lgc-5- F17E9.5 F17E9.4+
18chrV8,530,0018,550,000604.670.00025710otpl-5- his-8 his-7 his-6+ his-5 his-39+ otpl-4* asns-1- stdh-4-
19chrIII9,740,0019,760,000528.290.00001188T05G5.1+ iff-1 cdk-1 T05G5.4- T05G5.5+ ech-6 vps-53+ rmd-1
Under-expressed segments
1chrIV5,870,0015,890,0005.030.00045937srv-17* srv-18* srv-19* srv-20- srv-21- srv-22- srv-23* H04M03.11- glf-1-
2chrI12,380,00112,400,0004.890.00338533gly-16- T15D6.5* nhr-77- glct-3* T15D6.8- T15D6.9- T15D6.10* T15D6.11- T15D6.12-
3chrV3,060,0013,080,0004.790.00257570srt-15- srt-16- srh-185* str-40* C50H11.13- srt-10* srt-9- srt-5-
4chrII3,690,0013,710,0004.730.00448442srx-101* srx-100* srx-102* srx-104- srx-105- srx-106- srx-107- srx-108- srx-109- srx-110-
5chrIV5,860,0015,880,0004.570.00198431spe-27- srv-17* srv-18* srv-19* srv-20- srv-21- srv-22-
6chrV15,300,00115,320,0004.560.00257570srx-49* srx-48* T26H8.5- srz-10* irld-62- srt-22- nhr-246- ZK1037.13-
7chrIV9,480,0019,500,0004.350.00018020cng-3* gadr-2- sru-2* sru-1- sru-6- sru-3* sru-4*
8chrV16,670,00116,690,0004.270.00338533str-61- F14F8.8- srz-2* srz-1- srz-103- srw-44- srw-36* srw-43* srz-102-
9chrI12,690,00112,710,0004.210.00001683sra-17* F28C12.6- sra-18* sra-19* sra-20- sra-21* sra-22* sra-23- sra-24* T06G6.11- T06G6.3-
10chrI13,100,00113,120,0003.940.00134398Y26D4A.21- C17H1.2- pals-2* pals-12- pals-4* C17H1.1*
11chrV2,930,0012,950,0003.820.00257570C31B8.16- C31B8.1- srh-247- srw-141* srw-143- srw-137- srh-87* srw-128*
12chrV9,820,0019,840,0003.810.00134398sru-32* str-193* str-2- sru-40- sru-38* srsx-21-
13chrIV14,140,00114,160,0003.570.00134398srz-31- srz-30* oac-37- oac-38* H12I19.8- R05A10.8*
14chrV2,740,0012,760,0003.540.00002042srg-61* srg-62- srx-25* srx-24* srx-26* srx-27- srx-28* srg-58-
15chrV16,680,00116,700,0003.350.00001183srw-44- srw-36* srw-43* srz-102- srw-42* srz-101* srw-41*
16chrV16,460,00116,480,0003.230.00018020srh-142- T08G3.7- sru-44- srh-138* srw-35* srx-126* T08G3.11*
17chrV2,940,0012,960,0003.210.00030016srw-143- srw-137- srh-87* srw-128* srw-122- srh-248* srw-142* srw-144-
18chrV2,950,0012,970,0003.150.00198431srw-122- srh-248* srw-142* srw-144- srw-138- srh-88- srw-116*
19chrV6,800,0016,820,0003.110.00134398dmsr-10* dmsr-11* dmsr-12- T15B7.10- dmsr-14* dmsr-13-
20chrIV9,280,0019,300,0003.010.00040976nhr-267- nhr-264* F49C12.1* F49C12.2*
21chrI13,110,00113,130,0002.730.00023681pals-4* C17H1.1* pals-11*
The genomic segments significantly over-/under-expressed in the C. elegans transcriptome map. Over-expressed genes are in bold, under-expressed genes are with an asterisk and in bold. "+" or "-" signs indicate a value above or below the genome median, respectively. In order to simplify, segments with over-/under-expressed gene content fully included in a segment listed here are not shown.

Reference gene search

In the C. elegans transcriptome map, the search for reference genes with the described criteria (Materials and Methods section) retrieved 3 loci (Table 2). The rpl4 locus, encoding 60S ribosomal protein L4, shows the most favorable combination of high level expression, high number of samples and low standard deviation.
Table 2

List of the best predicted reference genes from the whole adult C. elegans quantitative transcriptome map. Chr = chromosome; SD = standard deviation.

Gene nameChrExpression ValueSample NumberSD as % of ExpressionDescription
rpl4chrI2603.7710219.8560S ribosomal protein L4
riok-3chrIII165.736118.99Serine/threonine-protein kinase RIO3
Y48G1C.1chrI149.215519.88hypothetical protein
List of the best predicted reference genes from the whole adult C. elegans quantitative transcriptome map. Chr = chromosome; SD = standard deviation.

Experimental design, materials and methods

Database search and selection

Caenorhabditis elegans is a nematode widely used in biology and genomics as a model organism [1], [2]. A search in GEO gene expression data repository for any available samples listing gene expression values for whole wild type, Bristol N2 strain C. elegans worms was conducted in November 2018 querying for: "Caenorhabditis elegans"[Organism] AND "Expression profiling by array"[Filter] AND adult. 250 datasets were found, and 50 randomly selected datasets (the first 50 presented by default order) were further studied to identify any individual, pertinent gene expression values list. The criteria for inclusion were: RNA extracted from whole Bristol N2 wild type adult (or young adult) worms at any age (day 2 - day 15); hermaphrodite/male sex. Criteria for exclusion were: larval stage, worms treated with empty or not empty vectors, worms not fed with living E. coli or fasting, exposition to DMSO (dimethyl sulfoxide) as vehicle control, grown at 25 °C when 20 °C condition was available. Although RNA sequencing (RNA-Seq), the other high-throughput method used to assess gene expression, is considered to be more sensitive and to have a broader dynamic range than RNA microarrays [4], the latter remains an accurate tool for measuring the levels of gene expression [5], also offering some specific advantages over RNA-Seq [6], and thus continuing to provide useful data-mining resources.

TRAM analysis

TRAM software [3] allows the importation of gene expression data from any source (expression microarray, RNA-Seq or proteomic platforms). It performs the integration of all data related to the same biological source by decoding probe set identifiers to gene symbols via UniGene data parsing [7], normalizes data from multiple platforms using intra-sample and inter-sample normalization (scaled quantile normalization) [8], and creates a graphical representation of gene expression profiles along the chromosomes also determining the statistical significance of differential expression of chromosomal segments in comparison with the other segments in the biological condition studied. When two conditions A and B are compared, it is able to calculate differential expression of each segment between them. The statistical method used by TRAM to this aim is hypergeometric distribution, a recognized algorithm able to test the probability 'p' that colocalization of over-/under-expressed genes within the same chromosomal segment may be due to chance [3]. We used an updated version of TRAM (TRAM 1.3) [8], including enhanced resolution of gene identifiers through an updated NCBI Gene database, updated platform annotation files and UniGene data parsing. TRAM set up for C. elegans was performed in November, 2018 following the software user guide. The gene expression profiles fulfilling the criteria for exclusion and inclusion were imported as Pool A. Pool B is available for comparisons with a different biological condition. The value for each locus is defined as the mean value of all available values for that locus. The genome wide gene expression median value was used in order to determine percentiles of expression for each gene. Using the "Map" mode graphical representation we searched for over-/under-expressed genome segments which have a window size of 20,000 bp (base pairs) and a shift of 10,000 bp. These values were chosen according to the ratio between human and C. elegans mean gene length (as determined by searching the recent GeneBase database available for humans [9], [10] and running an analogous NCBI Gene C. elegans data import in GeneBase for worms - data unshown). The expression value for each genomic segment is the mean of the expression values of the loci included in that segment. Loci for which mean value was derived from less than five biological samples were not considered. A segment is defined as over-/under-expressed by descriptive statistics if it has an expression value within the highest and the lowest 2.5th percentile among all genomic segments and contains at least three genes which have an expression value within the highest and the lowest 2.5th percentile (default parameters) among all genes. The statistical significance of the over-/under-expression of the over-/under-expressed genome segments, respectively, is then assessed by statistical tests based on hypergeometric distribution, a recognized algorithm able to test the probability "p" that colocalization of three over-/under-expressed genes within the same chromosomal segment may be due to chance and corrected for possible multiple comparisons causing false discovery rate (FDR) due to the high number of segments in a genome. A segment was considered to be statistically significantly over-/under-expressed for q < 0.05 [3], [8]. Apart from gene expression analyses, these data might be used in metabolic network models [11] for the validation of hypotheses about the relationships among mRNA levels, corresponding enzymatic proteins and the quantities of their substrates or products obtained by metabolome experiments [12], also using genomic location data [13]. In addition these data might be used to calculate the recently described transcriptomic GC content [14], i.e. the guanine-cytosine percentage calculated in the mRNA amount actually expressed in a tissue or cell type, to search for variation of this parameter among different biological conditions. Sample expression values equal to or lower than "0" (≤0) will be thresholded by TRAM [15], [16] to 95% of the minimum positive value present in that sample, in order to obtain meaningful numbers when dividing "Sample Pool A" values by "Sample Pool B" values. Assuming that in these cases an expression level is too low to be detected under the used experimental conditions, this transformation still allows a ratio between values in Pool A and values in Pool B to be obtained, which is useful to highlight differential gene expression. The ideal reference, or control, gene for the study of gene expression in a given organism should be expressed at a medium-high level for easy detection and at a constant/stable level throughout different samples also undergoing different treatments [17]. A search of reference genes best suitable for the study of whole adult C. elegans was performed in the transcriptome map created as described above using the following parameters in combination: expression value > 100 in order to select genes expressed above the mean value (that is posed equal to 100), therefore at an appreciable level; number of samples ≥ of half the total number of samples of the map in order to select commonly expressed genes (≥55); standard deviation (SD), expressed as a percentage of the mean value, ≤20 in order to identify genes with a very low expression variation among different samples [17].

Specifications table

Subject areaBiology
More specific subject areaGenomics, bioinformatics
Type of dataTable
How data was acquiredMicroarray data repository: Gene Expression Omnibus (GEO) provided by the National Center for Biotechnology Information (NCBI) at https://www.ncbi.nlm.nih.gov/geo/; elaboration: Transcriptome mapper (TRAM) software, version 1.3
Data formatExcel Table (.xlsx) - Data analyzed by TRAM software and exported as spreadsheet
Experimental factorsDatabase search, dataset selection, TRAM (Transcriptome Mapper) analysis
Experimental featuresMeta-analysis of wild type, Bristol N2 strain Caenorhabditis elegans adult worms
Data source locationData sources are listed in theSupplementary Table 1; Meta-analysis results have been obtained in Bologna, Italy, DIMES Department at University of Bologna; TRAM software set up for Caenorhabditis elegans with the results obtained in this analysis has been released
Data accessibilityData sources are available at https://www.ncbi.nlm.nih.gov/geo/; analyzed data are with this article
Related research articleL. Lenzi, F. Facchin, F. Piva, M. Giulietti, M.C. Pelleri, F. Frabetti, L. Vitale, R. Casadei, S. Canaider, S. Bortoluzzi et al, TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources, BMC Genomics. 12 (2011) 121
Value of the data

Reference table for a quantitative gene expression value for each of the 45,932 Caenorhabditis elegans transcripts, offering the possibility for immediate establishment of quantitative relative ratio of expression for every pair of desired genes as well as analysis of global patterns of expression with any tool of gene expression profile elaboration.

Benchmark to identify variation in individual gene expression value following comparison with gene profiles derived by worms in different biological conditions, e.g. different developmental stages, different feeding conditions or treatments, strains with knockdown of specific genes or with any type of genetic difference.

Possibility to select genes with the desired features of the expression values (high/low, with high or low standard deviation from the mean among a large number of individuals, usefulness as a reference gene in gene expression studies).

Possibility to select genomic segments with high/low expression values (mean of expression values of the genes contained in the segment), thus also identifying genomic open chromatin domains.

The quantitative reference values of the enzyme mRNAs might be used in metabolic network models for the validation of hypotheses about the relationships among mRNA levels, corresponding enzymatic proteins and the quantities of their substrates or products obtained by metabolome experiments.

  17 in total

1.  Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0.

Authors:  Jan Schellenberger; Richard Que; Ronan M T Fleming; Ines Thiele; Jeffrey D Orth; Adam M Feist; Daniel C Zielinski; Aarash Bordbar; Nathan E Lewis; Sorena Rahmanian; Joseph Kang; Daniel R Hyduke; Bernhard Ø Palsson
Journal:  Nat Protoc       Date:  2011-08-04       Impact factor: 13.491

Review 2.  What Can We Learn About Human Disease from the Nematode C. elegans?

Authors:  Javier Apfeld; Scott Alper
Journal:  Methods Mol Biol       Date:  2018

3.  Integrated Transcriptome Map Highlights Structural and Functional Aspects of the Normal Human Heart.

Authors:  Maria Caracausi; Allison Piovesan; Lorenza Vitale; Maria Chiara Pelleri
Journal:  J Cell Physiol       Date:  2016-07-21       Impact factor: 6.384

4.  TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources.

Authors:  Luca Lenzi; Federica Facchin; Francesco Piva; Matteo Giulietti; Maria Chiara Pelleri; Flavia Frabetti; Lorenza Vitale; Raffaella Casadei; Silvia Canaider; Stefania Bortoluzzi; Alessandro Coppe; Gian Antonio Danieli; Giovanni Principato; Sergio Ferrari; Pierluigi Strippoli
Journal:  BMC Genomics       Date:  2011-02-18       Impact factor: 3.969

5.  Plasma and urinary metabolomic profiles of Down syndrome correlate with alteration of mitochondrial metabolism.

Authors:  Maria Caracausi; Veronica Ghini; Chiara Locatelli; Martina Mericio; Allison Piovesan; Francesca Antonaros; Maria Chiara Pelleri; Lorenza Vitale; Rosa Anna Vacca; Federica Bedetti; Maria Chiara Mimmi; Claudio Luchinat; Paola Turano; Pierluigi Strippoli; Guido Cocchi
Journal:  Sci Rep       Date:  2018-02-14       Impact factor: 4.379

6.  Systematic identification of human housekeeping genes possibly useful as references in gene expression studies.

Authors:  Maria Caracausi; Allison Piovesan; Francesca Antonaros; Pierluigi Strippoli; Lorenza Vitale; Maria Chiara Pelleri
Journal:  Mol Med Rep       Date:  2017-07-06       Impact factor: 2.952

7.  A molecular view of the normal human thyroid structure and function reconstructed from its reference transcriptome map.

Authors:  Lorenza Vitale; Allison Piovesan; Francesca Antonaros; Pierluigi Strippoli; Maria Chiara Pelleri; Maria Caracausi
Journal:  BMC Genomics       Date:  2017-09-18       Impact factor: 3.969

8.  A coding and non-coding transcriptomic perspective on the genomics of human metabolic disease.

Authors:  James A Timmons; Philip J Atherton; Ola Larsson; Sanjana Sood; Ilya O Blokhin; Robert J Brogan; Claude-Henry Volmar; Andrea R Josse; Cris Slentz; Claes Wahlestedt; Stuart M Phillips; Bethan E Phillips; Iain J Gallagher; William E Kraus
Journal:  Nucleic Acids Res       Date:  2018-09-06       Impact factor: 16.971

9.  On the length, weight and GC content of the human genome.

Authors:  Allison Piovesan; Maria Chiara Pelleri; Francesca Antonaros; Pierluigi Strippoli; Maria Caracausi; Lorenza Vitale
Journal:  BMC Res Notes       Date:  2019-02-27

10.  Identification of minimal eukaryotic introns through GeneBase, a user-friendly tool for parsing the NCBI Gene databank.

Authors:  Allison Piovesan; Maria Caracausi; Marco Ricci; Pierluigi Strippoli; Lorenza Vitale; Maria Chiara Pelleri
Journal:  DNA Res       Date:  2015-11-17       Impact factor: 4.458

View more
  1 in total

1.  The transcriptome profile of human trisomy 21 blood cells.

Authors:  Francesca Antonaros; Rossella Zenatelli; Giulia Guerri; Matteo Bertelli; Chiara Locatelli; Beatrice Vione; Francesca Catapano; Alice Gori; Lorenza Vitale; Maria Chiara Pelleri; Giuseppe Ramacieri; Guido Cocchi; Pierluigi Strippoli; Maria Caracausi; Allison Piovesan
Journal:  Hum Genomics       Date:  2021-05-01       Impact factor: 4.639

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.