| Literature DB >> 25977771 |
Ying Yu1, Chen Zhao1, Zhenqiang Su2, Charles Wang3, James C Fuscoe2, Weida Tong2, Leming Shi4.
Abstract
The rat is used extensively by the pharmaceutical, regulatory, and academic communities for safety assessment of drugs and chemicals and for studying human diseases; however, its transcriptome has not been well studied. As part of the SEQC (i.e., MAQC-III) consortium efforts, a comprehensive RNA-Seq data set was constructed using 320 RNA samples isolated from 10 organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, and testes or uterus) from both sexes of Fischer 344 rats across four ages (2-, 6-, 21-, and 104-week-old) with four biological replicates for each of the 80 sample groups (organ-sex-age). With the Ribo-Zero rRNA removal and Illumina RNA-Seq protocols, 41 million 50 bp single-end reads were generated per sample, yielding a total of 13.4 billion reads. This data set could be used to identify and validate new rat genes and transcripts, develop a more comprehensive rat transcriptome annotation system, identify novel gene regulatory networks related to tissue specific gene expression and development, and discover genes responsible for disease and drug toxicity and efficacy.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25977771 PMCID: PMC4381750 DOI: 10.1038/sdata.2014.13
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Schematic overview of rat transcriptomic BodyMap study design.
Diagram of organs harvested from Fischer 344 rats and the selected four ages at which organs were harvested for RNA-Seq transcriptomic profiling. Total number of RNA samples: 10 organs per rat x 32 rats (2 sexes x 4 ages x 4 replicates) = 320 for RNA-Seq. Updated from Yu et al.[8]
Figure 2Quality assessment metrics for RNA-Seq data.
Box plots representing (a) GC content (%) and (b) Phred quality score distribution over all reads across all samples in each base (i.e., sequencing cycle). The box and horizontal bar represent the interquartile range and median of the (a) GC content and (b) median of Phred quality score over all reads. (c) Box plot representing the percentage of reads (y-axis) that appear N times (x-axis) relative to the number of unique reads from each sequencing sample across all samples. (d) The percentage of reads mapped (ratio, Mean ±s.e., n=32 in normal organs or 16 in sexual organs) to genomic regions, AceView exons, ERCCs and rRNA in each organ.
Figure 3Quality assessment based on external RNA spike-in controls.
(a) Heatmap and hierarchical clustering representing Pearson correlation between expression profiles of external RNA spike-in controls (ERCC) in 318 samples. Mix1 (pink) and Mix2 (light blue) are marked at the row side bar. The order of samples is identical in rows and columns. The plots of log2(FPKM) (Mean ±s.e., n=159) of ERCCs detected from samples spiked with ERCC Mix1 (b) and Mix2 (c) vs log2(spike-in concentrations). Linear fitting curve (blue) and error bar (red) are marked. (d) Kernel density and box plot representing 3′/5′-end coverage ratio across all samples. The average coverage of the 50 bases at the 3′- and 5′-ends of the 20 ERCCs with the highest spike-in concentrations is used to represent the coverage of the 3′- and 5′-ends in each sample. (e) Box plot for the gene body coverage ratio based on the 20 ERCCs with the highest spike-in concentrations. Four bins of the x-axis represent for average per-base coverage for the 20 ERCC sequences in each sample. Y-axis represents gene body coverage ratio, i.e., the percentage of bases that are sequenced at least one time. (f) Bar plot for single nucleotide sequencing error rate (y-axis) of each base along 5′- to 3′-end of a read (x-axis).
Guide to Supplementary File 1
| Column | Column Header | Explanation |
|---|---|---|
| 1 | Sample_ID | Sample identifier coding organ, sex, age, and replicate # |
| 2 | Libray_Processing_Order | The order in which an RNA sample or library was processed |
| 3 | RNA_Sample_ID | Sample identifier specifying the original serial sample ID |
| 4 | Flowcell | The flowcell number on which the sample was sequenced |
| 5 | Lane | The lane number on which the sample was sequenced |
| 6 | ERCC_Mix | Either ERCC Mix1 or Mix2 was added to the RNA sample |
| 7 | BarCode | The multiplexing barcode for the RNA sample |
| 8 | RNA_A260.A280_Ratio | A260/A280 ratio indicating RNA purity |
| 9 | RNA_RIN | RNA Integrity Number |
| 10 | Library_Con_ng.ul | Library concentration in ng μl−1 |
| 11 | Organ | Full name of the organ from which the RNA was isoloated |
| 12 | Organ_Abbr_2chars | Two-character abbreviation for an organ |
| 13 | Organ_Abbr_3chars | Three-character abbreviation for an organ |
| 14 | Age_Week | Age (weeks) of the rat |
| 15 | Sex | Sex of the rat |
| 16 | Replicate | Replicate # |
| 17 | Sample_Name_Alt | Alternate name with two-character code for organ name |
| 18 | Genome_Ratio | The ratio of reads mapped to genome |
| 19 | Aceview_Ratio | The ratio of reads mapped to the AceView database |
| 20 | ERCC_Ratio | The ratio of reads mapped to the ERCCs |
| 21 | rRNA_Ratio | The ratio of reads mapped to rRNA |
| 22 | Total_Reads | The number of total reads collected on an RNA sample |
Guide to Supplementary File 3
| Column | Column Header | Explanation |
|---|---|---|
| 1 | Filenames | A unique filename coding organ, sex, age, replicate #, flowcell, lane, and library pool |
| 2 | baseCov | Per-base coverage for the top 20 ERCC sequences |
| 3 | 5′ normCov | Length normalized per-base coverage of 50 bases in 5′ end across top 20 ERCC sequences |
| 4 | 3′ normCov | Length normalized per-base coverage of 50 bases in 3′ end across top 20 ERCC sequences |
| 5 | 5′/3′ Cov Ratio | 3′ bias based on normCov |
| 6 | Gap% | Percentage of bases that were not sequenced in the top 2 ERCC sequences |
| 7 | Base Mismatch Rate | Sequencing error rate in preprocessed sequencing data |