| Literature DB >> 25392405 |
Xinxia Peng1, Jean Thierry-Mieg2, Danielle Thierry-Mieg3, Andrew Nishida1, Lenore Pipes4, Marjan Bozinoski4, Matthew J Thomas1, Sara Kelly1, Jeffrey M Weiss1, Muthuswamy Raveendran5, Donna Muzny5, Richard A Gibbs5, Jeffrey Rogers5, Gary P Schroth6, Michael G Katze7, Christopher E Mason8.
Abstract
The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼ 20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼ 15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis.Entities:
Mesh:
Year: 2014 PMID: 25392405 PMCID: PMC4383927 DOI: 10.1093/nar/gku1110
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Summary of tissue-specific raw RNA-seq data (in millions of read pairs) for 11 NHP species and subspecies. All RNA-seq libraries were prepared with total RNAs with the strand-specific protocol and ribosomal RNA depletion (100 + 100 nt paired-end reads). Whole blood libraries were added a globin depletion step (50 + 50 nt paired-end reads). Additional tissue-specific RNA-seq data available from this NHPRTR update but not included in this table are: (i) testis (35.4 million read pairs) and ovary (33.1) from rhesus macaque Indian-origin; (ii) left (50.9) and right (69.5) brain hemispheres from marmoset; (iii) whole blood from ring tailed lemur (102.4), owl monkey (94.0) and rhesus macaque Chinese-origin (107.3); (iv) liver samples (50-nt single read reads) with polyA selection from Rhesus macaque Chinese-origin (26.2), cynomolgus macaque Mauritian-origin (20) and cynomolgus macaque Chinese-origin (16.9).
Figure 2.Summary of alignment of non-human primate (NHP) tissue-specific RNA-seq reads to human reference sequences. (a) Percentage of all reads from each NHP species which were aligned to human reference sequences using the Magic pipeline. The colors of bars indicate: Hominoids (blue), Old World monkeys (red), New World monkeys (green) and Prosimians (yellow); the number of tissues sequenced in each species is indicated in parenthesis. (b) Average number of mismatches per kilobase found in the alignments of each species to the human reference sequences. (c) Number of AceView genes with significant expression detected in at least one tissue of each primate species.
Figure 3.Examples of genes showing conserved tissue-specific patterns across NHPs. On the heatmap, columns are AceView human gene symbols and rows are individual tissue samples. Same tissues from different species were grouped together as indicated by the color bands and labels on the right side of the heatmap. The color on the heatmap indicates the relative gene expression abundance in individual tissue samples. The relative expression abundance was calculated as log 2 transformed normalized sFPKM subtracted by the median of log 2 transformed normalized sFPKMs across all tissue samples for the same gene, therefore a unit of 1 is equal to a 2-fold increase in abundance in a tissue compared to the median of all samples from the same tissue.