| Literature DB >> 28915238 |
Emily L Clark1, Stephen J Bush1, Mary E B McCulloch1, Iseabail L Farquhar1, Rachel Young1, Lucas Lefevre1, Clare Pridans1, Hiu G Tsang1, Chunlei Wu2, Cyrus Afrasiabi2, Mick Watson1, C Bruce Whitelaw1, Tom C Freeman1, Kim M Summers1,3, Alan L Archibald1, David A Hume1,3.
Abstract
Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of 'guilt by association' was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages.Entities:
Mesh:
Year: 2017 PMID: 28915238 PMCID: PMC5626511 DOI: 10.1371/journal.pgen.1006997
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Details of the tissues and cell types sequenced to generate the TxBF RNA-Seq dataset for the sheep gene expression atlas.
| Subset | Tissue Type | Library Type | Sequencing Depth | Total Number of Libraries | Number of Individuals |
|---|---|---|---|---|---|
| Core Atlas | Liver, spleen, ovary, testes, hippocampus, kidney medulla, bicep muscle, reticulum, ileum, thymus, left ventricle | Total RNA-Seq | >100 million paired-end reads per sample | 59 (~10 per individual) | 3 adult males and 3 adult females |
| Core Atlas | GI tract, reproductive tract, brain, endocrine, cardiovascular, lymphatic, musculo-skeletal-system, immune cells | mRNA-Seq | >25 million paired-end reads per sample | 202 (~45 per individual) | 3 adult males and 3 adult females |
| LPS Time Course | BMDM 0h (-LPS) and 7h (+LPS) | Total RNA-Seq | >100 million paired-end reads per sample | 12 (2 time points per individual) | 3 adult males and 3 adult females |
| LPS Time Course | BMDM 0h, 2h, 4h, 7h, 24h post LPS treatment | mRNA-Seq | >25 million paired-end reads per sample | 30 (3 time points per individual) | 3 adult males and 3 adult females |
| GI Tract Time Series | Gastro-intestinal tract | mRNA-Seq | >25 million paired-end reads per sample | 108 (~12 tissues per individual) | 9 lambs (3 at birth, 3 at one week and 3 at 8 weeks) |
| Early Development | Day 7 blastocysts (3 pools of 8) | Nu Gen Single Cell Ovation Kit | >66 million paired-end reads per sample | 3 | 3 pools |
| Early Development | Day 23 whole embryos | Total RNA-Seq | >100 million paired-end reads per sample | 3 | 3 embryos (all male) |
| Early Development (day 35) | Liver, brain, embryonic fibroblasts | mRNA-Seq | >25 million paired-end reads per sample | 7 | 3 embryos (two female and one male) |
| Early Development (day 100) | Liver, ovary | mRNA-Seq | >25 million paired-end reads per sample | 5 | 3 female (liver) |
| Maternal Reproductive Time Series (days 23, 35 and 100) | Placenta and ovary | mRNA-Seq | >25 million paired-end reads per sample | 12 | 2 females per time point |
Tissues and cells were chosen to cover all major organ systems. All libraries were Illumina 125bp paired end stranded libraries. See S2 Table for a detailed list of the tissues and cell types sequenced.
Fig 1Hierarchical clustering of the samples included in the sheep gene expression atlas dataset.
Samples of each tissue and cell type from each breed and developmental stage were averaged across individuals for ease of visualisation. The tree was constructed from the Euclidean distances between expression vectors using MEGA v7.0.14 [141] with the neighbour-joining method and edited in the graphical viewer FigTree v1.4.3 [142]. Clustering is biologically meaningful and highlights the lack of any significant effect of library type post-correction. Samples are coloured by organ system.
The number and percentage of Oar v3.1 protein coding and non-coding genes, with average TPM across all animals > 1 in at least one tissue, in both the TxBF dataset after the Kallisto first and second pass, and after incorporating the existing Texel dataset.
| RNA-Seq data used: | Texel x Scottish Blackface Libraries Only | Including the existing Texel dataset | |||||
|---|---|---|---|---|---|---|---|
| Kallisto index: | First-Pass | Second-Pass (restricted) | Second-Pass (restricted) | ||||
| Gene type | No. in reference annotation (Oar v3.1) | No. of genes of this type expressed | % genes of this type expressed | No. of genes of this type expressed | % genes of this type expressed | No. of genes of this type expressed | % genes of this type expressed |
| lincRNA | 1858 | 1548 | 83.32 | 0 | 0 | 0 | 0 |
| miRNA | 1305 | 1242 | 95.17 | 0 | 0 | 0 | 0 |
| misc RNA | 361 | 310 | 85.87 | 0 | 0 | 0 | 0 |
| MT rRNA | 2 | 2 | 100 | 0 | 0 | 0 | 0 |
| MT tRNA | 22 | 19 | 86.36 | 0 | 0 | 0 | 0 |
| processed pseudogene | 43 | 31 | 72.09 | 35 | 81.40 | 38 | 88.37 |
| protein-coding | 20921 | 19921 | 95.22 | 20189 | 96.50 | 20359 | 97.31 |
| pseudogene | 247 | 172 | 69.64 | 189 | 76.52 | 201 | 81.38 |
| rRNA | 305 | 272 | 89.18 | 0 | 0 | 0 | 0 |
| snoRNA | 756 | 717 | 94.84 | 0 | 0 | 0 | 0 |
| snRNA | 1234 | 1116 | 90.44 | 0 | 0 | 0 | 0 |
‘TxBF data’ refers to the present study; ‘Texel data’ is obtained from [18]. The ‘first pass’ Kallisto index contains the known Ovis aries v3.1 cDNAs for both protein-coding and non-protein coding transcripts. The ‘second pass’ Kallisto index is a filtered version of the former, that (a) restricts the RNA space to protein-coding genes, pseudogenes, and processed pseudogenes (so that expression within an equivalent space will be quantified, irrespective of experimental protocol), (b) omits genes that had no detectable expression across all TxBF samples, and (c) includes novel transcript reconstructions further to the de novo assembly of unmapped reads.
Fig 2Network visualisation and clustering of the sheep gene expression atlas.
A three-dimensional visualisation of a Pearson correlation gene-to-gene graph of expression levels derived from RNA-Seq data from analysis of sheep tissues and cells. Each node (sphere) in the graph represents a gene and the edges (lines) correspond to correlations between individual measurements above the defined threshold. The graph is comprised of 15,192 nodes (genes) and 811,213 edges (correlations ≥0.75). Co-expressed genes form highly connected complex clusters within the graph. Genes were assigned to groups according to their level of co-expression using the MCL algorithm.
Tissue/cell/pathway association of the 50 largest network clusters in the sheep gene expression atlas dataset.
| Cluster ID | Number of Transcripts | Profile Description | Class | Sub-Class |
|---|---|---|---|---|
| 1 | 1199 | General | House Keeping | House Keeping (1) |
| 2 | 987 | Brain | Central Nervous System (CNS) | CNS |
| 3 | 658 | Testes–Adult | Reproduction | Gamete Production |
| 4 | 585 | General | House Keeping | House Keeping (2) |
| 5 | 351 | Macrophages | Immune | Macrophages |
| 6 | 350 | Fallopian Tube > Testes | Cilia | Motile Cilia |
| 7 | 284 | Fetal Ovary > Adult Testes | Early Development | Reproduction |
| 8 | 276 | Many Tissues—Highly Variable | Pathway | Cell Cycle |
| 9 | 265 | Fetal Brain > Adult Brain | CNS | CNS |
| 10 | 247 | Skeletal Muscle > Oesophageal Muscle | Musculature | Skeletal Muscle |
| 11 | 219 | Lymph Nodes > Blood > Not Macrophages | Immune | T-Cell and B-Cell |
| 12 | 215 | Thymus > Salivary Gland | Immune | T-Cell |
| 13 | 186 | Fore-Stomachs > Tonsil > Skin | GI Tract | Ruminal Epithelium |
| 14 | 183 | Kidney Cortex > Liver | Renal | Kidney Cortex |
| 15 | 182 | General but not even—highest in muscle | Pathway | Oxidative Phosphorylation |
| 16 | 158 | Epididymis > Vas Deferens | Reproduction | Male |
| 17 | 153 | Liver | Liver | Liver (Hepatocytes) |
| 18 | 145 | Peyer’s Patch, Ileum, Lymph Nodes, Blood | Immune | T-Cell and B-Cell |
| 19 | 134 | Placenta | Gestation | Placental Function |
| 20 | 119 | Epididymis > Testes > Vas Deferens | Reproduction | Male |
| 21 | 115 | General but not even | Pathway | Ribosomal |
| 22 | 102 | Adrenal Gland | Endocrine | Steroid Hormone Biosynthesis |
| 23 | 102 | Placenta | Gestation | Placental Function |
| 24 | 98 | Liver > Small Intestine | Liver | Liver (GI Tract) |
| 25 | 96 | Fetal Liver | Liver | Developing Liver |
| 26 | 92 | Small Intestine > Large Intestine | GI Tract | GI Tract |
| 27 | 90 | Pituitary Gland | Endocrine | Hormone Synthesis |
| 28 | 85 | General highest in reproductive tissues and brain | Cilia | Primary Cilia |
| 29 | 77 | Heart | Musculature | Cardiac Muscle |
| 30 | 75 | Thyroid | Endocrine | Thyroxine Synthesis |
| 31 | 73 | Peyers Patch, Ileum, Lymph Nodes, Blood, Macrophages | Immune | T-Cell and B-Cell |
| 32 | 69 | Salivary Gland, Lymph Node, Blood, Small Intestine | Immune | T-Cell and B-Cell |
| 33 | 68 | Fore-Stomachs Adult—Not Neonates > AMs | GI Tract | Immune |
| 34 | 65 | Small Intestine > Large Intestine | GI Tract | GI Tract |
| 35 | 61 | General—highest in Brain | House Keeping | House Keeping (3) |
| 36 | 58 | Heart Valves | Cardiovascular | Extra Cellular Matrix |
| 37 | 58 | General—highest in Blood | House Keeping | House Keeping (4) |
| 38 | 56 | General—highest in Fetal Brain | House Keeping | House Keeping (5) |
| 39 | 50 | GI Tract—highest in Reticulum | GI Tract | GI Tract |
| 40 | 49 | General—highest in Testes | House Keeping | House Keeping (6) |
| 41 | 49 | General—highest in Ovary | House Keeping | House Keeping (7) |
| 42 | 47 | General—highest in Brain | House Keeping | House Keeping (8) |
| 43 | 45 | Fore-Stomachs > Tonsil > Skin | GI Tract | Ruminal Epithelium |
| 44 | 45 | General—highest in testes | House Keeping | House Keeping (9) |
| 45 | 44 | Macrophage (BMDM + LPS) | Immune | Macrophages (LPS Response TNF) |
| 46 | 44 | Blood, Lymph Nodes | Immune | Blood |
| 47 | 42 | Large Intestine | GI Tract | GI Tract |
| 48 | 42 | General | Pathway | Histones |
| 49 | 41 | Reticulum and Rumen—very variable | GI Tract | Reticulum/Rumen |
| 50 | 36 | Adrenal Gland Medulla | Endocrine | Steroid Hormone Biosynthesis |
> indicates decreasing expression profile.
Fig 3Collapsed node visualisation of the sheep gene expression atlas dataset in two-dimensions to illustrate the relative proportion of genes in each cluster.
Includes 3104 nodes and 138,407 edges with a Pearson correlation value of r = 0.75 and an MCL inflation (MCLi) value of 2.2. Nodes are coloured by tissue/cell type or for broader classes organ system. The largest clusters are numbered from 1 to 30 (see Table 3 for functional annotation). The largest clusters are dominated by either house-keeping genes (1 & 4) or genes associated with transcriptionally rich tissues or cell types, such as brain (2), testes (3) and macrophages (5).
Fig 4Interrogation of the underlying expression profiles allows regions of the graph to be associated with specific tissues or cell types.
A A three-dimensional visualisation of a Pearson correlation gene to gene network graph (r = 0.75, MCLi = 2.2). Samples of each tissue and cell type from each breed and developmental stage are averaged across individuals for ease of visualisation. Histograms of the averaged expression profile (averaged across individuals for each tissue and cell type for ease of visualisation) of genes in selected clusters are given on the right: B (i) profile of cluster 5 genes whose expression is highest in macrophages; (ii) profile of cluster 7 genes whose expression is highest in fetal ovary and testes; (iii) or a broader expression pattern associated with a cellular process e.g. oxidative phosphorylation (cluster 15). Note that there may be a degree of variation in the expression pattern of individual genes within a cluster which is masked when average profiles are displayed.
Fig 5Upregulation of genes in the crossbred TxBF relative to the purebred Texel.
A A Texel sire was crossed with Scottish Blackface dam to create the F1 Texel x Scottish Blackface individuals. B The top 20 genes showing the greatest up-regulation (as absolute fold change) between the crossbred TxBF and purebred Texel individuals. Genes associated with the function of skeletal muscle and connective tissue are indicated.
Fig 6Screenshot of the expression profile of the sheep myostatin (MSTN) gene within the BioGPS online portal.
Expression estimates from the TxBF sheep gene expression atlas dataset are available via the BioGPS database (http://biogps.org/dataset/BDS_00015/sheep-atlas/). This provides a searchable database of genes, with expression profiles across tissues and cells for each gene displayed as histograms via the following link, http://biogps.org/sheepatlas/. The BioGPS platform supports searching for genes with a similar profile, allows access to the raw data, and links to external resources. It also provides the potential for comparative analysis across species, for example with the expression profiles for pig.