Literature DB >> 27377755

Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome.

Aroa Suárez-Vega1, Beatriz Gutiérrez-Gil1, Christophe Klopp2, Gwenola Tosser-Klopp3,4,5, Juan-José Arranz1.   

Abstract

RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27377755      PMCID: PMC4932878          DOI: 10.1038/sdata.2016.51

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

The development of high-throughput whole-transcriptome sequencing technologies, ie, RNA-Seq, has induced a revolutionary impact on transcriptome analysis. RNA-Seq technology enables the generation of extensive transcriptome information providing advantages over previous microarray analyses due to its wide dynamic range and its capability to exhaustively quantify the studied samples transcripts and not only the targets present on the array[1]. Furthermore, the high sequencing depth and coverage of this technology additionally provides structural information including alternative splice forms and transcriptome single nucleotide polymorphism[2]. In recent years, RNA-Seq technology has been applied to the study of lactating mammary gland in several species[3-8]. The knowledge of the transcriptome profiling of the lactating mammary gland is of special interest since it allows the characterization of the genes implicated in the biology of lactation and the physiological and metabolic changes occurring in the mammary gland during this period. Besides, in dairy livestock the knowledge of the transcripts expressed in lactating mammary gland enhances our understanding on the genes underlying dairy traits, including milk yield and composition, milk technological properties, lactation persistency, etc. The principal aim of this study was to gain a better understanding of the sheep lactating mammary gland and to compare the mammary gland transcriptome of two sheep breeds with different dairy production characteristics, Spanish Churra and Assaf. To that end a total of eight healthy animals were selected to be included in the experiment, four Assaf and four Churra ewes. These two breeds were chosen as they are considered as two of the principal dairy sheep breeds farmed in Spain. Churra is a Spanish autochthonous breed, characterized by its rusticity[9]. Assaf is a more specialized dairy sheep developed as a crossbred between Awassi (5/8) and Milschchaf (3/8) breeds[9]. Lactation is normalized to 120 days in Churra and 150 days in Assaf. The Assaf milk yield (400 kg) is more than double of the milk yield in Churra (117 kg), although Assaf milk has lower fat (6.65 versus 7.01) and protein contents (5.40 versus 5.79) (http://www.magrama.gob.es/es/). In general, milk from Churra sheep shows better characteristics for the manufacturing of mature dairy products[10]. For all the animals included in the present experiment, milk samples were collected on days 10 (D10), 50 (D50), 120 (D120) and 150 (D150) after lambing (Table 1, Fig. 1). These sampling points were established to cover the different physiological stages of the mammary gland across the complete lactation (Fig. 1). All the collected milk samples were later processed to extract RNA from the milk somatic cells (MSCs). MSCs contain heterogeneous populations of cells[11,12]. The proportions of these cellular populations in sheep milk were review by Li et al.[11] Among MSCs, mammary epithelial cells (MECs) are the cells that produce milk and are shed from the mammary epithelium during lactation. In ruminants, this type of cell is often detected below 15%[11]. Specifically, in ewe’s milk, MECs represent a minority of total MSCs content, 2 to 3%; reviewed by Herve et al.[12] Nevertheless, this value should be used as a rough estimation since, for dairy cows, where milk cells subpopulations have been more studied than in sheep, the estimation of MECs concentration in milk has a wide range of variation depending, among other factors, on the counting method used (reviewed by Herve et al.[12]). For one of the studied breeds, Churra, a study on the variation in the total number and proportions of milk cells types according to total cell counts has been reported[13]. In this study, MECs were included in ‘other cells’ and the proportion range between 10 and 18% of total MSCs for hand milking ewes with total somatic cells counts below 200,000 cells ml−1 (ref. 13).
Table 1

Characterization and identification of the samples included in the experimental design described here.

Subjects Source Sample Characteristics
Protocol 1 Protocol 2 Protocol 3 Data
Breed Day of lactation
Sheep_1Sheep milkA03086_D10AssafDay 10Milk somatic cells isolationRNA extractionRNA-seqGSM1936059
Sheep_2Sheep milkA03105_D10AssafDay 10Milk somatic cells isolationRNA extractionRNA-seqGSM1936060
Sheep_3Sheep milkA03927_D10AssafDay 10Milk somatic cells isolationRNA extractionRNA-seqGSM1936061
Sheep_4Sheep milkA74063_D10AssafDay 10Milk somatic cells isolationRNA extractionRNA-seqGSM1936062
Sheep_5Sheep milkC03141_D10ChurraDay 10Milk somatic cells isolationRNA extractionRNA-seqGSM1936063
Sheep_6Sheep milkC04860_D10ChurraDay 10Milk somatic cells isolationRNA extractionRNA-seqGSM1936064
Sheep_7Sheep milkC49537_D10ChurraDay 10Milk somatic cells isolationRNA extractionRNA-seqGSM1936065
Sheep_8Sheep milkC09539_D10ChurraDay 10Milk somatic cells isolationRNA extractionRNA-seqGSM1936066
Sheep_1Sheep milkA03086_D50AssafDay 50Milk somatic cells isolationRNA extractionRNA-seqGSM1936067
Sheep_2Sheep milkA03105_D50AssafDay 50Milk somatic cells isolationRNA extractionRNA-seqGSM1936068
Sheep_3Sheep milkA03927_D50AssafDay 50Milk somatic cells isolationRNA extractionRNA-seqGSM1936069
Sheep_4Sheep milkA74063_D50AssafDay 50Milk somatic cells isolationRNA extractionRNA-seqGSM1936070
Sheep_5Sheep milkC03141_D50ChurraDay 50Milk somatic cells isolationRNA extractionRNA-seqGSM1936071
Sheep_6Sheep milkC04860_D50ChurraDay 50Milk somatic cells isolationRNA extractionRNA-seqGSM1936072
Sheep_7Sheep milkC49537_D50ChurraDay 50Milk somatic cells isolationRNA extractionRNA-seqGSM1936074
Sheep_8Sheep milkC09539_D50ChurraDay 50Milk somatic cells isolationRNA extractionRNA-seqGSM1936073
Sheep_2Sheep milkA03105_D120AssafDay 120Milk somatic cells isolationRNA extractionRNA-seqGSM1936075
Sheep_3Sheep milkA03927_D120AssafDay 120Milk somatic cells isolationRNA extractionRNA-seqGSM1936076
Sheep_4Sheep milkA74063_D120AssafDay 120Milk somatic cells isolationRNA extractionRNA-seqGSM1936077
Sheep_5Sheep milkC03141_D120ChurraDay 120Milk somatic cells isolationRNA extractionRNA-seqGSM1936078
Sheep_7Sheep milkC49537_D120ChurraDay 120Milk somatic cells isolationRNA extractionRNA-seqGSM1936080
Sheep_8Sheep milkC09539_D120ChurraDay 120Milk somatic cells isolationRNA extractionRNA-seqGSM1936079
Sheep_1Sheep milkA03086_D150AssafDay 150Milk somatic cells isolationRNA extractionRNA-seqGSM1936081
Sheep_2Sheep milkA03105_D150AssafDay 150Milk somatic cells isolationRNA extractionRNA-seqGSM1936082
Sheep_3Sheep milkA03927_D150AssafDay 150Milk somatic cells isolationRNA extractionRNA-seqGSM1936083
Sheep_4Sheep milkA74063_D150AssafDay 150Milk somatic cells isolationRNA extractionRNA-seqGSM1936084
Sheep_5Sheep milkC03141_D150ChurraDay 150Milk somatic cells isolationRNA extractionRNA-seqGSM1936085
Sheep_6Sheep milkC04860_D150ChurraDay 150Milk somatic cells isolationRNA extractionRNA-seqGSM1936086
Sheep_7Sheep milkC49537_D150ChurraDay 150Milk somatic cells isolationRNA extractionRNA-seqGSM1936088
Sheep_8Sheep milkC09539_D150ChurraDay 150Milk somatic cells isolationRNA extractionRNA-seqGSM1936087
Figure 1

Overview of the study design.

Considering a generic sheep lactation curve (provided in the top of the figure), the sampling points of this experiment, represented with green arrows in the graphic, were established to cover the different physiological stages of the mammary gland across the lactation curve. Milk samples were collected on days 10 (D10), 50 (D50), 120 (D120) and 150 (D150) after lambing. A total of eight healthy Assaf and Churra sheep were selected to be included in the experiment. At each sampling time point, four Assaf and four Churra ewes were milked to obtain milk somatic cells as source of RNA. Based on the quality scores of the extracted RNA samples for each breed, a total of 30 samples were sequenced: samples from four animals for the time points D10, D50 and D150, whereas three biological replicates were sequenced for D120.

For our study, MSCs cells were selected as RNA source based on cattle studies that have shown MSCs as a representative source of the RNA expressed in the mammary gland tissue[14], showing, for the gene expression levels, high average correlations with mammary gland biopsy (r=0.95) and laser microdissected mammary epithelial cells (r=0.87)[14]. Moreover, MSCs provide a more accessible method compared with invasive approaches, such as mammary gland biopsies. This later point is of relevance when undertaking dynamic studies requiring several sampling time points for the same animal[15]. Regarding the potential variation of MSCs during the lactation cycle, advancing lactation has been associated to an increase of MSCs in milk[16]. This increment is firstly due to a concentration effect as a result of the reduction of milk yield that occurs after the lactation peak. In addition, rises in MSCs have generally been associated to an increase of polymorphonuclear cells[13,17]. However, it has been demonstrated that advancing lactation has also a stimulatory effect on MECs exfoliation process[12], thus, there is also an increase of MECs towards late lactation[12]. The RNA-Seq profiling dataset was generated on high-quality total RNA on an Illumina HiSeq 2000 platform (Table 2). This approach generated a total of 1,116 million paired-end reads from the transcriptome sequencing of the 30 milk samples. All samples had a suitable level of real quality, a high mapping rate (Table 2, Technical validation) and no contamination was found through the alignment against the Escherichia coli genome. The highly expression of genes codifying for major milk proteins in all the stages of lactation analysed supported that the gene expression profile of MSCs are representative from lactating mammary gland. To the best of our knowledge, this dataset (GE) represents the largest public RNA-Seq longitudinal dataset on sheep lactating mammary gland. In the related work published on Scientific Reports we performed an in depth analysis of these data, providing the first integrated overview on sheep milk gene expression across lactation[18]. The dataset reported in this data descriptor may be helpful for future studies examining the biology of sheep lactation.
Table 2

Sample quality and read statistics.

Sample Total RNA QC
Sequencer Read length (bp) Million read-pairs (or reads) Uniquely mapped reads (%)
ng RIN
A03086_D105.768.0illumina HiSeq 20002×7536.7484.5
A03105_D105.048.3illumina HiSeq 20002×7533.7986.7
A03927_D105.747.8illumina HiSeq 20002×7531.8986.1
A74063_D102.657.7illumina HiSeq 20002×7542.0387.4
C03141_D104.697.1illumina HiSeq 20002×7530.6987.8
C04860_D106.587.7illumina HiSeq 20002×7528.7084.9
C49537_D1014.318.5illumina HiSeq 20002×7536.9688.3
C09539_D104.768.0illumina HiSeq 20002×7523.9386.3
A03086_D502.557.5illumina HiSeq 20002×7541.4088.5
A03105_D5020.658.5illumina HiSeq 20002×7543.2988.8
A03927_D503.557.2illumina HiSeq 20002×7530.5984.9
A74063_D505.517.4illumina HiSeq 20002×7535.6989.4
C03141_D504.448.1illumina HiSeq 20002×7529.8886.9
C04860_D509.677.9illumina HiSeq 20002×7546.9590.1
C49537_D503.527.8illumina HiSeq 20002×7543.2588.0
C09539_D504.969illumina HiSeq 20002×7546.8190.3
A03105_D12010.868.2illumina HiSeq 20002×7542.4988.1
A03927_D1203.208.9illumina HiSeq 20002×7539.6887.8
A74063_D1202.748.5illumina HiSeq 20002×7536.8489.6
C03141_D1202.558.8illumina HiSeq 20002×7535.8689.6
C49537_D1204.418.9illumina HiSeq 20002×7534.3587.3
C09539_D1202.658.7illumina HiSeq 20002×7540.5789.5
A03086_D1504.958.7illumina HiSeq 20002×7538.1488.6
A03105_D1504.038.4illumina HiSeq 20002×7531.4687.5
A03927_D1503.038.6illumina HiSeq 20002×7541.5489.3
A74063_D1502.518.1illumina HiSeq 20002×7536.1989.2
C03141_D1505.268.6illumina HiSeq 20002×7532.0689.6
C04860_D1502.938illumina HiSeq 20002×7543.7787.3
C49537_D1504.448.7illumina HiSeq 20002×7544.0590.9
C09539_D1505.958.3illumina HiSeq 20002×7536.6289.9

Methods

Power calculation

The online tool Scotty (http://scotty.genetics.utah.edu/) was used in the design of the RNA-Seq experiment. This tool enables the calculation of the optimal sequencing depth and the number of replicates needed per condition to plan RNA-Seq experiments with adequate power to detect differential expression. The power calculations on Scotty (http://scotty.genetics.utah.edu/) require to upload a prototype dataset and to fix several experimental constraints for power optimization. As prototype dataset, we used our own pilot RNA-Seq data obtained from MSCs from four sheep per breed. To estimate the power based on our pilot dataset we set the following parameters: a cost per replicate of 50 US Dollars (USD), a cost per million reads aligned to genes of 150 USD, an alignment rate of the 85%, a maximum of 10 replicates per condition, a read depth between 10 and 40 millions of reads, a maximum cost of the experiment of 100,000 USD, a 50% of differential expressed genes detected with a fold change of 2 and a P-value of 0.01 and a minimum of 30% of genes with at least 50% of maximum power.

Animals and sampling

This description of the selected animals and the sampling method is extended from descriptions in the related research manuscript[18]. The trial was initiated with thirteen non-related sheep, eight Assaf and five Churra ewes. The animals belong to the commercial farm of the University of León (Spain). These sheep were kept in free stall housing, fed with the same rations and did not endure any water restriction. Animals were machine milked twice a day: at 8 a.m. and 5 p.m. For all these ewes, lambing took place between November 11th, 2012, and December 11th, 2012. All the selected ewes were between their fourth and sixth parities. During the course of the lactation, official monthly test-day records for milk yield, somatic cell count (SCC) and fat, protein and total solids contents were performed by the corresponding breeders´ association. According to the SCC records, animals with high level of SCC (> 250,000 SCC per milliliter[19]), which is associated with subclinical mastitis, were discarded from the experiment (three Assaf and one Churra ewes). Finally, a total of eight healthy sheep were selected to be included in the experiment, four Assaf and four Churra ewes. The lactation phenotypic values of the ewes selected for this study are shown in Table 3.
Table 3

Lactation phenotypic values of the ewes selected for the RNA-Seq analysis described here.

Subjects ID Breed Milk yield (Kg) Somatic Cell Counts Protein (%) Fat (%) Dry extract (%)
Sheep_1A03086Assaf402.142155.335.4314.99
Sheep_2A03105Assaf477.38664.435.8716.78
Sheep_3A03927Assaf213.321175.87.7416.39
Sheep_4A74063Assaf270.951784.795.9816.44
Sheep_5C03141Churra235.051966.137.1519.01
Sheep_6C04860Churra131.211006.37.6819.68
Sheep_7C49537Churra55.8585.795.9617.63
Sheep_8C09539Churra94.731405.767.5018.86
Trying to cover the evolution of the mammary gland transcriptome across lactation, milk samples were collected on days 10 (D10), 50 (D50), 120 (D120) and 150 (D150) after lambing. D10 is the first day of lactation considered to be totally free of colostrum; it is also the day considered as starting point in the normalized lactation for both breeds. D50 is a time point close to the lactation peak in both breeds, although Churra shows an earlier peak (range days 35–45 (ref. 15)) than Assaf sheep (range days 45–55 (ref. 16)). The D120 and D150 sampling points correspond to the end of the normalized lactation in Churra and Assaf, respectively. Hence, whereas for Churra D120 is close to the final lactation point, for Assaf this time point corresponds to a transition stage from the lactation peak to the final lactation point (D150). For each sampled animal and lactation point, at least four milk samples of 50 ml were collected; two of them were obtained on the exact sampling day whereas two additional samples were collected the previous or the following day to ensure RNA source material for each desired sampling. With the aim of maximizing the number of somatic cells present in milk, the sample collection was performed one hour after the 8 a.m. routine milking and ten minutes after the injection of 5 IU of Oxitocine Facilpart (Syva, León, Spain). The time of milk sample collection was chosen based on previous studies that indicate that one hour after milking is the diurnal time point with the highest concentration of MSCs[20]. Oxytocin was just administrated on sampling days to avoid any effect on milk composition and with the aim of stimulating the mechanical effect of myoepithelial contraction and thus the flattering of the alveolar lumen that causes the release of the residual post-milking milk which has a higher concentration of exfoliated MECs[21]. All protocols involving animals were approved by the Animal Welfare Committee of the University of Leon, Spain, following proceedings described in Spanish and EU legislations (Law 32/2007, R.D. 1201/2005, and Council Directive 2010/63/EU). The animals used in this study were handled in strict accordance with good clinical practices and all efforts were made to minimize suffering. To ensure RNA purification of high yield and quality, we used the following protocol during the sampling process. Before sampling, the collection milk containers were cleaned with RNaseZap (Ambion, Austin, TX, USA) and autoclaved. In the farm, udder cleaning was performed with special care: first, the udders were cleaned with water and soap; then, they were disinfected with povidone iodine; and finally the nipples were cleaned with RNAseZap (Ambion, Austin, TX, USA). Milk samples were collected from both mammary glands. A sterile gauze was used to cover the collection container during milk collection to minimize the risk of sample contamination. After collection the milk was transferred to 50 ml RNAse-free tubes. Samples were maintained at 4 °C during their transport from the farm to the laboratory where they were immediately processed.

RNA extraction

This description of RNA extraction is extended from the protocol described in the related research manuscript[18]. Samples of approximately 50 ml of milk were used for the RNA extraction. The pellet of MSCs was obtained as described by Wickramasinghe et al.[3] with some modifications. The cells were pelleted by centrifugation, at 540×g for 10 minutes at 4 °C, and in the presence of a final concentration of 0.5 mM of EDTA to eliminate casein and fat globules. After centrifugation, the supernatant was discarded. During this step, a fatty layer frequently appeared on the top of the tube. To remove it, a sterile pipette tip was introduced to separate this fatty layer from the tube walls. Then, the cell pellet was washed in 10 ml of PBS (pH 7.2) with 0.5 mM EDTA and centrifuged at 540×g in 15 ml RNAse free sterile tubes for 10 min at 4 °C. The last step was repeated until the fatty layer was minimized (usually twice). Once the pellet was clean, it was resuspended in 500 μl of Trizol (Invitrogen, Carlsbad, CA, USA) and homogenized by vortexing. Immediately after that, the following steps were performed: first, the homogenized sample was incubated for 15 min at room temperature to permit the complete dissociation of the nucleoprotein complex. After incubation, 100 μl of chloroform were added. Then, the sample was shaken vigorously by hand for 15 s, incubated 15 min at room temperature and centrifuged at 12,000×g for 15 min at 4 °C. After centrifugation, the upper aqueous phase of the sample was taken and placed in a new tube where 250 μ of isopropanol were added. The sample was then incubated for ten minutes at room temperature and centrifuged at 12 000×g for 15 min at 4 °C. After centrifugation, the supernatant was removed from the tube, leaving only the RNA pellet. The RNA pellet was washed with 0.5 ml of ethanol. Then, the sample was vortexed briefly and the tube was centrifuged at 7,500×g for 5 min at 4 °C. After the ethanol was discarded, the sample was dried for seven minutes at room temperature. To elute the sample 150 μl of DEPC water with DNAse (0.2 μl in 100 μl) was added and then, it was incubated for 10 min at 55 °C. Once diluted, the sample was stored at −80 °C.

RNA sequencing

This description on RNA sequencing is extended from the description presented in the related research manuscript[18]. The Agilent 2100 Bioanalyzer device (Agilent Technologies, Santa Clara, CA, USA) was used to assess the integrity of the RNA. Based on the quality scores of the extracted RNA samples a total of 30 RNA samples were sequenced. For each breed, samples from four animals were sequenced for time points D10, D50 and D150, whereas three biological replicates were sequenced for D120. The RNA integrity value (RIN) of the samples selected to be sequenced ranged between 7.1 and 9 (Table 2). Paired-end libraries with fragments of 300 bp were prepared using the True-Seq RNA-Seq sample preparation Kit v2 (Illumina, San Diego, CA, USA). The fragments were sequenced on an Illumina Hi-Seq 2000 sequencer (Fasteris SA, Plan-les-Ouates, Switzerland), according to the manufacturer’s instructions at CNAG (Centro Nacional de Análisis Genómico, Barcelona, Spain). For each library, between 35–45 million paired-end 75 bp reads were generated during the sequencing run (Table 2). The Fastq files generated were deposited in the Gene Expression Omnibus (GEO) database under the accession number GSE74825.

RNA-Seq data analysis

The read quality of the RNA-seq libraries was evaluated using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads were mapped against the ovine genome assembly v.3.1. (Oar_v3.1) using the STAR aligner (v.2.3.1y)[22]. The data was also tested for contamination on the Escherichia coli genome using BWA[23]. Cuffquant and Cuffnorm packages from Cufflinks[24] were used to compare gene expression levels within the same sample. Gene abundances were normalized by library and gene length by calculating Fragments Per Kilobase Of Exon Per Million Fragments Mapped (FPKM) using the Ensembl annotated genes (Oar_v3.1) as a reference. The Cufflinks and Cuffmerge tools from the Cufflinks package[24] were used to create a ‘transcripts.gtf’ file to be used as reference in our assembly. The aim of the assembly was producing a new annotation reference including novel genes and transcripts to be used in the downstream differential expression analyses. The Cufflinks option ‘−g’ followed by the available gtf file from the Oar_v3.1 reference sequence was used to guide the assembly but without excluding new genes. Cuffmerge was used to filter genes with low or no expression from our reference gtf file. To compare the expression levels of genes across samples, raw counts for the genes and transcripts were obtained using SigCufflinks (available at http://www.sigenae.org) using de ‘-G’ option of SigCufflinks to guide the alignment but excluding new genes. SigCufflinks is a modified version of the cufflinks code that provides raw read counts per gene and transcript, by using the sorted bam file from the alignment and the reference gtf file created in the assembly. The output file form Sigcufflinks containing raw counts per gene was deposited in the Gene Expression Omnibus (GEO) database under the accession number GSE74825. Downstream differential expression analyses were performed with edgeR[25] and DESeq2[26] R packages, as indicated in the related research manuscript[18].

Data Records

The raw fastq files for the RNA-seq libraries were deposited at the Gene Expression Omnibus (GEO) database under the accession number GSE74825 (Data Citation 1). The processing of all fastq samples is summarized in Tables 1 and 2. The output file from the quantification of transcripts by Sigcufflinks is also deposited in the Gene Expression Omnibus (GEO) under the same accession number GSE74825. It contains all the genes identified in the assembly and the raw counts per gene for each sample.

Technical Validation

Power calculations

The results for the power estimates achieved in each experiment configuration tested with Scotty (http://scotty.genetics.utah.edu/) are described in Supplementary File 1 and summarized in Fig. 2. The least expensive experiment that has enough power to perform a differential expression analysis according to the settings fixed was sequencing six replicates to a depth of 10 million aligned reads per replicate. The most powerful experiment that matches our criteria was sequencing 10 replicates to a depth of 26.67 million reads to genes. According to these results, the animals available to perform the experiment and the nature of the lactating mammary gland transcriptome (mostly enriched in transcripts codifying for major milk proteins), we finally decided to sequenced the MSCs RNA samples from eight replicates (four Churra and four Assaf) at each of the lactation time-points selected for the study (with the exception of D120 for which only 6 replicates were sequenced) to an average depth of 35 million reads.
Figure 2

Chart showing the power achieved in each of the experimental configurations tested by Scotty (http://scotty.genetics.utah.edu/).

Millions of reads aligned to genes per replicate are represented in the X-axis, whereas the number of replicates in each condition is represented in the Y-axis. The percentage of genes detected with a 2x Fold Change are coloured as indicated in the bar placed at the right of the grid. Experiments that do not conform the fixed constraints are filled as indicated in the legend.

Quality control of RNA

Total RNA integrity was assessed by the RNA Integrity Number (RIN) algorithm calculated by the Agilent Bioanalyzer software. The Agilent Bioanalyzer RIN scores are listed in Table 2. All the total RNA samples used for this RNA-seq study had a RIN score above 7 showing the high integrity of the samples used.

Quality validation and analysis of RNA-seq data

A total of 30 RNA libraries were sequenced to a depth between 23–46 million paired-end reads among which about 88.10% of the reads mapped to unique locations in the ovine genome assembly (Oar_v3.1) (Table 2). No contamination was found in the alignment against the Escherichia coli genome. In order to validate the quality of the RNA-seq libraries as representative from lactating mammary gland, we evaluated the profile of the highly expressed genes identified for our samples. As expected, the genes with the highest FPKM values for both sheep breeds and at the four studied lactation time points are CSN2 (β-casein), CSN3 (κ-casein), ENSOARG00000005099 (LGB, β-lactoglobulin), CSN1S2 (casein-α-S2), CSN1S1 (α-S1-casein) and LALBA (α-lactalbumin) (Fig. 3), accumulating at approximately the 65% of the total gene FPKM reads at each of the analysed time points. These highly expressed genes encode four caseins and two whey proteins, principal components of milk, which encompass the 5.5% of total milk composition in sheep. Thus, although it has been remarked that MECs are a minor proportion of total MSCs in sheep, the highly expression of genes codifying for major milk proteins in all the stages of lactation demonstrated that the MSCs transcriptome is principally dominated for the expression of MECs, probably due to the high transcription activity of these cells during lactation.
Figure 3

Bar graph for the six highly expressed genes identified across lactation in the milk samples of the studied two sheep breeds (Churra and Assaf).

FPKM values are represented in the X-axis, whereas the gene names are indicated in the Y-axis. A colour code is used to represent the four time points studied: day 10 (D10), 50 (D50), 120 (D120) and 150 (D150). (a) Highly expressed genes in Assaf. (b) Highly expressed genes in Churra.

The principal aim of this study was the dynamic analysis of the sheep mammary gland transcriptome through MSCs. For the analysis we selected samples from two sheep breeds, Assaf and Churra. Both are dairy breeds differing on milk production traits, mainly in terms of milk yield and milk composition (explained in Background & Summary). However, it is necessary to clarify that this experimental design does not involve the analysis of extreme phenotypes and therefore completely differs from a case-control study. This would explain the high correlation observed between all the samples analysed (r >0.86). By plotting a heatmap using hierarchical clustering with the genes found as differentially expressed in common with the edgeR[25] (FDR<0.05) and DESeq2 (ref. 26) (padj-value < 0.05) packages between all the time points analysed and between both breeds (Fig. 4), it can be observed that the samples are mainly clustered in two major groups, one corresponding to the D10 and D50 time points (related to the initial stages of lactation for both breeds) and the other corresponding to D120 and D150 time points (associated with the late stages of lactation). These observations confirm that the considered set of samples is highly representative from initial and final stages of lactation in sheep, although some differences have also been found between breeds (see the related research manuscript[18]). As normal samples, with no evidence of disease or particular phenotype, these samples would be a useful complement for other studies focused on the analysis of the sheep mammary gland transcriptome through RNA-Seq.
Figure 4

Heatmap and hierarchical clustering of the differentially expressed genes (DEGs) across lactation and between the Assaf and Churra sheep breeds.

Heatmap display of supervised hierarchical clustering of the DEGs identified across the four considered time points of the sheep lactation (D10, D50, D120, D150) and between both breeds. The genes are displayed in rows and the normalized counts per sample are displayed in columns. Each column represents a sample; sample names indicate the corresponding breed (A=Assaf; C=Churra) and day of sampling (D10, D50, D120, D150). A colour code indicates up-regulated (orange) and down-regulated (blue) expression levels.

Usage Notes

The RNA-Seq fastq files could be aligned using publicly splice-aware software solutions like TopHat2 (ref. 27) or STAR[22]. As reference genome we have used the ovine genome assembly (Oar_v3.1) downloaded from Ensembl database (http://www.ensembl.org/Ovis_aries/Info/Index). Cufflinks package[24] could be used to perform the assembly, quantification and differential expression analysis but also other publicly software combinations could be used for quantification and differential expression analysis: e.g. SigCufflinks (available at http://www.sigenae.org) or HTSeq[28] for quantification, combined with edgeR[25] or DESeq2 (ref. 26) for the differential expression analyses. Based on power estimations (Fig. 2) we recommend to use at least 5 replicates per condition to perform differential expression analysis. Functional analysis of the RNA-Seq differential expressed genes could be performed with several software solutions such as Babelomics[29], WebGestalt[30] or QIAGEN’s Ingenuity Pathway Analysis (IPA, QIAGEN Redwood City, www.qiagen.com/ingenuity)

Additional Information

How to cite: Suárez-Vega, A. et al. Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome. Sci. Data 3:160051 doi: 10.1038/sdata.2016.51 (2016).
  24 in total

1.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors:  John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal:  Genome Res       Date:  2008-06-11       Impact factor: 9.043

2.  STAR: ultrafast universal RNA-seq aligner.

Authors:  Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal:  Bioinformatics       Date:  2012-10-25       Impact factor: 6.937

3.  Transcriptional profiling of bovine milk using RNA sequencing.

Authors:  Saumya Wickramasinghe; Gonzalo Rincon; Alma Islas-Trejo; Juan F Medrano
Journal:  BMC Genomics       Date:  2012-01-25       Impact factor: 3.969

4.  Babelomics 5.0: functional interpretation for new generations of genomic data.

Authors:  Roberto Alonso; Francisco Salavert; Francisco Garcia-Garcia; Jose Carbonell-Caballero; Marta Bleda; Luz Garcia-Alonso; Alba Sanchis-Juan; Daniel Perez-Gil; Pablo Marin-Garcia; Ruben Sanchez; Cankut Cubuk; Marta R Hidalgo; Alicia Amadoz; Rosa D Hernansaiz-Ballesteros; Alejandro Alemán; Joaquin Tarraga; David Montaner; Ignacio Medina; Joaquin Dopazo
Journal:  Nucleic Acids Res       Date:  2015-04-20       Impact factor: 16.971

5.  HTSeq--a Python framework to work with high-throughput sequencing data.

Authors:  Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal:  Bioinformatics       Date:  2014-09-25       Impact factor: 6.937

6.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

7.  WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013.

Authors:  Jing Wang; Dexter Duncan; Zhiao Shi; Bing Zhang
Journal:  Nucleic Acids Res       Date:  2013-05-23       Impact factor: 16.971

8.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.

Authors:  Daehwan Kim; Geo Pertea; Cole Trapnell; Harold Pimentel; Ryan Kelley; Steven L Salzberg
Journal:  Genome Biol       Date:  2013-04-25       Impact factor: 13.583

Review 9.  Role of somatic cells on dairy processes and products: a review.

Authors:  N Li; R Richoux; M Boutinaud; P Martin; V Gagnaire
Journal:  Dairy Sci Technol       Date:  2014-07-17

10.  Comparison of five different RNA sources to examine the lactating bovine mammary gland transcriptome using RNA-Sequencing.

Authors:  Angela Cánovas; Gonzalo Rincón; Claudia Bevilacqua; Alma Islas-Trejo; Pauline Brenaut; Russell C Hovey; Marion Boutinaud; Caroline Morgenthaler; Monica K VanKlompenberg; Patrice Martin; Juan F Medrano
Journal:  Sci Rep       Date:  2014-07-08       Impact factor: 4.379

View more
  20 in total

1.  Milk somatic cell derived transcriptome analysis identifies regulatory genes and pathways during lactation in Indian Sahiwal cattle (Bos indicus).

Authors:  Sonika Ahlawat; Ramesh Kumar Vijh; Anju Sharma; Upasna Sharma; Yashila Girdhar; Mandeep Kaur; Pooja Chhabra; Ashish Kumar; Reena Arora
Journal:  Mol Biol Rep       Date:  2020-09-03       Impact factor: 2.316

2.  In vivo response of xanthosine on mammary gene expression of lactating Beetal goat.

Authors:  Ratan K Choudhary; Shanti Choudhary; Ramneek Verma
Journal:  Mol Biol Rep       Date:  2018-05-26       Impact factor: 2.316

3.  Analysis of differential gene expression of the transgenic pig with overexpression of PGC1α in muscle.

Authors:  Hao Gu; Jianan Li; Fei Ying; Bo Zuo; Zaiyan Xu
Journal:  Mol Biol Rep       Date:  2019-04-12       Impact factor: 2.316

4.  Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.

Authors:  Nadia M Davidson; Alicia Oshlack
Journal:  Gigascience       Date:  2018-05-01       Impact factor: 6.524

5.  Transcriptome Analysis of Three Sheep Intestinal Regions reveals Key Pathways and Hub Regulatory Genes of Large Intestinal Lipid Metabolism.

Authors:  Tianle Chao; Guizhi Wang; Zhibin Ji; Zhaohua Liu; Lei Hou; Jin Wang; Jianmin Wang
Journal:  Sci Rep       Date:  2017-07-13       Impact factor: 4.379

6.  A high resolution atlas of gene expression in the domestic sheep (Ovis aries).

Authors:  Emily L Clark; Stephen J Bush; Mary E B McCulloch; Iseabail L Farquhar; Rachel Young; Lucas Lefevre; Clare Pridans; Hiu G Tsang; Chunlei Wu; Cyrus Afrasiabi; Mick Watson; C Bruce Whitelaw; Tom C Freeman; Kim M Summers; Alan L Archibald; David A Hume
Journal:  PLoS Genet       Date:  2017-09-15       Impact factor: 5.917

7.  The genomic architecture of mastitis resistance in dairy sheep.

Authors:  G Banos; G Bramis; S J Bush; E L Clark; M E B McCulloch; J Smith; G Schulze; G Arsenos; D A Hume; A Psifidi
Journal:  BMC Genomics       Date:  2017-08-16       Impact factor: 3.969

8.  Variant discovery in the sheep milk transcriptome using RNA sequencing.

Authors:  Aroa Suárez-Vega; Beatriz Gutiérrez-Gil; Christophe Klopp; Gwenola Tosser-Klopp; Juan José Arranz
Journal:  BMC Genomics       Date:  2017-02-15       Impact factor: 3.969

9.  Elucidating fish oil-induced milk fat depression in dairy sheep: Milk somatic cell transcriptome analysis.

Authors:  Aroa Suárez-Vega; Pablo G Toral; Beatriz Gutiérrez-Gil; Gonzalo Hervás; Juan José Arranz; Pilar Frutos
Journal:  Sci Rep       Date:  2017-04-05       Impact factor: 4.379

10.  Transcriptional profiling of swine mammary gland during the transition from colostrogenesis to lactogenesis using RNA sequencing.

Authors:  V Palombo; J J Loor; M D'Andrea; M Vailati-Riboni; K Shahzad; U Krogh; P K Theil
Journal:  BMC Genomics       Date:  2018-05-03       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.