| Literature DB >> 31223636 |
Scott T Schumacker1, Chloe A M Chidester2, Raymond A Enke1,3, Matthew R Marcello2.
Abstract
Transcriptome analysis using next generation sequencing (NGS) technology provides the capability to understand global changes in gene expression throughout a range of tissue samples. The nematode Caenorhabditis elegans (C. elegans) is a well-established genetic system used for analyzing a number of biological processes. C. elegans are a bacteria-eating soil nematode, and changes in bacterial diet have been shown to cause a number of physiological and molecular changes. Here we used Illumina RNA sequencing (RNA-seq) analysis to characterize the mRNA transcriptome of mixed C. elegans populations fed differing strains of bacteria to further understand dietary changes at the molecular level. Raw FASTQ files for the RNA-seq libraries are deposited in the NCBI Sequence Read Archive (SRA) and have been assigned BioProject accession PRJNA412551.Entities:
Year: 2019 PMID: 31223636 PMCID: PMC6565610 DOI: 10.1016/j.dib.2019.104006
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Bioinformatics pipeline, assessment of read mapping and sample variance. (a) Flowchart overview of the RNA-seq experiment. (b) Per sample summary of Kallisto pseudo-alignment of RNA-seq reads to C. elegans WBcel235 reference transcriptome. Number of reads are plotted on the x-axis is in millions (M). Additional details about the alignment are listed in Table 1. (c) Principal Component Analysis (PCA) Biplot of experimental sample variance. (d) Heat map analysis of experimental samples variance. [Key: Jensen Shannon Divergence (jsd) = similarity between samples; 0 = identical (blue); 1 = no overlap (white)].
RNA-seq read statistics.
| Sample name | Sequencer | Read length (bp) | psuedoaligned reads (in millions) | Uniquely mapped reads (%) |
|---|---|---|---|---|
| OP50 1 | Illumina NextSeq 500 | 2 × 75 | 7.6 | 96.8 |
| OP50 2 | Illumina NextSeq 500 | 2 × 75 | 7.7 | 97.2 |
| HB101 1 | Illumina NextSeq 500 | 2 × 75 | 8.7 | 97.7 |
| HB101 2 | Illumina NextSeq 500 | 2 × 75 | 7.3 | 97.6 |
Fig. 2FastQC and MultiQC quality assessment of unfiltered FASTQ data. MultiQC summary plot of FastQC analysis demonstrate the RNA-seq read distribution of average per base (a) and per sequence (b) quality scores for each experimental sample file. (c) MultiQC summary plot of Trimmomatic filtering results (see Code Availability 1–3 for details of FastQC, Trimmomatic, and MultiQC software respectively).
Specifications Table
| Subject area | |
|---|---|
| More specific subject area | |
| Type of data | |
| How data was acquired | |
| Data format | |
| Experimental factors | |
| Experimental features | |
| Data source location | New York, United States, Pace University; Cold Spring Harbor, New York, Cold Spring Harbor Laboratory |
| Data accessibility | The nucleotide sequences of raw reads were submitted to NCBI's Sequence Read Archive through the BioProject PRJNA412551 ( |
| Related research article | MacNeil, L. T., Watson, E., Arda, H. E., Zhu, L. J. & Walhout, A. J. M. Diet-induced developmental acceleration independent of TOR and insulin in |
These datasets will be valuable to the These transcriptome datasets may be used to identify differentially expressed genes after dietary changes in This bioinformatics analysis pipeline exclusively using open access tools to ensure sequence quality and robust eukaryotic transcriptome analysis. This bioinformatics alignment-free pipeline reduces the time of analysis as well as required computing power which may be beneficial for some users, particularly in an undergraduate course setting. |