| Literature DB >> 29041902 |
Xiaolong Cao1,2, Haobo Jiang3.
Abstract
BACKGROUND: Manduca sexta is a large lepidopteran insect widely used as a model to study biochemistry of insect physiological processes. As a part of its genome project, over 50 cDNA libraries have been analyzed to profile gene expression in different tissues and life stages. While the RNA-seq data were used to study genes related to cuticle structure, chitin metabolism and immunity, a vast amount of the information has not yet been mined for understanding the basic molecular biology of this model insect. In fact, the basic features of these data, such as composition of the RNA-seq reads and lists of library-correlated genes, are unclear. From an extended view of all insects, clear-cut tempospatial expression data are rarely seen in the largest group of animals including Drosophila and mosquitoes, mainly due to their small sizes.Entities:
Keywords: Insect genome; Tobacco hornworm; Transcriptome
Mesh:
Substances:
Year: 2017 PMID: 29041902 PMCID: PMC5645894 DOI: 10.1186/s12864-017-4147-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1M. sexta life cycle and the 67 Illumina RNA-seq datasets. Bars in the circle represent different life stages of M. sexta, which are proportional to time periods of the insect raised with artificial diet as previously described [1]. Color-coded library identifications (1–67) are placed outside the circle at the corresponding developmental stage. The first part of the library names (on the right) indicates that the libraries are made from head (), fat body (), whole body (), midgut (), Malpighian tubule (), muscle (), testis (), ovary (), and antenna (). The second part indicates major stages of the insect, i.e. embryo (E), 1st to 5th instar larvae (L1 − L5), pupae (P), and adults (A). In the third part, “D” stands for day, “h” for hour, “preW” for pre-wandering, “W” for wandering, “M” for male, and “F” for female. “S” in the last part of library names indicates single-end sequencing; no “S” in the end indicates paired-end sequencing. The cDNA libraries represent the following tissues and stages: head (H) [1. 2nd (instar) L (larvae), D1 (day 1); 2. 3rd L, D1; 3. 4th L, 12 h (hour); 4. 4th L, late; 5. 5th L, D0.5; 6. 5th L, D2; 7. 5th L, preW (pre-wandering); 8. P (pupae), late; 9. A (adults), D1; 10. A, D3; 11. A, D7], fat body () [12. 4th L, late; 13. 5th L, D1; 14. 5th L, preW; 15. 5th L, W (wandering); 16. P, D1–3; 17. P, D15–18; 18. A, D1–3; 19. A, D7–9], whole body (W) [20. E (embryo), 3 h; 21. E, late; 22. 1st L; 23. 2nd L; 24. 3rd L], midgut () (25. 2nd L; 26. 3rd L; 27. 4th L, 0 h; 28. 4th L, 12 h; 29. 4th L, late; 30. 5th L, 1–3 h; 31. 5th L, 24 h; 32. 5th L, preW; 33–34. 5th L, W; 35. P, D1; 36. P, D15–18; 37. A, D3–5), Malpighian tubule () (38. 5th L, preW; 39. A, D1; 40. A, D3), muscle () (41. 4th L, late; 42–43. 5th L, 12 h; 44–45. 5th L, preW; 46–47. 5th L, W), testis () (48. P, D3; 49. P, D15–18; 50. A, D1–3), ovary () (51. P, D15–18; 52. A, D1), head () [53–56. A, D1, F (female); 57–60, A, D1, M (male)], antenna () (61–63, 5th L; 64–66, A, F; 67, A, M)
Fig. 3Features of gene transcription revealed by alignment of reads in the cDNA libraries. a Relationship between aligned bases (x-axis) and percentages (y-axis) of the genome overlaid with reads using TopHat. Each colored symbol represents one library, with their library IDs labeled (Fig. 1). Squares for paired-end libraries; circles for single-end ones. The dashed lines are linear regression of the data from the paired- and single-end libraries. b Box-plot of percentages of the mapped genome in library categories P (paired-end, 33), S (single-end, 19), H (head, single-end, 8) and A (antenna, single-end, 7) (Fig. 2). c Heatmap of z-scores in each group of base range. BPKM values were used for sorting into 19 groups. Group 1 has the highest BPKM values 1–400; Groups 2 to 19 correspond to BPKMs ranked 401–800, 801–1600, 1601–3200, … 400 × 2n + 1 to 400 × 2n + 1, where n equals 0 to 17. The heatmap is colored based on the z-score of average BPKM in each group. Libraries with black and cyan IDs were determined by paired- and single-end sequencing, respectively. d Percentage of aligned bases for each BPKM group in the total aligned bases for a specific library. The library names and their color codes are described in Fig. 1
Fig. 5Pairwise comparison of the 67 cDNA libraries and number of library-correlated genes. a Mapping scores of library pairs. Value in a cell represents log2(mapping score). If higher than 4 (i.e. mapping score > 16), two libraries were closely similar or related. b Number of the correlated genes in each library, with grey bars indicating those with FPKM value >100
Fig. 7Library-specific expression of different genes in OGS2.0. Z-scores for highly expressed genes were calculated from FPKM values. Genes were clustered based on z-scores and divided to different groups manually based on the expression pattern. Significantly enriched GO terms (p < 0.05) for different clusters were labeled on the right, with GO numbers in red, green and blue represent Biological process, Cellular component and Molecular function, respectively
Fig. 2Overview of the 67 cDNA libraries. a Total read numbers in the libraries. As defined in Fig. 1, bar colors represent the tissue sources of libraries 1–67. Black and cyan IDs indicate the libraries were determined by paired- and single-end sequencing, respectively. b Up-boundaries represent percentages of the total reads after trimming (green) and mapping by STAR (yellow) and TopHat (blue), with the total reads (grey) in each library set at 100%. The library names and their color codes are the same as in Fig. 1. c and d. Box-plots of survived read numbers and percentages after trimming in categories P (paired-end, 33 of the first 52 libraries), S (single-end, 19 of libraries 1–52), H (head, single-end, 53–60), and A (antenna, single-end, 61–67). e Percentages of trimming-survived reads mapped to the genome using STAR and TopHat in the four categories. f Percentages of TopHat-mapped reads corresponding to mitochondrial (blue), protein-coding (white), noncoding (green), and rRNA (red) genes. g and h Box-plots of percentages of trimming-survived reads mapped to mitochondrial and rRNA genes in categories P, S, H and A. The first 52 libraries were sequenced as a part of the genome project [28], the next 8 were for detecting sex-biased genes expression in brain [14], and the last 7 were used to study chemosensory receptor gene expression [27]
Fig. 4Features of the unmapped reads with BLASTN hits in the 67 libraries. a Relationship between ratios of STAR-unmapped reads (x-axis) and percentages of the total unmapped reads with BLASTN match (y-axis) for all the RNA-seq libraries. Each colored symbol represents one library, with their library IDs labeled (Fig. 1). Squares for paired-end libraries; circles for single-end ones. b Distribution (left y-axis) of unmapped reads with hits in the 7 categories in different colors. Black line shows the total number of unmapped reads (right y-axis) in a library. c Box-plot of percentages of rRNA reads in total unmapped reads with BLASTN hits in library categories P, S, H and A (Fig. 2). The library IDs, names, and color codes are same as in Fig. 1
Fig. 6Expression profiles of 69 highly expressed genes in the 67 cDNA libraries. A non-redundant collection of the three genes with highest expression in each library are on the right. Their expression patterns are organized according to the results of cluster analysis (left). Their log2(FPKM + 1) values, representing mRNA levels, are shown in a rainbow color gradient in the heatmap. Library names (top), IDs (bottom), and color codes are described in Fig. 1