| Literature DB >> 33785021 |
Qing Liu1,2,3, Feng Jiang1,4, Jie Zhang1, Xiao Li5, Le Kang6,7,8.
Abstract
BACKGROUND: Core promoters have a substantial influence on various steps of transcription, including initiation, elongation, termination, polyadenylation, and finally, translation. The characterization of core promoters is crucial for exploring the regulatory code of transcription initiation. However, the current understanding of insect core promoters is focused on those of Diptera (especially Drosophila) species with small genome sizes.Entities:
Keywords: Core promoter; Genome size; Insects; Transcription initiation; Transcriptional start sites
Year: 2021 PMID: 33785021 PMCID: PMC8011201 DOI: 10.1186/s12915-021-01004-5
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Fig. 1Characteristics of TSSs and TSCs in locusts. a Distribution of OTSSs in different genomic regions. b Metaprofile of TSCs across the gene bodies of protein-coding genes in the official gene set of locusts. c Consensus 25-bp sequences surrounding the dominant TSSs in different genomic regions. The symbol height within the stack indicates the relative frequency of each nucleic acid at that position. The frequency of each nucleotide for each position was represented using the R package Seqlogo. d Sunburst charts summarizing the identified TSCs that are derived from TEs
Fig. 2Characterization of core promoters in locusts. a Patterns of the GC content in the 2-kb flanking region of transcription start sites. The deviation of the GC content in sliding windows was determined using the GC content normalized to the mean GC content. b CpG occurrence in the 2-kb flanking region of transcription start sites. Normalized CpG contents (CpG observed/expected, CpG oe) and GC contents were computed in a 50-bp sliding window across 4-kb regions centered on the dominant TSSs. c De novo motif discovery in the 20 to 40 bp region upstream of the dominant OTSSs of core promoters. d TATA-box signals in the upstream regions of PyPu dinucleotides in the core promoters with different promoter shapes. e Density distribution of the PSS values of promoters with ubiquitous and restricted TSC expression. f Shannon index of OTSS diversity in genic core promoters. g Shannon index of OTSS diversity in genic core promoters in the down-sampled data. h Shannon diversity index of OTSSs in the core promoters with different promoter shapes. i Density of SNPs flanking the transcription start sites. The TSCs flanked by repetitive elements were not included in this comparison. The red asterisk indicates P < 2.2e−16 according to the Wilcoxon rank-sum test
Fig. 3Distant transcription initiation in locusts and fruit flies. a The density distribution of distances from the annotated start codon of protein-coding genes to its farthest upstream core promoters. b Boxplot showing the length difference of the mRNA leaders (5′-UTRs) with different intron numbers. c Boxplot showing the exon length difference in the mRNA leaders with different intron numbers. The red asterisk indicates P < 2.2e−16 according to the Wilcoxon rank-sum test. N.S., not significant. d Standard deviation of exon lengths in the mRNA leaders with different intron numbers. The red asterisk indicates P < 2.2e−16 according to the Wilcoxon rank-sum test. N.S., not significant
Fig. 4Comparison of adjacent and distant core promoter genes between locusts and fruit flies. a Scatter plot of the TSC expression quantiles and the distances from the annotated start codons of protein-coding genes to the core promoters upstream. The density of points is shown using the smooth Scatter kernel-based density function in R. b Correlation between the distances from the annotated start codons to the upstream core promoters and TSC expression (TPM). c Density distribution of the PSS values of adjacent and distant core promoters. d Relationships between genic TSC expression and the shape dynamics of core promoters. All of the core promoters were sorted by the TPM of each genic TSC and were assigned to expression quantiles for each species. For all core promoters, we used a 200-core-promoter window size with a moving step size of 40 core promoters. The data represent the mean PSS values and mean TSC expression, which are normalized on the basis of the maximum TPM value of each category. The smooth lines were plotted with stat_smooth within the R environment using the ggplot2 package
Fig. 5Distant core promoter emergence in the context of genome size evolution in insects. a Density distribution of distances from the annotated start codon of protein-coding genes to its farthest core promoters upstream. b Correlation between the genome size and the number ratio of distant core promoters to adjacent core promoters. c Insertion of TEs in the upstream region from the start codon to core promoter. Top panel: average TE coverage and its standard deviation in the upstream region from the start codon to distant core promoter. Bottom panel: percentage of core promoters that do not contain TEs in the upstream region from the start codon to the distant core promoter. A, adjacent core promoter; D, distant core promoter. d TFBS abundance between distant and adjacent core promoters using the number of TFBSs per core promoter per TF. The heatmap was constructed using the log2 transformed ratios of the TFBS abundances between distant and adjacent core promoters. The TFBSs showing significant changes (chi-squared tests with Yates’ correction) in at least one comparison were included in this analysis. Statistical significances were adjusted by Benjamini–Hochberg FDR multiple-testing correction. The asterisk indicates a significant difference in the TFBS abundance between adjacent and distant core promoters at a threshold of FDR < 0.01. e Estimation of TFBS divergence between distant and adjacent core promoters using normalized SE. The asterisk (P < 0.05) indicates a significant difference in the SE value between adjacent and distant core promoters according to the Wilcoxon rank-sum test. Aedes aegypti, AAEGY; Acyrthosiphon pisum, APISU; Bombus terrestris, BTERR; Helicoverpa armigera, HARMI; Laodelphax striatellus, LSTRI; Tribolium castaneum, TCAST; Tetranychus urticae, TURTI