| Literature DB >> 25364802 |
Nilmini Hettiarachchi, Kirill Kryukov, Kenta Sumiyama, Naruya Saitou.
Abstract
Many studies on conserved noncoding sequences (CNSs) have found that CNSs are enriched significantly in regulatory sequence elements. We conducted whole-genome analysis on plant CNSs to identify lineage-specific CNSs in eudicots, monocots, angiosperms,and vascular plants based on the premise that lineage-specific CNSs define lineage-specific characters and functions in groups of organisms. We identified 27 eudicot, 204 monocot, 6,536 grass, 19 angiosperm, and 2 vascular plant lineage-specific CNSs(lengths range from 16 to 1,517 bp) that presumably originated in their respective common ancestors. A stronger constraint on the CNSs located in the untranslated regions was observed. The CNSs were often flanked by genes involved in transcription regulation. A drop of A+T content near the border of CNSs was observed and CNS regions showed a higher nucleosome occupancy probability. These CNSs are candidate regulatory elements, which are expected to define lineage-specific features of various plant groups.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25364802 PMCID: PMC4202324 DOI: 10.1093/gbe/evu188
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FPhylogenetic tree with the number of lineage-specific CNSs. The numbers on each branch represent the number of lineage-specific CNSs found in the study. The main plant groups considered in the study are depicted on the right. The phylogenetic tree was constructed with verified divergence times taken from Anderson et al. (2005), D’Hont et al. (2012), Banks et al. (2011), Heckman et al. (2001), and Rensing et al. (2007).
FThe flowcharts of the lineage-specific CNS determination. (A) Flowchart for lineage common CNS determination. (B) The flowchart for lineage-specific CNS determination.
FExample schematic diagram for identification of CNSs for all pairs of species. This schematic example with five species (A–E) shows how the pairwise searches are performed to determine the union of CNSs for each pair of species and separate lineages. In the example, the searches are performed in three levels (1–3). The bars on the right side connecting species represent separate searches. For all the species used in the analysis, a total of 105 searches were performed in a similar manner to determine the CNSs present in all pairs.
Summary of Lineage-Specific CNSs: Minimum–Maximum Lengths, Average Lengths, Average Percentage Identity, and Number of CNSs Longer Than 100 bp
| Eudicot | Monocot | Grass | Angiosperm | Vascular Plant | |
|---|---|---|---|---|---|
| Specific | Specific | Specific | Specific | Specific | |
| Number of CNSs | 27 | 204 | 6,536 | 19 | 2 |
| Minimum length (bp) | 22 | 23 | 23 | 16 | 46 |
| Maximum length (bp) | 63 | 186 | 1,517 | 95 | 50 |
| Average length (bp) | 38.5 | 58.5 | 140.7 | 42.8 | 48.0 |
| Average pid (%) | 89.8 | 84.3 | 80.25 | 87.5 | 82.0 |
| CNSs ≥ 100 bp | 0 | 14 | 3,306 | 0 | 0 |
FLength distributions of lineage-specific CNSs. (A) Length distribution for eudicot-specific CNSs. (B) Length distribution for monocot-specific CNSs. (C) Length distribution for grass-specific CNSs. (D) Length distribution for angiosperm-specific CNSs.
FThe lineage-specific loss of ancestral CNSs and the number of CNSs found from all pairwise searches. Left panel: The lineage-specific loss of ancestral CNSs. The values on branches represent the number of CNSs lost on that specific branch. The reference genome (Ch. reinhardtii) used for this analysis is highlighted in green. Right panel: Number of CNSs found from all pairwise searches. CNSs between all pairs of species were determined to have an overall comprehensive view on gain of noncoding conservation. These pairwise analyses consider the union of all CNSs. The number on each node reflects the gain of CNSs obtained through pairwise searches. These CNSs are common to each group of species and therefore are likely to be found in outgroup species.
Genomic Locations of the Lineage-Specific CNSs
| Rice Noncoding Genome Composition | Grass Specific | Monocot Specific | Arabidopsis Noncoding Genome Composition | Eudicot Specific | Angiosperm Specific | |
|---|---|---|---|---|---|---|
| Intergenic | 70.0 | 53.7 (3,503) | 54.9 (112) | 56.8 | 63.0 (17) | 47.4 (9) |
| Intron | 24.2 | 21.0 (1,374) | 22.1 (45) | 35.0 | 7.4 (2) | 31.6 (6) |
| UTR | 5.8 | 25.3 (1,658) | 23.0 (47) | 8.2 | 29.6 (8) | 21.0 (4) |
| 3′-UTR | 3.4 | 19.2 (1,259) | 11.3 (23) | 4.0 | 14.8 (4) | 10.5 (2) |
| 5′-UTR | 2.4 | 6.1 (399) | 11.7 (24) | 4.2 | 14.8 (4) | 10.5 (2) |
Note.—Genomic locations of grass and monocot-specific CNSs with respect to the reference genome Oryza sativa japonica are provided as a percentage in third and fourth columns. Rough percentage estimations of the intergenic, intron and UTRs for the reference genome are provided under rice noncoding genome composition in the second column. Genomic locations of eudicot and angiosperm-specific CNSs with respect to the reference genome Arabidopsis thaliana are provided as a percentage in sixth and seventh columns. Rough percentage estimations of the intergenic, intron, and UTRs for the reference genome are provided under Arabidopsis genome composition in the fifth column. The exact number of CNSs in each region is given in parentheses.
Gene Enrichment Analysis for the Lineage-Specific CNSs
| Functional Group | Percentage of Genes in the Group | |
|---|---|---|
| Likely target genes of grass-specific CNSs | ||
| Functions related to nucleus | 61.9 | 0.0E-0 |
| Regulation of transcription | 70.5 | 0.0E-0 |
| DNA-binding | 51.5 | 9.6E-309 |
| Transcription | 46.1 | 4.5E-278 |
| Transcription regulator activity | 69.1 | 5.7E-272 |
| Transcription factor activity | 65.6 | 8.6E-269 |
| Regulation of RNA metabolic processes | 41.7 | 4.6E-171 |
| Zinc-finger related | 23.3 | 3.4E-106 |
| Activator | 14.6 | 1.3E-86 |
| Sequence-specific DNA binding | 21.9 | 2.0E-83 |
| Zinc ion binding | 36.6 | 2.8E-72 |
| Metal ion binding | 25.1 | 1.1E-59 |
| Homeodomain related | 12.7 | 2.2E-54 |
| Response to organic substance | 24.5 | 1.5E-51 |
| Myb-type HTH DNA-binding domain | 10.0 | 5.6E-46 |
| Cellular response to hormone stimulus | 13.9 | 1.4E-43 |
| Hormone mediated signaling | 13.9 | 1.4E-43 |
| Myb, DNA binding | 10.3 | 1.8E-43 |
| Pathogenesis-related transcription factor and ERF, DNA binding | 7.7 | 3.9E-39 |
| Transition metal ion binding | 39.3 | 1.2E-38 |
| Likely target genes of monocot-specific CNSs | ||
| Transcription factor activity | 92.3 | 2.4E-51 |
| Transcription regulator activity | 92.3 | 5.8E-48 |
| Regulation of transcription | 90.8 | 1.3E-46 |
| Functions related to nucleus | 72.3 | 5.8E-45 |
| DNA binding | 93.8 | 1.2E-43 |
| Sequence-specific DNA binding | 35.4 | 1.8E-17 |
| Zinc-finger related | 26.2 | 3.0E-12 |
| Homeodomain related | 16.9 | 3.6E-8 |
| Basic-leucine zipper transcription factor | 10.8 | 1.3E-7 |
| Metal binding | 27.7 | 1.3E-7 |
| Activator | 13.8 | 2.1E-7 |
| Transcription factor, GATA, plant | 13.8 | 6.2E-5 |
| Myb-like DNA binding region | 9.2 | 5.5E-6 |
| No apical meristem protein | 9.2 | 2.0E-5 |
| Heat shock factor type, DNA binding | 6.2 | 5.1E-5 |
| Homeobox conserved site | 7.7 | 1.4E-4 |
| Anther development | 6.2 | 1.6E-4 |
| Androecium development | 6.2 | 8.2E-4 |
| Stamen development | 6.2 | 8.2E-4 |
| Postembryonic development | 18.5 | 8.7E-4 |
F(A) Distribution of A+T content in the flanking regions and within CNSs (for grass-specific CNSs) black line—average A+T content in rice genome. Red dots—A+T content inside CNSs (20 bp from the center of each CNS was considered as mentioned in the methodology) acquired through moving window analysis. Blue vertical lines—borders of 5′ and 3′ flanking regions around the CNS. (B) Nucleosome occupancy probability for grass-specific CNSs including flanking regions. Zeroth nucleotide position represents the center of each CNS and also the center of the random samples. Blue, red, and green graphs, respectively, show nucleosome occupancy probabilities of the CNSs, random sample with same AT content as CNSs, and the random sample without specific AT preference.