| Literature DB >> 25378306 |
Dapeng Wang1, Jun Yu2.
Abstract
Plastids carry their own genetic material that encodes a variable set of genes that are limited in number but functionally important. Aside from orthology, the lineage-specific order and orientation of these genes are also relevant. Here, we develop a database, Plastid-LCGbase (http://lcgbase.big.ac.cn/plastid-LCGbase/), which focuses on organizational variability of plastid genes and genomes from diverse taxonomic groups. The current Plastid-LCGbase contains information from 470 plastid genomes and exhibits several unique features. First, through a genome-overview page generated from OrganellarGenomeDRAW, it displays general arrangement of all plastid genes (circular or linear). Second, it shows patterns and modes of all paired plastid genes and their physical distances across user-defined lineages, which are facilitated by a step-wise stratification of taxonomic groups. Third, it divides the paired genes into three categories (co-directionally-paired genes or CDPGs, convergently-paired genes or CPGs and divergently-paired genes or DPGs) and three patterns (separation, overlap and inclusion) and provides basic statistics for each species. Fourth, the gene pairing scheme is expandable, where neighboring genes can also be included in species-/lineage-specific comparisons. We hope that Plastid-LCGbase facilitates gene variation (insertion-deletion, translocation and rearrangement) and transcription-level studies of plastid genomes.Entities:
Mesh:
Year: 2014 PMID: 25378306 PMCID: PMC4383908 DOI: 10.1093/nar/gku1070
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Screenshots for major functional modules of Plastid-LCGbase. (A) Phylogenetic trees for 470 species built from whole plastid proteomes. (B) The search page for determining conserved gene pairs and visualizing gene orders. (C) The result page from search pages. Arrows and orientation indicate genes and their transcription direction. (D) The browse page. (E) The gene list page. (F) The gene pair page. (G) The genome map page. (H) Distributions of TSS distances from the three types of gene pairs. (I) Barplots of the three types and three patterns of gene pairs.
Figure 2.The definition of categories and patterns of gene pairs. On the basis of relative transcription direction, gene pairs are divided into three categories: (A) CDPG, gene transcribed in the same direction. (B) CPG, gene transcribed in the opposing direction but toward each other. (C) DPG, gene transcribed in the opposing transcription direction but toward distant directions. According to the relative positions of neighboring genes, gene pairs are classified into three patterns: (D) ‘Separation’, no overlapping regions. (E) ‘Overlap’, shared some regions but no one covers the other completely. (F) ‘Inclusion’, one gene covers all regions of the other gene.
A basic survey for protein-coding genes of 13 species as examples of Plastid-LCGbase
| Species | Genome length (nt) | Gene number | Strand ratio | CDPGs% | CPGs% | DPGs% | Median transcript length | Median TSS distance |
|---|---|---|---|---|---|---|---|---|
| 116470 | 129 | 1.67 | 79.7 | 10.2 | 10.2 | 419 | 533 | |
| 35107 | 32 | 33.00 | 100.0 | 0.0 | 0.0 | 581 | 592 | |
| 77717 | 82 | 2.11 | 75.3 | 12.3 | 12.3 | 486.5 | 608 | |
| 105309 | 119 | 1.95 | 73.7 | 13.6 | 12.7 | 416 | 587 | |
| 191028 | 209 | 1.45 | 66.8 | 16.8 | 16.3 | 518 | 580 | |
| 125373 | 67 | 1.65 | 71.2 | 13.6 | 15.2 | 554 | 1405 | |
| 162424 | 86 | 1.84 | 72.9 | 12.9 | 14.1 | 546.5 | 1230 | |
| 139697 | 82 | 1.21 | 79.0 | 9.9 | 11.1 | 510.5 | 1040 | |
| 107122 | 70 | 1.32 | 73.9 | 13.0 | 13.0 | 416 | 1143 | |
| 154168 | 84 | 1.77 | 72.3 | 13.3 | 14.5 | 630.5 | 1009 | |
| 125319 | 75 | 1.41 | 74.3 | 12.2 | 13.5 | 605 | 1125 | |
| 159507 | 86 | 1.84 | 75.3 | 11.8 | 12.9 | 579.5 | 1221 | |
| 159593 | 85 | 1.72 | 70.2 | 14.3 | 15.5 | 605 | 1158.5 | |
| Median of 470 genomes | 154425.5 | 85 | 1.69 | 74.7 | 12.0 | 13.1 | 554 | 1087.5 |
Note: Genome length, the length of whole genome; Gene number, the number of protein-coding genes; Strand ratio, (the number of genes in dominate strand +1)/(the number of genes in the other strand +1); CDPGs%, CPGs% and DPGs% indicate the percentages of CDPGs, CPGs and DPGs among all gene pairs. Median transcript length and median TSS distance indicate the median values of transcript length and the distance between neighboring transcription start sites.