| Literature DB >> 20152032 |
Haining Lin1, Gaurav Moghe, Shu Ouyang, Amy Iezzoni, Shin-Han Shiu, Xun Gu, C Robin Buell.
Abstract
BACKGROUND: The availability of genome and transcriptome sequences for a number of species permits the identification and characterization of conserved as well as divergent genes such as lineage-specific genes which have no detectable sequence similarity to genes from other lineages. While genes conserved among taxa provide insight into the core processes among species, lineage-specific genes provide insights into evolutionary processes and biological functions that are likely clade or species specific.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20152032 PMCID: PMC2829037 DOI: 10.1186/1471-2148-10-41
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Identification of lineage specific genes in . The solid boxes reflect non-Arabidopsis sequences used in the searches while the hashed boxes show the Arabidopsis genes.
Genic features of the CBSGs, ALSGs, ECs, and TE-related genes
| CBSGs | ALSGs | ECs | TE-related genes | |||||
|---|---|---|---|---|---|---|---|---|
| Feature | Mean | Median | Mean | Median | Mean | Median | Mean | Median |
| Exons/gene | 2.2 | 2 | 1.7 | 1 | 6.0 | 4 | 1.7 | 1 |
| Exon length | 256 | 182 | 213 | 147 | 280 | 155 | 1,336 | 522 |
| Intron length | 205 | 109 | 227 | 114 | 163 | 99 | 160 | 96 |
| Gene length | 827 | 598 | 537 | 261 | 2,315 | 1,998 | 2,420 | 2,072 |
| Protein length | 148 | 104 | 97 | 66 | 431 | 370 | na | na |
| Exon GC (%) | 41.0 | 40.7 | 42.3 | 42.2 | 42.6 | 42.6 | 42.7 | 42.3 |
| Intron GC (%) | 31.5 | 31.3 | 35.1 | 34.4 | 32.4 | 32.7 | 32.8 | 31.9 |
| Gene GC (%) | 37.8 | 37.8 | 41.0 | 40.9 | 39.6 | 39.3 | 41.5 | 41.4 |
| CDS/ORF GC(%) | 42.2 | 42 | 42.8 | 42.7 | 44.5 | 44.2 | na | na |
| 1st position GC (%) | 45.5 | 45.6 | 45.7 | 45.7 | 50.2 | 50.2 | na | na |
| 2nd position GC (%) | 40.4 | 40 | 40.0 | 40 | 40.5 | 40.1 | na | na |
| 3rd position GC (%) | 40.8 | 40.9 | 42.8 | 42.9 | 42.9 | 42.1 | na | na |
Functional annotation of CBSGs, ALSGs, and ECs
| CBSGs | ALSGs | ECs | ||||
|---|---|---|---|---|---|---|
| No. of | Percentageb | No. of | Percentageb | No. of | Percentageb | |
| With no known function | 696 | 1,250 | 5,090 | |||
| transcript support | 549 | 60.1 | 641 | 48.4 | 4,904 | 19.9 |
| no transcript support | 147 | 16.1 | 609 | 46.0 | 186 | 0.8 |
| With a known function | 218 | 74 | 19,534 | |||
| transcript support | 152 | 16.6 | 59 | 4.5 | 18,699 | 75.9 |
| no transcript support | 66 | 7.2 | 15 | 1.1 | 835 | 3.4 |
| putative PCP or SCRa | 68 | 7.4 | 4 | 0.3 | 41 | 0.2 |
| beta-galactosidase | 0 | 0.0 | 13 | 1.0 | 34 | 0.1 |
| other | 150 | 16.4 | 57 | 4.3 | 19,459 | 79.0 |
| Total | 914 | 100.0 | 1,324 | 100.0 | 24,624 | 100.0 |
aPCP (pollen coat protein) gene family or SCR (S locus cysteine-rich protein)
bPercentages in bold represent subtotals of the CBSG, ALSG, or EC set.
Subcellular localization of the CBSGs, ALSGs, ECs, and TAIR8 non-TE protein-coding genes
| No. of genes (%) | No. of | No. of | |
|---|---|---|---|
| Chloroplast | 37 (4.0) | 7 | 30 |
| Mitochondrion | 70 (7.7) | 11 | 62 |
| Secretory pathway | 412 (45.1) | 107 | 285 |
| Other | 395 (43.2) | 93 | 324 |
| Uncertain | 0 (0.0) | 0 | 0 |
| Total | 914 (100.0) | 218 | 701 |
| Chloroplast | 61 (4.6) | 4 | 45 |
| Mitochondrion | 229 (17.3) | 10 | 109 |
| Secretory pathway | 271 (20.5) | 11 | 130 |
| Other | 763 (57.6) | 49 | 416 |
| Uncertain | 0 (0.0) | 0 | 0 |
| Total | 1,324 (100.0) | 74 | 700 |
| Chloroplast | 3,909 (15.9) | 3,023 | 3,837 |
| Mitochondrion | 2,834 (11.5) | 2,149 | 2,773 |
| Secretory pathway | 4,751 (19.3) | 3,838 | 4,498 |
| Other | 13,117 (53.5) | 10,515 | 12,482 |
| Uncertain | 13 (0.1) | 9 | 13 |
| Total | 24,624 (100.0) | 19,534 | 23,603 |
| Chloroplast | 4,007 (14.9) | 3,034 | 3,912 |
| Mitochondrion | 3,133 (11.7) | 2,170 | 2,944 |
| Secretory pathway | 5,434 (20.2) | 3,956 | 4,913 |
| Other | 14,275 (53.1) | 10,657 | 13,222 |
| Uncertain | 13 (0.0) | 9 | 13 |
| Total | 26,862 (100.0) | 19,826 | 25,004 |
Figure 2Density of cytosine methylation in the 500 bp upstream, coding, and 500 bp downstream regions. A) ALSGs, CBSGs, and ECs, and B) ALSGs, CBSGs, and ECs predicted to be targeted to the secretory pathway.
Figure 3Ratio of non-synonymous to synonymous SNPs substitutions and the number of SNPs, non-synonymous SNPs, and synonymous SNPs per 100 bp per gene within coding regions of A) ALSGs, CBSGs, and ECs, and B) ALSGs, CBSGs, and ECs predicted to be targeted to the secretory pathway. Ratio of non-synonymous to synonymous substitutions SNPs is plotted in the solid line. Number of SNPs, non-synonymous SNPs, and synonymous SNPs per 100 bp per gene within coding regions is plotted in the dotted line. Lines are drawn between the 3 classses of genes to facilitate interpretation.