| Literature DB >> 23879659 |
Andreas D Zimmer1, Daniel Lang, Karol Buchta, Stephane Rombauts, Tomoaki Nishiyama, Mitsuyasu Hasebe, Yves Van de Peer, Stefan A Rensing, Ralf Reski.
Abstract
BACKGROUND: The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23879659 PMCID: PMC3729371 DOI: 10.1186/1471-2164-14-498
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
genome annotation releases
| | ||||
|---|---|---|---|---|
| 480 | 480 | 480 | ||
| 2,106 | 1,995 | 1,985 | ||
| 35,938 | 27,966 (-7,972) | 32,275 (+4,309) | ||
| 12,593 | 19,119 (+6,523) | 26,722 (+7,603) | ||
| 35,938 | 27,966 (-7,972) | 38,357* (+10,391) | ||
| - | - | 3,500 | ||
| 4,517 | 4,515 (-2) | 15,757* (+11,242) | ||
| 8,418 | 8,381 (-37) | 16,010 (+7,629)21,464* | ||
| 13.4 | 17.2 | 14.9* | ||
| 4.9 | 5.4 | 5.0* | ||
| 246 | 234 | 275* | ||
| 311 | 277 | 278* | ||
| | | | ||
| | - | 22,307 | ||
| | 27,966 | 1,582 | ||
| | - | 8,387 | ||
| | - | 1,338 | ||
| | | 2,196 | ||
| | | 1,456 | ||
| | 7,972 | 4,077 | ||
| | 99 | 108& | ||
| | 220 | 229& | ||
| | 432 | 432 | ||
| | 798 | 798 | ||
| | 213 | 213 | ||
| 6 | 6 | |||
*including splice variants &data from the miRBase registry, release 18 [59].
Figure 1best hit (BLASTP) coverage changes from V1.2 to V1.6 34.5 % of the protein-coding gene models (V1.6) covers better their closest homolog.
Summary statistics of alternative splicing in genes
| 12,941 | 27.21 | 5,195 | 13.28 | 5,177 | 14.65 | |
| 4,443 | 34.33 | 1,649 | 31.74 | 1,657 | 32.01 | |
| 3,940 | 30.45 | 1,414 | 27.22 | 1,433 | 27.68 | |
| 2,423 | 18.72 | 886 | 17.05 | 908 | 17.54 | |
| 989 | 7.64 | 836 | 16.09 | 836 | 16.15 | |
| 800 | 6.18 | 552 | 10.63 | 574 | 11.09 | |
| 2,692 | 20.8 | 2,055 | 39.56 | 2,082 | 40.22 | |
| 810 | 6.26 | 552 | 10.63 | 575 | 11.11 | |
| 3,523 | 27.22 | 2,055 | 39.56 | 2,080 | 40.18 | |
| 1,162 | 8.98 | 1,010 | 19.44 | 1,033 | 19.95 |
Overview of the alternative splicing events observed in P. patens using the PASA software and ~300,000 ESTs.
Protein-coding gene statistics of selected Viridiplantae
| | | ||||||
|---|---|---|---|---|---|---|---|
| Genes | # | 15,935 | 27,726 | 32,275 | 22,259 | 40,577 | 27,206 |
| Transcripts | # | 15,935 | 27,966 | 38,357 | 22,259 | 50,939 | 35,176 |
| Gene length [bp] | x̄ | 5,363 | 2,499 | 2,369 | 1,699 | 2,816 | 2,190 |
| | x̃ | 4,273 | 1,878 | 1,809 | 1,368 | 2,148 | 1,896 |
| Transcript length [bp] | x̄ | 2,898 | 1,269 | 1,389 | 1,194 | 1,540 | 1,540 |
| | x̃ | 2,284 | 987 | 1,248 | 987 | 1,395 | 1,388 |
| CDS length [bp] | x̄ | 2,043 | 1,131 | 1,062 | 1,145 | 1,079 | 1,234 |
| | x̃ | 1,425 | 867 | 813 | 951 | 879 | 1,053 |
| Exon length [bp] | x̄ | 322 | 234 | 275 | 213 | 313 | 261 |
| | x̃ | 155 | 145 | 155 | 128 | 158 | 147 |
| Intron length [bp] | x̄ | 308 | 277 | 278 | 110 | 415 | 164 |
| | x̃ | 238 | 206 | 213 | 59 | 169 | 100 |
| Exons per gene | x̄ | 9.0 | 5.4 | 5.0 | 5.6 | 4.9 | 5.9 |
| | x̃ | 7 | 4 | 3 | 4 | 3 | 4 |
| Introns per gene | x̄ | 8.0 | 4.4 | 4.0 | 4.6 | 3.9 | 4.9 |
| | x̃ | 6 | 3 | 2 | 3 | 2 | 3 |
| 5'-UTR exon length [bp] | x̄ | 181 | 171 | 211 | 92 | 189 | 119 |
| | x̃ | 138 | 127 | 157 | 50 | 122 | 88 |
| 5'-UTR intron length [bp] | x̄ | 513 | 502 | 520 | 184 | 666 | 315 |
| | x̃ | 274 | 353 | 390 | 70 | 355 | 239 |
| 3'-UTR exon length [bp] | x̄ | 634 | 323 | 338 | 166 | 377 | 217 |
| | x̃ | 537 | 299 | 311 | 110 | 316 | 201 |
| 3'-UTR intron length [bp] | x̄ | 859 | 505 | 268 | 280 | 500 | 204 |
| | x̃ | 470.5 | 215 | 213 | 65 | 180 | 104 |
| 5'-UTR length [bp] | x̄ | 204 | 231 | 307 | 110 | 254 | 152 |
| | x̃ | 158 | 184 | 258 | 54 | 156 | 112 |
| 3'-UTR length [bp] | x̄ | 653 | 352 | 367 | 194 | 464 | 237 |
| | x̃ | 551 | 322 | 334 | 121 | 358 | 210 |
| Multi exon transcript | # | 15,322 | 23,758 | 29,378 | 18,789 | 40,859 | 29,050 |
| | % | 96.1% | 84.9% | 76.6% | 84.4% | 80.2% | 82.6% |
| Single exon transcript | # | 613 | 4,208 | 8,979 | 3,470 | 10,080 | 6,126 |
| | % | 3.9% | 15.1% | 23.4% | 15.6% | 19.8% | 17.4% |
| Transcripts with both 5' and 3'-UTR | # | 15,856 | 4,515 | 15,757 | 2,178 | 31,089 | 26,255 |
| Transcripts with 5'-UTR | # | 15,896 | 5,691 | 18,180 | 2,506 | 31,793 | 27,097 |
| Transcripts with 3'-UTR | # | 15,895 | 7,205 | 19,041 | 3,653 | 33,252 | 28,049 |
| Transcripts without UTR | # | 0 | 19,585 | 16,893 | 18,278 | 16,983 | 6,285 |
| Multi exon 5'-UTR | # | 1,743 | 1,556 | 7,120 | 399 | 7,940 | 6,486 |
| | % | 11.0% | 27.3% | 39.2% | 15.9% | 25.0% | 23.9% |
| Single exon 5'-UTR | # | 14,153 | 4,135 | 11,060 | 2,107 | 23,853 | 20,611 |
| | % | 89.0% | 72.7% | 60.8% | 84.1% | 75.0% | 76.1% |
| Multi exon 3'-UTR | # | 480 | 462 | 1,387 | 522 | 5,010 | 2,027 |
| | % | 3.0% | 6.4% | 7.3% | 14.3% | 15.1% | 33.1% |
| Single exon 3'-UTR | # | 15,415 | 6,743 | 17,654 | 3,131 | 28,242 | 26,022 |
| % | 97.0% | 93.6% | 92.7% | 85.7% | 84.9% | 92.8% |
Selected properties of structure and organization of protein-coding genes within Viridiplantae. A more detailed list can be found in Additional file 4: Table A3. x̄ - average x̃ - median # - amount.
Figure 2Comparison of 5’-UTR intron numbers in Viridiplantae5’-UTR intron number frequencies of selected Viridiplantae genomes. The y-axis labels give the number of transcripts w/o 5’UTR introns in percentage of all transcripts with 5’UTR.
Figure 3V1.6 intron lengths distribution 5’-UTR, CDS, and 3’UTR intron lengths in comparison. The percentage of introns longer than 500 bp is much higher in 5’-UTRs than in CDS and 3’-UTR introns.
Figure 4Distance to translation and transcription start sites of 5’-UTR intron positions Distribution of 5’-UTR positions for and transcripts in comparison. The closeness of 5’-UTR to the initiating ATG is more pronounced in A. thaliana. While ~75% of introns are closer than 65 bp in A. thaliana only ~50% are in P. patens.
Comparison of the Gene Ontology (GO) annotation of V1.2 and V1.6
| 56,935 | 10,681 | 39,894 | 6,360 | 11,586 | 8,449 | 10,408 | 4,774 | 27,966 | |
| (41%) | (30%) | (37%) | (17%) | ||||||
| 66,234 | 15,581 | 24,415 | 26,238 | 18,786 | 10,326 | 13,110 | 14,839 | 32,275 | |
| (58%) | (32%) | (41%) | (46%) |
A general increase of protein-coding genes with an assigned GO term could be achieved in V1.6. The number of GO terms “molecular function” has been manually corrected for V1.2 (see text in section). BP – “biological process”: MF – “molecular function”; CC –“cellular component”.
Selected GO categories: in comparison to
| Two component system – histidine kinases and response regulators | 6.91E-26 | GO:0000160 | two-component signal transduction system (phosphorelay) | 124 | 78 |
| 4.60E-24 | GO:0018106 | peptidyl-histidine phosphorylation | 72 | 3 | |
| 1.03E-28 | GO:0000155 | two-component sensor activity | 95 | 3 | |
| 1.34E-08 | GO:0000156 | two-component response regulator activity | 86 | 35 | |
| LHCs | 1.39E-09 | GO:0009765 | photosynthesis, light harvesting | 45 | 21 |
| flagellum | 0.000155148 | GO:0001539 | ciliary or flagellar motility | 10 | - |
| 3.11E-05 | GO:0019861 | flagellum | 19 | - | |
| 0.000496321 | GO:0030286 | dynein complex | 12 | - | |
| PAL | 0.000726755 | GO:0006559 | L-phenylalanine catabolic process | 14 | 3 |
| 0.010050079 | GO:0016841 | ammonia-lyase activity | 17 | 5 | |
| ALDH | 0.000574734 | GO:0004365 | glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity | 18 | 4 |
The GO enrichment analysis was performed using topGO with a q-value cut-off <0.05. Depicted are only GO terms overrepresented in P. patens and associated with the gene families reported to be expanded in P. patens.
Figure 5How many moss-specific genes are there? BLAST hits of P. patens-only clusters based on our OrthoMCL clustering with selected Viridiplantae genomes against GenPept (rel. 190). P. patens proteins were excluded from GenPept for this analysis.
Figure 6Gene family sizes in proteins from several Viridiplantae were clustered using OrthoMCL. Depicted are all protein clusters with regard to P. patens and sorted by cluster size. The clusters were subdivided into P. patens only clusters and clusters with at least one other member. Protein families found to be expanded in P. patens in comparison to A. thaliana are listed.
Number of introns in Viridiplantae
| Amount/fraction of intron-less genes | 105 (0.7%) | 941 (3%) | 1,051 (3%) | 1,196 (4%) |
| Amount/fraction of genes with less introns than median intron numbers of other plants | 1,304 (8%) | 4,405 (14%) | 5,719 (14%) | 5,158 (19%) |
Figure 7Comparison of gene family sizes in conserved clusters distribution of genes per cluster and species common to and at least one other Viridiplantae (8208 cluster).