| Literature DB >> 25398899 |
Thomas Bock1, Wei-Hua Chen1, Alessandro Ori1, Nayab Malik1, Noella Silva-Martin1, Jaime Huerta-Cepas1, Sean T Powell1, Panagiotis L Kastritis1, Georgy Smyshlyaev2, Ivana Vonkova1, Joanna Kirkpatrick3, Tobias Doerks1, Leo Nesme1, Jochen Baßler4, Martin Kos4, Ed Hurt4, Teresa Carlomagno1, Anne-Claude Gavin1, Orsolya Barabas1, Christoph W Müller1, Vera van Noort1, Martin Beck5, Peer Bork6.
Abstract
The thermophilic fungus Chaetomium thermophilum holds great promise for structural biology. To increase the efficiency of its biochemical and structural characterization and to explore its thermophilic properties beyond those of individual proteins, we obtained transcriptomics and proteomics data, and integrated them with computational annotation methods and a multitude of biochemical experiments conducted by the structural biology community. We considerably improved the genome annotation of Chaetomium thermophilum and characterized the transcripts and expression of thousands of genes. We furthermore show that the composition and structure of the expressed proteome of Chaetomium thermophilum is similar to its mesophilic relatives. Data were deposited in a publicly available repository and provide a rich source to the structural biology community.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25398899 PMCID: PMC4267624 DOI: 10.1093/nar/gku1147
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Literature overview of PDB-deposited structures and scientific publications derived from or referring to Chaetomium thermophilum proteins. Structures deposited before initial genome sequencing in 2011 were enabled by access to partial genome information.
Figure 2.Summary and added value of the analyses performed on the experimental transcriptomics and proteomics data, which included: refinement of intron/exon structures, analysis of previously predicted ORFs that contain a stop codon, de novo ORF/peptide analysis and expression analysis of protein termini.
Figure 3.Example of experimental validation of gene sequence reannotation. (A) Reverse transcriptase polymerase chain reaction of the ct IOC4 (CTHT_0009460) gene. Size of gel band was high compared to annotated gene model (> 1722 nt instead of 1635 nt). Gene sequencing (RT-PCR) reveals 87 nt additional sequence extending exon 3. The extended sequence partially matches intron 3 of the originally annotated ct IOC4 sequence. (B) Original Ct IOC4 coding sequence (top) and MS-identified peptides overlaid to the Protter protein sequence view (bottom). The expression of assumed exon 3 extension initially found by RT-PCR (red box) was verified by MS-based identification of the peptide sequence “.ATSEEDEDVEMEDAPSATETSAK.” (blue bar) which covers the MS-detectable part of the assumed exon 3 extension. The insert site of the exon 3 extended sequence is indicated (red dot) in the Protter (29) protein sequence image of ct IOC4, together with all other MS-identified peptides (highlighted in blue, N- and C-terminus and potential tryptic cleavage sites for MS indicated). An alternative splice variant for ct IOC4 containing the extended exon 3 sequence is included in the reannotated Ct genome.
Chaetomium thermophilum proteins with direct proteomic evidence of N-terminal and C-terminal expression
| Termini match | 1st Met omitted matcha | Sum terminal matches | Match in 25 terminal AA | Fraction of total proteome | |
|---|---|---|---|---|---|
| N-terminus | 194 | 822 | 985 | 1738 | 24.0 |
| C-terminus | 665 | - | 665 | 1906 | 26.4 |
aPeptides matching N-terminus without the START codon (Met) are counted as termini match.
Genomic repeat analysis in Chaetomium thermophilum
| Repeat type | Length occupied (bp) | Percentage of sequence (%) |
|---|---|---|
| Interspersed repeats | 721 491 | 2.55 |
| Retroelements | 342 067 | 1.21 |
| LINEs | 276 216 | 0.98 |
| 236 085 | 0.83 | |
| LTR elements | 64 479 | 0.23 |
| 37 960 | 0.13 | |
| 10 775 | 0.04 | |
| DNA transposons | 39 456 | 0.14 |
| Unclassified | 339 968 | 1.20 |
| Satellites | 3877 | 0.01 |
| Simple repeats | 9302 | 0.03 |
Figure 4.Abundance of functional protein category levels. (A) Qualitative and quantitative overview of the fraction of the genome, transcriptome and proteome dedicated to defined functional categories. ‘Genome’ refers to the number of genes, ‘Transcriptome’ to the number of identified mRNAs, ‘Proteome’ to the number of identified proteins, ‘Quantitative transcriptome’ to mRNA abundances and ‘Quantitative proteome’ to protein abundances within any given functional category. (B) Comparison of protein abundance changes in selected functional protein categories between the thermophile, Chaetomium thermophilum, and the mesophile, Chaetomium globosum. Functional categories were selected from the three present main categories available in eggNOG. (C) Comparison of protein abundance in selected major protein complexes and proteins of similar function between Chaetomium thermophilum and Chaetomium globosum. Protein abundance is based on “intensity-based absolute quantification” scores (iBAQ).