| Literature DB >> 20975906 |
Sarah Middleton1, Timothy Song, Sudhir Nayak.
Abstract
The increasing number of annotated genome sequences in public databases has made it possible to study the length distributions and domain composition of proteins at unprecedented resolution. To identify factors that influence protein length in metazoans, we performed an analysis of all domain-annotated proteins from a total of 49 animal species from Ensembl (v.56) or EnsemblMetazoa (v.3). Our results indicate that protein length constraints are not fixed as a linear function of domain count and can vary based on domain content. The presence of repeating domains was associated with relaxation of the constraints that govern protein length. Conversely, for proteins with unique domains, length constraints were generally maintained with increased domain counts. It is clear that mean (and median) protein length and domain composition vary significantly between metazoans and other kingdoms; however, the connections between function, domain content, and length are unclear. We incorporated Gene Ontology (GO) annotation to identify biological processes, cellular components, or molecular functions that favor the incorporation of multi-domain proteins. Using this approach, we identified multiple GO terms that favor the incorporation of multi-domain proteins; interestingly, several of the GO terms with elevated domain counts were not restricted to a single gene family. The findings presented here represent an important step in resolving the complex relationship between protein length, function, and domain content. The comparison of the data presented in this work to data from other kingdoms is likely to reveal additional differences in the regulation of protein length.Entities:
Year: 2010 PMID: 20975906 PMCID: PMC2951704 DOI: 10.6026/97320630004441
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Length and domain distributions for metazoan proteins. Black bars = repeatcontaining proteins. “*” = p≫0.0001 MannWhitney.
Length distribution of metazoan proteins. Proteins ≫2000 amino acids were excluded (1.7%) for illustration.
Domain distribution of metazoan proteins. Proteins with ≫20 domains were excluded (0.5%) for illustration.
Number of domains versus protein length. Proteins with ≫30 domains were excluded (≪0.1%) for illustration. Equation of y = 393.23e0.0672x.
Length constraints vary based on domain content. Proteins with ≫15 domains were excluded (1.7%) for illustration. Grey bars = nonrepeatcontaining proteins.
Figure 2GO ID domain distribution. A total of 10636 GO IDs were used, with 6429 in biological processes, 964 in cellular component, and 3243 in molecular function.
GO ID domain distribution within biological process. Black line = Domain distribution of all GO IDs in biological process, gray line = Domain distribution in GO:0001834, and dashed line = Domain distribution in GO:0002316.
GO ID domain distribution within cellular component. Black line = Domain distribution of all GO IDs in cellular component, gray line = Domain distribution in GO:0000235, and dashed line = Domain distribution in GO:0001527.
GO ID domain distribution within molecular function. Black line = Domain distribution of all GO IDs in molecular function, gray line = Domain distribution in GO:0000155, and dashed line = Domain distribution in GO:0004087.