| Literature DB >> 12620116 |
Victor Kunin1, Ildefonso Cases, Anton J Enright, Victor de Lorenzo, Christos A Ouzounis.
Abstract
From the historical record of genome sequencing, we show that the rate of discovery of new families has remained constant over time, indicating that our knowledge of sequence space is far from complete.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12620116 PMCID: PMC151299 DOI: 10.1186/gb-2003-4-2-401
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1The number of unique protein families accumulated from genome projects. Families were obtained by clustering proteins from complete genomes with the TRIBE-MCL algorithm (inflation value 1.1). Species with the largest contributions are indicated. All data and supplementary information are available at [9].
Figure 2Size distribution of protein families in relation to the time of their discovery. The x-axis represents the time of discovery of the founding member of a family; the y-axis represents frequency (on a logarithmic scale); each circle represents the number of protein families corresponding to the value on the y-axis; and the area of each circle corresponds to family size. It is notable that some of the largest families were founded early, but large families are still being discovered. Recently discovered small families (upper right) are expected to grow with better sampling of protein space.