| Literature DB >> 16381935 |
Heng Li1, Avril Coghlan, Jue Ruan, Lachlan James Coin, Jean-Karim Hériché, Lara Osmotherly, Ruiqiang Li, Tao Liu, Zhang Zhang, Lars Bolund, Gane Ka-Shu Wong, Weimou Zheng, Paramvir Dehal, Jun Wang, Richard Durbin.
Abstract
TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins from UniProt; approximately 40-85% of proteins encoded in the fully sequenced animal genomes are included in TreeFam. TreeFam is freely available at http://www.treefam.org and http://treefam.genomics.org.cn.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16381935 PMCID: PMC1347480 DOI: 10.1093/nar/gkj118
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Flowcharts of TreeFam pipelines. (A) Overall strategy. The seed families for TreeFam-B are taken from PhIGs clusters. They are expanded by a seed-to-full procedure to form full families. Manual curation makes TreeFam-B families become TreeFam-A families, which can also be curated further at a later date. (B) The seed-to-full procedure. This procedure is used to expand seed families to full families. Note that the complete seed-to-full pipeline is only applied when the sequence sets are updated or a whole new genome is added to TreeFam. That is, for a TreeFam-A family created by curation of a TreeFam-B family, the TreeFam-A seed is generated by manual curation, and the full sequences are taken directly from the TreeFam-B family that was curated. (C) Manual curation. Various published resources and in-house tools are utilized in this process.
The number of orthologs between each pair of fully sequenced animal genomes in TreeFam
| Mouse | Rat | Chicken | Zebrafish | Pufferfish | Fruitfly | C. elegans | C. briggsae | |
|---|---|---|---|---|---|---|---|---|
| Human | 16 424 H | 15 572 H | 12 075 H | 11 203 H | 12 089 H | 7878 H | 7349 H | 6977 H |
| 17 401 M | 16 088 R | 10 839 C | 12 815 Z | 11 852 P | 4895 F | 4612 Ce | 4312 Cb | |
| Mouse | 17 782 M | 12 550 M | 12 047 M | 12 642 M | 8063 M | 7520 M | 7120 M | |
| 16 782 R | 10 633 C | 12 593 Z | 11 708 P | 4875 F | 4553 Ce | 4296 Cb | ||
| Rat | 11 784 R | 10 981 R | 11 537 R | 7514 R | 7127 R | 6758 R | ||
| 10 127 C | 12 000 Z | 11 089 P | 4720 F | 4380 Ce | 4118 Cb | |||
| Chicken | 10 876 Z | 10 040 P | 5810 C | 5396 C | 5098 C | |||
| 8225 C | 9081 C | 4338 F | 4281 Ce | 4013 Cb | ||||
| Zebrafish | 10 151 P | 7999 Z | 7844 Z | 7247 Z | ||||
| 12 249 Z | 4305 F | 4137 Ce | 3887 Cb | |||||
| Pufferfish | 7613 P | 7292 P | 6877 P | |||||
| 4781 F | 4519 Ce | 4267 Cb | ||||||
| Fruitfly | 4055 F | 3954 F | ||||||
| 4485 Ce | 4223 Cb | |||||||
| C.elegans | 8126 Ce | |||||||
| 7339 Cb |
For example, 16 424 human genes are orthologous to 17 401 mouse genes. Here H = human, M = mouse, R = rat, C = chicken, Z = zebrafish, P = pufferfish, F = fruitfly, Ce = C.elegans and Cb = C.briggsae.
Figure 2An example TreeFam webpage, for the Cyclin-E family. In the alignment the position of introns are indicated by highlighting the amino acid to the right of each intron–exon boundary in red.