| Literature DB >> 15608234 |
Vamsi Veeramachaneni1, Wojciech Makalowski.
Abstract
A large database of homologous sequence alignments with good estimates of evolutionary distances can be a valuable resource for molecular evolutionary studies and phylogenetic research in particular. We recently created a database containing 159,921 transcripts from human, mouse, rat, zebrafish and fugu species. Approximately 1,000 homology groups were identified with the help of Ensembl homology evidence. At the macro-level, the database allows us to answer queries of the form: 1. What is the average k-distance between 5' untranslated regions of human and mouse? 2. List the 10 groups with the highest K(a)/K(s) ratio between mouse and rat. 3. List all identical proteins between human and rat. Researchers interested in specific proteins can use a simple web interface to retrieve the homology groups of interest, examine all pairwise distances between members of the group and study the conservation of exon-intron gene structures using a graphical interface. The database is available at http://warta.bio.psu.edu/DED/.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15608234 PMCID: PMC540048 DOI: 10.1093/nar/gki094
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Number of genes and transcripts stored in the DED (August 2004)
| Species | Number of genes | Number of transcripts |
|---|---|---|
| Human | 21 787 | 29 802 |
| Mouse | 25 307 | 32 281 |
| Rat | 22 159 | 28 545 |
| Fugu | 35 180 | 38 510 |
| Zebrafish | 22 409 | 30 783 |
| Total | 126 842 | 159 921 |
Number of external links present in the DED
| Database | Human | Mouse | RAT | Fugu | Zebrafish | Total |
|---|---|---|---|---|---|---|
| GKB | 526 | 0 | 0 | 0 | 0 | 526 |
| ZFIN_ID | 0 | 0 | 0 | 0 | 1397 | 1397 |
| PDB | 1174 | 351 | 228 | 2033 | 0 | 3786 |
| Sanger_Hver1_3_1 | 4 976 | 0 | 0 | 0 | 0 | 4976 |
| UMCU_Hsapiens_19Kv1 | 12 311 | 0 | 0 | 0 | 0 | 12 311 |
| RefSeq | 5255 | 1219 | 1592 | 6197 | 29 | 14 292 |
| HUGO | 11 075 | 0 | 0 | 3510 | 0 | 14 585 |
| MIM | 8738 | 205 | 148 | 5615 | 0 | 14 706 |
| MarkerSymbol | 0 | 17 177 | 0 | 0 | 0 | 17 177 |
| SPTREMBL | 5500 | 2174 | 789 | 6758 | 2123 | 17 344 |
| GO | 13 247 | 0 | 0 | 4171 | 0 | 17 418 |
| SWISS-PROT | 962 | 211 | 3250 | 24 711 | 5 | 29 139 |
| LocusLink | 15 903 | 15 443 | 4252 | 4833 | 1079 | 41 510 |
| Protein_id (at EMBL) | 19 414 | 19 989 | 5223 | 7862 | 3483 | 55 971 |
| EMBL (nucleotide records) | 19 434 | 20 014 | 5255 | 7874 | 3483 | 56 060 |
| Ensembl | 21 787 | 25 307 | 22 159 | 35 180 | 22 409 | 126 842 |
| Total | 140 302 | 102 090 | 42 896 | 108 744 | 34 008 | 428 040 |
Figure 1The distribution of group sizes.
Figure 2Sample homology group details. The member section has been truncated. Note that while the proteins are 100% identical, the alignment picture shows that the gene structure is not—there appears to be an intron gain in the rat lineage.
Figure 3Group structure and phylogenetic tree for a homology group. Pairwise comparison analysis suggests that the homology relationship between fugu and zebrafish genes can be ignored and the group split into two smaller groups.
Figure 4Overall evolutionary statistics with mean and standard deviation shown for all distances. Analysis was restricted to pairs with direct homology evidence. UTR comparisons were made only when UTR size was at least 30 bp. Alignments in which start (or stop) codons were separated by more than 20 columns were ignored.