| Literature DB >> 16515706 |
Paul Shafer1, David M Lin, Golan Yona.
Abstract
BACKGROUND: EST libraries are used in various biological studies, from microarray experiments to proteomic and genetic screens. These libraries usually contain many uncharacterized ESTs that are typically ignored since they cannot be mapped to known genes. Consequently, new discoveries are possibly overlooked.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16515706 PMCID: PMC1456965 DOI: 10.1186/1471-2164-7-41
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Left: Partial overview of the Biozon schema. Biozon currently stores extensive information about more than 50,000,000 objects (integrating sequence, structure, protein-protein interactions, pathways, expression data and more) totaling to about 100 million documents from more than 20 different databases as well as from in-house computations, and 6.5 billion relations between documents (including explicit relations between objects, and derived relations based on different similarity indices). Similarity relations are depicted with dashed lines. The database will be gradually extended to span both new source data types as well well as new computed data. Right: a subgraph of the Biozon data graph.
Figure 2The Biozon's EST2Prot system. An EST is mapped to a protein using one of five possible paths. To enrich the set of functional descriptors associated with each EST we also utilize similarity relations between proteins.
Figure 3Mapping ESTs to proteins through the substring relation. Often, a nucleic acid sequence is a fragment of a longer DNA sequence that contains a coding region. We compared all mouse nucleic acid sequences to each other and studied the distribution of (minimal) distances from coding regions. The vast majority of fragments (250,000) are located at the beginning of a coding region of a longer DNA sequence. In addition, there is a substantial number of ESTs that are located in the proximity of a coding region.