| Literature DB >> 16857057 |
Pavel Pospisil1, Lakshmanan K Iyer, S James Adelstein, Amin I Kassis.
Abstract
BACKGROUND: We present an effective, rapid, systematic data mining approach for identifying genes or proteins related to a particular interest. A selected combination of programs exploring PubMed abstracts, universal gene/protein databases (UniProt, InterPro, NCBI Entrez), and state-of-the-art pathway knowledge bases (LSGraph and Ingenuity Pathway Analysis) was assembled to distinguish enzymes with hydrolytic activities that are expressed in the extracellular space of cancer cells. Proteins were identified with respect to six types of cancer occurring in the prostate, breast, lung, colon, ovary, and pancreas.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16857057 PMCID: PMC1555615 DOI: 10.1186/1471-2105-7-354
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Data mining tools and strategy. Building effective data mining strategy and the sequence of programs used in the proposed strategy.
Number of enzyme hits identified in six common cancer types using the combined data mining strategy.
| 1791 | 2500 | 4325 | 3030 | 3365 | 3059 | |
| 749 | 941 | 1718 | 1186 | 1427 | 1244 | |
| 917 | 1117 | 1954 | 1408 | 1675 | 1495 | |
| 375 | 456 | 598 | 481 | 515 | 450 | |
| | 135 | 170 | 212 | 176 | 184 | 170 |
| | 188 | 220 | 298 | 238 | 262 | 219 |
| | 323 | 390 | 510 | 414 | 446 | 389 |
| | 12 | 14 | 20 | 17 | 16 | 16 |
| | 24 | 25 | 33 | 24 | 30 | 31 |
| | 5 | 9 | 10 | 7 | 8 | 5 |
| Other | 282 | 342 | 447 | 366 | 392 | 337 |
Completed on 11/18/05; in italic: number of abstracts; in normal font: number of entities (genes and proteins); numbers below correspond to subnetworks of entities designated by IPA-location or IPA-family and are all part of the IPA cancer network; includes some enzymes, for example human sulfatase 1
Figure 2Transfer of entities between two knowledge bases. For LSGraph, PubMed and GO, all extracellular or membrane-bound entities cited in scientific abstracts were exported using UniProt accession numbers into IPA (thick arrow) for each cancer type. For IPA, imported entities were recognized and enlarged by functional neighbors (shaded area) of «Global Analysis Genes» network. All entities with «High Level Function» corresponding to «cancer» (thin arrows) were exported to Microsoft Excel workbook as hit lists for further individual examination (see Additional file 1).
Figure 3Example of network of prostate-cancer-related proteins. This network was generated through the use of Ingenuity Pathway Analysis (IPA). Gene products are represented as nodes and biological relationships between two nodes as a line. Shapes of nodes symbolize functional class of gene product; for example, triangles are phosphatases, diamond-shaped rectangles are peptidases. Proteins are separated in spaces between lines corresponding to cellular location based on IPA-location categories: Extracellular space, Plasma membrane, Cytoplasm, and Nucleus. Some proteins of interest are circled: prostatic acid phosphatase (PAP) is coded as ACPP, prostate-specific antigen (PSA) as KLK3 (kallikrein 3), and various metalloproteinases as MMP.