| Literature DB >> 21496233 |
Andrew T Milnthorpe1, Mikhail Soloviev.
Abstract
BACKGROUND: The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21496233 PMCID: PMC3094240 DOI: 10.1186/1471-2105-12-97
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1CGAP library database entry. Each field entry begins with its heading, shown in capital letters, followed by the value associated with that field, after the colon.
Figure 2Tissue type origin of libraries reported for by CGAP tools after searching for "ear" tissue. All the libraries reported in this search (database access date 13 May 2010) were then manually checked for their "unique tissue" annotations and the percentage of the reported libraries which originate from all tissues were calculated.
Error rates for the CGAP library selection tools
| Tissue types available | Percentage of correctly reported libraries | Percentage of incorrectly reported libraries |
|---|---|---|
| Adipose | 100.00 | 0.00 |
| Adrenal cortex | 100.00 | 0.00 |
| Adrenal medulla | 100.00 | 0.00 |
| Bone | 33.04 | 66.96 |
| Bone marrow | 96.43 | 3.57 |
| Brain | 53.29 | 46.71 |
| Breast/Mammary Gland | 99.39 | 0.61 |
| Cartilage | 100.00 | 0.00 |
| Cerebellum | 92.86 | 7.14 |
| Cerebrum | 99.53 | 0.47 |
| Cervix | 100.00 | 0.00 |
| Colon | 98.88 | 1.12 |
| Ear | 5.66 | 94.34 |
| Embryonic tissue | 8.45 | 91.55 |
| Endocrine | 5.62 | 94.38 |
| Eye | 61.11 | 38.89 |
| Gastrointestinal tract | 3.47 | 96.53 |
| Genitourinary systema | 0.00 | 0.00 |
| Germ cell | 11.54 | 88.46 |
| Head and neck | 0.42 | 99.58 |
| Heart | 51.76 | 48.24 |
| Kidney | 94.31 | 5.69 |
| Limb | 0.00 | 100.00 |
| Liver | 83.66 | 16.34 |
| Lung | 97.27 | 2.73 |
| Lymph node | 100.00 | 0.00 |
| Lymphoreticular | 14.16 | 85.84 |
| Mammary gland/Breast | 99.39 | 0.61 |
| Muscle | 25.71 | 74.29 |
| Nervous | 0.92 | 99.08 |
| Oesophagus | 95.45 | 4.55 |
| Ovary | 95.92 | 4.08 |
| Pancreas | 67.35 | 32.65 |
| Pancreatic islet | 100.00 | 0.00 |
| Parathyroid | 57.14 | 42.86 |
| Peripheral nervous system | 12.50 | 87.50 |
| Pineal gland | 87.50 | 12.50 |
| Pituitary gland | 93.33 | 6.67 |
| Placenta | 99.48 | 0.52 |
| Pooled tissueb | Not available | Not available |
| Prostate | 97.46 | 2.54 |
| Retina | 100.00 | 0.00 |
| Salivary gland | 62.50 | 37.50 |
| Skin | 89.00 | 11.00 |
| Soft tissue | 1.74 | 98.26 |
| Spleen | 78.57 | 21.43 |
| Stem cell | 33.72 | 66.28 |
| Stomach | 94.07 | 5.93 |
| Synovium | 100.00 | 0.00 |
| Testis | 98.67 | 1.33 |
| Thymus | 97.50 | 2.50 |
| Thyroid | 97.57 | 2.43 |
| Uncharacterised tissue | 99.75 | 0.25 |
| Uterus | 99.22 | 0.78 |
| Vascular | 91.89 | 8.11 |
| White Blood Cellsb | 0.00 | 0.00 |
a No libraries were present in the database for these tissues
b Pooled tissue was not available in the CGAP tools, which listed these libraries under each of the tissues they were produced from.
Figure 3Differences in the number of genes reported by CGAP tools for an identical query. The total number of genes reported to be present when normal adipose libraries (in one pool) are compared with cancerous bone libraries (in the other pool) by xProfiler's gene lists (left circle) and cDNA DGED (right circle). The overlap between the two circles represents the genes reported by both tools.
Number of genes reported to be present in both pools when normal adipose libraries (in one pool) were compared with cancerous adipose tissues (in the other pool), by xProfiler's gene lists and summary table of gene results, and by cDNA DGED.
| Tool and output method used | Number of genes reported |
|---|---|
| cDNA xProfiler results table | 1,688 |
| cDNA xProfiler gene lists | 1,509 |
| cDNA DGED | 1,632 |
Number of genes reported to be present in either or both pools when normal bone libraries (in one pool) are compared with cancerous bone libraries (in the other pool) by xProfiler's gene lists and summary table of gene results, cDNA DGED and using our algorithm.
| Tool and output method used | Number of genes reported |
|---|---|
| cDNA xProfiler results table | 10,108 |
| Our algorithm | 9,996 |
| cDNA DGED | 9,996 |
| Our algorithm | 9,996 |
Change in probability values "P" reported by cDNA DGED and our algorithm when the display cut-off value "F" is changed are exemplified for three genes that are presented in the gene list when normal bone libraries are compared with cancerous bone libraries
| UniGene Cluster ID | Name | Symbol | ||||
|---|---|---|---|---|---|---|
| "F" = 2 | "F" = 3 | "F" = 2 | "F" = 3 | |||
| 164226 | Thrombospondin 1, mRNA | THBS1 | 0.001 | 0.049 | 0.978 | 0.978 |
| 369397 | CDNA FLJ53400 complete cds | TGFBI | 0.001 | 0.045 | 0.982 | 0.982 |
| 462998 | In-IGFBP-4 mRNA | IGFBP4 | 0.000 | 0.007 | 0.999 | 0.999 |
a Calculated using on-line tools from CGAP; calculations based on equations (1 - 3) in this report). The calculated "P" value is close to zero (on a scale of zero to one) if the probability is high that the observed expression difference is genuinely greater than the user-specified "F" value, and is not due to sampling error [31,33].
b Calculated using equation (4) in this report. This produces a "P" value of between zero and one, but unlike the CGAP value, this is a decimal fraction of the likelihood of there being at least a threefold difference in expression of the transcript in the activated cells.
Total number of sequences reported for normal adipose tissue libraries and for cancerous adipose tissue by cDNA DGED library list and gene list
| Number of sequences reported | Sequences in normal adipose libraries | Sequences in cancerous adipose libraries |
|---|---|---|
| Library list | 2,285 | 1,740 |
| Gene list | 1,799 | 721 |
Total number of sequences reported for normal bone libraries and for cancerous bone libraries by the library list and gene list produced by CGAP tools and by our routine.
| Number of sequences reported | Sequences reported for normal bone libraries | Sequences reported for cancerous bone libraries |
|---|---|---|
| Library list from CGAP tools | 19,308 | 18,197 |
| Gene list from cDNA DGED | 17,844 | 16,635 |
| Library list from our new algorithm | 17,844 | 16,635 |
| Gene list from our new algorithm | 17,844 | 16,635 |
Data from UniGene relational database for α-actinin genes reported by CGAP xProfiler and/or cDNA DGED tools for a comparison of a pool containing normal adipose libraries with a pool containing cancerous adipose libraries.
| Tool that reported gene in either or both pools | Gene Symbol | Gene Title | UniGene Cluster ID |
|---|---|---|---|
| cDNA DGED only | ACTN4 | Actinin, alpha 4 | 270291 |
| cDNA DGED and cDNA xProfiler | ACTN1 | Actinin, alpha 1 | 509765 |