| Literature DB >> 28183267 |
Beatriz Serrano-Solano1, Antonio Díaz Ramos2, Jean-Karim Hériché3, Juan A G Ranea4,5.
Abstract
BACKGROUND: Loss-of-function phenotypes are widely used to infer gene function using the principle that similar phenotypes are indicative of similar functions. However, converting phenotypic to functional annotations requires careful interpretation of phenotypic descriptions and assessment of phenotypic similarity. Understanding how functions and phenotypes are linked will be crucial for the development of methods for the automatic conversion of gene loss-of-function phenotypes to gene functional annotations.Entities:
Keywords: Biological network; Cellular phenotype; Cluster analysis; Ontology
Mesh:
Year: 2017 PMID: 28183267 PMCID: PMC5304448 DOI: 10.1186/s12859-017-1503-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Set of 36 phenotypes obtained from the listed siRNA experiments sorted by its CMPO identifier
| Experiment | Description | Phenotypes | IDs in CMPO |
|---|---|---|---|
| CellMorph [3] | Genome-wide RNAi screen | Decreased cell number cell with projections | CMPO:0000052 |
| that examines changes in | elongated cell more lamellipodia cells increased | CMPO:0000071 | |
| the morphology of | number of actin filament round cell increased cell | CMPO:0000077 | |
| individual HeLa cells within | size decreased cell size bright nuclei metaphase | CMPO:0000083 | |
| cell populations. | arrested increased cell size in population | CMPO:0000105 | |
| CMPO:0000118 | |||
| CMPO:0000128 | |||
| CMPO:0000129 | |||
| CMPO:0000154 | |||
| CMPO:0000305 | |||
| CMPO:0000340 | |||
| MitoCheck [2] | Genome-wide RNAi screen | Cell death increased nucleus size graped | CMPO:0000030 |
| for genes required for | micronucleus abnormal nucleus shape mitosis | CMPO:0000140 | |
| chromosome segregation in | delayed binuclear cell absence of mitotic | CMPO:0000156 | |
| HeLa cells. The screen also | chromosome decondensation increased cell | CMPO:0000157 | |
| reports genes involved in | movement speed increased cell movement | CMPO:0000202 | |
| other processes such as cell | distance proliferating cells metaphase delayed | CMPO:0000213 | |
| movement. | abnormal chromosome segregation prometaphase | CMPO:0000216 | |
| delayed increased variability of nuclear shape in | CMPO:0000236 | ||
| population mitotic metaphase plate congression | CMPO:0000237 | ||
| CMPO:0000241 | |||
| CMPO:0000307 | |||
| CMPO:0000326 | |||
| CMPO:0000344 | |||
| CMPO:0000345 | |||
| CMPO:0000348 | |||
| EMBL secretion [4] | Genome-wide RNAi screen | Increased rate of protein secretion mild decrease | CMPO:0000246 |
| for interference with | in rate of protein secretion strong decrease in rate | CMPO:0000318 | |
| ER-to-plasma membrane | of protein secretion decreased rate of intracellular | CMPO:0000319 | |
| transport of the secretory | protein transport | CMPO:0000346 | |
| cargo protein tsO45G in | |||
| HeLa cells. | |||
| GR00053 [10] | Genome-wide RNAi screen | Increased number of site of double-strand break | CMPO:0000182 |
| for genes involved in DNA | |||
| damage responses in HeLa | |||
| cells. | |||
| GR00290 [9] | Genome-wide RNAi screen | Increased centriole replication decreased | CMPO:0000361 |
| for genes regulating | centriole replication | CMPO:0000362 | |
| centriole formation in HeLa | |||
| cells. | |||
| Copenhagen DNA damage Ubiquitin [8] | RNAi screen of >1300 | Decreased number of site of double-strand break | CMPO:0000181 |
| genes involved in the | |||
| ubiquitin-proteasome | |||
| system or encoding | |||
| zinc-finger proteins looking | |||
| for modulators of cellular | |||
| responses to ionizing | |||
| radiation in HeLa and | |||
| U2OS cells. | |||
| EMBL chromosome condensation [7] | RNAi screen of 100 | Increased duration of mitotic prophase decreased | CMPO:0000328 |
| bioinformatically-selected | duration of mitotic prophase | CMPO:0000329 | |
| genes for changes in mitotic | |||
| prophase duration in HeLa | |||
| cells. |
Binary matrix for gene-phenotype association
| Gene | Decreased | Cell with | … | Mitotic metaphase |
|---|---|---|---|---|
| cell number | projections | plate congression | ||
| (CMPO:0000052) | (CMPO:0000071) | (CMPO:0000348) | ||
| 57147 (SCYL3) | 1 | 0 | … | 0 |
| 2268 (FGR) | 1 | 0 | … | 1 |
| 22875 (ENPP4) | 0 | 1 | … | 0 |
| … | … | … | … | … |
| 5439 (POLR2J) | 1 | 0 | … | 1 |
Presence and absence of a phenotype after inhibition of each gene is represented by values 1 and 0, respectively
Fig. 1Distribution of information content (IC) of the terms annotating genes with phenotypes (black) and all the terms in cellular process (grey). For each level of specificity represented by the information content (IC), the curves represent the proportion of genes annotated with terms of this level in all the annotated genes versus the subset of genes with phenotypes
Similarity measures used in this study
| Name | Formula |
|---|---|
| Euclidean similarity |
|
| Correlation similarity |
|
| where | |
| Cosine similarity |
|
| Hamming similarity |
|
| Jaccard similarity |
|
| Cohen’s kappa |
|
| - | |
| - | |
| TF-IDF similarity |
|
| Resnik’s semantic similarity |
|
| - the Most Informative Common Ancestor is | |
| - the information content (IC) of a term | |
| - the probability of a term | |
| - | |
| Lin’s semantic similarity |
|
| Schlicker’s semantic similarity |
|
| Jiang’s semantic similarity |
|
| Pesquita’s semantic similarity |
|
| - |
G is the full set of genes (n =4198) and P is the set of 36 (n ) phenotypes. x denotes the phenotypic profile of gene g with if g shows phenotype p, otherwise
Fig. 2Hierarchical clustering of phenotypic similarity measures based on Pearson correlation distance
Similarity measures sorted by area under the ROC curve (AUC)
| Measure | AUC | Protein interactions |
|
|---|---|---|---|
| Resnik in CMPO | 0.56 | 24 | 0.0102 |
| Schlicker in CMPO | 0.56 | 12 | 0.7512 |
| Lin in CMPO | 0.55 | 11 | 0.8332 |
| Cohen’s kappa | 0.54 | 27 | 0.0015 |
| Pesquita in CMPO | 0.54 | 14 | 0.5494 |
| Jiang in CMPO | 0.54 | 11 | 0.8332 |
| TF-IDF | 0.53 | 25 | 0.0055 |
| Euclidean | 0.53 | 16 | 0.3433 |
| correlation | 0.52 | 22 | 0.0311 |
| Hamming | 0.52 | 21 | 0.0513 |
| cosine | 0.49 | 13 | 0.6545 |
| Jaccard | 0.49 | 13 | 0.6545 |
| Euclidean (logistic PCA) | 0.46 | 25 | 0.0055 |
| correlation (logistic PCA) | 0.45 | 19 | 0.1242 |
| Cosine (logistic PCA) | 0.45 | 14 | 0.5494 |
The second column represents the number of nearest neighbour gene pairs who are also protein interaction partners, and the third one, the p-values (computed from the hypergeometric distribution) that the number of observed interacting pairs is due to chance
Fig. 3Distributions of functional and phenotypic similarities. The box represents the upper and lower quartiles and the median is represented by the black line inside the box. a Phenotypic similarity in CMPO versus functional similarity in GO. b Functional similarity in GO versus phenotypic similarity in CMPO
Fig. 4Average semantic similarity in GO between genes sharing a particular phenotype (black). Randomization of the relationships between phenotypes and genes represents the null model (grey). Phenotypes with genes having high functional similarity (FDR-corrected p-values ≤0.01) are marked with *. Phenotypes are sorted on the X axis by ascending information content in CMPO. CMPO descriptions for the identifiers are in Table 1
Fig. 5Annotation-driven cellular function definition. a Genes (circles) are annotated with cellular process ontology terms (rectangles). After bipartite graph projection, links between terms are weighted according to the number of genes shared (line width). Then, terms are grouped using spectral clustering. b Clusters of functional terms (coloured circles) are linked to phenotypes (triangles) by shared genes
Fig. 6Average semantic similarity between terms in clusters. Randomization of the assignments of terms in clusters are represented in grey. Clusters are sorted by size (i.e. number of terms). a Average phenotypic similarity in clusters of GO terms. b Average functional similarity in clusters of CMPO terms