| Literature DB >> 35542493 |
Zejun Li1,2, Bo Liao1, Yun Li1, Wenhua Liu2, Min Chen1,2, Lijun Cai1.
Abstract
Gene function annotation is the main challenge in the post genome era, which is an important part of the genome annotation. The sequencing of the human genome project produces a whole genome data, providing abundant biological information for the study of gene function annotation. However, to obtain useful knowledge from a large amount of data, a potential strategy is to apply machine learning methods to mine these data and predict gene function. In this study, we improved multi-instance hierarchical clustering by using gene ontology hierarchy to annotate gene function, which combines gene ontology hierarchy with multi-instance multi-label learning frame structure. Then, we used multi-label support vector machine (MLSVM) and multi-label k-nearest neighbor (MLKNN) algorithm to predict the function of gene. Finally, we verified our method in four yeast expression datasets. The performance of the simulated experiments proved that our method is efficient. This journal is © The Royal Society of Chemistry.Entities:
Year: 2018 PMID: 35542493 PMCID: PMC9083914 DOI: 10.1039/c8ra05122d
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
Fig. 1Part of the gene ontology hierarchy relationships.
Fig. 2Four types of machine learning frames.
The MLSVM algorithm
| Input: | |
| Output: | |
| 1 | For training set |
| 2 | For each label |
| 3 | Produce a sub-training set |
| 4 | Train a SVM model |
| 5 | For a test sample |
| 6 | Its labels are obtained by |
| 7 | End |
| 8 | End |
| 9 |
|
MLKNN algorithm
| Input: | |
| Output: | |
| 1 | For a test sample |
| 2 | Calculate |
| 3 | The candidate classes of |
| 4 | For each label |
| 5 | Calculate simScore( |
| 6 | Calculate the likelihood score of |
| 7 |
|
| 8 | End |
| 9 | End |
| 10 |
|
Fig. 3MIHC+ algorithm flowchart.
Fig. 4The flow chart of gene function prediction.
The results of cdc28 dataset by MLSVM
| Method |
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | ||
| GNC |
| 0.559 | 0.606 | 0.631 | 0.644 | 0.656 | 0.671 | 0.670 | 0.679 | 0.703 |
|
| 0.569 | 0.603 | 0.607 | 0.625 | 0.638 | 0.646 | 0.650 | 0.650 | 0.661 | |
|
| 0.571 | 0.583 | 0.599 | 0.612 | 0.618 | 0.629 | 0.628 | 0.632 | 0.646 | |
|
| 0.529 | 0.543 | 0.552 | 0.567 | 0.577 | 0.574 | 0.578 | 0.589 | 0.602 | |
|
| 0.532 | 0.535 | 0.558 | 0.569 | 0.572 | 0.579 | 0.588 | 0.592 | 0.596 | |
| GOLC |
| 0.594 | 0.617 | 0.634 | 0.635 | 0.648 | 0.643 | 0.641 | 0.651 | 0.653 |
|
| 0.609 | 0.623 | 0.624 | 0.631 | 0.624 | 0.632 | 0.635 | 0.640 | 0.643 | |
|
| 0.601 | 0.644 | 0.657 | 0.658 | 0.654 | 0.661 | 0.656 | 0.643 | 0.668 | |
|
| 0.601 | 0.638 | 0.647 | 0.651 | 0.654 | 0.663 | 0.658 | 0.656 | 0.662 | |
| MIHC | 0.621 | 0.666 | 0.727 | 0.767 | 0.800 | 0.794 | 0.817 | 0.828 | 0.838 | |
| MIHC+ | 0.644 | 0.665 | 0.735 | 0.740 | 0.796 | 0.811 | 0.822 | 0.831 | 0.840 | |
Some GO terms and its genes in alph dataset
| Environment | Alph | ||||
| Genes | YBR189W | YGL189C | YGR214W | YJR123W | YOL121C |
| YER025W | YGR094W | YGR285C | YNL178W | ||
| YGL123W | YGR118W | YHR064C | YNL209W | ||
| GO terms | GO:0008152 | GO:0009987 | GO:0044237 | GO:0044238 | GO:0071704 |
Some GO terms and its genes in cdc15 datasets
| Environment | cdc15 | ||||
| Genes | YBR048W | YGL030W | YKL006W | YLR333C | YOL120C |
| YDL061C | YGL103W | YKR057W | YLR367W | YOL127W | |
| YDL083C | YGR034W | YKR094C | YLR388W | YOR063W | |
| YDR064W | YGR214W | YLR075W | YML073C | YOR167C | |
| YDR418W | YHR203C | YLR167W | YML091C | YPL131W | |
| YER102W | YIL069C | YLR185W | YNL209W | ||
| YFR031C-A | YIL133C | YLR264W | YOL040C | ||
| GO terms | GO:0000462 | GO:0006396 | GO:0016072 | GO:0042274 | GO:0071704 |
| GO:0000469 | GO:0006725 | GO:0022613 | GO:0043170 | GO:0071840 | |
| GO:0000478 | GO:0006807 | GO:0030490 | GO:0044085 | GO:0090304 | |
| GO:0000479 | GO:0008152 | GO:0034470 | GO:0044237 | GO:0090305 | |
| GO:0000480 | GO:0009987 | GO:0034641 | GO:0044238 | GO:0090501 | |
| GO:0006139 | GO:0010467 | GO:0034660 | GO:0044260 | GO:0090502 | |
| GO:0006364 | GO:0016070 | GO:0042254 | GO:0046483 | GO:1901360 | |
Some GO terms and its genes in cdc28 datasets
| Environment | cdc28 | ||||
| Genes | YBL027W | YDL191W | YJR145C | YLR367W | YOR312C |
| YBL087C | YDR025W | YKL180W | YNL162W | YPL143W | |
| YBR048W | YDR064W | YKR057W | YNL302C | YPL198W | |
| YBR084C-A | YDR447C | YKR094C | YOL120C | YPR132W | |
| YBR181C | YHL001W | YLR185W | YOL121C | ||
| YDL075W | YJL189W | YLR287C-A | YOR234C | ||
| GO terms | GO:0009987 | GO:0044699 | GO:0044763 | ||
Some GO terms and its genes in cdc28 datasets
| Environment | Elution | ||||
| Genes | YBL047C | YEL048C | YJL154C | YLR361C | YNL192W |
| YBL099W | YER096W | YJR017C | YLR371W | YOR273C | |
| YBR038W | YFL038C | YJR032W | YLR417W | YOR332W | |
| YBR127C | YFR026C | YJR121W | YML034W | YPR156C | |
| YCR069W | YGR106C | YKL002W | YML078W | YPR165W | |
| YDL089W | YGR138C | YKL080W | YMR054W | ||
| YDR304C | YHL006C | YKL203C | YMR089C | ||
| YDR519W | YHR079C | YLR106C | YNL026W | ||
| GO terms | GO:0009987 | ||||