| Literature DB >> 24371834 |
Jingbo Xia1, Xing Zhang2, Daojun Yuan3, Lingling Chen4, Jonathan Webster5, Alex Chengyu Fang5.
Abstract
To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.Entities:
Mesh:
Year: 2013 PMID: 24371834 PMCID: PMC3859262 DOI: 10.1155/2013/853043
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Searching strategy for PubMed literature in rice.
| Searching content | PubMed hit |
|---|---|
| Binding | 1428 |
| Catabolism | 47 |
| Expression | 5170 |
| Localization | 816 |
| Phosphorylation | 226 |
| Regulation | 4067 |
| Transcription | 2624 |
| All of the above events | 6810 |
|
| 402 |
| (Oryza sativa) or rice | 33349 |
Algorithm 1Gene prioritization algorithm.
Sample list of evaluation of vital phrase by TF∗IDF (ti, d , D).
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| WRKY14 | 0.79 | 0.79 | 0 | 0 | 0.79 | 0 | 0 | 0.79 | 0.79 | 0 |
| RadA | 3.02 | 2.41 | 2.41 | 0 | 2.41 | 2.41 | 0 | 2.41 | 0 | 0 |
| UreD | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0 | 0 | 0.6 | 0 | 0 |
| CC-NBS-LRR | 4.22 | 2.41 | 1.21 | 0 | 2.41 | 0 | 0 | 0.6 | 1.21 | 0 |
| Urease | 18.45 | 1.35 | 0.9 | 0.45 | 0.9 | 0 | 0 | 0.45 | 0.45 | 0 |
| Hd6 | 7.85 | 3.02 | 0 | 0 | 3.02 | 0 | 0.6 | 3.02 | 0.6 | 0 |
| Carboxypeptidase | 15.85 | 8.56 | 0.32 | 0.95 | 6.02 | 0.95 | 0 | 5.07 | 0.63 | 0 |
| EUI | 2.2 | 1.8 | 0.2 | 0.2 | 1.6 | 0.2 | 0 | 1.6 | 0.6 | 0.2 |
| H2A | 1.9 | 1.59 | 0.32 | 0 | 0.95 | 0.32 | 0.32 | 0.32 | 0.95 | 0 |
| Prolin | 34.73 | 22.11 | 2.85 | 0.19 | 20.97 | 2.85 | 0.57 | 16.99 | 16.61 | 0.57 |
| Polypeptide | 36.82 | 18.6 | 5.69 | 0.19 | 14.14 | 2.85 | 1.14 | 8.92 | 7.78 | 0.66 |
| Reductase | 110.37 | 47.45 | 7.21 | 1.23 | 37.3 | 5.31 | 0.76 | 26.19 | 15.75 | 0.66 |
(d 1,2,…,10 = “rice”, “event”, “binding”, “catabolism”, “expression”, “localization”, “phosphorylation”, “regulation”, “transcript”, and “Xoo”.)
Voting results of key phrases with greatest importance.
| Term |
|
|
|
|
|
|
|
|
|
| Vote |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CR4 | 219 | 7 | 13 | 73 | 7 | 2 | 7 | 3 | 1 | 1 | 9 |
| Thioesterase | 106 | 6 | 1 | 63 | 6 | 1 | 6 | 14 | 20 | 8 | 9 |
| WRKY2 | 88 | 62 | 4 | 65 | 74 | 9 | 130 | 91 | 96 | 21 | 9 |
| Exonuclease-1 | 1 | 1 | 14 | 74 | 1 | 20 | 133 | 6 | 6 | 130 | 8 |
| Fibrillarin | 2 | 2 | 15 | 75 | 2 | 21 | 134 | 7 | 7 | 131 | 8 |
| Kinase-like | 204 | 149 | 2 | 64 | 76 | 16 | 40 | 16 | 2 | 79 | 8 |
| WRKY10 | 3 | 3 | 16 | 76 | 3 | 247 | 267 | 10 | 9 | 43 | 8 |
| WRKY30 | 4 | 4 | 17 | 77 | 4 | 248 | 268 | 11 | 10 | 44 | 8 |
| AML1 | 95 | 16 | 42 | 98 | 15 | 254 | 274 | 31 | 29 | 148 | 7 |
| Arginase | 91 | 60 | 19 | 32 | 66 | 292 | 310 | 12 | 11 | 133 | 7 |
| Constans | 96 | 17 | 43 | 99 | 16 | 255 | 275 | 32 | 30 | 149 | 7 |
| Decoy | 206 | 5 | 18 | 78 | 5 | 22 | 135 | 8 | 8 | 132 | 7 |
| Glutaredoxin-like | 6 | 9 | 35 | 94 | 11 | 38 | 149 | 20 | 362 | 376 | 7 |
| Glutathione-S-transferase | 227 | 181 | 32 | 91 | 196 | 17 | 12 | 92 | 3 | 7 | 7 |
| H2A | 103 | 145 | 5 | 66 | 75 | 10 | 20 | 4 | 95 | 211 | 7 |
| Metalloendopeptidase | 54 | 15 | 41 | 97 | 14 | 39 | 150 | 21 | 363 | 377 | 7 |
| PDR20 | 7 | 10 | 36 | 95 | 12 | 252 | 272 | 29 | 27 | 146 | 7 |
| RISBZ5 | 40 | 58 | 84 | 138 | 69 | 76 | 175 | 81 | 86 | 203 | 7 |
| SNF2P | 8 | 11 | 37 | 96 | 13 | 253 | 273 | 30 | 28 | 147 | 7 |
| YY2 | 41 | 59 | 85 | 139 | 70 | 77 | 176 | 82 | 87 | 204 | 7 |
| CIA | 297 | 168 | 33 | 92 | 166 | 4 | 8 | 156 | 22 | 10 | 6 |
| CR9 | 224 | 61 | 86 | 140 | 71 | 78 | 177 | 83 | 88 | 205 | 6 |
| EL3 | 71 | 117 | 20 | 79 | 68 | 294 | 47 | 126 | 14 | 136 | 6 |
| MtN21 | 55 | 85 | 462 | 463 | 24 | 43 | 153 | 24 | 25 | 144 | 6 |
| NPKL1 | 5 | 8 | 445 | 446 | 22 | 41 | 65 | 17 | 360 | 374 | 6 |
| Prohibitin | 202 | 148 | 6 | 67 | 26 | 260 | 92 | 151 | 5 | 20 | 6 |
| Ramy1 | 58 | 88 | 48 | 104 | 99 | 315 | 332 | 37 | 33 | 152 | 6 |
| UreD | 9 | 12 | 38 | 38 | 8 | 249 | 269 | 26 | 365 | 379 | 6 |
| UreF | 10 | 13 | 39 | 39 | 9 | 250 | 270 | 27 | 366 | 380 | 6 |
| UreG | 11 | 14 | 40 | 40 | 10 | 251 | 271 | 28 | 367 | 381 | 6 |
The sample of retrieving protein sequences.
| NCBI | Term | Annotation |
|---|---|---|
| 15721862 | CR4 | >gi 15721862 dbj BAB68389.1 CR4 [Oryza sativa] |
| 56201806 | Thioesterase | >gi 56201806 dbj BAD73256.1 putative acyl-(acyl carrier protein) thioesterase [Oryza sativa Japonica Group] |
| 50843956 | WRKY2 | >gi 50843956 gb AAT84156.1 transcription factor WRKY24 [Oryza sativa Indica Group] |
| 54111120 | Exonuclease-1 | >gi 54111120 dbj BAD60834.1 exonuclease-1 [Oryza sativa Japonica Group] |
| 18071363 | Brillarin | >gi 18071363 gb AAL58222.1 AC09088225 putative brillarin [Oryza sativa Japonica Group] |
| 1586408 | Kinase-like | >gi 1586408 prf 2203451 A receptor kinase-like protein |
| 50843970 | WRKY10 | >gi 50843970 gb AAT84163.1 transcription factor WRKY100 [Oryza sativa Indica Group] |
| 58042751 | WRKY30 | >gi 58042751 gb AAW63719.1 WRKY30 [Oryza sativa Japonica Group] |
| 52076187 | AML1 | >gi 52076187 dbj BAD46727.1 putative AML1 [Oryza sativa Japonica Group] |
| 30134457 | Arginase | >gi 301344557 gb ADK74000.1 arginase [Oryza sativa Indica Group] |
Sequence distribution for biological process in GO database.
| Go term | #Seq | Score | Parents | Evidence? |
|---|---|---|---|---|
| Cellular response to stimulus | 50 | 30 | Res, Cep | Yes |
| Regulation of biological process | 50 | 18 | Bir, Bip | |
| Response to stress | 44 | 44 | Res | Yes |
| Multicellular organismal development | 41 | 72.4 | Muo, Dep | |
| Response to biotic stimulus | 40 | 40 | Res | Yes |
| Primary metabolic process | 38 | 21.4 | Mep | |
| Response to external stimulus | 37 | 37.8 | Res | Yes |
| Anatomical structure development | 37 | 31.2 | Dep | |
| Cell death | 34 | 34 | Death, Cep | |
| Response to abiotic stimulus | 33 | 33 | Res | Yes |
| Establishment of localization | 33 | 19.8 | Loc, Bip | |
| Catabolic process | 30 | 30 | Mep | |
| Reproductive process | 30 | 6.48 | Bip, Rep | |
| Response to endogenous stimulus | 10 | 10 | Res | Yes |
| Macromolecule metabolic process | 10 | 3.6 | Mep | |
| Cellular metabolic process | 10 | 3..42 | Mep, Cep | |
| Cell cycle | 5 | 5 | Cep | |
| Regulation of biological quality | 4 | 0.88 | Bir | |
| Biosynthetic process | 3 | 3 | Mep | |
| Cell communication | 3 | 3 | Cep | |
| Nitrogen compound metabolic process | 3 | 1.08 | Mep | |
| Cellular homeostasis | 1 | 1 | Hop, Cep |
Multi Domain and Super family Data for Top 10 Sequence in CDD Hit.
| Query | Hit type | Short name | Description | Evidence? |
|---|---|---|---|---|
| Q#1->gi∣53793299 | Multidom | PLN00113 | LRR | Yes |
|
| ||||
| Q#2->gi∣2586087 | Superfamily | PKc_like superfamily | LRR and kinase | Yes |
| Superfamily | LRRNT_2 superfamily | |||
| Multidom | PLN00113 | |||
| Multidom | PLN03150 | |||
|
| ||||
| Q#3->gi∣343466349 | Specific | PKc | LRR and kinase | Yes |
| Superfamily | PKc_like superfamily | |||
| Superfamily | LRRNT_2 superfamily | |||
| Superfamily | LRR_RI superfamily | |||
| Multidom | PLN00113 | |||
|
| ||||
| Q#4->gi∣343466347 | Specific | PKc | LRR and kinase | Yes |
| Superfamily | PKc_like superfamily | |||
| Superfamily | LRRNT_2 superfamily | |||
| Superfamily | LRR_RI superfamily | |||
| Multidom | PLN00113 | |||
|
| ||||
| Q#5->gi∣63098460 | Superfamily | PKc_like superfamily | LRR and kinase | Yes |
| Multidom | PLN00113 | |||
|
| ||||
| Q#6->gi∣63098462 | Superfamily | PKc_like superfamily | LRR and kinase | Yes |
| Multidom | PLN00113 | |||
|
| ||||
| Q#7->gi∣63098474 | Superfamily | PKc_like superfamily | LRR and kinase | Yes |
| Multidom | PLN00113 | |||
|
| ||||
| Q#8->gi∣63098472 | Superfamily | PKc_like superfamily | LRR and kinase | Yes |
| Multidom | PLN00113 | |||
|
| ||||
| Q#9->gi∣63098486 | Superfamily | PKc_like superfamily | LRR and kinase | Yes |
| Multidom | PLN00113 | |||
|
| ||||
| Q#10->gi∣63098454 | Superfamily | LRR and kinase | Yes | |
| Multidom | PLN00113 | |||
Figure 1Flowchart of the Hybrid Strategy.