| Literature DB >> 26525745 |
Zhiqiang Zeng1, Hua Shi1, Yun Wu1, Zhiling Hong2.
Abstract
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26525745 PMCID: PMC4615216 DOI: 10.1155/2015/674296
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Problems and methodology relationship between NLP and bioinformatics.
Web server for protein disorder prediction.
| Problem | Name | Websites | Input format |
|---|---|---|---|
| Protein disorder prediction | DisProt |
| Fasta or EMBL sequence format |
|
| |||
|
| |||
| DisEMBL |
| SwissProt ID | |
| DRIPPRED |
| Only plain sequence; one sequence once; slow | |
| FoldIndex |
| Only plain sequence; one sequence once | |
| IUPred |
| SwissProt ID or plain sequence | |
| PONDR |
| Fasta | |
| PSIPRED |
| Raw sequence or fasta format | |
| SCRATCH |
| Only plain sequence; one sequence once; slow | |
| Spritz |
| Raw sequence or fasta format | |
| RONN |
| Fasta, but only one sequence once |
Web server for protein-protein interaction and sites prediction.
| Problem | Name | Websites | Input format |
|---|---|---|---|
| Protein interaction sites prediction | PPISP |
| PDB file |
|
| |||
| Protemot |
| PDB ID | |
| SPPIDER |
| PDB file or PDB ID | |
| Whiscy |
| PDB file | |
|
| |||
| Protein-protein interaction prediction | InterPreTS |
| Fasta, 40 sequences at most |
| PIE |
| Gene ID or name | |
| PPI |
| Fasta | |
| PredHS |
| PDB files, 10 files at most | |
| Pred-PPI |
| Two fasta sequences | |
| Prism |
| Two PDB IDs or PDB files | |
| Struct2Net |
| Gene names or keywords | |
Multiple sequence alignment tools.
| Tool | Alignment method | URL |
|---|---|---|
| BLAT | Sequence-based |
|
| BLAST |
| |
| BWA-SW |
| |
|
| ||
| Multilign | Structure-based |
|
| FoldalignM |
| |
| LocARNA/LocARNA-P |
| |
| MASTR |
| |
| RAF |
| |
| RNASampler |
| |
| RNAshapes |
| |
| RNAalifold |
| |
| StemLoc | N.A. | |
| MAFFT |
| |
| MiRAlign |
| |
miRNA identification methods.
| Method | URL | Online service | Local service |
|---|---|---|---|
| MiPred |
| ✓ | ✓ |
| microPred |
| ✓ | |
| TripletSVM |
| ✓ | |
| PlantMiRNAPred |
| ✓ | ✓ |
| miRNApre |
| ✓ | ✓ |
| MIReNA |
| ✓ | |
| HuntMi |
| ✓ | |
| Mirident |
| ✓ | |
| CSHMM |
| ✓ | |
| HeteroMirPred |
| ✓ | ✓ |
Secondary prediction tools.
| Tool | URL |
|---|---|
| RNAfold |
|
| RNAstructure |
|
| mfold |
|
| vsfold |
|
| evofold |
|
| sfold |
|