Literature DB >> 29989106

Special issue on Computational Resources and Methods in Biological Sciences.

Hao Lin1,2, Shaoliang Peng3, Jian Huang1,2.   

Abstract

This special issue covers a wide range of topics in computational biology, such as database construction, sequence analysis and function prediction with machine learning methods, disease-related diagnosis, drug-target and drug discovery, and electronic health record system construction.

Entities:  

Keywords:  Computational Method; Computational Resource

Mesh:

Year:  2018        PMID: 29989106      PMCID: PMC6036761          DOI: 10.7150/ijbs.27554

Source DB:  PubMed          Journal:  Int J Biol Sci        ISSN: 1449-2288            Impact factor:   6.580


Computational biology has become an increasingly important research area in biological sciences in the recent decades. The new-generation technologies have become relatively cheaper and more popular in biolabs, and huge volumes of biological data output have consequently driven the biological field to a type of data science to a certain extent 1. Computational resources and methods have since played a significant role in many biological studies 2. Big data in biological sciences have been considered for curation enhancement, efficient analysis and precise interpretation, and accurate modeling and prediction. The special issue was focused on the recent development of computational resources and methods in biological sciences. In this special issue, 20 studies have been published to elaborate the progresses and results of data collection and database construction, sequence analysis and classification, macromolecular structure analysis, disease-related analysis and diagnosis, and bioinformatics in biological techniques. With the accumulation of biological data, the collection, storage, and management of these data have become the prerequisite of subsequent biological analysis. Oryza sativa, one of the model plants, has attracted much attention because it is not only produced as a major cereal, but also because it is easy to study genetically. Jiang et al. manually collected the yield-related genomics and proteomics data of rice from published databases and the literature, and then constructed a comprehensive database called RicyerDB to store these data 3. The database comprises various yield-related information, such as genes, proteins, chromosomal locations, traits, protein-protein interaction networks, gene ontology, etc. All these data can be freely accessed at http://server.malab.cn/Ricyer/index.html. The database can also provide a relatively more convenient way to conduct yield-related research about rice. In the past ten years, an increasing number of scholars have focused on non-coding RNAs (ncRNAs) because this type of molecule plays an important role in various biological processes. Many databases about ncRNAs are currently built for different research purposes. To provide easy access and understanding of ncRNA resources for plant research, Liao et al. summarized 83 published ncRNA databases 4 and provided suggestions on how to use these resources and subsequently perform meaningful analysis on plant microRNAs (miRNAs) and LncRNAs. Sequence analysis and type classification are classical topics in computational biology. On the basis of the principle that sequence determines structure and structure determines function, eight papers in the special issue have focused on different computational methods that can be used to determine the structures and functions of biological macromolecules based on their sequence information. Membrane proteins play extremely important roles in biological signal transfer and are important candidates for drug-target discovery, but their structures are different to solve experimentally. According to the sequence alignment principle, Wang et al. developed a segment alignment method to predict the fold of alpha-helical transmembrane proteins (αTMP) by referring to topological structures and evolutionary information 5. The proposed method exhibited high alignment accuracy on a non-redundant dataset, and the RNA secondary structure is highly correlated with its function. Consequently, the prediction of the secondary structures of RNA has become a popular topic. However, an RNA sequence seldom provides sufficient information for RNA secondary structure prediction. Zhu et al. investigated the folding diversity of different RNA families and designed an RNA folding rule-based statistical learning model to predict RNA secondary structures 6. Many studies have performed prediction by directly using sequence information 7, 8, 9. Meiotic recombination is an important genome evolution mode, but it is not randomly distributed in a genome. Thus, to identify the region with high DNA recombination frequency, Yang et al. designed a support vector machine-based method to discriminate recombination hotspot from recombination coldspot by optimizing the key hexamer features via binomial distribution 10. A comparison with other published methods demonstrated that the proposed predictor was superior to its counterparts in terms of various evaluation indexes. Based on the method, the authors established a free webserver called iRSpot-Pse6NC, which can be freely accessed from http://lin-group.cn/server/iRSpot-Pse6NC. Several proteins with special functions have also attracted the attention of scholars. Hormone-binding proteins (HBPs) can interact with hormones, but their functions are currently unclear. Thus, a bioinformatics method was designed to correctly identify HBPs and provide vital clues about their functions. Tang et al. collected a high-quality benchmark dataset and developed a powerful predictor named HBPred (http://lin-group.cn/server/HBPred) to assist in the recognition of HBPs by their optimal dipeptide composition features 11. Matrix metalloprotease (MMP) is another kind of function-special protein related to metabolism. Song et al. presented a random forest-based method to identify peptides that bind with MMP by referring to the physicochemical properties of residues 12. As a result, 1300 known peptide motifs that bind with 7 MMPs were identified, and these results provided vital clues and insights into candidate inhibitor drugs. Some proteins with special properties, such as the self-interacting proteins (SIPs), also play important roles in cellular regulation. A predictive method was proposed to recognize SIPs based on sequence evolution information 13, and deep neural network was utilized to perform feature selection. The cross-validations showed that the proposed method can produce highly accurate results. Wang et al. verified the performance of their model by comparing the method with other existing techniques 13. Almost all biological processes are influenced by protein post-translational modifications (PTMs). Accordingly, methods for bioinformatics were developed to predict PTM sites in proteins. Liu et al. developed a powerful webserver called PTM-ssMP to identify PTM sites by using protein sequence information 14. The webserver provides nine kinds of PTM prediction and is freely available at http://bioinformatics.ustc.edu.cn/PTM-ssMP/index/. All users can easily obtain the research results from the published web servers without having to refer to detailed mathematical equations. CRISPR is a family of DNA sequences in bacteria and archaea, and it can be used as defense against phage attacks. At present, CRISPR has become the most popular biological technique for precise genome editing. However, the research community generally lacks user-friendly avenues to derive CRISPR-related information from high-throughput sequencing data. You et al. developed a tool called CRISPRMatch that can automatically process the high-throughput genome-editing data of CRISPR 15. The CRISPRMatch toolkit and the original code can be freely obtained from https://github.com/zhangtaolab/CRISPRMatch. Computational methods can be used for drug-target discovery, drug design, disease diagnosis, and health recording, which altogether can guide disease treatment. Ten studies performed intensive research on these subtopics. Cancer is the most common complex genetic disease. Scholars have exerted considerable effort to investigate the nosogenesis of cancer and design suitable drugs. miRNAs play important roles in multiple biological processes and have great potentials as drug candidates. Based on miRNA data about target genes and tissue specificity of diseases, as well as information from the Food and Drug Administration, Yu et al. constructed a drug-miRNA-disease network and developed a method called miTS to predict the potential treatment of diseases 16. The method was applied on breast cancer cases, and several new potential drugs for breast cancer treatment were found. The method provided a convenient way to determine new drug candidates to treat complex diseases. The identification of differentially expressed genes (DEGs) not only helps in understanding the nosogenesis of cancer, but it also provides vital clues for cancer treatment. However, the detection of weakly differential expression signals is difficult to conduct between two phenotypes with limited sample sizes. Cai et al. improved their previous method RankComp to detect DEGs between two phenotypes without the need of batch effect adjustments 17. The method with breast cancer drug-response data was applied to detect weak differential expression signals. Gastric cancer is one of the most common causes of cancer-related death in the world, and investigating its key genes can provide important clues for drug-target discovery. Zeng et al. constructed a co-expression network of gastric cancers and performed a series of enrichment analysis to detect the key genes 18. Their results found that most of the identified key genes were unique, which suggest that key genes can be regarded as the potential biomarker of a subtype, and each subtype of a gastric cancer may differ by occurrence and developmental mechanism. Gene fusion plays an important role in cancer, but the current methods on tumor DNA composition has low sensitivity. Chen et al. developed a highly efficient tool called GeneFuse to accurately detect and visualize gene fusions 19. The tool can be downloaded from https://github.com/OpenGene/GeneFuse. Cholangiocarcinoma is a rare and often fatal cancer. In the era of big data, translation bioinformatics can be enhanced with new methods and knowledge on cancer research. Qian et al. reviewed the development of various computational methods applied in cholangiocarcinoma, which ranged from biological data classification, knowledge discovery and annotation, and clinical application 20. They also discussed future opportunities and challenges, including data collection and precise diagnosis for cholangiocarcinoma treatment. Autism spectrum disorder (ASD) is a complex neurodevelopmental disease. However, the molecular mechanism of ASD is unclear. Huang et al. analyzed the RNA-sequence data from corpus callosum of six patients with ASD and six normal individuals 21. The co-expression module analysis for ASD-associated genes showed that the detected DEGs and ncRNAs were significantly involved in nervous system, sensory system, immune system, and organ development. These findings can help uncover core dysfunctional modules and understand the molecular mechanism of ASD. Cancers, age-related macular degeneration, choroidal neovascularization, and many other diseases are closely related to pathological angiogenesis, which can be treated with blockers of VEGF signaling pathway. According to previous knowledge, the peptide HRH can target vascular endothelial growth factor receptors and exhibit a dose-dependent angiogenesis-suppressing effect 22. Ning et al. designed a peptibody with an antiangiogenic effect by fusing the HRH peptide to a human IgG1 Fc fragment, and they obtained a new peptibody called PbHRH 23. A series of molecular dynamics simulations analysis was also performed, and they found that PbHRH can prevent the native VEGF-A binding site of VEGFR-1 D2 from directly binding its HRH functional domain. The designed peptibody may indirectly help improve the pharmacokinetic profile and bioavailability of HRH. Microorganisms that reside in the human body can influence human health, and the abnormal distribution of microorganisms can cause various diseases. Wu et al. developed a computational method called PRWHMDA to predict human micro-disease associations 24. The model was also applied to asthma, inflammatory bowel disease, and type 1 diabetes cases. The authors experimentally found and validated the existence of disease-associated microbes, which demonstrates that the method can be used to detect disease-related microbes. Pathophysiology signals provide key clues for disease diagnoses. Sputum sounds can be used to estimate the status of sputum deposition in a respiratory system. Shi et al. designed an artificial neural network-based method to recognize sputum sounds by using wavelet transform, and their method achieved highly accurate results 25. The proposed method can contribute to the efficiency of intensive care unit (ICU) staff in achieving timely clearance of secretion among patients with mechanical ventilation. The era of big data has rendered it highly important for online diagnostic services to store, process, and transmit electronic health records (EHRs) of patients across several healthcare platforms, even among clinical researchers. Yang et al. applied the MicroService Architecture to construct an EHR system that can enable clinical researchers and governmental or non-governmental organizations to conveniently and directly obtain clinical-related data 26. In summary, this special issue covers a wide range of topics in computational biology, such as database construction, sequence analysis and function prediction with machine learning methods, disease-related diagnosis, drug-target and drug discovery, and electronic health record system construction. The databases, methods, tools and discoveries published in this special issue can offer important knowledge and ways for the life sciences. Moreover, this special issue can promote the development of computational biology and bioinformatics-related fields.
  26 in total

1.  PSBinder: A Web Service for Predicting Polystyrene Surface-Binding Peptides.

Authors:  Ning Li; Juanjuan Kang; Lixu Jiang; Bifang He; Hao Lin; Jian Huang
Journal:  Biomed Res Int       Date:  2017-12-27       Impact factor: 3.411

2.  RicyerDB: A Database For Collecting Rice Yield-related Genes with Biological Analysis.

Authors:  Jing Jiang; Fei Xing; Xiangxiang Zeng; Quan Zou
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

3.  Identifying differentially expressed genes from cross-site integrated data based on relative expression orderings.

Authors:  Hao Cai; Xiangyu Li; Jing Li; Qirui Liang; Weicheng Zheng; Qingzhou Guan; Zheng Guo; Xianlong Wang
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

4.  PTM-ssMP: A Web Server for Predicting Different Types of Post-translational Modification Sites Using Novel Site-specific Modification Profile.

Authors:  Yu Liu; Minghui Wang; Jianing Xi; Fenglin Luo; Ao Li
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

5.  MicroShare: Privacy-Preserved Medical Resource Sharing through MicroService Architecture.

Authors:  Yilong Yang; Quan Zu; Peng Liu; Defang Ouyang; Xiaoshan Li
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

6.  PRWHMDA: Human Microbe-Disease Association Prediction by Random Walk on the Heterogeneous Network with PSO.

Authors:  Chuanyan Wu; Rui Gao; Daoliang Zhang; Shiyun Han; Yusen Zhang
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

7.  Improving Prediction of Self-interacting Proteins Using Stacked Sparse Auto-Encoder with PSSM profiles.

Authors:  Yan-Bin Wang; Zhu-Hong You; Li-Ping Li; De-Shuang Huang; Feng-Feng Zhou; Shan Yang
Journal:  Int J Biol Sci       Date:  2018-05-23       Impact factor: 6.580

8.  GeneFuse: detection and visualization of target gene fusions from DNA sequencing data.

Authors:  Shifu Chen; Ming Liu; Tanxiao Huang; Wenting Liao; Mingyan Xu; Jia Gu
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

9.  A Sequential Segment Based Alpha-Helical Transmembrane Protein Alignment Method.

Authors:  Han Wang; Jingru Wang; Li Zhang; Pingping Sun; Ning Du; Yanwen Li
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

10.  BDB: biopanning data bank.

Authors:  Bifang He; Guoshi Chai; Yaocong Duan; Zhiqiang Yan; Liuyang Qiu; Huixiong Zhang; Zechun Liu; Qiang He; Ke Han; Beibei Ru; Feng-Biao Guo; Hui Ding; Hao Lin; Xianlong Wang; Nini Rao; Peng Zhou; Jian Huang
Journal:  Nucleic Acids Res       Date:  2015-10-25       Impact factor: 16.971

View more
  1 in total

1.  SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis.

Authors:  Chuang Li; Kenli Li; Keqin Li; Xianghui Xie; Feng Lin
Journal:  Int J Biol Sci       Date:  2019-07-03       Impact factor: 6.580

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.