| Literature DB >> 35112110 |
Mu Gao1, Peik Lund-Andersen2, Alex Morehead3, Sajid Mahmud3, Chen Chen3, Xiao Chen3, Nabin Giri3, Raj S Roy3, Farhan Quadir3, T Chad Effler4, Ryan Prout4, Subil Abraham4, Wael Elwasif4, N Quentin Haas4, Jeffrey Skolnick1, Jianlin Cheng3, Ada Sedova4.
Abstract
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.Entities:
Keywords: computational biology; deep learning; high-performance computing; machine learning; protein sequence alignment; protein structure prediction
Year: 2021 PMID: 35112110 PMCID: PMC8802329 DOI: 10.1109/mlhpc54614.2021.00010
Source DB: PubMed Journal: Workshop Mach Learn HPC Environ ISSN: 2768-4237