| Literature DB >> 34283864 |
Lars A Bratholm1,2, Will Gerrard1, Brandon Anderson3,4,5, Shaojie Bai6,7, Sunghwan Choi8, Lam Dang9, Pavel Hanchar10, Addison Howard11, Sanghoon Kim12, Zico Kolter6,7, Risi Kondor3,4,13, Mordechai Kornbluth14, Youhan Lee15, Youngsoo Lee16, Jonathan P Mailoa14, Thanh Tu Nguyen9, Milos Popovic17, Goran Rakocevic17, Walter Reade11, Wonho Song18, Luka Stojanovic17, Erik H Thiede3,13, Nebojsa Tijanic17, Andres Torrubia19, Devin Willmott6, Craig P Butts1, David R Glowacki1,20,21.
Abstract
The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published 'in-house' efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.Entities:
Year: 2021 PMID: 34283864 DOI: 10.1371/journal.pone.0253612
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240