Literature DB >> 35778754

SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer.

Zhanpeng Xu1, Jianhua Li2, Zhaopeng Yang1, Shiliang Li3, Honglin Li3.   

Abstract

Optical chemical structure recognition from scientific publications is essential for rediscovering a chemical structure. It is an extremely challenging problem, and current rule-based and deep-learning methods cannot achieve satisfactory recognition rates. Herein, we propose SwinOCSR, an end-to-end model based on a Swin Transformer. This model uses the Swin Transformer as the backbone to extract image features and introduces Transformer models to convert chemical information from publications into DeepSMILES. A novel chemical structure dataset was constructed to train and verify our method. Our proposed Swin Transformer-based model was extensively tested against the backbone of existing publicly available deep learning methods. The experimental results show that our model significantly outperforms the compared methods, demonstrating the model's effectiveness. Moreover, we used a focal loss to address the token imbalance problem in the text representation of the chemical structure diagram, and our model achieved an accuracy of 98.58%.
© 2022. The Author(s).

Entities:  

Keywords:  Chemical Structure Recognition; Deep Learning; End-to-End Model; Swin Transfromer

Year:  2022        PMID: 35778754      PMCID: PMC9248127          DOI: 10.1186/s13321-022-00624-5

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   8.489


  11 in total

1.  CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition.

Authors:  Aniko T Valko; A Peter Johnson
Journal:  J Chem Inf Model       Date:  2009-04       Impact factor: 4.956

2.  Molecular Structure Extraction from Documents Using Deep Learning.

Authors:  Joshua Staker; Kyle Marshall; Robert Abel; Carolyn M McQuaw
Journal:  J Chem Inf Model       Date:  2019-02-27       Impact factor: 4.956

3.  Markov logic networks for optical chemical structure recognition.

Authors:  Paolo Frasconi; Francesco Gabbrielli; Marco Lippi; Simone Marinai
Journal:  J Chem Inf Model       Date:  2014-08-06       Impact factor: 4.956

4.  Optical structure recognition software to recover chemical information: OSRA, an open source solution.

Authors:  Igor V Filippov; Marc C Nicklaus
Journal:  J Chem Inf Model       Date:  2009-03       Impact factor: 4.956

5.  The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics.

Authors:  Christoph Steinbeck; Yongquan Han; Stefan Kuhn; Oliver Horlacher; Edgar Luttmann; Egon Willighagen
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr

6.  DECIMER: towards deep learning for chemical image recognition.

Authors:  Kohulan Rajan; Achim Zielesny; Christoph Steinbeck
Journal:  J Cheminform       Date:  2020-10-27       Impact factor: 5.514

7.  Automated extraction of chemical structure information from digital raster images.

Authors:  Jungkap Park; Gus R Rosania; Kerby A Shedden; Mandee Nguyen; Naesung Lyu; Kazuhiro Saitou
Journal:  Chem Cent J       Date:  2009-02-05       Impact factor: 4.215

8.  PubChem 2019 update: improved access to chemical data.

Authors:  Sunghwan Kim; Jie Chen; Tiejun Cheng; Asta Gindulyte; Jia He; Siqian He; Qingliang Li; Benjamin A Shoemaker; Paul A Thiessen; Bo Yu; Leonid Zaslavsky; Jian Zhang; Evan E Bolton
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.