Literature DB >> 27514036

Deep Visual-Semantic Alignments for Generating Image Descriptions.

Andrej Karpathy, Li Fei-Fei.   

Abstract

We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Our alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks (RNN) over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Multimodal Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate that our alignment model produces state of the art results in retrieval experiments on Flickr8K, Flickr30K and MSCOCO datasets. We then show that the generated descriptions outperform retrieval baselines on both full images and on a new dataset of region-level annotations. Finally, we conduct large-scale analysis of our RNN language model on the Visual Genome dataset of 4.1 million captions and highlight the differences between image and region-level caption statistics.

Year:  2016        PMID: 27514036     DOI: 10.1109/TPAMI.2016.2598339

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  25 in total

Review 1.  [The future of radiology: What can we expect within the next 10 years?].

Authors:  F Nensa; M Forsting; A Wetter
Journal:  Urologe A       Date:  2016-03       Impact factor: 0.639

2.  Demographic-Guided Attention in Recurrent Neural Networks for Modeling Neuropathophysiological Heterogeneity.

Authors:  Nicha C Dvornek; Xiaoxiao Li; Juntang Zhuang; Pamela Ventola; James S Duncan
Journal:  Mach Learn Med Imaging       Date:  2020-09-29

3.  A Survey on Multi-View Clustering.

Authors:  Guoqing Chao; Shiliang Sun; Jinbo Bi
Journal:  IEEE Trans Artif Intell       Date:  2021-04-05

4.  Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment.

Authors:  Zhanghexuan Ji; Mohammad Abuzar Shaikh; Dana Moukheiber; Sargur N Srihari; Yifan Peng; Mingchen Gao
Journal:  Mach Learn Med Imaging       Date:  2021-09-21

5.  On Interpretability of Artificial Neural Networks: A Survey.

Authors:  Feng-Lei Fan; Jinjun Xiong; Mengzhou Li; Ge Wang
Journal:  IEEE Trans Radiat Plasma Med Sci       Date:  2021-03-17

6.  Bilinear pooling in video-QA: empirical challenges and motivational drift from neurological parallels.

Authors:  Thomas Winterbottom; Sarah Xiao; Alistair McLean; Noura Al Moubayed
Journal:  PeerJ Comput Sci       Date:  2022-06-03

Review 7.  Artificial intelligence in radiology.

Authors:  Ahmed Hosny; Chintan Parmar; John Quackenbush; Lawrence H Schwartz; Hugo J W L Aerts
Journal:  Nat Rev Cancer       Date:  2018-08       Impact factor: 60.716

8.  Improvement diagnostic accuracy of sinusitis recognition in paranasal sinus X-ray using multiple deep learning models.

Authors:  Hyug-Gi Kim; Kyung Mi Lee; Eui Jong Kim; Jin San Lee
Journal:  Quant Imaging Med Surg       Date:  2019-06

Review 9.  Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms.

Authors:  Mohammed AlQuraishi; Peter K Sorger
Journal:  Nat Methods       Date:  2021-10-04       Impact factor: 28.547

Review 10.  AI musculoskeletal clinical applications: how can AI increase my day-to-day efficiency?

Authors:  YiRang Shin; Sungjun Kim; Young Han Lee
Journal:  Skeletal Radiol       Date:  2021-08-03       Impact factor: 2.199

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.