VQA
VQA
-
Vision-to-Language Tasks Based on Attributes and Attention Mechanism
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Relation
VQA
-
Leveraging Medical Visual Question Answering with Supporting Facts
arXiv_AI
arXiv_AI
QA
GAN
Transfer_Learning
VQA
-
Why do These Match? Explaining the Behavior of Image Similarity Models
arXiv_CV
arXiv_CV
Salient
Image_Classification
Classification
Deep_Learning
VQA
Recognition
-
Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
arXiv_CV
arXiv_CV
Knowledge
QA
Attention
GAN
Represenation_Learning
Classification
VQA
-
Self-Critical Reasoning for Robust Visual Question Answering
arXiv_CV
arXiv_CV
QA
Relation
VQA
-
Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Prediction
Relation
VQA
-
Towards VQA Models That Can Read
arXiv_CV
arXiv_CV
QA
VQA
-
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
arXiv_CV
arXiv_CV
Regularization
QA
Attention
Quantitative
VQA
-
TVQA: Localized, Compositional Video Question Answering
arXiv_AI
arXiv_AI
QA
VQA
-
State-of-the-art in 360° Video/Image Processing: Perception, Assessment and Compression
arXiv_CV
arXiv_CV
Review
QA
Attention
Survey
VQA
-
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
arXiv_AI
arXiv_AI
VQA
-
Scene Graph Prediction with Limited Labels
arXiv_CV
arXiv_CV
Sparse
Knowledge
Caption
Transfer_Learning
Prediction
Relation
VQA
-
TVQA+: Spatio-Temporal Grounding for Video Question Answering
arXiv_AI
arXiv_AI
QA
VQA
-
Challenges and Prospects in Vision and Language Research
arXiv_CV
arXiv_CV
Image_Caption
Review
VQA
-
Towards VQA Models that can Read
arXiv_CV
arXiv_CV
QA
VQA
-
Progressive Attention Memory Network for Movie Story Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Inference
Prediction
VQA
-
Question Guided Modular Routing Networks for Visual Question Answering
arXiv_CV
arXiv_CV
Knowledge
QA
Face
VQA
-
Evaluating the Representational Hub of Language and Vision Models
arXiv_CV
arXiv_CV
VQA
-
Factor Graph Attention
arXiv_AI
arXiv_AI
Attention
VQA
-
Text Guided Person Image Synthesis
arXiv_CV
arXiv_CV
QA
VQA
-
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
arXiv_CV
arXiv_CV
QA
RNN
VQA
-
Multi-Target Embodied Question Answering
arXiv_CV
arXiv_CV
QA
VQA
-
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
arXiv_CV
arXiv_CV
Knowledge
Transfer_Learning
VQA
Recognition
-
Recursive Visual Attention in Visual Dialog
arXiv_CV
arXiv_CV
QA
Attention
Quantitative
VQA
-
Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness
arXiv_CV
arXiv_CV
QA
VQA
-
Actively Seeking and Learning from Live Data
arXiv_CV
arXiv_CV
QA
Face
Caption
VQA
-
MMED: A Multi-domain and Multi-modality Event Dataset
arXiv_CV
arXiv_CV
Knowledge
GAN
VQA
-
Relation-aware Graph Attention Network for Visual Question Answering
arXiv_AI
arXiv_AI
QA
Attention
Relation
VQA
-
Information Maximizing Visual Question Generation
arXiv_CV
arXiv_CV
Quantitative
VQA
-
Dual Recurrent Attention Units for Visual Question Answering
arXiv_CV
arXiv_CV
Knowledge
QA
Attention
CNN
VQA
-
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning
arXiv_CV
arXiv_CV
QA
Inference
VQA
-
Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data
arXiv_CL
arXiv_CL
QA
VQA
-
AI2-THOR: An Interactive 3D Environment for Visual AI
arXiv_AI
arXiv_AI
Object_Detection
Segmentation
Reinforcement_Learning
Represenation_Learning
Detection
VQA
-
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
arXiv_AI
arXiv_AI
QA
Tracking
Detection
Relation
VQA
Recognition
-
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images
arXiv_CV
arXiv_CV
Adversarial
GAN
CNN
Detection
VQA
-
Answer Them All! Toward Universal Visual Question Answering Models
arXiv_CV
arXiv_CV
Image_Caption
QA
VQA
-
Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR images
arXiv_CV
arXiv_CV
Adversarial
GAN
CNN
Detection
VQA
-
GQA: a new dataset for compositional question answering over real-world images
arXiv_AI
arXiv_AI
QA
RNN
VQA
-
MUREL: Multimodal Relational Reasoning for Visual Question Answering
arXiv_AI
arXiv_AI
QA
Attention
Relation
VQA
-
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
arXiv_CV
arXiv_CV
QA
Attention
Quantitative
Relation
VQA
-
Systematic Generalization: What Is Required and Can It Be Learned?
arXiv_CV
arXiv_CV
Knowledge
QA
VQA
-
Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
arXiv_AI
arXiv_AI
QA
Prediction
VQA
-
Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention
arXiv_AI
arXiv_AI
QA
Attention
Caption
Language_Model
Relation
VQA
-
Cycle-Consistency for Robust Visual Question Answering
arXiv_CV
arXiv_CV
QA
VQA
-
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Caption
Language_Model
Prediction
VQA
-
Rethinking Visual Relationships for High-level Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
VQA
-
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
arXiv_CV
arXiv_CV
QA
Represenation_Learning
Deep_Learning
Detection
Relation
VQA
-
Visual Entailment Task for Visually-Grounded Language Learning
arXiv_CV
arXiv_CV
QA
Inference
VQA
-
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Inference
VQA
-
Learning a Deep Convolution Network with Turing Test Adversaries for Microscopy Image Super Resolution
arXiv_CV
arXiv_CV
Adversarial
Super_Resolution
CNN
VQA
-
Assessing Visual Quality of Omnidirectional Videos
arXiv_CV
arXiv_CV
Knowledge
QA
VQA
-
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
arXiv_CV
arXiv_CV
Knowledge
QA
Represenation_Learning
VQA
Recognition
-
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
arXiv_CV
arXiv_CV
Object_Detection
Knowledge
Segmentation
Quantitative
Detection
VQA
-
A Novel Framework for Robustness Analysis of Visual QA Models
arXiv_CV
arXiv_CV
Adversarial
QA
Optimization
VQA
-
Multi-modal Learning with Prior Visual Relation Reasoning
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
QA
Embedding
CNN
Relation
VQA
-
Multi-task Learning of Hierarchical Vision-Language Representation
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
Prediction
Relation
VQA
-
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Prediction
VQA
-
From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts
arXiv_CV
arXiv_CV
Knowledge
QA
Attention
Embedding
Inference
VQA
-
Visual Question Answering as Reading Comprehension
arXiv_CV
arXiv_CV
Knowledge
QA
VQA
-
Explicit Bias Discovery in Visual Question Answering Models
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
arXiv_CV
arXiv_CV
Regularization
Adversarial
QA
Relation
VQA
-
Zero-Shot Transfer VQA Dataset
arXiv_CV
arXiv_CV
Knowledge
QA
GAN
VQA
-
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Relation
VQA
-
TallyQA: Answering Complex Counting Questions
arXiv_CV
arXiv_CV
Object_Detection
QA
Detection
Relation
VQA
-
Gated Hierarchical Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Reinforcement_Learning
Caption
CNN
Prediction
VQA
Recognition
-
Do Explanations make VQA Models more Predictable to a Human?
arXiv_CV
arXiv_CV
QA
VQA
-
Bilinear Attention Networks
arXiv_CV
arXiv_CV
QA
Attention
Quantitative
VQA
-
Knowing Where to Look? Analysis on Attention of Visual Question Answering System
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
Convolutional Neural Networks for Video Quality Assessment
arXiv_CV
arXiv_CV
QA
CNN
Deep_Learning
VQA
-
Textually Enriched Neural Module Networks for Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
QA
Attention
Caption
VQA
Recognition
-
The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA
arXiv_CV
arXiv_CV
QA
Quantitative
VQA
-
Faithful Multimodal Explanation for Visual Question Answering
arXiv_CV
arXiv_CV
QA
VQA
-
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
arXiv_CV
arXiv_CV
QA
VQA
-
Interpretable Visual Question Answering by Reasoning on Dependency Trees
arXiv_CV
arXiv_CV
QA
Attention
Relation
VQA
-
From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks
arXiv_CV
arXiv_CV
Knowledge
QA
Classification
VQA
-
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions
arXiv_CV
arXiv_CV
QA
Caption
Prediction
Quantitative
VQA
-
NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning
arXiv_CV
arXiv_CV
Video_Caption
Caption
NMT
Classification
Deep_Learning
VQA
-
Multimodal Differential Network for Visual Question Generation
arXiv_CV
arXiv_CV
Caption
Quantitative
VQA
-
Question-Guided Hybrid Convolution for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Relation
VQA
-
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
arXiv_CV
arXiv_CV
QA
Attention
CNN
VQA
-
Visual Reference Resolution using Attention Memory for Visual Dialog
arXiv_CV
arXiv_CV
QA
Attention
Prediction
VQA
-
Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining
arXiv_CV
arXiv_CV
QA
Attention
Relation
VQA
-
Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model
arXiv_CV
arXiv_CV
QA
Deep_Learning
VQA
-
A user model for JND-based video quality assessment: theory and applications
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
Pythia v0.1: the Winning Entry to the VQA Challenge 2018
arXiv_CV
arXiv_CV
QA
Face
VQA
-
On the Flip Side: Identifying Counterexamples in Visual Question Answering
arXiv_CV
arXiv_CV
QA
Prediction
VQA
-
Question Relevance in Visual Question Answering
arXiv_CV
arXiv_CV
QA
VQA
-
Reciprocal Attention Fusion for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Relation
VQA
-
A Dataset and Architecture for Visual Reasoning with a Working Memory
arXiv_CV
arXiv_CV
QA
Deep_Learning
VQA
-
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
arXiv_CV
arXiv_CV
Object_Detection
Knowledge
QA
Attention
Embedding
Detection
Relation
VQA
-
Question Type Guided Attention in Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
VQA
Recognition
-
A JND-based Video Quality Assessment Model and Its Application
arXiv_CV
arXiv_CV
QA
VQA
-
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
arXiv_CV
arXiv_CV
QA
Attention
Caption
VQA
-
Learning Visual Knowledge Memory Networks for Visual Question Answering
arXiv_CV
arXiv_CV
Knowledge
QA
Embedding
Relation
Memory_Networks
VQA
-
iParaphrasing: Extracting Visually Grounded Paraphrases via an Image
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
VQA
-
CS-VQA: Visual Question Answering with Compressively Sensed Images
arXiv_CV
arXiv_CV
QA
VQA
Recognition
-
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Relation
VQA
Recognition
-
Robustness Analysis of Visual QA Models by Basic Questions
arXiv_CV
arXiv_CV
QA
Optimization
VQA
-
A Case for Variability-Aware Policies for NISQ-Era Quantum Computers
arXiv_CV
arXiv_CV
QA
VQA
-
Joint Image Captioning and Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
QA
Caption
VQA
-
Tree Memory Networks for Modelling Long-term Temporal Dependencies
arXiv_CV
arXiv_CV
RNN
Prediction
Relation
Memory_Networks
VQA
-
VizWiz Grand Challenge: Answering Visual Questions from Blind People
arXiv_CV
arXiv_CV
QA
VQA
-
The Effects of Statistical Multiplicity of Infection on Virus Quantification and Infectivity Assays
arXiv_CV
arXiv_CV
QA
VQA
-
Generalized Hadamard-Product Fusion Operators for Visual Question Answering
arXiv_CV
arXiv_CV
QA
VQA
-
Fooling Vision and Language Models Despite Localization and Attention Mechanism
arXiv_CV
arXiv_CV
Adversarial
QA
Attention
Caption
Language_Model
VQA
-
Visual Question Reasoning on General Dependency Tree
arXiv_CV
arXiv_CV
Salient
Adversarial
Knowledge
QA
Attention
Relation
VQA
-
Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering
arXiv_CV
arXiv_CV
Image_Caption
Caption
VQA
-
DVQA: Understanding Data Visualizations via Question Answering
arXiv_CV
arXiv_CV
QA
Face
VQA
-
Visual Question Answering with Memory-Augmented Networks
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering
arXiv_CV
arXiv_CV
Knowledge
QA
Face
Language_Model
Relation
VQA
-
Attention on Attention: Architectures for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Deep_Learning
VQA
-
Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
arXiv_CV
arXiv_CV
QA
Reinforcement_Learning
VQA
-
iVQA: Inverse Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Inference
VQA
-
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Salient
QA
Attention
Caption
VQA
-
ParlAI: A Dialog Research Software Platform
arXiv_CV
arXiv_CV
QA
Reinforcement_Learning
RNN
Memory_Networks
VQA
-
Interpretable Counting for Visual Question Answering
arXiv_CV
arXiv_CV
QA
VQA
-
Learning to Count Objects in Natural Images for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
Generating Triples with Adversarial Networks for Scene Graph Construction
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Object_Detection
Attention
GAN
Caption
Image_Classification
Classification
Deep_Learning
Detection
Relation
VQA
-
Object-based reasoning in VQA
arXiv_CV
arXiv_CV
Object_Detection
QA
Detection
VQA
-
Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing
arXiv_CV
arXiv_CV
QA
VQA
-
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
QA
Attention
Caption
Inference
Detection
VQA
-
Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
CNN
VQA
-
Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Embedding
Detection
VQA
-
Learning by Asking Questions
arXiv_CV
arXiv_CV
QA
VQA
-
Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks
arXiv_CV
arXiv_CV
Knowledge_Graph
Knowledge
QA
Dynamic_Memory_Network
Attention
Relation
Memory_Networks
VQA
-
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
RNN
VQA
-
Visual Question Answering as a Meta Learning Task
arXiv_CV
arXiv_CV
QA
VQA
-
High-Order Attention Models for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Relation
VQA
-
Active Learning for Visual Question Answering: An Empirical Study
arXiv_CV
arXiv_CV
QA
VQA
-
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
arXiv_CV
arXiv_CV
QA
Embedding
VQA
Recognition
-
Optimally Stopped Variational Quantum Algorithms
arXiv_CV
arXiv_CV
QA
Optimization
VQA
-
It Takes Two to Tango: Towards Theory of AI's Mind
arXiv_CV
arXiv_CV
Knowledge
QA
Attention
Prediction
VQA
-
Survey of Recent Advances in Visual Question Answering
arXiv_CV
arXiv_CV
QA
Survey
VQA
-
Visual Question Generation as Dual Task of Visual Question Answering
arXiv_CV
arXiv_CV
QA
Relation
VQA
-
Exploring Human-like Attention Supervision in Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
Speech-Based Visual Question Answering
arXiv_CV
arXiv_CV
QA
Speech_Recognition
VQA
Recognition
-
An Analysis of Visual Question Answering Algorithms
arXiv_CV
arXiv_CV
QA
Attention
GAN
VQA
-
VQABQ: Visual Question Answering by Basic Questions
arXiv_CV
arXiv_CV
QA
Optimization
VQA
-
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
arXiv_CV
arXiv_CV
QA
Prediction
Detection
Relation
VQA
-
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
arXiv_CV
arXiv_CV
QA
Segmentation
Attention
Semantic_Segmentation
Language_Model
VQA
-
Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Prediction
Relation
VQA
-
Learning to Disambiguate by Asking Discriminative Questions
arXiv_CV
arXiv_CV
Image_Caption
Weakly_Supervised
Caption
Quantitative
VQA
-
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
arXiv_CV
arXiv_CV
QA
Attention
Embedding
VQA
-
FVQA: Fact-based Visual Question Answering
arXiv_CV
arXiv_CV
Knowledge
QA
Attention
Relation
VQA
-
Structured Attentions for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Inference
Relation
VQA
-
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models
arXiv_CV
arXiv_CV
QA
Attention
Deep_Learning
VQA
-
End-to-End Instance Segmentation with Recurrent Attention
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Attention
Caption
CNN
Semantic_Segmentation
RNN
Prediction
VQA
-
Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures
arXiv_CV
arXiv_CV
Sparse
QA
VQA
-
Compact Tensor Pooling for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Embedding
RNN
VQA
-
Visual Question Answering: Datasets, Algorithms, and Future Challenges
arXiv_CV
arXiv_CV
Image_Caption
Review
QA
Deep_Learning
VQA
-
Automatic Generation of Grounded Visual Questions
arXiv_CV
arXiv_CV
Knowledge
Caption
VQA
-
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
VQA
-
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Relation
VQA
-
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
VQA
-
Survey of Visual Question Answering: Datasets and Techniques
arXiv_CV
arXiv_CV
QA
Attention
Survey
Deep_Learning
VQA
-
Counting Everyday Objects in Everyday Scenes
arXiv_CV
arXiv_CV
Object_Detection
QA
Detection
VQA
-
DualNet: Domain-Invariant Network for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Embedding
VQA
-
C-VQA: A Compositional Split of the Visual Question Answering v1.0 Dataset
arXiv_CV
arXiv_CV
QA
Attention
Deep_Learning
Relation
VQA
-
What's in a Question: Using Visual Questions as a Form of Supervision
arXiv_CV
arXiv_CV
QA
Quantitative
VQA
-
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
arXiv_CV
arXiv_CV
QA
VQA
-
Graph-Structured Representations for Visual Question Answering
arXiv_CV
arXiv_CV
QA
RNN
VQA
-
Hadamard Product for Low-rank Bilinear Pooling
arXiv_CV
arXiv_CV
QA
Segmentation
Attention
VQA
Recognition
-
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
VQA
-
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
arXiv_CV
arXiv_CV
Adversarial
QA
Attention
Reinforcement_Learning
Caption
CNN
Image_Classification
Classification
Prediction
VQA
-
Dual Attention Networks for Multimodal Reasoning and Matching
arXiv_CV
arXiv_CV
QA
Attention
Inference
VQA
-
Task-driven Visual Saliency and Attention-based Visual Question Answering
arXiv_CV
arXiv_CV
Salient
QA
Attention
RNN
Relation
VQA
-
Grad-CAM: Why did you say that?
arXiv_CV
arXiv_CV
Image_Caption
QA
Caption
CNN
Prediction
Relation
VQA
-
Hierarchical Question-Image Co-Attention for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
CNN
VQA
-
Leveraging Video Descriptions to Learn Video Question Answering
arXiv_CV
arXiv_CV
Video_Caption
QA
VQA
-
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
CNN
RNN
VQA
-
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
arXiv_CV
arXiv_CV
QA
Segmentation
Attention
Detection
VQA
-
VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
arXiv_CV
arXiv_CV
QA
CNN
VQA
-
Ask Your Neurons: A Deep Learning Approach to Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Deep_Learning
VQA
-
Revisiting Visual Question Answering Baselines
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Classification
VQA
-
Zero-Shot Visual Question Answering
arXiv_CV
arXiv_CV
QA
Embedding
VQA
-
VQA: Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Caption
VQA
-
Proposing Plausible Answers for Open-ended Visual Question Answering
arXiv_CV
arXiv_CV
QA
VQA
-
Open-Ended Visual Question-Answering
arXiv_CV
arXiv_CV
QA
Embedding
CNN
RNN
Deep_Learning
VQA
-
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Quantitative
VQA
-
Tutorial on Answering Questions about Images with Deep Learning
arXiv_CV
arXiv_CV
QA
RNN
Deep_Learning
VQA
-
Training Recurrent Answering Units with Joint Loss Minimization for VQA
arXiv_CV
arXiv_CV
Knowledge
QA
Attention
Inference
Prediction
VQA
-
Analyzing the Behavior of Visual Question Answering Models
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions
arXiv_CV
arXiv_CV
QA
Caption
RNN
VQA
-
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
arXiv_CV
arXiv_CV
QA
Attention
VQA
-
The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering
arXiv_CV
arXiv_CV
QA
Caption
VQA
-
Towards Transparent AI Systems: Interpreting Visual Question Answering Models
arXiv_CV
arXiv_CV
QA
Attention
Quantitative
VQA
-
Leveraging Visual Question Answering for Image-Caption Ranking
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Knowledge
QA
Caption
VQA
-
Measuring Machine Intelligence Through Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Caption
VQA
-
Visual Question: Predicting If a Crowd Will Agree on the Answer
arXiv_CV
arXiv_CV
QA
VQA
-
Solving Visual Madlibs with Multiple Cues
arXiv_CV
arXiv_CV
QA
Classification
Prediction
Relation
VQA
-
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering
arXiv_CV
arXiv_CV
QA
CNN
Classification
Prediction
Relation
VQA
-
Visual Question Answering: A Survey of Methods and Datasets
arXiv_CV
arXiv_CV
Review
Knowledge
QA
Attention
Face
Survey
CNN
RNN
VQA
-
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
arXiv_CV
arXiv_CV
QA
Attention
Face
Quantitative
Relation
VQA
-
Improved Techniques for Training GANs
arXiv_CV
arXiv_CV
Adversarial
GAN
Classification
VQA
-
Generating Natural Questions About an Image
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Inference
VQA
-
What value do explicit high level concepts have in vision to language problems?
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
VQA
-
Subjective Assessment of H.264 Compressed Stereoscopic Video
arXiv_CV
arXiv_CV
QA
Attention
Relation
VQA
-
Yin and Yang: Balancing and Answering Binary Visual Questions
arXiv_CV
arXiv_CV
QA
VQA
Recognition
-
Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources
arXiv_CV
arXiv_CV
Knowledge
QA
VQA
-
A Focused Dynamic Attention Model for Visual Question Answering
arXiv_CV
arXiv_CV
Object_Detection
QA
Attention
RNN
Detection
VQA
Recognition
-
ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
CNN
Deep_Learning
VQA
-
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Classification
Detection
VQA
Recognition
-
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Caption
CNN
Inference
RNN
Memory_Networks
VQA
-
Dynamic Memory Networks for Visual and Textual Question Answering
arXiv_CV
arXiv_CV
Dynamic_Memory_Network
Attention
Memory_Networks
VQA
-
Where To Look: Focus Regions for Visual Question Answering
arXiv_CV
arXiv_CV
Knowledge
QA
VQA
-
A Restricted Visual Turing Test for Deep Scene and Event Understanding
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Face
Ontology
Caption
Inference
VQA
-
Simple Baseline for Visual Question Answering
arXiv_CV
arXiv_CV
QA
RNN
VQA
-
Neural Self Talk: Image Understanding via Continuous Questioning and Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
CNN
RNN
VQA
-
Compositional Memory for Visual Question Answering
arXiv_CV
arXiv_CV
QA
Attention
RNN
Deep_Learning
VQA