Caption
Caption
-
Lattice-based lightly-supervised acoustic model training
arXiv_CL
arXiv_CL
Speech_Recognition
Caption
Language_Model
Recognition
-
Vision-to-Language Tasks Based on Attributes and Attention Mechanism
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Relation
VQA
-
On Measuring Gender Bias in Translation of Gender-neutral Pronouns
arXiv_CL
arXiv_CL
Image_Caption
Caption
Detection
Recommendation
-
Bivariate Beta LSTM
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Image_Classification
RNN
Classification
Relation
-
SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding
arXiv_CL
arXiv_CL
Image_Caption
Text_Classification
Caption
Embedding
Image_Classification
Classification
-
Application of Machine Learning in Fiber Nonlinearity Modeling and Monitoring for Elastic Optical Networks
arXiv_CV
arXiv_CV
Caption
-
Image Captioning based on Deep Learning Methods: A Survey
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Attention
Caption
Survey
Deep_Learning
-
Implications of Computer Vision Driven Assistive Technologies Towards Individuals with Visual Impairment
arXiv_CV
arXiv_CV
Face
Caption
-
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
Quantitative
-
Harvesting Information from Captions for Weakly Supervised Semantic Segmentation
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Weakly_Supervised
Caption
Embedding
CNN
Semantic_Segmentation
-
VICSOM: VIsual Clues from SOcial Media for psychological assessment
arXiv_CV
arXiv_CV
Caption
CNN
Classification
Language_Model
Recognition
-
Aligning Visual Regions and Textual Concepts: Learning Fine-Grained Image Representations for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Text_Generation
Caption
Relation
-
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
Language_Model
Recognition
-
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Caption
CNN
Classification
Detection
-
On Flow Profile Image for Video Representation
arXiv_CV
arXiv_CV
Video_Caption
Caption
Optimization
Video_Classification
Classification
Recognition
-
Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables
arXiv_AI
arXiv_AI
Image_Caption
Adversarial
Caption
RNN
-
Memory-Attended Recurrent Network for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Learning Representations for Predicting Future Activities
arXiv_AI
arXiv_AI
Caption
Embedding
Prediction
-
Multimodal Semantic Attention Network for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
RNN
Classification
-
Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence
arXiv_CV
arXiv_CV
Caption
Deep_Learning
Quantitative
-
Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
CNN
Optimization
RNN
Relation
-
PR Product: A Substitute for Inner Product in Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Image_Classification
RNN
Classification
Deep_Learning
-
Hierarchical Recurrent Neural Network for Video Summarization
arXiv_CV
arXiv_CV
Video_Caption
Summarization
Caption
RNN
Classification
-
UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
Embedding
Relation
-
Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications
arXiv_AI
arXiv_AI
Image_Caption
Caption
-
Scene Graph Prediction with Limited Labels
arXiv_CV
arXiv_CV
Sparse
Knowledge
Caption
Transfer_Learning
Prediction
Relation
VQA
-
Pointing Novel Objects in Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
RNN
Recognition
-
On the Contributions of Visual and Textual Supervision in Low-resource Semantic Speech Retrieval
arXiv_CL
arXiv_CL
Caption
-
nocaps: Novel Object Captioning at Scale
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Detection
-
Tripping through time: Efficient Localization of Activities in Videos
arXiv_CV
arXiv_CV
Attention
Reinforcement_Learning
Caption
Classification
-
BERTScore: Evaluating Text Generation with BERT
arXiv_CL
arXiv_CL
Image_Caption
Text_Generation
Caption
Embedding
-
Deep Metric Learning Beyond Binary Supervision
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
Relation
-
3G structure for image caption generation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Embedding
RNN
Relation
-
Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
arXiv_CV
arXiv_CV
Image_Caption
Caption
Detection
Relation
-
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Detection
-
Learning to Collocate Neural Modules for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Robust Change Captioning
arXiv_CV
arXiv_CV
Attention
Caption
-
Unsupervised Discovery of Multimodal Links in Multi-Image, Multi-Sentence Documents
arXiv_CV
arXiv_CV
Caption
Relation
-
Natural Language Statistical Features of LSTM-generated Texts
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Quantitative
Relation
-
Self-critical n-step Training for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Sparse
Reinforcement_Learning
Caption
-
Big but Imperceptible Adversarial Perturbations via Semantic Manipulation
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
Image_Classification
Classification
Deep_Learning
-
Intention Oriented Image Captions with Guiding Objects
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
Embedding
Relation
-
Membership Inference Attacks on Sequence-to-Sequence Models
arXiv_CL
arXiv_CL
Video_Caption
Caption
Inference
-
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
-
Exploring Uncertainty Measures for Image-Caption Embedding-and-Retrieval Task
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Classification
Deep_Learning
-
Streamlined Dense Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Reinforcement_Learning
Caption
-
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Unsupervised Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Knowledge
Caption
Detection
-
Measuring scheduling efficiency of RNNs for NLP applications
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
Optimization
Inference
RNN
Recognition
-
Evaluating Text-to-Image Matching using Binary Image Selection
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
-
Actively Seeking and Learning from Live Data
arXiv_CV
arXiv_CV
QA
Face
Caption
VQA
-
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Prediction
-
An End-to-End Baseline for Video Captioning
arXiv_AI
arXiv_AI
Video_Caption
Attention
Caption
Action_Recognition
CNN
RNN
Recognition
-
VideoBERT: A Joint Model for Video and Language Representation Learning
arXiv_CV
arXiv_CV
Video_Caption
Speech_Recognition
Caption
Represenation_Learning
Classification
Language_Model
Quantitative
Recognition
-
Good News, Everyone! Context driven entity-aware captioning for news images
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Relation
-
Context and Attribute Grounded Dense Captioning
arXiv_CV
arXiv_CV
Caption
-
Multi-source weak supervision for saliency detection
arXiv_CV
arXiv_CV
Salient
Attention
Weakly_Supervised
Caption
Classification
Prediction
Detection
-
Object Hallucination in Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Classification
-
Modeling Acoustic-Prosodic Cues for Word Importance Prediction in Spoken Dialogues
arXiv_CL
arXiv_CL
Speech_Recognition
Caption
Prediction
Recognition
-
Describing like humans: on diversity in image captioning
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
-
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Weakly_Supervised
Caption
-
Learning semantic sentence representations from visually grounded language without lexical knowledge
arXiv_CL
arXiv_CL
Image_Caption
Knowledge
Caption
Embedding
-
AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Style_Transfer
Caption
Detection
-
Unpaired Image Captioning via Scene Graph Alignments
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
-
End-to-End Learning Using Cycle Consistency for Image-to-Caption Transformations
arXiv_CV
arXiv_CV
Caption
-
Scene Understanding for Autonomous Manipulation with Deep Learning
arXiv_CV
arXiv_CV
Object_Detection
Segmentation
Caption
Deep_Learning
Detection
-
Learning to Caption Images through a Lifetime by Asking Questions
arXiv_CV
arXiv_CV
Knowledge
Caption
-
Engaging Image Captioning Via Personality
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Boosted Attention: Leveraging Human Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
CNN
Language_Model
Recognition
-
A Weighted Multi-Criteria Decision Making Approach for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Show, Translate and Tell
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
-
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Prediction
Relation
-
'Hang in There': Lexical and Visual Analysis to Identify Posts Warranting Empathetic Responses
arXiv_CL
arXiv_CL
Sentiment
Caption
-
Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
-
Image captioning with weakly-supervised attention penalty
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
A Synchronized Multi-Modal Attention-Caption Dataset and Analysis
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
Relation
-
Dixit: Interactive Visual Storytelling via Term Manipulation
arXiv_CL
arXiv_CL
Image_Caption
Caption
RNN
-
M-VAD Names: a Dataset for Video Captioning with Naming
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
COMIC: Towards A Compact Image Captioning Model with Attention
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Embedding
-
Learning from Multiview Correlations in Open-Domain Videos
arXiv_CV
arXiv_CV
Caption
Represenation_Learning
Relation
-
Insertion-based Decoding with automatically Inferred Generation Order
arXiv_CL
arXiv_CL
Image_Caption
Caption
-
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Object_Detection
Caption
CNN
RNN
Language_Model
Detection
-
SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set
arXiv_CV
arXiv_CV
Caption
-
Using Deep Object Features for Image Descriptions
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Language_Model
-
Audio Caption: Listen and Tell
arXiv_CL
arXiv_CL
Image_Caption
Caption
Classification
Detection
Relation
-
Deep CNN-based Speech Balloon Detection and Segmentation for Comic Books
arXiv_CV
arXiv_CV
Segmentation
Caption
CNN
Detection
-
Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention
arXiv_AI
arXiv_AI
QA
Attention
Caption
Language_Model
Relation
VQA
-
Contextual Memory Trees
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Classification
-
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Actions Generation from Captions
arXiv_CV
arXiv_CV
Adversarial
Attention
GAN
Caption
-
Wasserstein Barycenter Model Ensembling
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Classification
-
Improving Image Captioning with Conditional Generative Adversarial Nets
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
RNN
-
Attend More Times for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
-
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Caption
Language_Model
Prediction
VQA
-
A sequential guiding network with attention for image captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
Deep_Learning
-
Area Attention
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Rethinking Visual Relationships for High-level Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
VQA
-
Self-Supervised Visual Representations for Cross-Modal Retrieval
arXiv_CV
arXiv_CV
Object_Detection
Caption
Image_Classification
Classification
Detection
Relation
-
Improving Image Captioning by Leveraging Knowledge Graphs
arXiv_CV
arXiv_CV
Image_Caption
Knowledge_Graph
Knowledge
Caption
-
Face-Cap: Image Captioning using Facial Expression Analysis
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Relation
-
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
arXiv_CV
arXiv_CV
Reinforcement_Learning
Caption
-
Binary Image Selection : Interpretable Evaluation of Visual Grounding
arXiv_AI
arXiv_AI
Image_Caption
Caption
-
How to Become Instagram Famous: Post Popularity Prediction with Dual-Attention
arXiv_CV
arXiv_CV
Image_Caption
Attention
Face
Caption
Classification
Prediction
-
Improving Sequence-to-Sequence Learning via Optimal Transport
arXiv_CL
arXiv_CL
Image_Caption
Summarization
Caption
-
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
arXiv_CV
arXiv_CV
Caption
Recommendation
-
Image Based Review Text Generation with Emotional Guidance
arXiv_AI
arXiv_AI
Image_Caption
Review
Text_Generation
Caption
-
Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models
arXiv_AI
arXiv_AI
Image_Caption
Summarization
Caption
Language_Model
Prediction
-
Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Attention
GAN
Reinforcement_Learning
Caption
Optimization
Inference
RNN
Deep_Learning
-
Viewpoint Invariant Change Captioning
arXiv_AI
arXiv_AI
Caption
-
DeepBase: Deep Inspection of Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Optimization
Deep_Learning
Recognition
-
MultiDEC: Multi-Modal Clustering of Image-Caption Pairs
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Amortized Context Vector Inference for Sequence-to-Sequence Networks
arXiv_CV
arXiv_CV
Video_Caption
Attention
Summarization
Caption
Inference
-
Generating Multiple Objects at Spatially Distinct Locations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
-
Transfer learning from language models to image caption generators: Better models may not transfer better
arXiv_CL
arXiv_CL
Image_Caption
Caption
Embedding
CNN
Transfer_Learning
Language_Model
-
End-to-End Video Captioning with Multitask Reinforcement Learning
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Reinforcement_Learning
Caption
CNN
RNN
-
Not All Words are Equal: Video-specific Information Loss for Video Captioning
arXiv_CV
arXiv_CV
Salient
Video_Caption
Attention
Caption
Relation
Recognition
-
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Attention
Caption
RNN
Language_Model
-
Multilevel Language and Vision Integration for Text-to-Clip Retrieval
arXiv_CV
arXiv_CV
Caption
-
Joint Event Detection and Description in Continuous Video Streams
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
Detection
Relation
-
Symbolic inductive bias for visually grounded learning of spoken language
arXiv_CV
arXiv_CV
Caption
Embedding
-
nocaps: novel object captioning at scale
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Detection
-
Generating Diverse and Meaningful Captions
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
-
Multi Instance Learning For Unbalanced Data
arXiv_CV
arXiv_CV
Caption
Classification
-
Feature Fusion Effects of Tensor Product Representation on Compositional Network for Caption Generation for Images
arXiv_CV
arXiv_CV
Image_Caption
Caption
Language_Model
Relation
-
Grounded Video Description
arXiv_CV
arXiv_CV
Image_Caption
Caption
Recognition
-
Adversarial Inference for Multi-Sentence Video Description
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Video_Caption
GAN
Caption
Inference
-
CNN Fixations: An unraveling approach to visualize the discriminative image regions
arXiv_CV
arXiv_CV
Caption
CNN
Classification
Prediction
Detection
Recognition
-
Auto-Encoding Scene Graphs for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Relation
-
Weakly Supervised Dense Event Captioning in Videos
arXiv_CV
arXiv_CV
Weakly_Supervised
Caption
-
An Attempt towards Interpretable Audio-Visual Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
-
Conditional Video Generation Using Action-Appearance Captions
arXiv_CV
arXiv_CV
Adversarial
GAN
Caption
Quantitative
-
Multi-task Learning of Hierarchical Vision-Language Representation
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
Prediction
Relation
VQA
-
Turbo Learning for Captionbot and Drawingbot
arXiv_CV
arXiv_CV
Image_Caption
Text_Generation
Caption
-
On the effectiveness of task granularity for transfer learning
arXiv_CV
arXiv_CV
Caption
Transfer_Learning
Video_Classification
Classification
-
Towards Task Understanding in Visual Settings
arXiv_CV
arXiv_CV
Image_Caption
Ontology
Text_Generation
Caption
CNN
-
Partially-Supervised Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
RNN
Detection
-
Learning to discover and localize visual objects with open vocabulary
arXiv_CV
arXiv_CV
Object_Detection
Weakly_Supervised
Caption
Detection
Relation
-
Senti-Attend: Image Captioning using Sentiment and Attention
arXiv_CV
arXiv_CV
Image_Caption
Sentiment
Attention
Caption
-
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Caption
Embedding
-
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Caption
CNN
Classification
-
An Interpretable Model for Scene Graph Generation
arXiv_CV
arXiv_CV
Image_Caption
QA
Caption
Detection
Relation
-
Scene Graph Generation via Conditional Random Fields
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Object_Detection
QA
Segmentation
Caption
Detection
Relation
-
A task in a suit and a tie: paraphrase generation with semantic augmentation
arXiv_CV
arXiv_CV
Caption
RNN
-
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Action_Recognition
CNN
Classification
Deep_Learning
Prediction
Recognition
-
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Recognition
-
Entity-aware Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Knowledge_Graph
Knowledge
Caption
CNN
Inference
RNN
Memory_Networks
-
Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences
arXiv_CV
arXiv_CV
Caption
Represenation_Learning
Prediction
-
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression
arXiv_CV
arXiv_CV
Attention
Caption
RNN
-
A Corpus for Reasoning About Natural Language Grounded in Photographs
arXiv_CV
arXiv_CV
Caption
Relation
-
Attentive Tensor Product Learning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
Deep_Learning
-
Semantic speech retrieval with a visually grounded model of untranscribed speech
arXiv_CV
arXiv_CV
Caption
-
Gated Hierarchical Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Reinforcement_Learning
Caption
CNN
Prediction
VQA
Recognition
-
Middle-Out Decoding
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
-
A Neural Compositional Paradigm for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings
arXiv_CV
arXiv_CV
Image_Caption
Re-identification
Video_Caption
Person_Re-identification
Caption
Embedding
RNN
-
Cross-Modal and Hierarchical Modeling of Video and Text
arXiv_CV
arXiv_CV
Video_Caption
Caption
Action_Recognition
Embedding
Recognition
-
Image Captioning as Neural Machine Translation Task in SOCKEYE
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
-
Bringing back simplicity and lightliness into neural image captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
UMONS Submission for WMT18 Multimodal Translation Task
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
A Comprehensive Survey of Deep Learning for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Review
Caption
Survey
Deep_Learning
Relation
-
Quantifying the amount of visual information used by neural caption generators
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Pre-gen metrics: Predicting caption quality metrics without generating captions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas
arXiv_CV
arXiv_CV
Attention
GAN
Image_Generation
Caption
Embedding
-
Unseen Action Recognition with Multimodal Learning
arXiv_CV
arXiv_CV
Adversarial
Caption
Action_Recognition
Embedding
Classification
Recognition
-
Vector Learning for Cross Domain Representations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Video_Caption
GAN
Caption
-
Semantically Invariant Text-to-Image Generation
arXiv_CV
arXiv_CV
Image_Caption
Caption
Quantitative
-
Batch-normalized Recurrent Highway Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
A Neural-Symbolic Approach to Design of CAPTCHA
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Deep_Learning
-
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Textually Enriched Neural Module Networks for Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
QA
Attention
Caption
VQA
Recognition
-
Multimodal Dual Attention Memory for Video Story Question Answering
arXiv_CV
arXiv_CV
QA
Attention
Caption
Inference
-
Lessons learned in multilingual grounded language learning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
-
C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis
arXiv_CV
arXiv_CV
Caption
Quantitative
-
Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
-
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Caption
RNN
-
Exploring Visual Relationship for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
Relation
-
Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Quantitative
-
Image Captioning based on Deep Reinforcement Learning
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Reinforcement_Learning
Caption
RNN
-
End-to-end Image Captioning Exploits Multimodal Distributional Similarity
arXiv_CV
arXiv_CV
Image_Caption
Text_Generation
Caption
RNN
-
Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset
arXiv_CV
arXiv_CV
Caption
Quantitative
-
SPASS: Scientific Prominence Active Search System with Deep Image Captioning Network
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
BFGAN: Backward and Forward Generative Adversarial Networks for Lexically Constrained Sentence Generation
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Knowledge
GAN
Caption
RNN
-
Neural Network Interpretation via Fine Grained Textual Summarization
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Summarization
Caption
Inference
Classification
Prediction
-
Hierarchical Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Diverse and Coherent Paragraph Generation from Images
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Caption
-
Chittron: An Automatic Bangla Image Captioning System
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
RNN
Language_Model
-
Approximate Distribution Matching for Sequence-to-Sequence Learning
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Caption
Optimization
RNN
Prediction
-
When to Finish? Optimal Beam Search for Neural Text Generation
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Text_Generation
Caption
-
LUCSS: Language-based User-customized Colourization of Scene Sketches
arXiv_CV
arXiv_CV
Segmentation
Caption
Relation
-
Hard Non-Monotonic Attention for Character-Level Transduction
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Text_Generation
Caption
-
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
A neural attention model for speech command recognition
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
Recognition
-
simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions
arXiv_CV
arXiv_CV
QA
Caption
Prediction
Quantitative
VQA
-
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary
arXiv_CV
arXiv_CV
GAN
Caption
Recognition
-
Context-Aware Visual Policy Network for Sequence-Level Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Reinforcement_Learning
Caption
Prediction
Relation
-
Fully-Convolutional Point Networks for Large-Scale Point Clouds
arXiv_CV
arXiv_CV
Segmentation
GAN
Caption
CNN
Prediction
Relation
-
Deep Multimodal Image-Repurposing Detection
arXiv_CV
arXiv_CV
Knowledge
GAN
Caption
Detection
-
Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods
arXiv_CV
arXiv_CV
Caption
Relation
-
NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning
arXiv_CV
arXiv_CV
Video_Caption
Caption
NMT
Classification
Deep_Learning
VQA
-
Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
-
DenseRAN for Offline Handwritten Chinese Character Recognition
arXiv_CV
arXiv_CV
Attention
Caption
RNN
Deep_Learning
Recognition
-
Multimodal Differential Network for Visual Question Generation
arXiv_CV
arXiv_CV
Caption
Quantitative
VQA
-
Decoupled Novel Object Captioner
arXiv_CV
arXiv_CV
Image_Caption
Caption
Detection
-
Dropout during inference as a model for neurological degeneration in an image captioning network
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
-
Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance
arXiv_CV
arXiv_CV
Knowledge
Caption
CNN
Classification
Prediction
-
SketchyScene: Richly-Annotated Scene Sketches
arXiv_CV
arXiv_CV
Image_Retrieval
Segmentation
Caption
Semantic_Segmentation
-
Recurrent Fusion Network for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
-
Doubly Attentive Transformer Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
NMT
-
ADVISE: Symbolism and External Knowledge for Decoding Advertisements
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
GAN
Caption
Recognition
-
'Factual' or 'Emotional': Stylized Image Captioning with Adaptive Learning and Attention
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
RNN
-
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
arXiv_CV
arXiv_CV
Image_Retrieval
Caption
Embedding
Prediction
-
Move Forward and Tell: A Progressive Generator of Video Descriptions
arXiv_CV
arXiv_CV
Video_Caption
Caption
Embedding
-
Rethinking the Form of Latent States in Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Distinctive-attribute Extraction for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
-
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
arXiv_CV
arXiv_CV
Image_Caption
GAN
Caption
Relation
-
Inductive Visual Localisation: Factorised Training for Superior Generalisation
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Recognition
-
What is not where: the challenge of integrating spatial representations into deep learning architectures
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Knowledge
Caption
Deep_Learning
Language_Model
Detection
Relation
-
Deep Reinforcement Learning For Sequence to Sequence Models
arXiv_CV
arXiv_CV
Image_Caption
Attention
Summarization
Reinforcement_Learning
Caption
Survey
-
Unpaired Image Captioning by Language Pivoting
arXiv_CV
arXiv_CV
Image_Caption
Caption
Quantitative
-
Predicting Visual Features from Text for Image and Video Caption Retrieval
arXiv_CV
arXiv_CV
Video_Caption
Caption
Embedding
CNN
-
Topic-Guided Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Quantitative
-
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling
arXiv_CV
arXiv_CV
Adversarial
Face
Reinforcement_Learning
Caption
-
Learning The Sequential Temporal Information with Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Review
Speech_Recognition
Tracking
Caption
Object_Tracking
RNN
Language_Model
Prediction
Recognition
-
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
RNN
Language_Model
Prediction
-
Women also Snowboard: Overcoming Bias in Captioning Models
arXiv_CV
arXiv_CV
Image_Caption
Caption
Prediction
-
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
arXiv_CV
arXiv_CV
QA
Attention
Caption
VQA
-
YH Technologies at ActivityNet Challenge 2018
arXiv_CV
arXiv_CV
Caption
Action_Recognition
Recognition
-
Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Represenation_Learning
Deep_Learning
Quantitative
-
Learning Visually-Grounded Semantics from Contrastive Adversarial Samples
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Knowledge
Caption
Embedding
Quantitative
-
Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Inference
RNN
-
RUC+CMU: System Report for Dense Captioning Events in Videos
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Learning to Evaluate Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Relation
-
iParaphrasing: Extracting Visually Grounded Paraphrases via an Image
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
VQA
-
Discriminability objective for training descriptive captions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Dank Learning: Generating Memes Using Deep Neural Networks
arXiv_CV
arXiv_CV
Attention
Caption
Embedding
RNN
-
Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations
arXiv_CV
arXiv_CV
Sparse
Caption
Classification
Prediction
-
Learning Visually Grounded Sentence Representations
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
Embedding
-
Improved Image Captioning with Adversarial Semantic Alignment
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
RNN
Relation
-
Grow and Prune Compact, Fast, and Accurate LSTMs
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
RNN
Recognition
-
Extracting Scientific Figures with Distantly Supervised Neural Networks
arXiv_CV
arXiv_CV
Caption
Detection
-
Neural Joking Machine : Humorous image captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Quantifying the visual concreteness of words and topics in multimodal datasets
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
Recommendation
-
CNN+CNN: Convolutional Decoders for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
-
Joint Image Captioning and Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
QA
Caption
VQA
-
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
CNN
RNN
-
Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
CNN
RNN
Prediction
Quantitative
-
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text
arXiv_CV
arXiv_CV
Image_Caption
Caption
Language_Model
-
Defoiling Foiled Image Captions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Generating Continuous Representations of Medical Texts
arXiv_CV
arXiv_CV
Adversarial
Caption
RNN
Quantitative
-
Token-level and sequence-level loss smoothing for RNN language models
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
Prediction
-
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
arXiv_CV
arXiv_CV
Image_Caption
Caption
Action_Recognition
RNN
Prediction
Recognition
-
Pragmatically Informative Image Captioning with Character-Level Inference
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
-
ECO: Efficient Convolutional Network for Online Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
Classification
Relation
-
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
arXiv_CV
arXiv_CV
Video_Caption
Caption
Video_Classification
Classification
-
Object Counts! Bringing Explicit Detections Back into Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Embedding
Language_Model
Detection
-
Jointly Localizing and Describing Events for Dense Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
Optimization
Detection
-
To Create What You Tell: Generating Videos from Captions
arXiv_CV
arXiv_CV
Adversarial
GAN
Caption
Embedding
Deep_Learning
Quantitative
-
Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer
arXiv_CV
arXiv_CV
Image_Caption
Sentiment
Review
Adversarial
Style_Transfer
Caption
-
Learning to Color from Language
arXiv_CV
arXiv_CV
Caption
-
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
-
Imagine This! Scripts to Compositions to Videos
arXiv_CV
arXiv_CV
Knowledge
Caption
-
Discovery and usage of joint attention in images
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Detection
-
Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech
arXiv_CV
arXiv_CV
Speech_Recognition
Caption
Embedding
Recognition
-
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
arXiv_CV
arXiv_CV
Face
Caption
Embedding
-
Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
RNN
-
Finding beans in burgers: Deep semantic-visual embedding with localization
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Relation
-
Fooling Vision and Language Models Despite Localization and Attention Mechanism
arXiv_CV
arXiv_CV
Adversarial
QA
Attention
Caption
Language_Model
VQA
-
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
arXiv_CV
arXiv_CV
Object_Detection
Segmentation
Caption
Detection
-
FlipDial: A Generative Model for Two-Way Visual Dialogue
arXiv_CV
arXiv_CV
Caption
CNN
-
Learning to Guide Decoding for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
Prediction
-
End-to-End Dense Video Captioning with Masked Transformer
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
-
Guide Me: Interacting with Deep Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Inference
-
Reconstruction Network for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering
arXiv_CV
arXiv_CV
Image_Caption
Caption
VQA
-
Video Captioning via Hierarchical Reinforcement Learning
arXiv_CV
arXiv_CV
Video_Caption
Reinforcement_Learning
Caption
-
Radical analysis network for zero-shot learning in printed Chinese character recognition
arXiv_CV
arXiv_CV
Attention
Caption
CNN
RNN
Recognition
-
Fraternal Dropout
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Caption
Inference
RNN
Language_Model
Prediction
-
COCO-Stuff: Thing and Stuff Classes in Context
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Attention
Face
Caption
Semantic_Segmentation
Classification
Detection
Relation
-
Neural Baby Talk
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Detection
-
The Effect of Pets on Happiness: A Large-scale Multi-Factor Analysis using Social Multimedia
arXiv_CV
arXiv_CV
Sentiment
Face
Caption
Transfer_Learning
Classification
Deep_Learning
Detection
Face_Detection
Relation
Recognition
-
Bayesian Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
-
Attend and Interact: Higher-Order Object Interactions for Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Caption
Action_Recognition
Detection
Relation
Recognition
-
Object Captioning and Retrieval with Natural Language
arXiv_CV
arXiv_CV
Caption
Inference
RNN
-
Approximate Query Matching for Image Retrieval
arXiv_CV
arXiv_CV
Image_Retrieval
Segmentation
Caption
Semantic_Segmentation
Relation
Recognition
-
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Inference
Prediction
-
Where to put the Image in an Image Caption Generator
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
-
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Salient
QA
Attention
Caption
VQA
-
Improved Image Captioning via Policy Gradient optimization of SPIDEr
arXiv_CV
arXiv_CV
Image_Caption
Caption
Optimization
-
Excitation Backprop for RNNs
arXiv_CV
arXiv_CV
Salient
Video_Caption
Caption
Action_Recognition
RNN
Classification
Prediction
Recognition
-
Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Attention
Weakly_Supervised
Caption
Semantic_Segmentation
-
Less Is More: Picking Informative Frames for Video Captioning
arXiv_CV
arXiv_CV
Salient
Video_Caption
Attention
Caption
-
Contextually Customized Video Summaries via Natural Language
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
-
Twin Networks: Matching the Future for Sequence Generation
arXiv_CV
arXiv_CV
Speech_Recognition
Caption
Inference
RNN
Recognition
-
ChatPainter: Improving Text to Image Generation using Dialogue
arXiv_CV
arXiv_CV
Caption
-
Multimodal Named Entity Recognition for Short Social Media Posts
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
Recognition
-
Disjoint Multi-task Learning between Heterogeneous Human-centric Tasks
arXiv_CV
arXiv_CV
Caption
Optimization
Classification
-
Human Action Adverb Recognition: ADHA Dataset and A Three-Stream Hybrid Model
arXiv_CV
arXiv_CV
Image_Caption
Caption
Action_Recognition
Recognition
-
A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts
arXiv_CV
arXiv_CV
Speech_Recognition
Caption
Classification
Relation
Recognition
-
Zero-Resource Neural Machine Translation with Multi-Agent Communication Game
arXiv_CV
arXiv_CV
Image_Caption
Caption
NMT
-
Generating Triples with Adversarial Networks for Scene Graph Construction
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Object_Detection
Attention
GAN
Caption
Image_Classification
Classification
Deep_Learning
Detection
Relation
VQA
-
Multimodal Image Captioning for Marketing Analysis
arXiv_CV
arXiv_CV
Image_Caption
Caption
Classification
Relation
-
Attention-Based Models for Text-Dependent Speaker Verification
arXiv_CV
arXiv_CV
Image_Caption
Attention
Summarization
Speech_Recognition
Caption
RNN
Recognition
-
Netizen-Style Commenting on Fashion Photos: Dataset and Diversity Measures
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions
arXiv_CV
arXiv_CV
Image_Caption
Sentiment
Attention
Caption
-
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
QA
Attention
Caption
Inference
Detection
VQA
-
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition
arXiv_CV
arXiv_CV
Attention
Caption
RNN
Deep_Learning
Recognition
-
Erratum: 'Determining neutron star masses and radii using energy-resolved waveforms of X-ray burst oscillations'
arXiv_CV
arXiv_CV
Face
Caption
-
Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli
arXiv_CV
arXiv_CV
Image_Caption
Caption
Deep_Learning
Quantitative
Relation
-
Image Captioning using Deep Neural Architectures
arXiv_CV
arXiv_CV
Image_Caption
Caption
Recognition
-
DeepSeek: Content Based Image Search & Retrieval
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Deep_Learning
Language_Model
-
Approximate FPGA-based LSTMs under Computation Time Constraints
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Quantitative
-
GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
RNN
-
Consensus-based Sequence Training for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Reinforcement_Learning
Caption
-
Exploring Models and Data for Remote Sensing Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Review
Attention
Caption
Classification
Detection
-
Order-Free RNN with Visual Attention for Multi-Label Classification
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
Inference
RNN
Classification
Prediction
-
Synthesizing Novel Pairs of Image and Text
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
-
Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Embedding
CNN
NMT
Detection
-
Tensor Product Generation Networks for Deep NLP Modeling
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Deep_Learning
-
OSU Multimodal Machine Translation System Report
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Predicting Yelp Star Reviews Based on Network Structure with Deep Learning
arXiv_CV
arXiv_CV
Review
Caption
Image_Classification
Classification
Deep_Learning
-
Integrating both Visual and Audio Cues for Enhanced Video Caption
arXiv_CV
arXiv_CV
Video_Caption
Caption
Inference
-
Long Text Generation via Adversarial Training with Leaked Information
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Text_Generation
Reinforcement_Learning
Caption
-
Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing
arXiv_CV
arXiv_CV
Speech_Recognition
Caption
Relation
Recognition
-
Actor-Critic Sequence Training for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
-
Convolutional Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
-
Towards Automatic Learning of Procedures from Web Instructional Videos
arXiv_CV
arXiv_CV
Video_Caption
Segmentation
Caption
-
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
A framework for Multi-A/B testing with online FDR control
arXiv_CV
arXiv_CV
Caption
-
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
Caption
Classification
Detection
-
Adaptive Feature Abstraction for Translating Video to Text
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
CNN
Quantitative
-
Grounded Objects and Interactions for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Deep Matching Autoencoders
arXiv_CV
arXiv_CV
Image_Caption
GAN
Caption
Represenation_Learning
-
Self-critical Sequence Training for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Optimization
Inference
-
DataVizard: Recommending Visual Presentations for Structured Data
arXiv_CV
arXiv_CV
Image_Caption
Caption
Survey
-
Phrase-based Image Captioning with Hierarchical LSTM Model
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
RNN
-
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
-
Learning Hard Alignments with Variational Inference
arXiv_CV
arXiv_CV
Attention
Speech_Recognition
Caption
Inference
Recognition
-
Evaluation of Automatic Video Captioning Using Direct Assessment
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML
arXiv_CV
arXiv_CV
Image_Caption
Caption
Classification
Prediction
Relation
Recognition
-
Automated Audio Captioning with Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Classification
-
A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Classification
-
Attentive Semantic Video Generation using Captions
arXiv_CV
arXiv_CV
Style_Transfer
Caption
Action_Recognition
Recognition
-
Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures
arXiv_CV
arXiv_CV
Knowledge
Attention
Caption
-
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
RNN
-
Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
Inference
Recognition
-
Cold-Start Reinforcement Learning with Softmax Policy Gradient
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Reinforcement_Learning
Caption
Prediction
-
Bollywood Movie Corpus for Text, Images and Videos
arXiv_CV
arXiv_CV
Face
Caption
Relation
-
Protein identification with deep learning: from abc to xyz
arXiv_CV
arXiv_CV
GAN
Caption
CNN
RNN
Deep_Learning
-
Contrastive Learning for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Using Human Brain Activity to Guide Machine Learning
arXiv_CV
arXiv_CV
Caption
CNN
Recognition
-
Scene Graph Generation from Objects, Phrases and Region Captions
arXiv_CV
arXiv_CV
Object_Detection
Caption
Detection
Relation
-
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
A KL-LUCB Bandit Algorithm for Large-Scale Crowdsourcing
arXiv_CV
arXiv_CV
Caption
-
Neural Extractive Summarization with Side Information
arXiv_CV
arXiv_CV
Image_Caption
Attention
Summarization
Caption
-
Learning the Enigma with Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
RNN
Recognition
-
SketchParse : Towards Rich Descriptions for Poorly Drawn Sketches using Multi-Task Hierarchical Deep Networks
arXiv_CV
arXiv_CV
Image_Retrieval
Caption
CNN
Inference
Prediction
-
Generating Video Descriptions with Topic Guidance
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
Prediction
-
Video Captioning with Guidance of Multimodal Latent Topics
arXiv_CV
arXiv_CV
Video_Caption
Caption
Prediction
-
What is the Role of Recurrent Neural Networks in an Image Caption Generator?
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Areas of Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
CNN
RNN
Language_Model
Detection
-
Cold Fusion: Training Seq2Seq Models Together with Language Models
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Language_Model
Recognition
-
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
Recognition
-
ConvNet Architecture Search for Spatiotemporal Feature Learning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Segmentation
NAS
Caption
Semantic_Segmentation
Inference
Detection
-
Fluency-Guided Cross-Lingual Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
Inference
-
Towards Diverse and Natural Image Descriptions via a Conditional GAN
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Reinforcement_Learning
Caption
RNN
-
MAT: A Multimodal Attentive Translator for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
-
Learning to Disambiguate by Asking Discriminative Questions
arXiv_CV
arXiv_CV
Image_Caption
Weakly_Supervised
Caption
Quantitative
VQA
-
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
arXiv_CV
arXiv_CV
Attention
Caption
Inference
Detection
Relation
-
Multi-Task Video Captioning with Video and Entailment Generation
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Caption
Prediction
-
Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN
arXiv_CV
arXiv_CV
Sentiment
Review
Attention
Sentiment_Classification
Caption
RNN
Classification
-
Dense Captioning with Joint Inference and Visual Context
arXiv_CV
arXiv_CV
Caption
Inference
-
Reinforced Video Captioning with Entailment Rewards
arXiv_CV
arXiv_CV
Video_Caption
Reinforcement_Learning
Caption
-
Learning Visual N-Grams from Web Data
arXiv_CV
arXiv_CV
Image_Retrieval
Caption
CNN
Language_Model
Prediction
Recognition
-
Recurrent Models for Situation Recognition
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Prediction
Recognition
-
Paying Attention to Descriptions Generated by Image Captioning Models
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
Language_Model
-
An Empirical Study of Language CNN for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
-
Generative Semantic Manipulation with Contrasting GAN
arXiv_CV
arXiv_CV
Adversarial
GAN
Style_Transfer
Caption
Quantitative
-
Context-aware Captions from Context-agnostic Supervision
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Language_Model
-
Deep Interactive Region Segmentation and Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Knowledge
Segmentation
Caption
CNN
Deep_Learning
Detection
-
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
RNN
Language_Model
Detection
Relation
-
Captioning Images with Diverse Objects
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Embedding
Recognition
-
Guided Open Vocabulary Image Captioning with Constrained Beam Search
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Prediction
-
Supervising Neural Attention Models for Video Captioning by Human Gaze Data
arXiv_CV
arXiv_CV
Video_Caption
Attention
Tracking
Caption
Prediction
-
CUNI System for the WMT17 Multimodal Translation Task
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
End-to-End Instance Segmentation with Recurrent Attention
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Attention
Caption
CNN
Semantic_Segmentation
RNN
Prediction
VQA
-
Where to Play: Retrieval of Video Segments using Natural-Language Queries
arXiv_CV
arXiv_CV
Image_Caption
Tracking
Caption
Quantitative
Relation
-
Learning from Ambiguously Labeled Face Images
arXiv_CV
arXiv_CV
Knowledge
Face
Caption
-
archivist: An R Package for Managing, Recording and Restoring Data Analysis Results
arXiv_CV
arXiv_CV
Tracking
Caption
Relation
-
A Semi-supervised Framework for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Salient
Review
Attention
Caption
Embedding
-
Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
VideoMCC: a New Benchmark for Video Comprehension
arXiv_CV
arXiv_CV
Video_Caption
GAN
Caption
Quantitative
-
One Model To Learn Them All
arXiv_CV
arXiv_CV
Image_Caption
Sparse
Attention
Speech_Recognition
Caption
CNN
Image_Classification
Classification
Deep_Learning
Recognition
-
The 'something something' video database for learning and evaluating visual common sense
arXiv_CV
arXiv_CV
Knowledge
Caption
Classification
Prediction
-
Image Captioning with Object Detection and Localization
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
RNN
Detection
Relation
-
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Language_Model
-
Teaching Machines to Describe Images via Natural Language Feedback
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
RNN
Language_Model
-
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
Transfer_Learning
RNN
-
Order embeddings and character-level convolutions for multimodal alignment
arXiv_CV
arXiv_CV
Caption
Embedding
CNN
RNN
Recognition
-
Automatic Generation of Grounded Visual Questions
arXiv_CV
arXiv_CV
Knowledge
Caption
VQA
-
Visually grounded learning of keyword prediction from untranscribed speech
arXiv_CV
arXiv_CV
Caption
Prediction
-
Deep image representations using caption generators
arXiv_CV
arXiv_CV
Image_Caption
Caption
Transfer_Learning
Deep_Learning
Recognition
-
Learning Word-Like Units from Joint Audio-Visual Analysis
arXiv_CV
arXiv_CV
Speech_Recognition
Caption
Recognition
-
Attention-based Natural Language Person Retrieval
arXiv_CV
arXiv_CV
Segmentation
Attention
Caption
CNN
Image_Classification
RNN
Classification
Deep_Learning
-
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
VQA
-
CHAM: action recognition using convolutional hierarchical attention model
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Action_Recognition
CNN
RNN
Recognition
-
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
arXiv_CV
arXiv_CV
Attention
Caption
-
FOIL it! Find One mismatch between Image and Language caption
arXiv_CV
arXiv_CV
Caption
Classification
Detection
Relation
-
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Prediction
Quantitative
Memory_Networks
-
Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries
arXiv_CV
arXiv_CV
Caption
Language_Model
Detection
-
Temporal Tessellation: A Unified Approach for Video Analysis
arXiv_CV
arXiv_CV
Video_Caption
Summarization
Caption
Prediction
Detection
-
Spatial Memory for Context Reasoning in Object Detection
arXiv_CV
arXiv_CV
Object_Detection
Caption
Detection
Relation
-
Top-down Visual Saliency Guided by Captions
arXiv_CV
arXiv_CV
Salient
Video_Caption
Attention
Caption
Classification
-
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Embedding
Prediction
-
Learning a Deep Embedding Model for Zero-Shot Learning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
-
Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Classification
-
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
Prediction
-
A Hierarchical Approach for Generating Descriptive Image Paragraphs
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
CNN
Detection
Relation
Recognition
-
Hierarchical Boundary-Aware Neural Encoder for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
RNN
-
Weakly Supervised Dense Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Weakly_Supervised
Caption
CNN
Language_Model
-
AMC: Attention guided Multi-modal Correlation Learning for Image Search
arXiv_CV
arXiv_CV
Attention
Caption
Relation
-
Improving Interpretability of Deep Neural Networks with Semantic Information
arXiv_CV
arXiv_CV
Video_Caption
Caption
Action_Recognition
Prediction
Recognition
-
Semantic Compositional Networks for Visual Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Quantitative
-
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
VQA
-
Recurrent Memory Addressing for describing videos
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Embedding
Memory_Networks
-
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
arXiv_CV
arXiv_CV
Adversarial
QA
Attention
Reinforcement_Learning
Caption
CNN
Image_Classification
Classification
Prediction
VQA
-
Can Active Memory Replace Attention?
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Image_Classification
Classification
Deep_Learning
Recognition
-
Cats and Captions vs. Creators and the Clock: Comparing Multimodal Content to Context in Predicting Relative Popularity
arXiv_CV
arXiv_CV
Attention
Caption
-
Evolving Deep Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
Deep_Learning
Language_Model
Recognition
-
An Actor-Critic Algorithm for Sequence Prediction
arXiv_CV
arXiv_CV
Reinforcement_Learning
Caption
Prediction
-
MIML-FCN+: Multi-instance Multi-label Learning via Fully Convolutional Networks with Privileged Information
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Deep_Learning
Relation
Recognition
-
DSD: Dense-Sparse-Dense Training for Deep Neural Networks
arXiv_CV
arXiv_CV
Sparse
Speech_Recognition
Caption
Image_Classification
Optimization
Inference
RNN
Classification
Recognition
-
Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
Prediction
Detection
-
Grad-CAM: Why did you say that?
arXiv_CV
arXiv_CV
Image_Caption
QA
Caption
CNN
Prediction
Relation
VQA
-
Deep Network Guided Proof Search
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
Deep_Learning
Detection
Recognition
-
Towards Music Captioning: Generating Music Playlist Descriptions
arXiv_CV
arXiv_CV
Caption
Recommendation
-
Comprehension-guided referring expressions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Re-evaluating Automatic Metrics for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Relation
-
Web-based Semantic Similarity for Emotion Recognition in Web Objects
arXiv_CV
arXiv_CV
Sentiment
Caption
Quantitative
Relation
Recognition
-
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
CNN
RNN
VQA
-
Beyond Holistic Object Recognition: Enriching Image Understanding with Part States
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Recognition
-
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
Inference
RNN
-
Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image
arXiv_CV
arXiv_CV
Image_Caption
GAN
Caption
CNN
Detection
Recommendation
-
Text-guided Attention Model for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
-
Video Captioning with Multi-Faceted Attention
arXiv_CV
arXiv_CV
Salient
Video_Caption
Attention
Face
Caption
RNN
-
Bidirectional Multirate Reconstruction for Temporal Modeling in Videos
arXiv_CV
arXiv_CV
Video_Caption
Caption
Detection
-
On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Watch What You Just Said: Image Captioning with Text-Conditional Attention
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Embedding
RNN
Language_Model
Quantitative
-
Video Captioning with Transferred Semantic Attributes
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
RNN
-
Attention Correctness in Neural Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Deep_Learning
Quantitative
-
Multimodal Memory Modelling for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
CNN
RNN
Deep_Learning
-
Semantic Regularisation for Recurrent Image Annotation
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Embedding
RNN
Classification
Relation
-
Sort Story: Sorting Jumbled Images and Captions into Stories
arXiv_CV
arXiv_CV
Image_Caption
QA
Summarization
Caption
Prediction
-
Boosting Image Captioning with Attributes
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
Relation
-
Review Networks for Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Review
Attention
Caption
RNN
-
VQA: Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Caption
VQA
-
Generating captions without looking beyond objects
arXiv_CV
arXiv_CV
Image_Caption
Caption
Language_Model
-
Spatio-Temporal Attention Models for Grounded Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Image_Classification
Classification
Recognition
-
Learning What and Where to Draw
arXiv_CV
arXiv_CV
Adversarial
GAN
Face
Caption
-
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Quantitative
VQA
-
Variational Autoencoder for Deep Learning of Images, Labels and Captions
arXiv_CV
arXiv_CV
Caption
CNN
Deep_Learning
-
Learning Language-Visual Embedding for Movie Understanding with Natural-Language
arXiv_CV
arXiv_CV
Knowledge
Caption
Embedding
Language_Model
-
Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions
arXiv_CV
arXiv_CV
QA
Caption
RNN
VQA
-
Reasoning About Pragmatics with Neural Listeners and Speakers
arXiv_CV
arXiv_CV
Caption
Inference
-
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes
arXiv_CV
arXiv_CV
Segmentation
Caption
Semantic_Segmentation
-
Deep Learning for Video Classification and Captioning
arXiv_CV
arXiv_CV
Review
Video_Caption
Caption
Video_Classification
Classification
Deep_Learning
-
The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering
arXiv_CV
arXiv_CV
QA
Caption
VQA
-
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Embedding
Detection
-
Oracle performance for visual captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Language_Model
-
Multimodal Attention for Neural Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
NMT
-
Title Generation for User Generated Videos
arXiv_CV
arXiv_CV
Salient
Video_Caption
Object_Detection
Attention
Caption
Prediction
Detection
-
Leveraging Visual Question Answering for Image-Caption Ranking
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Knowledge
QA
Caption
VQA
-
Measuring Machine Intelligence Through Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Caption
VQA
-
Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Caption
Language_Model
-
Learning to generalize to new compositions in image understanding
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Prediction
-
Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Segmentation
Attention
Speech_Recognition
Caption
RNN
Recognition
-
Seeing with Humans: Gaze-Assisted Neural Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
Relation
Recognition
-
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
DeepDiary: Automatic Caption Generation for Lifelogging Image Streams
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
GAN
Caption
Deep_Learning
Quantitative
-
SPICE: Semantic Propositional Image Caption Evaluation
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
-
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Face
Caption
Classification
Recognition
Face_Recognition
-
Image Captioning with Deep Bidirectional LSTMs
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
Embedding
CNN
RNN
Detection
-
Domain Adaptation for Neural Networks by Parameter Augmentation
arXiv_CV
arXiv_CV
Caption
RNN
-
Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning
arXiv_CV
arXiv_CV
Caption
Transfer_Learning
Represenation_Learning
Classification
Relation
-
Is a Picture Worth Ten Thousand Words in a Review Dataset?
arXiv_CV
arXiv_CV
Review
Caption
Deep_Learning
Quantitative
Recommendation
-
A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation
arXiv_CV
arXiv_CV
GAN
Caption
Relation
-
Bidirectional Long-Short Term Memory for Video Description
arXiv_CV
arXiv_CV
Video_Caption
Sparse
Knowledge
Attention
Caption
CNN
RNN
Language_Model
-
Multimodal Pivots for Image Caption Translation
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
CNN
-
Generating Natural Questions About an Image
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Inference
VQA
-
Automated Image Captioning for Rapid Prototyping and Resource Constrained Environments
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Deep_Learning
Detection
-
Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network
arXiv_CV
arXiv_CV
Video_Caption
Caption
RNN
-
Beyond Caption To Narrative: Video Captioning With Multiple Sentences
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
-
Improving Image Captioning by Concept-based Sentence Reranking
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
Detection
-
What value do explicit high level concepts have in vision to language problems?
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
VQA
-
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Knowledge
Caption
Recognition
-
Visual Storytelling
arXiv_CV
arXiv_CV
Caption
-
Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels
arXiv_CV
arXiv_CV
Image_Caption
Caption
Image_Classification
Classification
-
TGIF: A New Dataset and Benchmark on Animated GIF Description
arXiv_CV
arXiv_CV
Caption
RNN
-
Natural Language Object Retrieval
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Knowledge
Caption
-
Generation and Comprehension of Unambiguous Object Descriptions
arXiv_CV
arXiv_CV
Image_Caption
Caption
Deep_Learning
-
Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos
arXiv_CV
arXiv_CV
Sentiment
Caption
Classification
Recognition
-
Automatic Annotation of Structured Facts in Images
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Embedding
RNN
-
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Classification
Detection
VQA
Recognition
-
Rich Image Captioning in the Wild
arXiv_CV
arXiv_CV
Image_Caption
Caption
Recognition
-
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
Classification
-
Generating Visual Explanations
arXiv_CV
arXiv_CV
Reinforcement_Learning
Caption
Classification
Language_Model
Prediction
Recognition
-
Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation
arXiv_CV
arXiv_CV
Image_Caption
Regularization
GAN
Caption
CNN
RNN
Deep_Learning
-
BreakingNews: Article Annotation by Image and Text Processing
arXiv_CV
arXiv_CV
Image_Retrieval
Caption
Transfer_Learning
Deep_Learning
Prediction
Detection
Relation
-
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Caption
CNN
Inference
RNN
Memory_Networks
VQA
-
Delving Deeper into Convolutional Networks for Learning Video Representations
arXiv_CV
arXiv_CV
Video_Caption
Sparse
Caption
Action_Recognition
CNN
Recognition
-
Multi-task Sequence to Sequence Learning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Order-Embeddings of Images and Language
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Prediction
-
Generating Images from Captions with Attention
arXiv_CV
arXiv_CV
Attention
Image_Generation
Caption
-
Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
Video_Classification
RNN
Classification
Recognition
-
Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
NMT
Recognition
-
Event Specific Multimodal Pattern Mining with Image-Caption Pairs
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Weakly_Supervised
Caption
-
A Restricted Visual Turing Test for Deep Scene and Event Understanding
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Face
Ontology
Caption
Inference
VQA
-
SentiCap: Generating Image Descriptions with Sentiments
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Sentiment
Caption
Language_Model
Relation
Recognition
-
Video captioning with recurrent networks based on frame- and video-level features and visual content classification
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
RNN
Classification
Language_Model
-
TennisVid2Text: Fine-grained Descriptions for Domain Specific Videos
arXiv_CV
arXiv_CV
Caption
-
Spoken Language Translation for Polish
arXiv_CV
arXiv_CV
Speech_Recognition
Caption
RNN
Language_Model
Recognition
-
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
arXiv_CV
arXiv_CV
Image_Caption
Salient
Object_Detection
Caption
CNN
Optimization
Language_Model
Detection
-
Mapping Images to Sentiment Adjective Noun Pairs with Factorized Neural Nets
arXiv_CV
arXiv_CV
Image_Caption
Sentiment
Caption
Classification
-
How to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Knowledge
Caption
Deep_Learning
-
Deep Multimodal Semantic Embeddings for Speech and Images
arXiv_CV
arXiv_CV
Caption
Embedding
CNN
-
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
Image_Classification
Inference
Classification
Deep_Learning
-
From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Detection
-
Learning Visual Features from Large Weakly Supervised Data
arXiv_CV
arXiv_CV
Weakly_Supervised
Caption
CNN
-
Sequence to Sequence -- Video to Text
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
RNN
Language_Model
-
A Critical Review of Recurrent Neural Networks for Sequence Learning
arXiv_CV
arXiv_CV
Image_Caption
Review
Caption
Survey
Optimization
RNN
Prediction
Recognition
-
Language Models for Image Captioning: The Quirks and What Works
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
Language_Model
-
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
RNN
Prediction
-
Learning Wake-Sleep Recurrent Attention Models
arXiv_CV
arXiv_CV
Attention
Caption
CNN
Image_Classification
Inference
Classification
-
Guiding Long-Short Term Memory for Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
SentenceRacer: A Game with a Purpose for Image Sentence Annotation
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
A large annotated corpus for learning natural language inference
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
-
Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
-
Image Representations and New Domains in Neural Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Language_Model
-
Scalable Bayesian Optimization Using Deep Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Optimization
Language_Model
Recognition
-
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
CNN
RNN
Classification
Recognition
-
Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest
arXiv_CV
arXiv_CV
Sentiment
Caption
Detection
-
Attention-Based Models for Speech Recognition
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Recognition
-
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
arXiv_CV
arXiv_CV
Caption
Embedding
Quantitative
-
Aligning where to see and what to tell: image caption with region-based attention and scene factorization
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
Language_Model
-
Technical Report: Image Captioning with Semantically Similar Images
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
CNN
-
Deep Captioning with Multimodal Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
-
The Long-Short Story of Movie Description
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Understanding Image Virality
arXiv_CV
arXiv_CV
Caption
Prediction
-
Exploring Nearest Neighbor Approaches for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
arXiv_CV
arXiv_CV
Knowledge
Caption
CNN
RNN
Deep_Learning
Prediction
-
Joint Learning of Distributed Representations for Images and Texts
arXiv_CV
arXiv_CV
Caption
-
Image Specificity
arXiv_CV
arXiv_CV
Image_Retrieval
Caption
-
From Captions to Visual Concepts and Back
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Language_Model
Detection
-
Simple Image Description Generator via a Linear Phrase-Based Approach
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Language_Model
-
Phrase-based Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Language_Model
-
Microsoft COCO Captions: Data Collection and Evaluation Server
arXiv_CV
arXiv_CV
Caption
-
Recurrent Neural Network Regularization
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Speech_Recognition
Caption
RNN
Language_Model
Recognition
-
Learning a Recurrent Visual Representation for Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
Embedding
-
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
arXiv_CV
arXiv_CV
Object_Detection
Caption
Embedding
CNN
RNN
Language_Model
Detection
-
Reconstruction of vertical and L-shaped ancient Egyptian sundials and methods for measuring time
arXiv_CV
arXiv_CV
Caption
-
First result of the experimental search for the 2K-capture of Xe-124 with the copper proportional counter
arXiv_CV
arXiv_CV
Caption
-
Says who? Automatic Text-Based Content Analysis of Television News
arXiv_CV
arXiv_CV
Caption
-
About the mechanism of matter transfer along cosmic string
arXiv_CV
arXiv_CV
Caption
-
Mining Associated Text and Images with Dual-Wing Harmoniums
arXiv_CV
arXiv_CV
Caption
Inference
Classification
-
Fast and Exact Top-k Search for Random Walk with Restart
arXiv_CV
arXiv_CV
Image_Caption
Sparse
Caption
Prediction
Recommendation
-
Video OCR for Video Indexing
arXiv_CV
arXiv_CV
OCR
Video_Indexing
Caption
Recognition
-
Effectively Searching Maps in Web Documents
arXiv_CV
arXiv_CV
Review
Caption
-
Reply to the Comment of M. V. Cheremisin
arXiv_CV
arXiv_CV
Caption
-
Bounds on Leptoquark and Supersymmetric, R-parity violating Interactions from Meson Decays
arXiv_CV
arXiv_CV
Caption
-
Retrieval from Captioned Image Databases Using Natural Language Processing
arXiv_CV
arXiv_CV
Caption
Relation
-
Explanation-based Learning for Machine Translation
arXiv_CV
arXiv_CV
Caption
-
A Lexicalist Approach to the Translation of Colloquial Text
arXiv_CV
arXiv_CV
Caption
-
Statistical versus symbolic parsing for captioned-information retrieval
arXiv_CV
arXiv_CV
Caption
Image_Caption
-
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
CNN
-
Vision-to-Language Tasks Based on Attributes and Attention Mechanism
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Relation
VQA
-
On Measuring Gender Bias in Translation of Gender-neutral Pronouns
arXiv_CL
arXiv_CL
Image_Caption
Caption
Detection
Recommendation
-
Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
arXiv_CV
arXiv_CV
Image_Caption
Embedding
Detection
Relation
-
Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
Attention
Embedding
Detection
-
Bivariate Beta LSTM
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Image_Classification
RNN
Classification
Relation
-
SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding
arXiv_CL
arXiv_CL
Image_Caption
Text_Classification
Caption
Embedding
Image_Classification
Classification
-
Image Captioning based on Deep Learning Methods: A Survey
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Attention
Caption
Survey
Deep_Learning
-
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
Quantitative
-
Latent Variable Model for Multi-modal Translation
arXiv_CL
arXiv_CL
Image_Caption
Embedding
-
Harvesting Information from Captions for Weakly Supervised Semantic Segmentation
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Weakly_Supervised
Caption
Embedding
CNN
Semantic_Segmentation
-
Aligning Visual Regions and Textual Concepts: Learning Fine-Grained Image Representations for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Text_Generation
Caption
Relation
-
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
Language_Model
Recognition
-
Detect-to-Retrieve: Efficient Regional Aggregation for Image Search
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Object_Detection
Detection
-
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Caption
CNN
Classification
Detection
-
Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables
arXiv_AI
arXiv_AI
Image_Caption
Adversarial
Caption
RNN
-
Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network
arXiv_CV
arXiv_CV
Image_Caption
Face
Classification
Deep_Learning
Prediction
Recognition
-
Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Conversational Group Detection With Deep Convolutional Networks
arXiv_CV
arXiv_CV
Image_Caption
CNN
Detection
-
A Joint Convolutional Neural Networks and Context Transfer for Street Scenes Labeling
arXiv_CV
arXiv_CV
Image_Caption
CNN
Inference
-
Detecting Visual Relationships Using Box Attention
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Prediction
Quantitative
Detection
Relation
-
Accurate Visual Localization for Automotive Applications
arXiv_AI
arXiv_AI
Image_Caption
-
PR Product: A Substitute for Inner Product in Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Image_Classification
RNN
Classification
Deep_Learning
-
Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning
arXiv_CV
arXiv_CV
Image_Caption
-
UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
Embedding
Relation
-
Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications
arXiv_AI
arXiv_AI
Image_Caption
Caption
-
Pointing Novel Objects in Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
RNN
Recognition
-
Bridging the Domain Gap for Ground-to-Aerial Image Matching
arXiv_CV
arXiv_CV
Image_Caption
GAN
-
nocaps: Novel Object Captioning at Scale
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Detection
-
BERTScore: Evaluating Text Generation with BERT
arXiv_CL
arXiv_CL
Image_Caption
Text_Generation
Caption
Embedding
-
Deep Metric Learning Beyond Binary Supervision
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
Relation
-
3G structure for image caption generation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Embedding
RNN
Relation
-
Multi-modal gated recurrent units for image description
arXiv_CV
arXiv_CV
Image_Caption
Embedding
CNN
Relation
-
Challenges and Prospects in Vision and Language Research
arXiv_CV
arXiv_CV
Image_Caption
Review
VQA
-
Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
arXiv_CV
arXiv_CV
Image_Caption
Caption
Detection
Relation
-
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Detection
-
Learning to Collocate Neural Modules for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Visual Relationship Detection with Language prior and Softmax
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Detection
Relation
-
Single Pixel Reconstruction for One-stage Instance Segmentation
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Segmentation
Inference
Prediction
Detection
-
SIMCO: SIMilarity-based object COunting
arXiv_CV
arXiv_CV
Image_Caption
Embedding
-
Natural Language Statistical Features of LSTM-generated Texts
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Quantitative
Relation
-
Self-critical n-step Training for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Sparse
Reinforcement_Learning
Caption
-
Localizing Discriminative Visual Landmarks for Place Recognition
arXiv_CV
arXiv_CV
Image_Caption
CNN
Recognition
-
Big but Imperceptible Adversarial Perturbations via Semantic Manipulation
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
Image_Classification
Classification
Deep_Learning
-
TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning
arXiv_AI
arXiv_AI
Image_Caption
Embedding
Prediction
-
Intention Oriented Image Captions with Guiding Objects
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
Embedding
Relation
-
Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Embedding
Recognition
-
On the Intrinsic Dimensionality of Image Representations
arXiv_CV
arXiv_CV
Image_Caption
Face
-
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
-
Learning Non-Metric Visual Similarity for Image Retrieval
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
-
Self-Supervised GANs via Auxiliary Rotation Loss
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Represenation_Learning
-
Exploring Uncertainty Measures for Image-Caption Embedding-and-Retrieval Task
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Classification
Deep_Learning
-
UG$^{2+}$ Track 2: A Collective Benchmark Effort for Evaluating and Advancing Image Understanding in Poor Visibility Environments
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Face
Detection
Face_Detection
Recognition
-
ContextDesc: Local Descriptor Augmentation with Cross-Modality Context
arXiv_CV
arXiv_CV
Image_Caption
Relation
-
Unsupervised Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Knowledge
Caption
Detection
-
Measuring scheduling efficiency of RNNs for NLP applications
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
Optimization
Inference
RNN
Recognition
-
Hypernetwork functional image representation
arXiv_CV
arXiv_CV
Image_Caption
Super_Resolution
-
Evaluating Text-to-Image Matching using Binary Image Selection
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
-
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Prediction
-
Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning
arXiv_CV
arXiv_CV
Image_Caption
Quantitative
-
Good News, Everyone! Context driven entity-aware captioning for news images
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Relation
-
Scene Graph Generation with External Knowledge and Image Reconstruction
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Knowledge
Attention
Prediction
Detection
Relation
-
Pedestrian re-identification based on Tree branch network with local and global learning
arXiv_CV
arXiv_CV
Image_Caption
Re-identification
Person_Re-identification
-
ImageGCN: Multi-Relational Image Graph Convolutional Networks for Disease Identification with Chest X-rays
arXiv_AI
arXiv_AI
Image_Caption
Object_Detection
Weakly_Supervised
CNN
Detection
Relation
-
Object Hallucination in Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Classification
-
Describing like humans: on diversity in image captioning
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
-
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Weakly_Supervised
Caption
-
Learning semantic sentence representations from visually grounded language without lexical knowledge
arXiv_CL
arXiv_CL
Image_Caption
Knowledge
Caption
Embedding
-
Differentiable Scene Graphs
arXiv_CV
arXiv_CV
Image_Caption
Optimization
Relation
-
AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Style_Transfer
Caption
Detection
-
Unpaired Image Captioning via Scene Graph Alignments
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
-
On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning
arXiv_AI
arXiv_AI
Image_Caption
Reinforcement_Learning
CNN
-
Vector of Locally-Aggregated Word Embeddings : A Novel Document-level Representation
arXiv_CL
arXiv_CL
Image_Caption
Review
Text_Classification
Embedding
Classification
-
Joint Learning of Discriminative Low-dimensional Image Representations Based on Dictionary Learning and Two-layer Orthogonal Projections
arXiv_CV
arXiv_CV
Image_Caption
Sparse
CNN
Image_Classification
Optimization
Classification
Deep_Learning
Gradient_Descent
-
Semantic Comparison of State-of-the-Art Deep Learning Methods for Image Multi-Label Classification
arXiv_CV
arXiv_CV
Image_Caption
Face
Classification
Deep_Learning
Recognition
-
Engaging Image Captioning Via Personality
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
HWNet v2: An Efficient Word Image Representation for Handwritten Documents
arXiv_CV
arXiv_CV
Image_Caption
CNN
Transfer_Learning
Classification
-
Learning to Augment Synthetic Images for Sim2Real Policy Transfer
arXiv_CV
arXiv_CV
Image_Caption
-
Boosted Attention: Leveraging Human Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
CNN
Language_Model
Recognition
-
A Weighted Multi-Criteria Decision Making Approach for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Show, Translate and Tell
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
-
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Prediction
Relation
-
MirrorGAN: Learning Text-to-image Generation by Redescription
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Attention
GAN
Embedding
-
Neural Scene Decomposition for Multi-Person Motion Capture
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Pose_Estimation
-
Unsupervised Discovery of Parts, Structure, and Dynamics
arXiv_AI
arXiv_AI
Image_Caption
-
Generating superpixels using deep image representations
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Tracking
Object_Tracking
Semantic_Segmentation
Classification
-
A Unified Formulation for Visual Odometry
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Tracking
Optimization
-
Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
-
Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Tracking
CNN
Transfer_Learning
Represenation_Learning
Prediction
Detection
-
Hierarchical Autoregressive Image Models with Auxiliary Decoders
arXiv_CV
arXiv_CV
Image_Caption
-
Image captioning with weakly-supervised attention penalty
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
A Synchronized Multi-Modal Attention-Caption Dataset and Analysis
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
Relation
-
Dixit: Interactive Visual Storytelling via Term Manipulation
arXiv_CL
arXiv_CL
Image_Caption
Caption
RNN
-
From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Embedding
CNN
-
COMIC: Towards A Compact Image Captioning Model with Attention
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Embedding
-
Let's Transfer Transformations of Shared Semantic Representations
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Embedding
-
Extreme Channel Prior Embedded Network for Dynamic Scene Deblurring
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Sparse
CNN
Quantitative
-
Answer Them All! Toward Universal Visual Question Answering Models
arXiv_CV
arXiv_CV
Image_Caption
QA
VQA
-
Towards Automatic Construction of Diverse, High-quality Image Dataset
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Weakly_Supervised
Image_Classification
Classification
Detection
-
Insertion-based Decoding with automatically Inferred Generation Order
arXiv_CL
arXiv_CL
Image_Caption
Caption
-
Using Deep Object Features for Image Descriptions
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Language_Model
-
End-to-end Hand Mesh Recovery from a Monocular RGB Image
arXiv_CV
arXiv_CV
Image_Caption
Pose_Estimation
-
Audio Caption: Listen and Tell
arXiv_CL
arXiv_CL
Image_Caption
Caption
Classification
Detection
Relation
-
Vector of Locally-Aggregated Word Embeddings : A novel document-level embedding
arXiv_CL
arXiv_CL
Image_Caption
Review
Text_Classification
Embedding
Classification
-
Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks
arXiv_CV
arXiv_CV
Image_Caption
CNN
-
Image Aesthetics Assessment Using Composite Features from off-the-Shelf Deep Models
arXiv_CV
arXiv_CV
Image_Caption
CNN
Image_Classification
Classification
Deep_Learning
Recognition
-
FreeLabel: A Publicly Available Annotation Tool based on Freehand Traces
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Face
Deep_Learning
Quantitative
-
Object Recognition under Multifarious Conditions: A Reliability Analysis and A Feature Similarity-based Performance Estimation
arXiv_CV
arXiv_CV
Image_Caption
Deep_Learning
Relation
Recognition
-
Deep Convolutional Sum-Product Networks for Probabilistic Image Representations
arXiv_CV
arXiv_CV
Image_Caption
Regularization
CNN
Inference
Relation
-
BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
CNN
Classification
Deep_Learning
-
Contextual Memory Trees
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Classification
-
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Graph-RISE: Graph-Regularized Image Semantic Embedding
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Embedding
Image_Classification
Classification
-
Wasserstein Barycenter Model Ensembling
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Classification
-
Improving Image Captioning with Conditional Generative Adversarial Nets
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
RNN
-
Attend More Times for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
-
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Caption
Language_Model
Prediction
VQA
-
A sequential guiding network with attention for image captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
Deep_Learning
-
Asynchronous Spatial Image Convolutions for Event Cameras
arXiv_CV
arXiv_CV
Image_Caption
Tracking
Detection
-
Area Attention
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Rethinking Visual Relationships for High-level Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
VQA
-
TGAN: Deep Tensor Generative Adversarial Nets for Large Image Generation
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Super_Resolution
GAN
CNN
-
Fast and Efficient Lenslet Image Compression
arXiv_CV
arXiv_CV
Image_Caption
GAN
Prediction
-
Improving Image Captioning by Leveraging Knowledge Graphs
arXiv_CV
arXiv_CV
Image_Caption
Knowledge_Graph
Knowledge
Caption
-
Face-Cap: Image Captioning using Facial Expression Analysis
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Relation
-
Deep Learning on Attributed Graphs: A Journey from Graphs to Their Embeddings and Back
arXiv_CV
arXiv_CV
Image_Caption
Embedding
Deep_Learning
Prediction
Relation
-
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Inference
VQA
-
Binary Image Selection : Interpretable Evaluation of Visual Grounding
arXiv_AI
arXiv_AI
Image_Caption
Caption
-
How to Become Instagram Famous: Post Popularity Prediction with Dual-Attention
arXiv_CV
arXiv_CV
Image_Caption
Attention
Face
Caption
Classification
Prediction
-
Deep Representation Learning Characterized by Inter-class Separation for Image Clustering
arXiv_CV
arXiv_CV
Image_Caption
Represenation_Learning
-
Improving Sequence-to-Sequence Learning via Optimal Transport
arXiv_CL
arXiv_CL
Image_Caption
Summarization
Caption
-
Image Based Review Text Generation with Emotional Guidance
arXiv_AI
arXiv_AI
Image_Caption
Review
Text_Generation
Caption
-
Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models
arXiv_AI
arXiv_AI
Image_Caption
Summarization
Caption
Language_Model
Prediction
-
Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Attention
GAN
Reinforcement_Learning
Caption
Optimization
Inference
RNN
Deep_Learning
-
DeepBase: Deep Inspection of Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Optimization
Deep_Learning
Recognition
-
MultiDEC: Multi-Modal Clustering of Image-Caption Pairs
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
A Hierarchical Grocery Store Image Dataset with Visual and Semantic Labels
arXiv_CV
arXiv_CV
Image_Caption
CNN
Image_Classification
Classification
Prediction
-
Generating Multiple Objects at Spatially Distinct Locations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
-
Transfer learning from language models to image caption generators: Better models may not transfer better
arXiv_CL
arXiv_CL
Image_Caption
Caption
Embedding
CNN
Transfer_Learning
Language_Model
-
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Attention
Caption
RNN
Language_Model
-
Multi-modal Learning with Prior Visual Relation Reasoning
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
QA
Embedding
CNN
Relation
VQA
-
nocaps: novel object captioning at scale
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Detection
-
Generating Diverse and Meaningful Captions
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
-
Feature Fusion Effects of Tensor Product Representation on Compositional Network for Caption Generation for Images
arXiv_CV
arXiv_CV
Image_Caption
Caption
Language_Model
Relation
-
Grounded Video Description
arXiv_CV
arXiv_CV
Image_Caption
Caption
Recognition
-
Adversarial Inference for Multi-Sentence Video Description
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Video_Caption
GAN
Caption
Inference
-
Auto-Encoding Scene Graphs for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Relation
-
Multi-task Learning of Hierarchical Vision-Language Representation
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
Prediction
Relation
VQA
-
Turbo Learning for Captionbot and Drawingbot
arXiv_CV
arXiv_CV
Image_Caption
Text_Generation
Caption
-
Towards Task Understanding in Visual Settings
arXiv_CV
arXiv_CV
Image_Caption
Ontology
Text_Generation
Caption
CNN
-
Partially-Supervised Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
RNN
Detection
-
Senti-Attend: Image Captioning using Sentiment and Attention
arXiv_CV
arXiv_CV
Image_Caption
Sentiment
Attention
Caption
-
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Caption
CNN
Classification
-
An Interpretable Model for Scene Graph Generation
arXiv_CV
arXiv_CV
Image_Caption
QA
Caption
Detection
Relation
-
Scene Graph Generation via Conditional Random Fields
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Object_Detection
QA
Segmentation
Caption
Detection
Relation
-
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Action_Recognition
CNN
Classification
Deep_Learning
Prediction
Recognition
-
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Recognition
-
Entity-aware Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Knowledge_Graph
Knowledge
Caption
CNN
Inference
RNN
Memory_Networks
-
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Relation
VQA
-
Attentive Tensor Product Learning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
Deep_Learning
-
Gated Hierarchical Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Reinforcement_Learning
Caption
CNN
Prediction
VQA
Recognition
-
A Neural Compositional Paradigm for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings
arXiv_CV
arXiv_CV
Image_Caption
Re-identification
Video_Caption
Person_Re-identification
Caption
Embedding
RNN
-
Image Captioning as Neural Machine Translation Task in SOCKEYE
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
-
Bringing back simplicity and lightliness into neural image captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
UMONS Submission for WMT18 Multimodal Translation Task
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
A Comprehensive Survey of Deep Learning for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Review
Caption
Survey
Deep_Learning
Relation
-
Quantifying the amount of visual information used by neural caption generators
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Pre-gen metrics: Predicting caption quality metrics without generating captions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Vector Learning for Cross Domain Representations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Video_Caption
GAN
Caption
-
Semantically Invariant Text-to-Image Generation
arXiv_CV
arXiv_CV
Image_Caption
Caption
Quantitative
-
Batch-normalized Recurrent Highway Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
A Neural-Symbolic Approach to Design of CAPTCHA
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Deep_Learning
-
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Textually Enriched Neural Module Networks for Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
QA
Attention
Caption
VQA
Recognition
-
Lessons learned in multilingual grounded language learning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
-
Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
-
Exploring Visual Relationship for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
Relation
-
Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Quantitative
-
Image Captioning based on Deep Reinforcement Learning
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Reinforcement_Learning
Caption
RNN
-
End-to-end Image Captioning Exploits Multimodal Distributional Similarity
arXiv_CV
arXiv_CV
Image_Caption
Text_Generation
Caption
RNN
-
SPASS: Scientific Prominence Active Search System with Deep Image Captioning Network
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
BFGAN: Backward and Forward Generative Adversarial Networks for Lexically Constrained Sentence Generation
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Knowledge
GAN
Caption
RNN
-
Neural Network Interpretation via Fine Grained Textual Summarization
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Summarization
Caption
Inference
Classification
Prediction
-
Diverse and Coherent Paragraph Generation from Images
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Caption
-
Chittron: An Automatic Bangla Image Captioning System
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
RNN
Language_Model
-
Approximate Distribution Matching for Sequence-to-Sequence Learning
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Caption
Optimization
RNN
Prediction
-
When to Finish? Optimal Beam Search for Neural Text Generation
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Text_Generation
Caption
-
Hard Non-Monotonic Attention for Character-Level Transduction
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Text_Generation
Caption
-
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
A neural attention model for speech command recognition
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
Recognition
-
simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Context-Aware Visual Policy Network for Sequence-Level Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Reinforcement_Learning
Caption
Prediction
Relation
-
Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
-
Decoupled Novel Object Captioner
arXiv_CV
arXiv_CV
Image_Caption
Caption
Detection
-
Dropout during inference as a model for neurological degeneration in an image captioning network
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
-
Online Illumination Invariant Moving Object Detection by Generative Neural Network
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Optimization
Detection
Gradient_Descent
-
Recurrent Fusion Network for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
-
Doubly Attentive Transformer Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
NMT
-
Emulating malware authors for proactive protection using GANs over a distributed image visualization of dynamic file behavior
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
CNN
-
ADVISE: Symbolism and External Knowledge for Decoding Advertisements
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
GAN
Caption
Recognition
-
'Factual' or 'Emotional': Stylized Image Captioning with Adaptive Learning and Attention
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
RNN
-
Rethinking the Form of Latent States in Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Image Generation from Sketch Constraint Using Contextual GAN
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
-
Distinctive-attribute Extraction for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
-
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
arXiv_CV
arXiv_CV
Image_Caption
GAN
Caption
Relation
-
Inductive Visual Localisation: Factorised Training for Superior Generalisation
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Recognition
-
What is not where: the challenge of integrating spatial representations into deep learning architectures
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Knowledge
Caption
Deep_Learning
Language_Model
Detection
Relation
-
Deep Reinforcement Learning For Sequence to Sequence Models
arXiv_CV
arXiv_CV
Image_Caption
Attention
Summarization
Reinforcement_Learning
Caption
Survey
-
Unpaired Image Captioning by Language Pivoting
arXiv_CV
arXiv_CV
Image_Caption
Caption
Quantitative
-
Object Relation Detection Based on One-shot Learning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Deep_Learning
Detection
Relation
Recognition
-
Object Detection with Deep Learning: A Review
arXiv_CV
arXiv_CV
Image_Caption
Salient
Review
Object_Detection
Attention
Face
Survey
CNN
Optimization
Deep_Learning
Detection
Face_Detection
Relation
-
Topic-Guided Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Quantitative
-
Learning The Sequential Temporal Information with Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Review
Speech_Recognition
Tracking
Caption
Object_Tracking
RNN
Language_Model
Prediction
Recognition
-
Women also Snowboard: Overcoming Bias in Captioning Models
arXiv_CV
arXiv_CV
Image_Caption
Caption
Prediction
-
Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Represenation_Learning
Deep_Learning
Quantitative
-
Learning Visually-Grounded Semantics from Contrastive Adversarial Samples
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Knowledge
Caption
Embedding
Quantitative
-
Learning to Evaluate Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Relation
-
iParaphrasing: Extracting Visually Grounded Paraphrases via an Image
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
VQA
-
Discriminability objective for training descriptive captions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Learning Visually Grounded Sentence Representations
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
Embedding
-
Improved Image Captioning with Adversarial Semantic Alignment
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
RNN
Relation
-
Grow and Prune Compact, Fast, and Accurate LSTMs
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
RNN
Recognition
-
Neural Joking Machine : Humorous image captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Quantifying the visual concreteness of words and topics in multimodal datasets
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
Recommendation
-
CNN+CNN: Convolutional Decoders for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
-
Joint Image Captioning and Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
QA
Caption
VQA
-
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
CNN
RNN
-
Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
CNN
RNN
Prediction
Quantitative
-
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text
arXiv_CV
arXiv_CV
Image_Caption
Caption
Language_Model
-
Defoiling Foiled Image Captions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Token-level and sequence-level loss smoothing for RNN language models
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
Prediction
-
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
arXiv_CV
arXiv_CV
Image_Caption
Caption
Action_Recognition
RNN
Prediction
Recognition
-
Pragmatically Informative Image Captioning with Character-Level Inference
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
-
Mobile Multi-View Object Image Search
arXiv_CV
arXiv_CV
Image_Caption
-
Object Counts! Bringing Explicit Detections Back into Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Embedding
Language_Model
Detection
-
Deep Semantic Hashing with Generative Adversarial Networks
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Adversarial
GAN
CNN
Classification
-
Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer
arXiv_CV
arXiv_CV
Image_Caption
Sentiment
Review
Adversarial
Style_Transfer
Caption
-
Discovery and usage of joint attention in images
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Detection
-
Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
RNN
-
Finding beans in burgers: Deep semantic-visual embedding with localization
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Relation
-
Learning to Guide Decoding for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Guide Me: Interacting with Deep Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Inference
-
Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering
arXiv_CV
arXiv_CV
Image_Caption
Caption
VQA
-
Fraternal Dropout
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Caption
Inference
RNN
Language_Model
Prediction
-
COCO-Stuff: Thing and Stuff Classes in Context
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Attention
Face
Caption
Semantic_Segmentation
Classification
Detection
Relation
-
Neural Baby Talk
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Detection
-
Object Detection for Comics using Manga109 Annotations
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
CNN
Detection
-
Bayesian Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
-
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Inference
Prediction
-
Where to put the Image in an Image Caption Generator
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
-
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Salient
QA
Attention
Caption
VQA
-
Improved Image Captioning via Policy Gradient optimization of SPIDEr
arXiv_CV
arXiv_CV
Image_Caption
Caption
Optimization
-
Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Attention
Weakly_Supervised
Caption
Semantic_Segmentation
-
Contextually Customized Video Summaries via Natural Language
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
-
Multimodal Named Entity Recognition for Short Social Media Posts
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
Recognition
-
Human Action Adverb Recognition: ADHA Dataset and A Three-Stream Hybrid Model
arXiv_CV
arXiv_CV
Image_Caption
Caption
Action_Recognition
Recognition
-
Zero-Resource Neural Machine Translation with Multi-Agent Communication Game
arXiv_CV
arXiv_CV
Image_Caption
Caption
NMT
-
Generating Triples with Adversarial Networks for Scene Graph Construction
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Object_Detection
Attention
GAN
Caption
Image_Classification
Classification
Deep_Learning
Detection
Relation
VQA
-
Multimodal Image Captioning for Marketing Analysis
arXiv_CV
arXiv_CV
Image_Caption
Caption
Classification
Relation
-
Attention-Based Models for Text-Dependent Speaker Verification
arXiv_CV
arXiv_CV
Image_Caption
Attention
Summarization
Speech_Recognition
Caption
RNN
Recognition
-
Netizen-Style Commenting on Fashion Photos: Dataset and Diversity Measures
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions
arXiv_CV
arXiv_CV
Image_Caption
Sentiment
Attention
Caption
-
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
QA
Attention
Caption
Inference
Detection
VQA
-
Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
CNN
VQA
-
Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli
arXiv_CV
arXiv_CV
Image_Caption
Caption
Deep_Learning
Quantitative
Relation
-
Image Captioning using Deep Neural Architectures
arXiv_CV
arXiv_CV
Image_Caption
Caption
Recognition
-
DeepSeek: Content Based Image Search & Retrieval
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Deep_Learning
Language_Model
-
Approximate FPGA-based LSTMs under Computation Time Constraints
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Quantitative
-
GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
RNN
-
Exploring Models and Data for Remote Sensing Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Review
Attention
Caption
Classification
Detection
-
Order-Free RNN with Visual Attention for Multi-Label Classification
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
Inference
RNN
Classification
Prediction
-
Synthesizing Novel Pairs of Image and Text
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
-
Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Embedding
CNN
NMT
Detection
-
Tensor Product Generation Networks for Deep NLP Modeling
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Deep_Learning
-
OSU Multimodal Machine Translation System Report
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Long Text Generation via Adversarial Training with Leaked Information
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Text_Generation
Reinforcement_Learning
Caption
-
Actor-Critic Sequence Training for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
-
Convolutional Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
-
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding
arXiv_CV
arXiv_CV
Image_Caption
Caption
Classification
Detection
-
Deep Matching Autoencoders
arXiv_CV
arXiv_CV
Image_Caption
GAN
Caption
Represenation_Learning
-
Self-critical Sequence Training for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Optimization
Inference
-
DataVizard: Recommending Visual Presentations for Structured Data
arXiv_CV
arXiv_CV
Image_Caption
Caption
Survey
-
Phrase-based Image Captioning with Hierarchical LSTM Model
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
RNN
-
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
-
Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML
arXiv_CV
arXiv_CV
Image_Caption
Caption
Classification
Prediction
Relation
Recognition
-
Automated Audio Captioning with Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Classification
-
A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Classification
-
Improved Search in Hamming Space using Deep Multi-Index Hashing
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
-
Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Attention
Caption
Inference
Recognition
-
Cold-Start Reinforcement Learning with Softmax Policy Gradient
arXiv_CV
arXiv_CV
Image_Caption
Summarization
Reinforcement_Learning
Caption
Prediction
-
Contrastive Learning for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Neural Extractive Summarization with Side Information
arXiv_CV
arXiv_CV
Image_Caption
Attention
Summarization
Caption
-
Learning the Enigma with Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
RNN
Recognition
-
Generating Video Descriptions with Topic Guidance
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
Prediction
-
What is the Role of Recurrent Neural Networks in an Image Caption Generator?
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Areas of Attention for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
CNN
RNN
Language_Model
Detection
-
Cold Fusion: Training Seq2Seq Models Together with Language Models
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Language_Model
Recognition
-
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
Recognition
-
ConvNet Architecture Search for Spatiotemporal Feature Learning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Segmentation
NAS
Caption
Semantic_Segmentation
Inference
Detection
-
Fluency-Guided Cross-Lingual Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Caption
Inference
-
Towards Diverse and Natural Image Descriptions via a Conditional GAN
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Reinforcement_Learning
Caption
RNN
-
MAT: A Multimodal Attentive Translator for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
RNN
-
Learning to Disambiguate by Asking Discriminative Questions
arXiv_CV
arXiv_CV
Image_Caption
Weakly_Supervised
Caption
Quantitative
VQA
-
Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval
arXiv_CV
arXiv_CV
Image_Caption
CNN
-
Recurrent Models for Situation Recognition
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Prediction
Recognition
-
Paying Attention to Descriptions Generated by Image Captioning Models
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
Language_Model
-
An Empirical Study of Language CNN for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
-
Context-aware Captions from Context-agnostic Supervision
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Language_Model
-
Deep Interactive Region Segmentation and Captioning
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Knowledge
Segmentation
Caption
CNN
Deep_Learning
Detection
-
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
RNN
Language_Model
Detection
Relation
-
Captioning Images with Diverse Objects
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Embedding
Recognition
-
Guided Open Vocabulary Image Captioning with Constrained Beam Search
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Prediction
-
CUNI System for the WMT17 Multimodal Translation Task
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
End-to-End Instance Segmentation with Recurrent Attention
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Attention
Caption
CNN
Semantic_Segmentation
RNN
Prediction
VQA
-
Where to Play: Retrieval of Video Segments using Natural-Language Queries
arXiv_CV
arXiv_CV
Image_Caption
Tracking
Caption
Quantitative
Relation
-
A Semi-supervised Framework for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Salient
Review
Attention
Caption
Embedding
-
Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
One Model To Learn Them All
arXiv_CV
arXiv_CV
Image_Caption
Sparse
Attention
Speech_Recognition
Caption
CNN
Image_Classification
Classification
Deep_Learning
Recognition
-
Visual Question Answering: Datasets, Algorithms, and Future Challenges
arXiv_CV
arXiv_CV
Image_Caption
Review
QA
Deep_Learning
VQA
-
Image Captioning with Object Detection and Localization
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
RNN
Detection
Relation
-
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Language_Model
-
Teaching Machines to Describe Images via Natural Language Feedback
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
GAN
Caption
Transfer_Learning
RNN
-
Deep image representations using caption generators
arXiv_CV
arXiv_CV
Image_Caption
Caption
Transfer_Learning
Deep_Learning
Recognition
-
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
VQA
-
CHAM: action recognition using convolutional hierarchical attention model
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Action_Recognition
CNN
RNN
Recognition
-
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
VQA
-
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Prediction
Quantitative
Memory_Networks
-
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
arXiv_CV
arXiv_CV
Image_Caption
Reinforcement_Learning
Caption
Embedding
Prediction
-
Learning a Deep Embedding Model for Zero-Shot Learning
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
-
Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Classification
-
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
CNN
Prediction
-
A Hierarchical Approach for Generating Descriptive Image Paragraphs
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
CNN
Detection
Relation
Recognition
-
Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Detection
-
Semantic Compositional Networks for Visual Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Quantitative
-
Recurrent Topic-Transition GAN for Visual Paragraph Generation
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Knowledge
Attention
GAN
Quantitative
-
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
VQA
-
Can Active Memory Replace Attention?
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Image_Classification
Classification
Deep_Learning
Recognition
-
Evolving Deep Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
Deep_Learning
Language_Model
Recognition
-
MIML-FCN+: Multi-instance Multi-label Learning via Fully Convolutional Networks with Privileged Information
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Deep_Learning
Relation
Recognition
-
Correlation Hashing Network for Efficient Cross-Modal Retrieval
arXiv_CV
arXiv_CV
Image_Caption
CNN
Relation
-
Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
RNN
Prediction
Detection
-
Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
CNN
Classification
-
Grad-CAM: Why did you say that?
arXiv_CV
arXiv_CV
Image_Caption
QA
Caption
CNN
Prediction
Relation
VQA
-
Deep Network Guided Proof Search
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
Deep_Learning
Detection
Recognition
-
Comprehension-guided referring expressions
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Re-evaluating Automatic Metrics for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Relation
-
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
CNN
RNN
VQA
-
Beyond Holistic Object Recognition: Enriching Image Understanding with Part States
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Recognition
-
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
Inference
RNN
-
Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image
arXiv_CV
arXiv_CV
Image_Caption
GAN
Caption
CNN
Detection
Recommendation
-
Text-guided Attention Model for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
-
Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
-
On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Ask Your Neurons: A Deep Learning Approach to Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Deep_Learning
VQA
-
Watch What You Just Said: Image Captioning with Text-Conditional Attention
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Embedding
RNN
Language_Model
Quantitative
-
Attention Correctness in Neural Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
Deep_Learning
Quantitative
-
Revisiting Visual Question Answering Baselines
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Classification
VQA
-
Semantic Regularisation for Recurrent Image Annotation
arXiv_CV
arXiv_CV
Image_Caption
Face
Caption
Embedding
RNN
Classification
Relation
-
Sort Story: Sorting Jumbled Images and Captions into Stories
arXiv_CV
arXiv_CV
Image_Caption
QA
Summarization
Caption
Prediction
-
Boosting Image Captioning with Attributes
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
Relation
-
Review Networks for Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Review
Attention
Caption
RNN
-
VQA: Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Caption
VQA
-
Generating captions without looking beyond objects
arXiv_CV
arXiv_CV
Image_Caption
Caption
Language_Model
-
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
Quantitative
VQA
-
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Embedding
Detection
-
Multimodal Attention for Neural Machine Translation
arXiv_CV
arXiv_CV
Image_Caption
Attention
Caption
NMT
-
Leveraging Visual Question Answering for Image-Caption Ranking
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Knowledge
QA
Caption
VQA
-
Measuring Machine Intelligence Through Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
Caption
VQA
-
Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions
arXiv_CV
arXiv_CV
Image_Caption
Segmentation
Caption
Language_Model
-
Learning to generalize to new compositions in image understanding
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Prediction
-
Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Segmentation
Attention
Speech_Recognition
Caption
RNN
Recognition
-
Seeing with Humans: Gaze-Assisted Neural Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
Relation
Recognition
-
DeepDiary: Automatic Caption Generation for Lifelogging Image Streams
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
GAN
Caption
Deep_Learning
Quantitative
-
Compressive Change Retrieval for Moving Object Detection
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Detection
Recognition
-
SPICE: Semantic Propositional Image Caption Evaluation
arXiv_CV
arXiv_CV
Image_Caption
Caption
Relation
-
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Face
Caption
Classification
Recognition
Face_Recognition
-
Image Captioning with Deep Bidirectional LSTMs
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Attention
Caption
Embedding
CNN
RNN
Detection
-
Multimodal Pivots for Image Caption Translation
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
CNN
-
Generating Natural Questions About an Image
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Inference
VQA
-
Automated Image Captioning for Rapid Prototyping and Resource Constrained Environments
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Deep_Learning
Detection
-
Beyond Caption To Narrative: Video Captioning With Multiple Sentences
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
-
Improving Image Captioning by Concept-based Sentence Reranking
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
Detection
-
What value do explicit high level concepts have in vision to language problems?
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
VQA
-
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Knowledge
Caption
Recognition
-
Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels
arXiv_CV
arXiv_CV
Image_Caption
Caption
Image_Classification
Classification
-
Natural Language Object Retrieval
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Knowledge
Caption
-
Generation and Comprehension of Unambiguous Object Descriptions
arXiv_CV
arXiv_CV
Image_Caption
Caption
Deep_Learning
-
Automatic Annotation of Structured Facts in Images
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Classification
Detection
VQA
Recognition
-
Rich Image Captioning in the Wild
arXiv_CV
arXiv_CV
Image_Caption
Caption
Recognition
-
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
Classification
-
Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation
arXiv_CV
arXiv_CV
Image_Caption
Regularization
GAN
Caption
CNN
RNN
Deep_Learning
-
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
Attention
Caption
CNN
Inference
RNN
Memory_Networks
VQA
-
Instance-Aware Hashing for Multi-Label Image Retrieval
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
GAN
-
Multi-task Sequence to Sequence Learning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
Order-Embeddings of Images and Language
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
Prediction
-
Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
arXiv_CV
arXiv_CV
Image_Caption
Speech_Recognition
Caption
Video_Classification
RNN
Classification
Recognition
-
Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
NMT
Recognition
-
Event Specific Multimodal Pattern Mining with Image-Caption Pairs
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Weakly_Supervised
Caption
-
SentiCap: Generating Image Descriptions with Sentiments
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Sentiment
Caption
Language_Model
Relation
Recognition
-
Neural Self Talk: Image Understanding via Continuous Questioning and Answering
arXiv_CV
arXiv_CV
Image_Caption
QA
CNN
RNN
VQA
-
Video captioning with recurrent networks based on frame- and video-level features and visual content classification
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
RNN
Classification
Language_Model
-
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
arXiv_CV
arXiv_CV
Image_Caption
Salient
Object_Detection
Caption
CNN
Optimization
Language_Model
Detection
-
Mapping Images to Sentiment Adjective Noun Pairs with Factorized Neural Nets
arXiv_CV
arXiv_CV
Image_Caption
Sentiment
Caption
Classification
-
How to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Knowledge
Caption
Deep_Learning
-
From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge
arXiv_CV
arXiv_CV
Image_Caption
Knowledge
Caption
Detection
-
Sequence to Sequence -- Video to Text
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
RNN
Language_Model
-
A Critical Review of Recurrent Neural Networks for Sequence Learning
arXiv_CV
arXiv_CV
Image_Caption
Review
Caption
Survey
Optimization
RNN
Prediction
Recognition
-
Language Models for Image Captioning: The Quirks and What Works
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
Language_Model
-
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
RNN
Prediction
-
Guiding Long-Short Term Memory for Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
SentenceRacer: A Game with a Purpose for Image Sentence Annotation
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
A large annotated corpus for learning natural language inference
arXiv_CV
arXiv_CV
Image_Caption
Caption
Inference
-
Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
Language_Model
-
Image Representations and New Domains in Neural Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Language_Model
-
Scalable Bayesian Optimization Using Deep Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Optimization
Language_Model
Recognition
-
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
CNN
RNN
Classification
Recognition
-
Attention-Based Models for Speech Recognition
arXiv_CV
arXiv_CV
Image_Caption
Attention
Speech_Recognition
Caption
Recognition
-
Aligning where to see and what to tell: image caption with region-based attention and scene factorization
arXiv_CV
arXiv_CV
Image_Caption
Salient
Attention
Caption
Language_Model
-
Technical Report: Image Captioning with Semantically Similar Images
arXiv_CV
arXiv_CV
Image_Caption
Caption
Embedding
CNN
-
Deep Captioning with Multimodal Recurrent Neural Networks
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
RNN
-
The Long-Short Story of Movie Description
arXiv_CV
arXiv_CV
Image_Caption
Caption
RNN
-
Exploring Nearest Neighbor Approaches for Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
-
From Captions to Visual Concepts and Back
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Caption
Language_Model
Detection
-
Simple Image Description Generator via a Linear Phrase-Based Approach
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Language_Model
-
Phrase-based Image Captioning
arXiv_CV
arXiv_CV
Image_Caption
Caption
CNN
Language_Model
-
3D Object Class Detection in the Wild
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Detection
-
Recurrent Neural Network Regularization
arXiv_CV
arXiv_CV
Image_Caption
Regularization
Speech_Recognition
Caption
RNN
Language_Model
Recognition
-
Learning a Recurrent Visual Representation for Image Caption Generation
arXiv_CV
arXiv_CV
Image_Caption
Image_Retrieval
Caption
Embedding
-
Detection Bank: An Object Detection Based Video Representation for Multimedia Event Recognition
arXiv_CV
arXiv_CV
Image_Caption
Object_Detection
Classification
Detection
Recognition
-
Fast and Exact Top-k Search for Random Walk with Restart
arXiv_CV
arXiv_CV
Image_Caption
Sparse
Caption
Prediction
Recommendation
Video_Caption
-
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
CNN
-
Exploring Temporal Information for Improved Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Segmentation
Face
Action_Recognition
Semantic_Segmentation
Video_Classification
Inference
Classification
Prediction
Recognition
-
Lightweight Network Architecture for Real-Time Action Recognition
arXiv_AI
arXiv_AI
Video_Caption
Action_Recognition
Inference
Recognition
-
Neural Message Passing on Hybrid Spatio-Temporal Visual and Symbolic Graphs for Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Segmentation
Classification
Deep_Learning
Detection
Relation
-
Video Instance Segmentation
arXiv_CV
arXiv_CV
Video_Caption
Segmentation
Tracking
Detection
-
On Flow Profile Image for Video Representation
arXiv_CV
arXiv_CV
Video_Caption
Caption
Optimization
Video_Classification
Classification
Recognition
-
Memory-Attended Recurrent Network for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Multimodal Semantic Attention Network for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
RNN
Classification
-
Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition
arXiv_CV
arXiv_CV
Video_Caption
Attention
Action_Recognition
Relation
Recognition
-
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
CNN
Optimization
RNN
Relation
-
Hierarchical Recurrent Neural Network for Video Summarization
arXiv_CV
arXiv_CV
Video_Caption
Summarization
Caption
RNN
Classification
-
Holistic Large Scale Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
GAN
Action_Recognition
Recognition
-
Long-Term Feature Banks for Detailed Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
CNN
-
Recurrent Space-time Graphs for Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
-
Membership Inference Attacks on Sequence-to-Sequence Models
arXiv_CL
arXiv_CL
Video_Caption
Caption
Inference
-
Streamlined Dense Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Reinforcement_Learning
Caption
-
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
An End-to-End Baseline for Video Captioning
arXiv_AI
arXiv_AI
Video_Caption
Attention
Caption
Action_Recognition
CNN
RNN
Recognition
-
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction
arXiv_CV
arXiv_CV
Video_Caption
Action_Recognition
Prediction
Recognition
-
VideoBERT: A Joint Model for Video and Language Representation Learning
arXiv_CV
arXiv_CV
Video_Caption
Speech_Recognition
Caption
Represenation_Learning
Classification
Language_Model
Quantitative
Recognition
-
Constructing Hierarchical Q&A Datasets for Video Story Understanding
arXiv_AI
arXiv_AI
Video_Caption
Knowledge
-
TSM: Temporal Shift Module for Efficient Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Relation
Recognition
-
M-VAD Names: a Dataset for Video Captioning with Naming
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Object_Detection
Caption
CNN
RNN
Language_Model
Detection
-
4D Generic Video Object Proposals
arXiv_CV
arXiv_CV
Video_Caption
Segmentation
-
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition
arXiv_CV
arXiv_CV
Adversarial
Video_Caption
Action_Recognition
Inference
Classification
Recognition
-
Amortized Context Vector Inference for Sequence-to-Sequence Networks
arXiv_CV
arXiv_CV
Video_Caption
Attention
Summarization
Caption
Inference
-
End-to-End Video Captioning with Multitask Reinforcement Learning
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Reinforcement_Learning
Caption
CNN
RNN
-
Not All Words are Equal: Video-specific Information Loss for Video Captioning
arXiv_CV
arXiv_CV
Salient
Video_Caption
Attention
Caption
Relation
Recognition
-
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Attention
Caption
RNN
Language_Model
-
Joint Event Detection and Description in Continuous Video Streams
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
Detection
Relation
-
Adversarial Inference for Multi-Sentence Video Description
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Video_Caption
GAN
Caption
Inference
-
An Attempt towards Interpretable Audio-Visual Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
-
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection
arXiv_CV
arXiv_CV
Video_Caption
Object_Detection
Tracking
Detection
-
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Caption
Embedding
-
Middle-Out Decoding
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
-
Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings
arXiv_CV
arXiv_CV
Image_Caption
Re-identification
Video_Caption
Person_Re-identification
Caption
Embedding
RNN
-
Cross-Modal and Hierarchical Modeling of Video and Text
arXiv_CV
arXiv_CV
Video_Caption
Caption
Action_Recognition
Embedding
Recognition
-
Vector Learning for Cross Domain Representations
arXiv_CV
arXiv_CV
Image_Caption
Adversarial
Video_Caption
GAN
Caption
-
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Caption
RNN
-
Hierarchical Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning
arXiv_CV
arXiv_CV
Video_Caption
Caption
NMT
Classification
Deep_Learning
VQA
-
Move Forward and Tell: A Progressive Generator of Video Descriptions
arXiv_CV
arXiv_CV
Video_Caption
Caption
Embedding
-
Predicting Visual Features from Text for Image and Video Caption Retrieval
arXiv_CV
arXiv_CV
Video_Caption
Caption
Embedding
CNN
-
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
RNN
Language_Model
Prediction
-
Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Inference
RNN
-
RUC+CMU: System Report for Dense Captioning Events in Videos
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
ECO: Efficient Convolutional Network for Online Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
Classification
Relation
-
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
arXiv_CV
arXiv_CV
Video_Caption
Caption
Video_Classification
Classification
-
Jointly Localizing and Describing Events for Dense Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
Optimization
Detection
-
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
-
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
Prediction
-
End-to-End Dense Video Captioning with Masked Transformer
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
-
Reconstruction Network for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Video Captioning via Hierarchical Reinforcement Learning
arXiv_CV
arXiv_CV
Video_Caption
Reinforcement_Learning
Caption
-
Attend and Interact: Higher-Order Object Interactions for Video Understanding
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Caption
Action_Recognition
Detection
Relation
Recognition
-
Excitation Backprop for RNNs
arXiv_CV
arXiv_CV
Salient
Video_Caption
Caption
Action_Recognition
RNN
Classification
Prediction
Recognition
-
Less Is More: Picking Informative Frames for Video Captioning
arXiv_CV
arXiv_CV
Salient
Video_Caption
Attention
Caption
-
Consensus-based Sequence Training for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Reinforcement_Learning
Caption
-
Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition
arXiv_CV
arXiv_CV
Video_Caption
Object_Detection
Detection
-
Integrating both Visual and Audio Cues for Enhanced Video Caption
arXiv_CV
arXiv_CV
Video_Caption
Caption
Inference
-
Towards Automatic Learning of Procedures from Web Instructional Videos
arXiv_CV
arXiv_CV
Video_Caption
Segmentation
Caption
-
Adaptive Feature Abstraction for Translating Video to Text
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
CNN
Quantitative
-
Grounded Objects and Interactions for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Evaluation of Automatic Video Captioning Using Direct Assessment
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
RNN
-
Generating Video Descriptions with Topic Guidance
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
Prediction
-
Video Captioning with Guidance of Multimodal Latent Topics
arXiv_CV
arXiv_CV
Video_Caption
Caption
Prediction
-
Multi-Task Video Captioning with Video and Entailment Generation
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Caption
Prediction
-
Reinforced Video Captioning with Entailment Rewards
arXiv_CV
arXiv_CV
Video_Caption
Reinforcement_Learning
Caption
-
Supervising Neural Attention Models for Video Captioning by Human Gaze Data
arXiv_CV
arXiv_CV
Video_Caption
Attention
Tracking
Caption
Prediction
-
VideoMCC: a New Benchmark for Video Comprehension
arXiv_CV
arXiv_CV
Video_Caption
GAN
Caption
Quantitative
-
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
RNN
Language_Model
-
Temporal Tessellation: A Unified Approach for Video Analysis
arXiv_CV
arXiv_CV
Video_Caption
Summarization
Caption
Prediction
Detection
-
Top-down Visual Saliency Guided by Captions
arXiv_CV
arXiv_CV
Salient
Video_Caption
Attention
Caption
Classification
-
Hierarchical Boundary-Aware Neural Encoder for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
RNN
-
Weakly Supervised Dense Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Weakly_Supervised
Caption
CNN
Language_Model
-
Improving Interpretability of Deep Neural Networks with Semantic Information
arXiv_CV
arXiv_CV
Video_Caption
Caption
Action_Recognition
Prediction
Recognition
-
Recurrent Memory Addressing for describing videos
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Embedding
Memory_Networks
-
Leveraging Video Descriptions to Learn Video Question Answering
arXiv_CV
arXiv_CV
Video_Caption
QA
VQA
-
Video Captioning with Multi-Faceted Attention
arXiv_CV
arXiv_CV
Salient
Video_Caption
Attention
Face
Caption
RNN
-
Bidirectional Multirate Reconstruction for Temporal Modeling in Videos
arXiv_CV
arXiv_CV
Video_Caption
Caption
Detection
-
Video Captioning with Transferred Semantic Attributes
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
RNN
-
Multimodal Memory Modelling for Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
CNN
RNN
Deep_Learning
-
Spatio-Temporal Attention Models for Grounded Video Captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Image_Classification
Classification
Recognition
-
Deep Learning for Video Classification and Captioning
arXiv_CV
arXiv_CV
Review
Video_Caption
Caption
Video_Classification
Classification
Deep_Learning
-
Oracle performance for visual captioning
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Language_Model
-
Title Generation for User Generated Videos
arXiv_CV
arXiv_CV
Salient
Video_Caption
Object_Detection
Attention
Caption
Prediction
Detection
-
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation
arXiv_CV
arXiv_CV
Video_Caption
Caption
-
Bidirectional Long-Short Term Memory for Video Description
arXiv_CV
arXiv_CV
Video_Caption
Sparse
Knowledge
Attention
Caption
CNN
RNN
Language_Model
-
Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network
arXiv_CV
arXiv_CV
Video_Caption
Caption
RNN
-
Beyond Caption To Narrative: Video Captioning With Multiple Sentences
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
-
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Knowledge
Caption
Recognition
-
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
arXiv_CV
arXiv_CV
Video_Caption
Attention
Caption
Embedding
RNN
-
Delving Deeper into Convolutional Networks for Learning Video Representations
arXiv_CV
arXiv_CV
Video_Caption
Sparse
Caption
Action_Recognition
CNN
Recognition
-
A Restricted Visual Turing Test for Deep Scene and Event Understanding
arXiv_CV
arXiv_CV
Video_Caption
Knowledge
Face
Ontology
Caption
Inference
VQA
-
Video captioning with recurrent networks based on frame- and video-level features and visual content classification
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
RNN
Classification
Language_Model
-
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
arXiv_CV
arXiv_CV
Video_Caption
Caption
CNN
Image_Classification
Inference
Classification
Deep_Learning
-
Sequence to Sequence -- Video to Text
arXiv_CV
arXiv_CV
Image_Caption
Video_Caption
Caption
RNN
Language_Model