Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

2018-11-29

Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid

arXiv_CV

arXiv_CV Knowledge Segmentation Pose_Estimation NAS Reinforcement_Learning CNN Image_Classification Semantic_Segmentation Classification Language_Model Prediction Quantitative
Abstract

Automated design of neural network architectures tailored for a specific task is an extremely promising, albeit inherently difficult, avenue to explore. While most results in this domain have been achieved on image classification and language modelling problems, here we concentrate on dense per-pixel tasks, in particular, semantic image segmentation using fully convolutional networks. In contrast to the aforementioned areas, the design choices of a fully convolutional network require several changes, ranging from the sort of operations that need to be used - e.g., dilated convolutions - to a solving of a more difficult optimisation problem. In this work, we are particularly interested in searching for high-performance compact segmentation architectures, able to run in real-time using limited resources. To achieve that, we intentionally over-parameterise the architecture during the training time via a set of auxiliary cells that provide an intermediate supervisory signal and can be omitted during the evaluation phase. The design of the auxiliary cell is emitted by a controller, a neural network with the fixed structure trained using reinforcement learning. More crucially, we demonstrate how to efficiently search for these architectures within limited time and computational budgets. In particular, we rely on a progressive strategy that terminates non-promising architectures from being further trained, and on Polyak averaging coupled with knowledge distillation to speed-up the convergence. Quantitatively, in 8 GPU-days our approach discovers a set of architectures performing on-par with state-of-the-art among compact models on the semantic segmentation, pose estimation and depth prediction tasks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1810.10804

PDF

https://arxiv.org/pdf/1810.10804
Read All
Visual Question Answering as Reading Comprehension

2018-11-29

Hui Li, Peng Wang, Chunhua Shen, Anton van den Hengel

arXiv_CV

arXiv_CV Knowledge QA VQA
Abstract

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the form of text. Current methods jointly embed both the visual information and the textual feature into the same space. However, how to model the complex interactions between the two different modalities is not an easy task. In contrast to struggling on multimodal feature fusion, in this paper, we propose to unify all the input information by natural language so as to convert VQA into a machine reading comprehension problem. With this transformation, our method not only can tackle VQA datasets that focus on observation based questions, but can also be naturally extended to handle knowledge-based VQA which requires to explore large-scale external knowledge base. It is a step towards being able to exploit large volumes of text and natural language processing techniques to address VQA problem. Two types of models are proposed to deal with open-ended VQA and multiple-choice VQA respectively. We evaluate our models on three VQA benchmarks. The comparable performance with the state-of-the-art demonstrates the effectiveness of the proposed method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.11903

PDF

https://arxiv.org/pdf/1811.11903
Read All
The Breakthrough Listen Search for Intelligent Life: Wide-bandwidth Digital Instrumentation for the CSIRO Parkes 64-m Telescope

2018-11-28

Danny C. Price, David H. E. MacMahon, Matt Lebofsky, Steve Croft, David DeBoer, J. Emilio Enriquez, Griffin S. Foster, Vishal Gajjar, Nectaria Gizani, Greg Hellbourg, Howard Isaacson, Andrew P. V. Siemion, Dan Werthimer, James A. Green, Shaun Amy, Lewis Ball, Douglas C.-J. Bock, Dan Craig, Philip G. Edwards, Andrew Jameson, Stacy Mader, Brett Preisig, Mal Smith, John Reynolds, John Sarkissian

arXiv_CV

arXiv_CV
Abstract

Breakthrough Listen is a ten-year initiative to search for signatures of technologies created by extraterrestrial civilizations at radio and optical wavelengths. Here, we detail the digital data recording system deployed for Breakthrough Listen observations at the 64-m aperture CSIRO Parkes Telescope in New South Wales, Australia. The recording system currently implements two recording modes: a dual-polarization, 1.125 GHz bandwidth mode for single beam observations, and a 26-input, 308-MHz bandwidth mode for the 21-cm multibeam receiver. The system is also designed to support a 3 GHz single-beam mode for the forthcoming Parkes ultra-wideband feed. In this paper, we present details of the system architecture, provide an overview of hardware and software, and present initial performance results.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.04571

PDF

https://arxiv.org/pdf/1804.04571
Read All
Towards Task Understanding in Visual Settings

2018-11-28

Sebastin Santy, Wazeer Zulfikar, Rishabh Mehrotra, Emine Yilmaz

arXiv_CV

arXiv_CV Image_Caption Ontology Text_Generation Caption CNN
Abstract

We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external hierarchical task ontology to produce task descriptions from input images. Detailed experiments highlight the efficacy of the extracted descriptions, which could potentially find their way in many applications, including image alt text generation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.11833

PDF

https://arxiv.org/pdf/1811.11833
Read All
Partially-Supervised Image Captioning

2018-11-28

Peter Anderson, Stephen Gould, Mark Johnson

arXiv_CV

arXiv_CV Image_Caption Object_Detection Caption RNN Detection
Abstract

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a much larger number and variety of visual concepts must be understood. To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets. Since image labels and object classes can be interpreted as partial captions, we formulate this problem as learning from partially-specified sequence data. We then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which we represent using finite state automata. In the context of image captioning, our method lifts the restriction that previously required image captioning models to be trained on paired image-sentence corpora only, or otherwise required specialized model architectures to take advantage of alternative data modalities. Applying our approach to an existing neural captioning model, we achieve state of the art results on the novel object captioning task using the COCO dataset. We further show that we can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.06004

PDF

https://arxiv.org/pdf/1806.06004
Read All
Large Scale Audio-Visual Video Analytics Platform for Forensic Investigations of Terroristic Attacks

2018-11-28

Alexander Schindler, Martin Boyer, Andrew Lindley, David Schreiber, Thomas Philipp

arXiv_CV

arXiv_CV Object_Detection Tracking Detection
Abstract

The forensic investigation of a terrorist attack poses a huge challenge to the investigative authorities, as several thousand hours of video footage need to be spotted. To assist law enforcement agencies (LEA) in identifying suspects and securing evidences, we present a platform which fuses information of surveillance cameras and video uploads from eyewitnesses. The platform integrates analytical modules for different input-modalities on a scalable architecture. Videos are analyzed according their acoustic and visual content. Specifically, Audio Event Detection is applied to index the content according to attack-specific acoustic concepts. Audio similarity search is utilized to identify similar video sequences recorded from different perspectives. Visual object detection and tracking are used to index the content according to relevant concepts. The heterogeneous results of the analytical modules are fused into a distributed index of visual and acoustic concepts to facilitate rapid start of investigations, following traits and investigating witness reports.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.11623

PDF

https://arxiv.org/pdf/1811.11623
Read All
Semi-supervised learning with Bidirectional GANs

2018-11-28

Maciej Zamorski, Maciej Zięba

arXiv_CV

arXiv_CV Image_Retrieval Adversarial GAN Embedding Classification
Abstract

In this work we introduce a novel approach to train Bidirectional Generative Adversarial Model (BiGAN) in a semi-supervised manner. The presented method utilizes triplet loss function as an additional component of the objective function used to train discriminative data representation in the latent space of the BiGAN model. This representation can be further used as a seed for generating artificial images, but also as a good feature embedding for classification and image retrieval tasks. We evaluate the quality of the proposed method in the two mentioned challenging tasks using two benchmark datasets: CIFAR10 and SVHN.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.11426

PDF

https://arxiv.org/pdf/1811.11426
Read All
General-to-Detailed GAN for Infrequent Class Medical Images

2018-11-28

Tatsuki Koga, Naoki Nonaka, Jun Sakuma, Jun Seita

arXiv_CV

arXiv_CV Adversarial GAN Deep_Learning
Abstract

Deep learning has significant potential for medical imaging. However, since the incident rate of each disease varies widely, the frequency of classes in a medical image dataset is imbalanced, leading to poor accuracy for such infrequent classes. One possible solution is data augmentation of infrequent classes using synthesized images created by Generative Adversarial Networks (GANs), but conventional GANs also require certain amount of images to learn. To overcome this limitation, here we propose General-to-detailed GAN (GDGAN), serially connected two GANs, one for general labels and the other for detailed labels. GDGAN produced diverse medical images, and the network trained with an augmented dataset outperformed other networks using existing methods with respect to Area-Under-Curve (AUC) of Receiver Operating Characteristic (ROC) curve.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.01690

PDF

https://arxiv.org/pdf/1812.01690
Read All
Relational dynamic memory networks

2018-11-28

Trang Pham, Truyen Tran, Svetha Venkatesh

arXiv_CV

arXiv_CV Dynamic_Memory_Network Prediction Relation Memory_Networks
Abstract

Neural networks excel in detecting regular patterns but are less successful in representing and manipulating complex data structures, possibly due to the lack of an external memory. This has led to the recent development of a new line of architectures known as Memory-Augmented Neural Networks (MANNs), each of which consists of a neural network that interacts with an external memory matrix. However, this RAM-like memory matrix is unstructured and thus does not naturally encode structured objects. Here we design a new MANN dubbed Relational Dynamic Memory Network (RMDN) to bridge the gap. Like existing MANNs, RMDN has a neural controller but its memory is structured as multi-relational graphs. RMDN uses the memory to represent and manipulate graph-structured data in response to query; and as a neural network, RMDN is trainable from labeled data. Thus RMDN learns to answer queries about a set of graph-structured objects without explicit programming. We evaluate the capability of RMDN on several important prediction problems, including software vulnerability, molecular bioactivity and chemical-chemical interaction. Results demonstrate the efficacy of the proposed model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1808.04247

PDF

https://arxiv.org/pdf/1808.04247
Read All
Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection

2018-11-28

Hongyu Xu, Xutao Lv, Xiaoyu Wang, Zhou Ren, Rama Chellappa

arXiv_CV

arXiv_CV Object_Detection Segmentation Deep_Learning Detection
Abstract

In this paper, we propose a novel object detection algorithm named “Deep Regionlets” by integrating deep neural networks and conventional detection schema for accurate generic object detection. Motivated by the advantages of regionlets on modeling object deformation and multiple aspect ratios, we incorporate regionlets into an end-to-end trainable deep learning framework. The deep regionlets framework consists of a region selection network and a deep regionlet learning module. Specifically, given a detection bounding box proposal, the region selection network provides guidance on where to select regions from which features can be learned from. The regionlet learning module focuses on local feature selection and transformation to alleviate the effects of appearance variations. To this end, we first realize non-rectangular region selection within the detection framework to accommodate variations in object appearance. Moreover, we design a “gating network” within the regionlet leaning module to enable soft regionlet selection and pooling. The Deep Regionlets framework is trained end-to-end without additional efforts. We present the results of ablation studies and extensive experiments on PASCAL VOC and Microsoft COCO datasets. The proposed algorithm outperforms state-of-the-art algorithms, such as RetinaNet and Mask R-CNN, even without additional segmentation labels.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.11318

PDF

https://arxiv.org/pdf/1811.11318
Read All
Intra-class Variation Isolation in Conditional GANs

2018-11-27

Richard T. Marriott, Sami Romdhani, Liming Chen

arXiv_CV

arXiv_CV Adversarial GAN Face Quantitative
Abstract

Current state-of-the-art conditional generative adversarial networks (C-GANs) require strong supervision via labeled datasets in order to generate images with continuously adjustable, disentangled semantics. In this paper we introduce a new formulation of the C-GAN that is able to learn realistic models with continuous, semantically meaningful input parameters and which has the advantage of requiring only the weak supervision of binary attribute labels. We coin the method intra-class variation isolation (IVI) and the resulting network the IVI-GAN. The method allows continuous control over the attributes in synthesised images where precise labels are not readily available. For example, given only labels found using a simple classifier of ambient / non-ambient lighting in images, IVI has enabled us to learn a generative face-image model with controllable lighting that is disentangled from other factors in the synthesised images, such as the identity. We evaluate IVI-GAN on the CelebA and CelebA-HQ datasets, learning to disentangle attributes such as lighting, pose, expression and age, and provide a quantitative comparison of IVI-GAN with a classical continuous C-GAN.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.11296

PDF

https://arxiv.org/pdf/1811.11296
Read All
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

2018-11-27

Zheng Zhang, Dazhi Cheng, Xizhou Zhu, Stephen Lin, Jifeng Dai

arXiv_CV

arXiv_CV Video_Caption Object_Detection Tracking Detection
Abstract

Accurate detection and tracking of objects is vital for effective video understanding. In previous work, the two tasks have been combined in a way that tracking is based heavily on detection, but the detection benefits marginally from the tracking. To increase synergy, we propose to more tightly integrate the tasks by conditioning the object detection in the current frame on tracklets computed in prior frames. With this approach, the object detection results not only have high detection responses, but also improved coherence with the existing tracklets. This greater coherence leads to estimated object trajectories that are smoother and more stable than the jittered paths obtained without tracklet-conditioned detection. Over extensive experiments, this approach is shown to achieve state-of-the-art performance in terms of both detection and tracking accuracy, as well as noticeable improvements in tracking stability.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.11167

PDF

https://arxiv.org/pdf/1811.11167
Read All
Class-Distinct and Class-Mutual Image Generation with GANs

2018-11-27

Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

We describe a new problem called class-distinct and class-mutual (DM) image generation. Typically in class-conditional image generation, it is assumed that there are no intersections between classes, and a generative model is optimized to fit discrete class labels. However, in real-world scenarios, it is often required to handle data in which class boundaries are ambiguous or unclear. For example, data crawled from the web tend to contain mislabeled data resulting from confusion. Given such data, our goal is to construct a generative model that can be controlled for class specificity, which we employ to selectively generate class-distinct and class-mutual images in a controllable manner. To achieve this, we propose novel families of generative adversarial networks (GANs) called class-mixture GAN (CMGAN) and class-posterior GAN (CPGAN). In these new networks, we redesign the generator prior and the objective function in auxiliary classifier GAN (AC-GAN), then extend these to class-mixture and arbitrary class-overlapping settings. In addition to an analysis from an information theory perspective, we empirically demonstrate the effectiveness of our proposed models for various class-overlapping settings (including synthetic to real-world settings) and tasks (i.e., image generation and image-to-image translation).

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.11163

PDF

https://arxiv.org/pdf/1811.11163
Read All
Effect of Tensile Strain in GaN Layer on the Band Offsets and 2DEG Density in AlGaN/GaN Heterostructures

2018-11-27

Mihir Date, Sudipta Mukherjee, Joydeep Ghosh, Dipankar Saha, Swaroop Ganguly, Apurba Laha

arXiv_CV

arXiv_CV GAN Face
Abstract

We have addressed the existing ambiguity regarding the effect of process-induced strain in the underlying GaN layer on AlGaN/GaN heterostructure properties. The bandgaps and offsets for AlGaN on strained GaN are first computed using a cubic interpolation scheme within an empirical tight-binding framework. These are then used to calculate the polarization charge and two-dimensional electron gas density. Our bandstructure calculations show that it is not possible to induce any significant change in band offsets through strain in the GaN layer. The charge-density calculations indicate that such strain can, however, modulate the polarization charge and thereby enhance the 2DEG density at the AlGaN/GaN hetero-interface substantially, by as much as 25% for low Al mole fraction.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.03419

PDF

https://arxiv.org/pdf/1712.03419
Read All
Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

2018-11-27

Yusuke Niitani, Takuya Akiba, Tommi Kerola, Toru Ogawa, Shotaro Sano, Shuji Suzuki

arXiv_CV

arXiv_CV Object_Detection Sparse Detection Relation
Abstract

Efficient and reliable methods for training of object detectors are in higher demand than ever, and more and more data relevant to the field is becoming available. However, large datasets like Open Images Dataset v4 (OID) are sparsely annotated, and some measure must be taken in order to ensure the training of a reliable detector. In order to take the incompleteness of these datasets into account, one possibility is to use pretrained models to detect the presence of the unverified objects. However, the performance of such a strategy depends largely on the power of the pretrained model. In this study, we propose part-aware sampling, a method that uses human intuition for the hierarchical relation between objects. In terse terms, our method works by making assumptions like “a bounding box for a car should contain a bounding box for a tire”. We demonstrate the power of our method on OID and compare the performance against a method based on a pretrained model. Our method also won the first and second place on the public and private test sets of the Google AI Open Images Competition 2018.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.10862

PDF

https://arxiv.org/pdf/1811.10862
Read All
Feature-Fused SSD: Fast Detection for Small Objects

2018-11-27

Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu

arXiv_CV

arXiv_CV Object_Detection CNN Detection
Abstract

Small objects detection is a challenging task in computer vision due to its limited resolution and information. In order to solve this problem, the majority of existing methods sacrifice speed for improvement in accuracy. In this paper, we aim to detect small objects at a fast speed, using the best object detector Single Shot Multibox Detector (SSD) with respect to accuracy-vs-speed trade-off as base architecture. We propose a multi-level feature fusion method for introducing contextual information in SSD, in order to improve the accuracy for small objects. In detailed fusion operation, we design two feature fusion modules, concatenation module and element-sum module, different in the way of adding contextual information. Experimental results show that these two fusion modules obtain higher mAP on PASCALVOC2007 than baseline SSD by 1.6 and 1.7 points respectively, especially with 2-3 points improvement on some smallobjects categories. The testing speed of them is 43 and 40 FPS respectively, superior to the state of the art Deconvolutional single shot detector (DSSD) by 29.4 and 26.4 FPS. Code is available at this https URL. Keywords: small object detection, feature fusion, real-time, single shot multi-box detector

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.05054

PDF

https://arxiv.org/pdf/1709.05054
Read All
GANsfer Learning: Combining labelled and unlabelled data for GAN based data augmentation

2018-11-26

Christopher Bowles, Roger Gunn, Alexander Hammers, Daniel Rueckert

arXiv_CV

arXiv_CV Segmentation GAN
Abstract

Medical imaging is a domain which suffers from a paucity of manually annotated data for the training of learning algorithms. Manually delineating pathological regions at a pixel level is a time consuming process, especially in 3D images, and often requires the time of a trained expert. As a result, supervised machine learning solutions must make do with small amounts of labelled data, despite there often being additional unlabelled data available. Whilst of less value than labelled images, these unlabelled images can contain potentially useful information. In this paper we propose combining both labelled and unlabelled data within a GAN framework, before using the resulting network to produce images for use when training a segmentation network. We explore the task of deep grey matter multi-class segmentation in an AD dataset and show that the proposed method leads to a significant improvement in segmentation results, particularly in cases where the amount of labelled data is restricted. We show that this improvement is largely driven by a greater ability to segment the structures known to be the most affected by AD, thereby demonstrating the benefits of exposing the system to more examples of pathological anatomical variation. We also show how a shift in domain of the training data from young and healthy towards older and more pathological examples leads to better segmentations of the latter cases, and that this leads to a significant improvement in the ability for the computed segmentations to stratify cases of AD.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.10669

PDF

https://arxiv.org/pdf/1811.10669
Read All
GP-CNAS: Convolutional Neural Network Architecture Search with Genetic Programming

2018-11-26

Yiheng Zhu, Yichen Yao, Zili Wu, Yujie Chen, Guozheng Li, Haoyuan Hu, Yinghui Xu

arXiv_CV

arXiv_CV Speech_Recognition NAS CNN Recognition
Abstract

Convolutional neural networks (CNNs) are effective at solving difficult problems like visual recognition, speech recognition and natural language processing. However, performance gain comes at the cost of laborious trial-and-error in designing deeper CNN architectures. In this paper, a genetic programming (GP) framework for convolutional neural network architecture search, abbreviated as GP-CNAS, is proposed to automatically search for optimal CNN architectures. GP-CNAS encodes CNNs as trees where leaf nodes (GP terminals) are selected residual blocks and non-leaf nodes (GP functions) specify the block assembling procedure. Our tree-based representation enables easy design and flexible implementation of genetic operators. Specifically, we design a dynamic crossover operator that strikes a balance between exploration and exploitation, which emphasizes CNN complexity at early stage and CNN diversity at later stage. Therefore, the desired CNN architecture with balanced depth and width can be found within limited trials. Moreover, our GP-CNAS framework is highly compatible with other manually-designed and NAS-generated block types as well. Experimental results on the CIFAR-10 dataset show that GP-CNAS is competitive among the state-of-the-art automatic and semi-automatic NAS algorithms.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.07611

PDF

https://arxiv.org/pdf/1812.07611
Read All
Robust Cross-View Gait Identification with Evidence: A Discriminant Gait GAN Approach on 10000 People

2018-11-26

BingZhang Hu, Yan Gao, Yu Guan, Yang Long, Nicholas Lane, Thomas Ploetz

arXiv_CV

arXiv_CV Adversarial GAN Face Deep_Learning Recognition
Abstract

Gait is an important biometric trait for surveillance and forensic applications, which can be used to identify individuals at a large distance through CCTV cameras. However, it is very difficult to develop robust automated gait recognition systems, since gait may be affected by many covariate factors such as clothing, walking surface, walking speed, camera view angle, etc. Out of them, large view angle was deemed as the most challenging factor since it may alter the overall gait appearance substantially. Recently, some deep learning approaches (such as CNNs) have been employed to extract view-invariant features, and achieved encouraging results on small datasets. However, they do not scale well to large dataset, and the performance decreases significantly w.r.t. number of subjects, which is impractical to large-scale surveillance applications. To address this issue, in this work we propose a Discriminant Gait Generative Adversarial Network (DiGGAN) framework, which not only can learn view-invariant gait features for cross-view gait recognition tasks, but also can be used to reconstruct the gait templates in all views — serving as important evidences for forensic applications. We evaluated our DiGGAN framework on the world’s largest multi-view OU-MVLP dataset (which includes more than 10000 subjects), and our method outperforms state-of-the-art algorithms significantly on various cross-view gait identification scenarios (e.g., cooperative/uncooperative mode). Our DiGGAN framework also has the best results on the popular CASIA-B dataset, and it shows great generalisation capability across different datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.10493

PDF

https://arxiv.org/pdf/1811.10493
Read All
FutureGAN: Anticipating the Future Frames of Video Sequences using Spatio-Temporal 3d Convolutions in Progressively Growing GANs

2018-11-26

Sandra Aigner, Marco Körner

arXiv_CV

arXiv_CV GAN Prediction
Abstract

We introduce a new encoder-decoder GAN model, FutureGAN, that predicts future frames of a video sequence conditioned on a sequence of past frames. During training, the networks solely receive the raw pixel values as an input, without relying on additional constraints or dataset specific conditions. To capture both the spatial and temporal components of a video sequence, spatio-temporal 3d convolutions are used in all encoder and decoder modules. Further, we utilize concepts of the existing progressively growing GAN (PGGAN) that achieves high-quality results on generating high-resolution single images. The FutureGAN model extends this concept to the complex task of video prediction. We conducted experiments on three different datasets, MovingMNIST, KTH Action, and Cityscapes. Our results show that the model learned representations to transform the information of an input sequence into a plausible future sequence effectively for all three datasets. The main advantage of the FutureGAN framework is that it is applicable to various different datasets without additional changes, whilst achieving stable results that are competitive to the state-of-the-art in video prediction. Our code is available at this https URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1810.01325

PDF

https://arxiv.org/pdf/1810.01325
Read All
A Survey on Joint Object Detection and Pose Estimation using Monocular Vision

2018-11-26

Aniruddha V Patil, Pankaj Rabha

arXiv_CV

arXiv_CV Object_Detection Face Pose_Estimation Survey Deep_Learning Detection
Abstract

In this survey we present a complete landscape of joint object detection and pose estimation methods that use monocular vision. Descriptions of traditional approaches that involve descriptors or models and various estimation methods have been provided. These descriptors or models include chordiograms, shape-aware deformable parts model, bag of boundaries, distance transform templates, natural 3D markers and facet features whereas the estimation methods include iterative clustering estimation, probabilistic networks and iterative genetic matching. Hybrid approaches that use handcrafted feature extraction followed by estimation by deep learning methods have been outlined. We have investigated and compared, wherever possible, pure deep learning based approaches (single stage and multi stage) for this problem. Comprehensive details of the various accuracy measures and metrics have been illustrated. For the purpose of giving a clear overview, the characteristics of relevant datasets are discussed. The trends that prevailed from the infancy of this problem until now have also been highlighted.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.10216

PDF

https://arxiv.org/pdf/1811.10216
Read All
Learning to discover and localize visual objects with open vocabulary

2018-11-25

Keren Ye, Mingda Zhang, Wei Li, Danfeng Qin, Adriana Kovashka, Jesse Berent

arXiv_CV

arXiv_CV Object_Detection Weakly_Supervised Caption Detection Relation
Abstract

To alleviate the cost of obtaining accurate bounding boxes for training today’s state-of-the-art object detection models, recent weakly supervised detection work has proposed techniques to learn from image-level labels. However, requiring discrete image-level labels is both restrictive and suboptimal. Real-world “supervision” usually consists of more unstructured text, such as captions. In this work we learn association maps between images and captions. We then use a novel objectness criterion to rank the resulting candidate boxes, such that high-ranking boxes have strong gradients along all edges. Thus, we can detect objects beyond a fixed object category vocabulary, if those objects are frequent and distinctive enough. We show that our objectness criterion improves the proposed bounding boxes in relation to prior weakly supervised detection methods. Further, we show encouraging results on object detection from image-level captions only.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.10080

PDF

https://arxiv.org/pdf/1811.10080
Read All
Dissimilarity Coefficient based Weakly Supervised Object Detection

2018-11-25

Aditya Arun, C.V. Jawahar, M. Pawan Kumar

arXiv_CV

arXiv_CV Object_Detection Weakly_Supervised Optimization Deep_Learning Prediction Detection
Abstract

We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the efficacy of our proposed approach.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.10016

PDF

https://arxiv.org/pdf/1811.10016
Read All
A Framework of Transfer Learning in Object Detection for Embedded Systems

2018-11-24

Ioannis Athanasiadis, Panagiotis Mousouliotis, Loukas Petrou

arXiv_CV

arXiv_CV Object_Detection Transfer_Learning Optimization Detection Recognition
Abstract

Transfer learning is one of the subjects undergoing intense study in the area of machine learning. In object recognition and object detection there are known experiments for the transferability of parameters, but not for neural networks which are suitable for object detection in real time embedded applications, such as the SqueezeDet neural network. We use transfer learning to accelerate the training of SqueezeDet to a new group of classes. Also, experiments are conducted to study the transferability and co-adaptation phenomena introduced by the transfer learning process. To accelerate training, we propose a new implementation of the SqueezeDet training which provides a faster pipeline for data processing and achieves 1.8 times speedup compared to the initial implementation. Finally, we created a mechanism for automatic hyperparameter optimization using an empirical method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.04863

PDF

https://arxiv.org/pdf/1811.04863
Read All
Object Detection based Deep Unsupervised Hashing

2018-11-24

Rong-Cheng Tu, Xian-Ling Mao, Bo-Si Feng, Bing-Bing Bian, Yu-shu Ying

arXiv_CV

arXiv_CV Image_Retrieval Object_Detection Detection
Abstract

Recently, similarity-preserving hashing methods have been extensively studied for large-scale image retrieval. Compared with unsupervised hashing, supervised hashing methods for labeled data have usually better performance by utilizing semantic label information. Intuitively, for unlabeled data, it will improve the performance of unsupervised hashing methods if we can first mine some supervised semantic ‘label information’ from unlabeled data and then incorporate the ‘label information’ into the training process. Thus, in this paper, we propose a novel Object Detection based Deep Unsupervised Hashing method (ODDUH). Specifically, a pre-trained object detection model is utilized to mining supervised ‘label information’, which is used to guide the learning process to generate high-quality hash codes.Extensive experiments on two public datasets demonstrate that the proposed method outperforms the state-of-the-art unsupervised hashing methods in the image retrieval task.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09822

PDF

https://arxiv.org/pdf/1811.09822
Read All
Senti-Attend: Image Captioning using Sentiment and Attention

2018-11-24

Omid Mohamad Nezami, Mark Dras, Stephen Wan, Cecile Paris

arXiv_CV

arXiv_CV Image_Caption Sentiment Attention Caption
Abstract

There has been much recent work on image captioning models that describe the factual aspects of an image. Recently, some models have incorporated non-factual aspects into the captions, such as sentiment or style. However, such models typically have difficulty in balancing the semantic aspects of the image and the non-factual dimensions of the caption; in addition, it can be observed that humans may focus on different aspects of an image depending on the chosen sentiment or style of the caption. To address this, we design an attention-based model to better add sentiment to image captions. The model embeds and learns sentiment with respect to image-caption data, and uses both high-level and word-level sentiment information during the learning process. The model outperforms the state-of-the-art work in image captioning with sentiment using standard evaluation metrics. An analysis of generated captions also shows that our model does this by a better selection of the sentiment-bearing adjectives and adjective-noun pairs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09789

PDF

https://arxiv.org/pdf/1811.09789
Read All
On Hallucinating Context and Background Pixels from a Face Mask using Multi-scale GANs

2018-11-24

Sandipan Banerjee, Walter J. Scheirer, Kevin W. Bowyer, Patrick J. Flynn

arXiv_CV

arXiv_CV Adversarial GAN Face Recognition
Abstract

We propose a multi-scale GAN model to hallucinate realistic context (forehead, hair, neck, clothes) and background pixels automatically from a single input face mask. Instead of swapping a face on to an existing picture, our model directly generates realistic context and background pixels based on the features of the provided face mask. Unlike face inpainting algorithms, it can generate realistic hallucinations even for a large number of missing pixels. Our model is composed of a cascaded network of GAN blocks, each tasked with hallucination of missing pixels at a particular resolution while guiding the synthesis process of the next GAN block. The hallucinated full face image is made photo-realistic by using a combination of reconstruction, perceptual, adversarial and identity preserving losses at each block of the network. With a set of extensive experiments, we demonstrate the effectiveness of our model in hallucinating context and background pixels from face masks varying in facial pose, expression and lighting, collected from multiple datasets subject disjoint with our training data. We also compare our method with two popular face swapping and face completion methods in terms of visual quality and recognition performance. Additionally, we analyze our cascaded pipeline and compare it with the recently proposed progressive growing of GANs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.07104

PDF

https://arxiv.org/pdf/1811.07104
Read All
Survey on Secure Search Over Encrypted Data on the Cloud

2018-11-24

Hoang Pham, Jason Woodworth, Mohsen Amini Salehi

arXiv_CV

arXiv_CV Survey
Abstract

Cloud computing has become a potential resource for businesses and individuals to outsource their data to remote but highly accessible servers. However, potentials of the cloud services have not been fully unleashed due to users’ concerns about security and privacy of their data in the cloud. User-side encryption techniques can be employed to mitigate the security concerns. Nonetheless, once the data in encrypted, no processing (e.g., searching) can be performed on the outsourced data. Searchable Encryption (SE) techniques have been widely studied to enable searching on the data while they are encrypted. These techniques enable various types of search on the encrypted data and offer different levels of security. In addition, although these techniques enable different search types and vary in details, they share similarities in their components and architectures. In this paper, we provide a comprehensive survey on different secure search techniques; a high-level architecture for these systems, and an analysis of their performance and security level.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09767

PDF

https://arxiv.org/pdf/1811.09767
Read All
A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting

2018-11-24

Sisheng Liang, Long Nguyen, Fang Jin

arXiv_CV

arXiv_CV RNN Prediction
Abstract

Precisely forecasting wind speed is essential for wind power producers and grid operators. However, this task is challenging due to the stochasticity of wind speed. To accurately predict short-term wind speed under uncertainties, this paper proposed a multi-variable stacked LSTMs model (MSLSTM). The proposed method utilizes multiple historical meteorological variables, such as wind speed, temperature, humidity, pressure, dew point and solar radiation to accurately predict wind speeds. The prediction performance is extensively assessed using real data collected in West Texas, USA. The experimental results show that the proposed MSLSTM can preferably capture and learn uncertainties while output competitive performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09735

PDF

https://arxiv.org/pdf/1811.09735
Read All
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning

2018-11-23

Xin Wang, Jiawei Wu, Da Zhang, Yu Su, William Yang Wang

arXiv_CV

arXiv_CV Video_Caption Knowledge Caption Embedding
Abstract

Although promising results have been achieved in video captioning, existing models are limited to the fixed inventory of activities in the training corpus, and do not generalize to open vocabulary scenarios. Here we introduce a novel task, zero-shot video captioning, that aims at describing out-of-domain videos of unseen activities. Videos of different activities usually require different captioning strategies in many aspects, i.e. word selection, semantic construction, and style expression etc, which poses a great challenge to depict novel activities without paired training data. But meanwhile, similar activities share some of those aspects in common. Therefore, We propose a principled Topic-Aware Mixture of Experts (TAMoE) model for zero-shot video captioning, which learns to compose different experts based on different topic embeddings, implicitly transferring the knowledge learned from seen activities to unseen ones. Besides, we leverage external topic-related text corpus to construct the topic embedding for each activity, which embodies the most relevant semantic vectors within the topic. Empirical results not only validate the effectiveness of our method in utilizing semantic knowledge for video captioning, but also show its strong generalization ability when describing novel activities.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.02765

PDF

https://arxiv.org/pdf/1811.02765
Read All
A Hierarchical Neural Network for Sequence-to-Sequences Learning

2018-11-23

Si Zuo, Zhimin Xu

arXiv_CL

arXiv_CL Attention NMT
Abstract

In recent years, the sequence-to-sequence learning neural networks with attention mechanism have achieved great progress. However, there are still challenges, especially for Neural Machine Translation (NMT), such as lower translation quality on long sentences. In this paper, we present a hierarchical deep neural network architecture to improve the quality of long sentences translation. The proposed network embeds sequence-to-sequence neural networks into a two-level category hierarchy by following the coarse-to-fine paradigm. Long sentences are input by splitting them into shorter sequences, which can be well processed by the coarse category network as the long distance dependencies for short sentences is able to be handled by network based on sequence-to-sequence neural network. Then they are concatenated and corrected by the fine category network. The experiments shows that our method can achieve superior results with higher BLEU(Bilingual Evaluation Understudy) scores, lower perplexity and better performance in imitating expression style and words usage than the traditional networks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09575

PDF

https://arxiv.org/pdf/1811.09575
Read All
Joint Neural Architecture Search and Quantization

2018-11-23

Yukang Chen, Gaofeng Meng, Qian Zhang, Xinbang Zhang, Liangchen Song, Shiming Xiang, Chunhong Pan

arXiv_CV

arXiv_CV NAS Deep_Learning
Abstract

Designing neural architectures is a fundamental step in deep learning applications. As a partner technique, model compression on neural networks has been widely investigated to gear the needs that the deep learning algorithms could be run with the limited computation resources on mobile devices. Currently, both the tasks of architecture design and model compression require expertise tricks and tedious trials. In this paper, we integrate these two tasks into one unified framework, which enables the joint architecture search with quantization (compression) policies for neural networks. This method is named JASQ. Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices. Technically, a multi-objective evolutionary search algorithm is introduced to search the models under the balance between model size and performance accuracy. In experiments, we find that our approach outperforms the methods that search only for architectures or only for quantization policies. 1) Specifically, given existing networks, our approach can provide them with learning-based quantization policies, and outperforms their 2 bits, 4 bits, 8 bits, and 16 bits counterparts. It can yield higher accuracies than the float models, for example, over 1.02% higher accuracy on MobileNet-v1. 2) What is more, under the balance between model size and performance accuracy, two models are obtained with joint search of architectures and quantization policies: a high-accuracy model and a small model, JASQNet and JASQNet-Small that achieves 2.97% error rate with 0.9 MB on CIFAR-10.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09426

PDF

https://arxiv.org/pdf/1811.09426
Read All
Query-Efficient GAN Based Black-Box Attack Against Sequence Based Machine and Deep Learning Classifiers

2018-11-23

Ishai Rosenberg, Asaf Shabtai, Yuval Elovici, Lior Rokach

arXiv_CV

arXiv_CV Adversarial Knowledge GAN Deep_Learning
Abstract

In this paper we present an efficient and generic black-box attack demonstrated against API call based machine learning malware classifiers. We generate adversarial examples combining sequences (API call sequences) and other features (e.g., printable strings) that will be misclassified by the classifier without affecting the malware functionality. Opposed to previous studies, our attack minimizes the number of target classifier queries and only requires access to the predicted label of the attacked model (without the confidence level). We evaluate the attack’s effectiveness against a variety of classifiers, including recurrent neural network variants, deep neural networks, support vector machines, and gradient-boosted decision trees. We show that the attack requires fewer queries and less knowledge about the attacked model’s architecture than other existing black-box attacks, making it practical for attacking cloud based models at a minimal cost. We also implement a software framework that can be used to recraft any malware binary so it will not be detected by classifiers, without access to the malware source code. Finally, we discuss the robustness of this attack to existing defense mechanisms.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.08778

PDF

https://arxiv.org/pdf/1804.08778
Read All
A Blended Human-Robot Shared Control Framework to Handle Drift and Latency

2018-11-23

Anas Abou Allaban, Velin Dimitrov, Taşkın Padır

arXiv_CV

arXiv_CV
Abstract

Maximizing the utility of human-robot teams in disaster response and search and rescue (SAR) missions remains to be a challenging problem. This is due to the dynamic, uncertain nature of the environment and the variability in cognitive performance of the human operators. By having an autonomous agent share control with the operator, we can achieve near-optimal performance by augmenting the operator’s input and compensate for the factors resulting in degraded performance. What this solution does not consider though is the human input latency and errors caused by potential hardware failures that can occur during task completion when operating in disaster response and SAR scenarios. In this paper, we propose the use of blended shared control (BSC) architecture to address these issues and investigate the architecture’s performance in constrained, dynamic environments with a differential drive robot that has input latency and erroneous odometry feedback. We conduct a validation study (n=12) for our control architecture and then a user study (n=14) in 2 different environments that are unknown to both the human operator and the autonomous agent. The results demonstrate that the BSC architecture can prevent collisions and enhance operator performance without the need of a complete transfer of control between the human operator and autonomous agent.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09382

PDF

https://arxiv.org/pdf/1811.09382
Read All
Transferable Interactiveness Prior for Human-Object Interaction Detection

2018-11-23

Yong-Lu Li, Siyuan Zhou, Xijie Huang, Liang Xu, Ze Ma, Hao-Shu Fang, Yan-Feng Wang, Cewu Lu

arXiv_CV

arXiv_CV Knowledge Inference Classification Detection
Abstract

Human-Object Interaction (HOI) Detection is an important problem to understand how humans interact with objects. In this paper, we explore Interactiveness Prior which indicates whether human and object interact with each other or not. We found that interactiveness prior can be learned across HOI datasets, regardless of HOI category settings. Our core idea is to exploit an Interactiveness Network to learn the general interactiveness prior from multiple HOI datasets and perform Non-Interaction Suppression before HOI classification in inference. On account of the generalization of interactiveness prior, interactiveness network is a transferable knowledge learner and can be cooperated with any HOI detection models to achieve desirable results. We extensively evaluate the proposed method on HICO-DET and V-COCO datasets. Our framework outperforms state-of-the-art HOI detection results by a great margin, verifying its efficacy and flexibility. Source codes and models will be made publicly available.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08264

PDF

https://arxiv.org/pdf/1811.08264
Read All
ChainGAN: A sequential approach to GANs

2018-11-22

Safwan Hossain, Kiarash Jamali, Yuchen Li, Frank Rudzicz

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

We propose a new architecture and training methodology for generative adversarial networks. Current approaches attempt to learn the transformation from a noise sample to a generated data sample in one shot. Our proposed generator architecture, called $\textit{ChainGAN}$, uses a two-step process. It first attempts to transform a noise vector into a crude sample, similar to a traditional generator. Next, a chain of networks, called $\textit{editors}$, attempt to sequentially enhance this sample. We train each of these units independently, instead of with end-to-end backpropagation on the entire chain. Our model is robust, efficient, and flexible as we can apply it to various network architectures. We provide rationale for our choices and experimentally evaluate our model, achieving competitive results on several datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08081

PDF

https://arxiv.org/pdf/1811.08081
Read All
Super Diffusion for Salient Object Detection

2018-11-22

Peng Jiang, Zhiyi Pan, Nuno Vasconcelos, Baoquan Chen, Jingliang Peng

arXiv_CV

arXiv_CV Salient Object_Detection Detection Relation
Abstract

One major branch of saliency object detection methods is diffusion-based which construct a graph model on a given image and diffuse seed saliency values to the whole graph by a diffusion matrix. While their performance is sensitive to specific feature spaces and scales used for the diffusion matrix definition, little work has been published to systematically promote the robustness and accuracy of salient object detection under the generic mechanism of diffusion. In this work, we firstly present a novel view of the working mechanism of the diffusion process based on mathematical analysis, which reveals that the diffusion process is actually computing the similarity of nodes with respect to the seeds based on diffusion maps. Following this analysis, we propose super diffusion, a novel inclusive learning-based framework for salient object detection, which makes the optimum and robust performance by integrating a large pool of feature spaces, scales and even features originally computed for non-diffusion-based salient object detection. A closed-form solution of the optimal parameters for the integration is determined through supervised learning. At the local level, we propose to promote each individual diffusion before the integration. Our mathematical analysis reveals the close relationship between saliency diffusion and spectral clustering. Based on this, we propose to re-synthesize each individual diffusion matrix from the most discriminative eigenvectors and the constant eigenvector (for saliency normalization). The proposed framework is implemented and experimented on prevalently used benchmark datasets, consistently leading to state-of-the-art performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09038

PDF

https://arxiv.org/pdf/1811.09038
Read All
Data Augmentation using Random Image Cropping and Patching for Deep CNNs

2018-11-22

Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara

arXiv_CV

arXiv_CV Image_Caption Regularization Caption CNN Classification
Abstract

Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study, we propose a new data augmentation technique called random image cropping and patching (RICAP) which randomly crops four images and patches them to create a new training image. Moreover, RICAP mixes the class labels of the four images, resulting in an advantage similar to label smoothing. We evaluated RICAP with current state-of-the-art CNNs (e.g., the shake-shake regularization model) by comparison with competitive data augmentation techniques such as cutout and mixup. RICAP achieves a new state-of-the-art test error of $2.19\%$ on CIFAR-10. We also confirmed that deep CNNs with RICAP achieve better results on classification tasks using CIFAR-100 and ImageNet and an image-caption retrieval task using Microsoft COCO.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09030

PDF

https://arxiv.org/pdf/1811.09030
Read All
A Subsampling Line-Search Method with Second-Order Results

2018-11-21

El-houcine Bergou, Youssef Diouane, Vyacheslav Kungurtsev, Clément W. Royer

arXiv_CV

arXiv_CV Optimization Deep_Learning
Abstract

In many contemporary optimization problems, such as hyperparameter tuning for deep learning architectures, it is computationally challenging or even infeasible to evaluate an entire function or its derivatives. This necessitates the use of stochastic algorithms that sample problem data, which can jeopardize the guarantees classically obtained through globalization techniques via a trust region or a line search. Using subsampled function values is particularly challenging for the latter strategy, that relies upon multiple evaluations. On top of that all, there has been an increasing interest for nonconvex formulations of data-related problems. For such instances, one aims at developing methods that converge to second-order stationary points, which is particularly delicate to ensure when one only accesses subsampled approximations of the objective and its derivatives. This paper contributes to this rapidly expanding field by presenting a stochastic algorithm based on negative curvature and Newton-type directions, computed for a subsampling model of the objective. A line-search technique is used to enforce suitable decrease for this model, and for a sufficiently large sample, a similar amount of reduction holds for the true objective. By using probabilistic reasoning, we can then obtain worst-case complexity guarantees for our framework, leading us to discuss appropriate notions of stationarity in a subsampling context. Our analysis, which we illustrate through real data experiments, encompasses the full sampled regime as a special case: it thus provides an insightful generalization of second-order line-search paradigms to subsampled settings.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1810.07211

PDF

https://arxiv.org/pdf/1810.07211
Read All
Resource Mention Extraction for MOOC Discussion Forums

2018-11-21

Ya-Hui An, Liangming Pan, Min-Yen Kan, Qiang Dong, Yan Fu

arXiv_CV

arXiv_CV Attention RNN
Abstract

In discussions hosted on discussion forums for MOOCs, references to online learning resources are often of central importance. They contextualize the discussion, anchoring the discussion participants’ presentation of the issues and their understanding. However they are usually mentioned in free text, without appropriate hyperlinking to their associated resource. Automated learning resource mention hyperlinking and categorization will facilitate discussion and searching within MOOC forums, and also benefit the contextualization of such resources across disparate views. We propose the novel problem of learning resource mention identification in MOOC forums. As this is a novel task with no publicly available data, we first contribute a large-scale labeled dataset, dubbed the Forum Resource Mention (FoRM) dataset, to facilitate our current research and future research on this task. We then formulate this task as a sequence tagging problem and investigate solution architectures to address the problem. Importantly, we identify two major challenges that hinder the application of sequence tagging models to the task: (1) the diversity of resource mention expression, and (2) long-range contextual dependencies. We address these challenges by incorporating character-level and thread context information into a LSTM-CRF model. First, we incorporate a character encoder to address the out-of-vocabulary problem caused by the diversity of mention expressions. Second, to address the context dependency challenge, we encode thread contexts using an RNN-based context encoder, and apply the attention mechanism to selectively leverage useful context information during sequence tagging. Experiments on FoRM show that the proposed method improves the baseline deep sequence tagging models notably, significantly bettering performance on instances that exemplify the two challenges.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08853

PDF

https://arxiv.org/pdf/1811.08853
Read All
An Interpretable Model for Scene Graph Generation

2018-11-21

Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

arXiv_CV

arXiv_CV Image_Caption QA Caption Detection Relation
Abstract

We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature’s contribution can be explicitly investigated. We study the key factors about these features that have the most impact on the performance, and also visualize the learned visual features for relationships and investigate the efficacy of our model. We won the champion of the OpenImages Visual Relationship Detection Challenge on Kaggle, where we outperform the 2nd place by 5\% (20\% relatively). We believe an accurate scene graph generator is a fundamental stepping stone for higher-level vision-language tasks such as image captioning and visual QA, since it provides a semantic, structured comprehension of an image that is beyond pixels and objects.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09543

PDF

https://arxiv.org/pdf/1811.09543
Read All
Neural Machine Translation based Word Transduction Mechanisms for Low-Resource Languages

2018-11-21

Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh

arXiv_CL

arXiv_CL Knowledge Embedding NMT
Abstract

Out-Of-Vocabulary (OOV) words can pose serious challenges for machine translation (MT) tasks, and in particular, for Low-Resource Languages (LRLs). This paper adapts variants of seq2seq models to perform transduction of such words from Hindi to Bhojpuri (an LRL instance), learning from a set of cognate pairs built upon a bilingual dictionary of Hindi-Bhojpuri words. We demonstrate that our models can effectively be used for languages that have a limited amount of parallel corpora, by working at the character-level to grasp phonetic and orthographic similarities across multiple types of word adaptions, whether synchronic or diachronic, loan words or cognates. We provide a comprehensive overview over the training aspects of character-level NMT systems adapted to this task, combined with a detailed analysis of their respective error cases. Using our method, we achieve an improvement by over 6 BLEU on the Hindi-to-Bhojpuri translation task. Further, we show that such transductions generalize well to other languages by applying it successfully to Hindi-Bangla cognate pairs. Our work can be seen as an important step in the process of: (i) resolving the OOV words problem arising in MT tasks, (ii) creating effective parallel corpora for resource-constrained languages, and (iii) leveraging the enhanced semantic knowledge captured by word-level embeddings onto character-level tasks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08816

PDF

https://arxiv.org/pdf/1811.08816
Read All
Retina U-Net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection

2018-11-21

Paul F. Jaeger, Simon A. A. Kohl, Sebastian Bickelhaupt, Fabian Isensee, Tristan Anselm Kuder, Heinz-Peter Schlemmer, Klaus H. Maier-Hein

arXiv_CV

arXiv_CV Object_Detection Segmentation Semantic_Segmentation Detection
Abstract

The task of localizing and categorizing objects in medical images often remains formulated as a semantic segmentation problem. This approach, however, only indirectly solves the coarse localization task by predicting pixel-level scores, requiring ad-hoc heuristics when mapping back to object-level scores. State-of-the-art object detectors on the other hand, allow for individual object scoring in an end-to-end fashion, while ironically trading in the ability to exploit the full pixel-wise supervision signal. This can be particularly disadvantageous in the setting of medical image analysis, where data sets are notoriously small. In this paper, we propose Retina U-Net, a simple architecture, which naturally fuses the Retina Net one-stage detector with the U-Net architecture widely used for semantic segmentation in medical images. The proposed architecture recaptures discarded supervision signals by complementing object detection with an auxiliary task in the form of semantic segmentation without introducing the additional complexity of previously proposed two-stage detectors. We evaluate the importance of full segmentation supervision on two medical data sets, provide an in-depth analysis on a series of toy experiments and show how the corresponding performance gain grows in the limit of small data sets. Retina U-Net yields strong detection performance only reached by its more complex two-staged counterparts. Our framework including all methods implemented for operation on 2D and 3D images is available at github.com/pfjaeger/medicaldetectiontoolkit.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08661

PDF

https://arxiv.org/pdf/1811.08661
Read All
Effect of Polarization Field on the Photocurrent Characteristics of GaN/AlN Multi-Quantum-Well Avalanche Photodiode

2018-11-21

Wangping Wang, Qian Li, Xingzhao Wu, Jianbin Kang, Yi Luo, Mo Li, Lai Wang

arXiv_CV

arXiv_CV GAN
Abstract

Polarization is an important property of GaN/AlN multi-quantum-well (MQW) avalanche diode (MAPD) but has been ignored in recent analyses of MAPD to simplify the Monte Carlo simulation. Here, the photocurrent characteristics of GaN/AlN MAPD are investigated to understand the role of polarization field in the MQW structure. Carrier multiplication in AlN/GaN MAPD is found as a result of interfacial impact ionization not much helped from external field but instead considerably contributed by the polarization field. In addition, the movement of ionized electrons out of quantum well is proved as Fowler-Nordheim tunneling process helped by the polarization field in AlN barrier. Furthermore, the transport of ionized electrons through MQW structure is found influenced by the reverse polarization field in GaN layer, which could be suppressed by the external electric field. With all the three effects, the quick photocurrent increase of GaN/AlN MAPD is ascribed to the efficient transport of interfacial-ionized electrons step by step out of MQW at high voltages, which is in big difference from conventional APD due to avalanche breakdown.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1808.10582

PDF

https://arxiv.org/pdf/1808.10582
Read All
Neural Machine Translation with Adequacy-Oriented Learning

2018-11-21

Xiang Kong, Zhaopeng Tu, Shuming Shi, Eduard Hovy, Tong Zhang

arXiv_CL

arXiv_CL Adversarial Attention Face Reinforcement_Learning NMT Quantitative
Abstract

Although Neural Machine Translation (NMT) models have advanced state-of-the-art performance in machine translation, they face problems like the inadequate translation. We attribute this to that the standard Maximum Likelihood Estimation (MLE) cannot judge the real translation quality due to its several limitations. In this work, we propose an adequacy-oriented learning mechanism for NMT by casting translation as a stochastic policy in Reinforcement Learning (RL), where the reward is estimated by explicitly measuring translation adequacy. Benefiting from the sequence-level training of RL strategy and a more accurate reward designed specifically for translation, our model outperforms multiple strong baselines, including (1) standard and coverage-augmented attention models with MLE-based training, and (2) advanced reinforcement and adversarial training strategies with rewards based on both word-level BLEU and character-level chrF3. Quantitative and qualitative analyses on different language pairs and NMT architectures demonstrate the effectiveness and universality of the proposed approach.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08541

PDF

https://arxiv.org/pdf/1811.08541
Read All
Orthographic Feature Transform for Monocular 3D Object Detection

2018-11-20

Thomas Roddick, Alex Kendall, Roberto Cipolla

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Detection
Abstract

3D object detection from monocular images has proven to be an enormously challenging task, with the performance of leading systems not yet achieving even 10\% of that of LiDAR-based counterparts. One explanation for this performance gap is that existing systems are entirely at the mercy of the perspective image-based representation, in which the appearance and scale of objects varies drastically with depth and meaningful distances are difficult to infer. In this work we argue that the ability to reason about the world in 3D is an essential element of the 3D object detection task. To this end, we introduce the orthographic feature transform, which enables us to escape the image domain by mapping image-based features into an orthographic 3D space. This allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful. We apply this transformation as part of an end-to-end deep learning architecture and achieve state-of-the-art performance on the KITTI 3D object benchmark.\footnote{We will release full source code and pretrained models upon acceptance of this manuscript for publication.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08188

PDF

https://arxiv.org/pdf/1811.08188
Read All
DarwinML: A Graph-based Evolutionary Algorithm for Automated Machine Learning

2018-11-20

Fei Qi, Zhaohui Xia, Gaoyang Tang, Hang Yang, Yu Song, Guangrui Qian, Xiong An, Chunhuan Lin, Guangming Shi

arXiv_CV

arXiv_CV Optimization
Abstract

As an emerging field, Automated Machine Learning (AutoML) aims to reduce or eliminate manual operations that require expertise in machine learning. In this paper, a graph-based architecture is employed to represent flexible combinations of ML models, which provides a large searching space compared to tree-based and stacking-based architectures. Based on this, an evolutionary algorithm is proposed to search for the best architecture, where the mutation and heredity operators are the key for architecture evolution. With Bayesian hyper-parameter optimization, the proposed approach can automate the workflow of machine learning. On the PMLB dataset, the proposed approach shows the state-of-the-art performance compared with TPOT, Autostacker, and auto-sklearn. Some of the optimized models are with complex structures which are difficult to obtain in manual design.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.08013

PDF

https://arxiv.org/pdf/1901.08013
Read All
Scene Graph Generation via Conditional Random Fields

2018-11-20

Weilin Cong, William Wang, Wang-Chien Lee

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval Object_Detection QA Segmentation Caption Detection Relation
Abstract

Despite the great success object detection and segmentation models have achieved in recognizing individual objects in images, performance on cognitive tasks such as image caption, semantic image retrieval, and visual QA is far from satisfactory. To achieve better performance on these cognitive tasks, merely recognizing individual object instances is insufficient. Instead, the interactions between object instances need to be captured in order to facilitate reasoning and understanding of the visual scenes in an image. Scene graph, a graph representation of images that captures object instances and their relationships, offers a comprehensive understanding of an image. However, existing techniques on scene graph generation fail to distinguish subjects and objects in the visual scenes of images and thus do not perform well with real-world datasets where exist ambiguous object instances. In this work, we propose a novel scene graph generation model for predicting object instances and its corresponding relationships in an image. Our model, SG-CRF, learns the sequential order of subject and object in a relationship triplet, and the semantic compatibility of object instance nodes and relationship nodes in a scene graph efficiently. Experiments empirically show that SG-CRF outperforms the state-of-the-art methods, on three different datasets, i.e., CLEVR, VRD, and Visual Genome, raising the Recall@100 from 24.99% to 49.95%, from 41.92% to 50.47%, and from 54.69% to 54.77%, respectively.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08075

PDF

https://arxiv.org/pdf/1811.08075
Read All
Stackelberg GAN: Towards Provable Minimax Equilibrium via Multi-Generator Architectures

2018-11-19

Hongyang Zhang, Susu Xu, Jiantao Jiao, Pengtao Xie, Ruslan Salakhutdinov, Eric P. Xing

arXiv_CV

arXiv_CV GAN Optimization Gradient_Descent
Abstract

We study the problem of alleviating the instability issue in the GAN training procedure via new architecture design. The discrepancy between the minimax and maximin objective values could serve as a proxy for the difficulties that the alternating gradient descent encounters in the optimization of GANs. In this work, we give new results on the benefits of multi-generator architecture of GANs. We show that the minimax gap shrinks to $\epsilon$ as the number of generators increases with rate $\widetilde{O}(1/\epsilon)$. This improves over the best-known result of $\widetilde{O}(1/\epsilon^2)$. At the core of our techniques is a novel application of Shapley-Folkman lemma to the generic minimax problem, where in the literature the technique was only known to work when the objective function is restricted to the Lagrangian function of a constraint optimization problem. Our proposed Stackelberg GAN performs well experimentally in both synthetic and real-world datasets, improving Fréchet Inception Distance by $14.61\%$ over the previous multi-generator GANs on the benchmark datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08010

PDF

https://arxiv.org/pdf/1811.08010
Read All
End-to-End Retrieval in Continuous Space

2018-11-19

Daniel Gillick, Alessandro Presta, Gaurav Singh Tomar

arXiv_CV

arXiv_CV Embedding
Abstract

Most text-based information retrieval (IR) systems index objects by words or phrases. These discrete systems have been augmented by models that use embeddings to measure similarity in continuous space. But continuous-space models are typically used just to re-rank the top candidates. We consider the problem of end-to-end continuous retrieval, where standard approximate nearest neighbor (ANN) search replaces the usual discrete inverted index, and rely entirely on distances between learned embeddings. By training simple models specifically for retrieval, with an appropriate model architecture, we improve on a discrete baseline by 8% and 26% (MAP) on two similar-question retrieval tasks. We also discuss the problem of evaluation for retrieval systems, and show how to modify existing pairwise similarity datasets for this purpose.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08008

PDF

https://arxiv.org/pdf/1811.08008
Read All

202/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL