Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Behavioural Repertoire via Generative Adversarial Policy Networks

2019-03-06

Marija Jegorova, Stéphane Doncieux, Timothy Hospedales

arXiv_AI

arXiv_AI Adversarial Face
Abstract

Learning algorithms are enabling robots to solve increasingly challenging real-world tasks. These approaches often rely on demonstrations and reproduce the behavior shown. Unexpected changes in the environment may require using different behaviors to achieve the same effect, for instance to reach and grasp an object in changing clutter. An emerging paradigm addressing this robustness issue is to learn a diverse set of successful behaviors for a given task, from which a robot can select the most suitable policy when faced with a new environment. In this paper, we explore a novel realization of this vision by learning a generative model over policies. Rather than learning a single policy, or a small fixed repertoire, our generative model for policies compactly encodes an unbounded number of policies and allows novel controller variants to be sampled. Leveraging our generative policy network, a robot can sample novel behaviors until it finds one that works for a new environment. We demonstrate this idea with an application of robust ball-throwing in the presence of obstacles. We show that this approach achieves a greater diversity of behaviors than an existing evolutionary approach, while maintaining good efficacy of sampled behaviors, allowing a Baxter robot to hit targets more often when ball throwing in the presence of obstacles.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.02945

PDF

http://arxiv.org/pdf/1811.02945
Read All
A Lane-Change Path Planner and its application with a monocular camera

2019-03-06

Yunlong Huang

arXiv_RO

arXiv_RO
Abstract

Human drivers utilize the visual cues from the road to performance some fundamental driving tasks, e.g. lane keeping and lane change, for the complex driving maneuvers. Lane keeping and lane change can be generalized as one task, because both of them are to drive a vehicle onto a target lane. In this paper, we first design a lane-change path planner based on HD (High-Definition) map for autonomous driving systems using control theory. Later, applying the similar idea, a lane change controller using a monocular camera is designed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02552

PDF

http://arxiv.org/pdf/1903.02552
Read All
Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries

2019-03-06

Jawadul H. Bappy, Cody Simons, Lakshmanan Nataraj, B.S. Manjunath, Amit K. Roy-Chowdhury

arXiv_CV

arXiv_CV RNN Prediction Detection Relation
Abstract

With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy-clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulation localization architecture which utilizes resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder network to segment out manipulated regions from non-manipulated ones. Resampling features are used to capture artifacts like JPEG quality loss, upsampling, downsampling, rotation, and shearing. The proposed network exploits larger receptive fields (spatial maps) and frequency domain correlation to analyze the discriminative characteristics between manipulated and non-manipulated regions by incorporating encoder and LSTM network. Finally, decoder network learns the mapping from low-resolution feature maps to pixel-wise predictions for image tamper localization. With predicted mask provided by final layer (softmax) of the proposed architecture, end-to-end training is performed to learn the network parameters through back-propagation using ground-truth masks. Furthermore, a large image splicing dataset is introduced to guide the training process. The proposed method is capable of localizing image manipulations at pixel level with high precision, which is demonstrated through rigorous experimentation on three diverse datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02495

PDF

http://arxiv.org/pdf/1903.02495
Read All
Object Counting and Instance Segmentation with Image-level Supervision

2019-03-06

Hisham Cholakkal, Guolei Sun (equal contribution), Fahad Shahbaz Khan, Ling Shao

arXiv_CV

arXiv_CV Knowledge Segmentation
Abstract

Common object counting in a natural scene is a challenging problem in computer vision with numerous real-world applications. Existing image-level supervised common object counting approaches only predict the global object count and rely on additional instance-level supervision to also determine object locations. We propose an image-level supervised approach that provides both the global object count and the spatial distribution of object instances by constructing an object category density map. Motivated by psychological studies, we further reduce image-level supervision using a limited object count information (up to four). To the best of our knowledge, we are the first to propose image-level supervised density map estimation for common object counting and demonstrate its effectiveness in image-level supervised instance segmentation. Comprehensive experiments are performed on the PASCAL VOC and COCO datasets. Our approach outperforms existing methods, including those using instance-level supervision, on both datasets for common object counting. Moreover, our approach improves state-of-the-art image-level supervised instance segmentation with a relative gain of 17.8% in terms of average best overlap, on the PASCAL VOC 2012 dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02494

PDF

http://arxiv.org/pdf/1903.02494
Read All
SemEval 2019 Task 1: Cross-lingual Semantic Parsing with UCCA

2019-03-06

Daniel Hershcovich, Zohar Aizenbud, Leshem Choshen, Elior Sulem, Ari Rappoport, Omri Abend

arXiv_CL

arXiv_CL
Abstract

We present the SemEval 2019 shared task on UCCA parsing in English, German and French, and discuss the participating systems and results. UCCA is a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. The shared task has yielded improvements over the state-of-the-art baseline in all languages and settings. Full results can be found in the task’s website \url{https://competitions.codalab.org/competitions/19160}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02953

PDF

http://arxiv.org/pdf/1903.02953
Read All
GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier

2019-03-06

Alexandre Gariépy, Jean-Christophe Ruel, Brahim Chaib-draa, Philippe Giguère

arXiv_CV

arXiv_CV Sparse Detection
Abstract

Grasping is a fundamental robotic task needed for the deployment of household robots or furthering warehouse automation. However, few approaches are able to perform grasp detection in real time (frame rate). To this effect, we present Grasp Quality Spatial Transformer Network (GQ-STN), a one-shot grasp detection network. Being based on the Spatial Transformer Network (STN), it produces not only a grasp configuration, but also directly outputs a depth image centered at this configuration. By connecting our architecture to an externally-trained grasp robustness evaluation network, we can train efficiently to satisfy a robustness metric via the backpropagation of the gradient emanating from the evaluation network. This removes the difficulty of training detection networks on sparsely annotated databases, a common issue in grasping. We further propose to use this robustness classifier to compare approaches, being more reliable than the traditional rectangle metric. Our GQ-STN is able to detect robust grasps on the depth images of the Dex-Net 2.0 dataset with 92.4 % accuracy in a single pass of the network. We finally demonstrate in a physical benchmark that our method can propose robust grasps more often than previous sampling-based methods, while being more than 60 times faster.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02489

PDF

http://arxiv.org/pdf/1903.02489
Read All
Superframes, A Temporal Video Segmentation

2019-03-06

Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, Paolo Remagnino

arXiv_CV

arXiv_CV Segmentation
Abstract

The goal of video segmentation is to turn video data into a set of concrete motion clusters that can be easily interpreted as building blocks of the video. There are some works on similar topics like detecting scene cuts in a video, but there is few specific research on clustering video data into the desired number of compact segments. It would be more intuitive, and more efficient, to work with perceptually meaningful entity obtained from a low-level grouping process which we call it superframe. This paper presents a new simple and efficient technique to detect superframes of similar content patterns in videos. We calculate the similarity of content-motion to obtain the strength of change between consecutive frames. With the help of existing optical flow technique using deep models, the proposed method is able to perform more accurate motion estimation efficiently. We also propose two criteria for measuring and comparing the performance of different algorithms on various databases. Experimental results on the videos from benchmark databases have demonstrated the effectiveness of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.06642

PDF

http://arxiv.org/pdf/1804.06642
Read All
EMG-Controlled Non-Anthropomorphic Hand Teleoperation Using a Continuous Teleoperation Subspace

2019-03-06

Cassie Meeker, Matei Ciocarlie

arXiv_RO

arXiv_RO
Abstract

We present a method for EMG-driven teleoperation of non-anthropomorphic robot hands. EMG sensors are appealing as a wearable, inexpensive, and unobtrusive way to gather information about the teleoperator’s hand pose. However, mapping from EMG signals to the pose space of a non-anthropomorphic hand presents multiple challenges. We present a method that first projects from forearm EMG into a subspace relevant to teleoperation. To increase robustness, we use a model which combines continuous and discrete predictors along different dimensions of this subspace. We then project from the teleoperation subspace into the pose space of the robot hand. Our method is effective and intuitive, as it enables novice users to teleoperate pick and place tasks faster and more robustly than state-of-the-art EMG teleoperation methods when applied to a non-anthropomorphic, multi-DOF robot hand.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.09730

PDF

http://arxiv.org/pdf/1809.09730
Read All
Development of SAM: cable-Suspended Aerial Manipulator

2019-03-06

Yuri S. Sarkisov, Min Jun Kim, Davide Bicego, Dzmitry Tsetserukou, Christian Ott, Antonio Franchi, Konstantin Kondak

arXiv_RO

arXiv_RO
Abstract

High risk of a collision between rotor blades and the obstacles in a complex environment imposes restrictions on the aerial manipulators. To solve this issue, a novel system cable-Suspended Aerial Manipulator (SAM) is presented in this paper. Instead of attaching a robotic manipulator directly to an aerial carrier, it is mounted on an active platform which is suspended on the carrier by means of a cable. As a result, higher safety can be achieved because the aerial carrier can keep a distance from the obstacles. For self-stabilization, the SAM is equipped with two actuation systems: winches and propulsion units. This paper presents an overview of the SAM including the concept behind, hardware realization, control strategy, and the first experimental results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02426

PDF

http://arxiv.org/pdf/1903.02426
Read All
KBQA: Learning Question Answering over QA Corpora and Knowledge Bases

2019-03-06

Wanyun Cui, Yanghua Xiao, Haixun Wang, Yangqiu Song, Seung-won Hwang, Wei Wang

arXiv_CL

arXiv_CL Knowledge QA
Abstract

Question answering (QA) has become a popular way for humans to access billion-scale knowledge bases. Unlike web search, QA over a knowledge base gives out accurate and concise results, provided that natural language questions can be understood and mapped precisely to structured queries over the knowledge base. The challenge, however, is that a human can ask one question in many different ways. Previous approaches have natural limits due to their representations: rule based approaches only understand a small set of “canned” questions, while keyword based or synonym based approaches cannot fully understand the questions. In this paper, we design a new kind of question representation: templates, over a billion scale knowledge base and a million scale QA corpora. For example, for questions about a city’s population, we learn templates such as What’s the population of $city?, How many people are there in $city?. We learned 27 million templates for 2782 intents. Based on these templates, our QA system KBQA effectively supports binary factoid questions, as well as complex questions which are composed of a series of binary factoid questions. Furthermore, we expand predicates in RDF knowledge base, which boosts the coverage of knowledge base by 57 times. Our QA system beats all other state-of-art works on both effectiveness and efficiency over QALD benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02419

PDF

http://arxiv.org/pdf/1903.02419
Read All
Compressing complex convolutional neural network based on an improved deep compression algorithm

2019-03-06

Jiasong Wu, Hongshan Ren, Youyong Kong, Chunfeng Yang, Lotfi Senhadji, Huazhong Shu

arXiv_CV

arXiv_CV Knowledge CNN
Abstract

Although convolutional neural network (CNN) has made great progress, large redundant parameters restrict its deployment on embedded devices, especially mobile devices. The recent compression works are focused on real-value convolutional neural network (Real CNN), however, to our knowledge, there is no attempt for the compression of complex-value convolutional neural network (Complex CNN). Compared with the real-valued network, the complex-value neural network is easier to optimize, generalize, and has better learning potential. This paper extends the commonly used deep compression algorithm from real domain to complex domain and proposes an improved deep compression algorithm for the compression of Complex CNN. The proposed algorithm compresses the network about 8 times on CIFAR-10 dataset with less than 3% accuracy loss. On the ImageNet dataset, our method compresses the model about 16 times and the accuracy loss is about 2% without retraining.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02358

PDF

http://arxiv.org/pdf/1903.02358
Read All
CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning

2019-03-06

Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, Chunhua Shen

arXiv_CV

arXiv_CV Segmentation Attention CNN Semantic_Segmentation Optimization Prediction
Abstract

Recent progress in semantic segmentation is driven by deep Convolutional Neural Networks and large-scale labeled image datasets. However, data labeling for pixel-wise segmentation is tedious and costly. Moreover, a trained model can only make predictions within a set of pre-defined classes. In this paper, we present CANet, a class-agnostic segmentation network that performs few-shot segmentation on new classes with only a few annotated images available. Our network consists of a two-branch dense comparison module which performs multi-level feature comparison between the support image and the query image, and an iterative optimization module which iteratively refines the predicted results. Furthermore, we introduce an attention mechanism to effectively fuse information from multiple support examples under the setting of k-shot learning. Experiments on PASCAL VOC 2012 show that our method achieves a mean Intersection-over-Union score of 55.4% for 1-shot segmentation and 57.1% for 5-shot segmentation, outperforming state-of-the-art methods by a large margin of 14.6% and 13.2%, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02351

PDF

http://arxiv.org/pdf/1903.02351
Read All
Understanding the Artificial Intelligence Clinician and optimal treatment strategies for sepsis in intensive care

2019-03-06

Matthieu Komorowski, Leo A. Celi, Omar Badawi, Anthony C. Gordon, A. Aldo Faisal

arXiv_AI

arXiv_AI Review Reinforcement_Learning Recommendation
Abstract

In this document, we explore in more detail our published work (Komorowski, Celi, Badawi, Gordon, & Faisal, 2018) for the benefit of the AI in Healthcare research community. In the above paper, we developed the AI Clinician system, which demonstrated how reinforcement learning could be used to make useful recommendations towards optimal treatment decisions from intensive care data. Since publication a number of authors have reviewed our work (e.g. Abbasi, 2018; Bos, Azoulay, & Martin-Loeches, 2019; Saria, 2018). Given the difference of our framework to previous work, the fact that we are bridging two very different academic communities (intensive care and machine learning) and that our work has impact on a number of other areas with more traditional computer-based approaches (biosignal processing and control, biomedical engineering), we are providing here additional details on our recent publication.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02345

PDF

http://arxiv.org/pdf/1903.02345
Read All
A Capsule Network-based Embedding Model for Search Personalization

2019-03-06

Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dinh Phung

arXiv_CL

arXiv_CL Embedding Deep_Learning Relation
Abstract

Search personalization aims to tailor search results to each specific user based on the user’s personal interests and preferences (i.e., the user profile). Recent research approaches to search personalization by modelling the potential 3-way relationship between the submitted query, the user and the search results (i.e., documents). That relationship is then used to personalize the search results to that user. In this paper, we introduce a novel embedding model based on capsule network, which recently is a breakthrough in deep learning, to model the 3-way relationships for search personalization. In the model, each user (submitted query or returned document) is embedded by a vector in the same vector space. The 3-way relationship is described as a triple of (query, user, document) which is then modeled as a 3-column matrix containing the three embedding vectors. After that, the 3-column matrix is fed into a deep learning architecture to re-rank the search results returned by a basis ranker. Experimental results on query logs from a commercial web search engine show that our model achieves better performances than the basis ranker as well as strong search personalization baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.04266

PDF

http://arxiv.org/e-print/1804.04266
Read All
Gaze-based, Context-aware Robotic System for Assisted Reaching and Grasping

2019-03-06

Ali Shafti, Pavel Orlov, A. Aldo Faisal

arXiv_RO

arXiv_RO
Abstract

Assistive robotic systems endeavour to support those with movement disabilities, enabling them to move again and regain functionality. Main issue with these systems is the complexity of their low-level control, and how to translate this to simpler, higher level commands that are easy and intuitive for a human user to interact with. We have created a multi-modal system, consisting of different sensing, decision making and actuating modalities, leading to intuitive, human-in-the-loop assistive robotics. The system takes its cue from the user’s gaze, to decode their intentions and implement low-level motion actions to achieve high-level tasks. This results in the user simply having to look at the objects of interest, for the robotic system to assist them in reaching for those objects, grasping them, and using them to interact with other objects. We present our method for 3D gaze estimation, and grammars-based implementation of sequences of action with the robotic system. The 3D gaze estimation is evaluated with 8 subjects, showing an overall accuracy of $4.68\pm0.14cm$. The full system is tested with 5 subjects, showing successful implementation of $100\%$ of reach to gaze point actions and full implementation of pick and place tasks in 96\%, and pick and pour tasks in $76\%$ of cases. Finally we present a discussion on our results and what future work is needed to improve the system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08095

PDF

http://arxiv.org/pdf/1809.08095
Read All
Self-Supervised Learning of 3D Human Pose using Multi-view Geometry

2019-03-06

Muhammed Kocabas, Salih Karagoz, Emre Akbas

arXiv_CV

arXiv_CV Pose_Estimation
Abstract

Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect. Various weakly or self supervised pose estimation methods have been proposed due to lack of 3D data. Nevertheless, these methods, in addition to 2D ground-truth poses, require either additional supervision in various forms (e.g. unpaired 3D ground truth data, a small subset of labels) or the camera parameters in multiview settings. To address these problems, we present EpipolarPose, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics. During training, EpipolarPose estimates 2D poses from multi-view images, and then, utilizes epipolar geometry to obtain a 3D pose and camera geometry which are subsequently used to train a 3D pose estimator. We demonstrate the effectiveness of our approach on standard benchmark datasets i.e. Human3.6M and MPI-INF-3DHP where we set the new state-of-the-art among weakly/self-supervised methods. Furthermore, we propose a new performance measure Pose Structure Score (PSS) which is a scale invariant, structure aware measure to evaluate the structural plausibility of a pose with respect to its ground truth. Code and pretrained models are available at https://github.com/mkocabas/EpipolarPose

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02330

PDF

http://arxiv.org/pdf/1903.02330
Read All
Towards Learning Abstract Representations for Locomotion Planning in High-dimensional State Spaces

2019-03-06

Tobias Klamt, Sven Behnke

arXiv_RO

arXiv_RO
Abstract

Ground robots which are able to navigate a variety of terrains are needed in many domains. One of the key aspects is the capability to adapt to the ground structure, which can be realized through movable body parts coming along with additional degrees of freedom (DoF). However, planning respective locomotion is challenging since suitable representations result in large state spaces. Employing an additional abstract representation—which is coarser, lower-dimensional, and semantically enriched—can support the planning. While a desired robot representation and action set of such an abstract representation can be easily defined, the cost function requires large tuning efforts. We propose a method to represent the cost function as a CNN. Training of the network is done on generated artificial data, while it generalizes well to the abstraction of real world scenes. We further apply our method to the problem of search-based planning of hybrid driving-stepping locomotion. The abstract representation is used as a powerful informed heuristic which accelerates planning by multiple orders of magnitude.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02308

PDF

http://arxiv.org/pdf/1903.02308
Read All
Video-based surgical skill assessment using 3D convolutional neural networks

2019-03-06

Isabel Funke, Sören Torge Mees, Jürgen Weitz, Stefanie Speidel

arXiv_CV

arXiv_CV Tracking CNN Video_Classification Classification Deep_Learning
Abstract

Purpose: A profound education of novice surgeons is crucial to ensure that surgical interventions are effective and safe. One important aspect is the teaching of technical skills for minimally invasive or robot-assisted procedures. This includes the objective and preferably automatic assessment of surgical skill. Recent studies presented good results for automatic, objective skill evaluation by collecting and analyzing motion data such as trajectories of surgical instruments. However, obtaining the motion data generally requires additional equipment for instrument tracking or the availability of a robotic surgery system to capture kinematic data. In contrast, we investigate a method for automatic, objective skill assessment that requires video data only. This has the advantage that video can be collected effortlessly during minimally invasive and robot-assisted training scenarios. Methods: Our method builds on recent advances in deep learning-based video classification. Specifically, we propose to use an inflated 3D ConvNet to classify snippets of optical flow extracted from surgical video. The network is extended into a Temporal Segment Network during training. Results: On the publicly available JIGSAWS dataset, our approach achieves high skill classification accuracies ranging from 95.1% to 100.0%. Conclusions: Our results demonstrate the feasibility of deep learning-based assessment of technical skill from surgical video. The 3D ConvNet is able to learn meaningful patterns directly from the data, alleviating the need for manual feature engineering. Further evaluation will require more annotated data for training and testing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02306

PDF

http://arxiv.org/pdf/1903.02306
Read All
Lambda-Field: A Continuous Counterpart of the Bayesian Occupancy Grid for Risk Assessment

2019-03-06

Johann Laconte, Christophe Debain, Roland Chapuis, François Pomerleau, Romuald Aufrère

arXiv_RO

arXiv_RO
Abstract

In a context of autonomous robots, one of the most important task is to ensure the safety of the robot and its surrounding. Most of the time, the risk of navigation is simply said to be the probability of collision. This notion of risk is not well defined in the literature, especially when dealing with occupancy grids. The Bayesian occupancy grid is the most used method to deal with complex environments. However, this is not fitted to compute the risk along a path by its discrete nature, hence giving poor results. In this article, we present a new way to store the occupancy of the environment that allows the computation of risk for a given path. We then define the risk as the force of collision that would occur for a given obstacle. Using this framework, we are able to generate navigation paths ensuring the safety of the robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02285

PDF

http://arxiv.org/pdf/1903.02285
Read All
Multiple configurations for puncturing robot positioning

2019-03-06

Omar Abdelaziz, Minzhou Luo, Guanwu Jiang, Saixuan Chen

arXiv_RO

arXiv_RO
Abstract

The paper presents the Inverse Kinematics (IK) close form derivation steps using combination of analytical and geometric techniques for the UR robot. The innovative application of this work is used in the precise positioning of puncture robotics system. The end effector is a puncture needle guide tube, which needs precise positioning over the puncture insertion point. The IK closed form solutions bring out maximum 8 solutions represents 8 different robot joints configurations. These multiple solutions are helpful in the puncture robotics system, it allow doctors to choose the most suitable configuration during the operation. Therefore the workspace becomes more adequate for the coexistence of human and robot. Moreover IK closed form solutions are more precise in positioning for medical puncture surgery compared to other numerical methods. We include a performance evaluation for both of the IK obtained by the closed form solution and by a numerical method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02281

PDF

http://arxiv.org/pdf/1903.02281
Read All
Latent Space Autoregression for Novelty Detection

2019-03-06

Davide Abati, Angelo Porrello, Simone Calderara, Rita Cucchiara

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity. Despite its importance in different application settings, designing a novelty detector is utterly complex due to the unpredictable nature of novelties and its inaccessibility during the training procedure, factors which expose the unsupervised nature of the problem. In our proposal, we design a general framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying its latent representations through an autoregressive procedure. We show that a maximum likelihood objective, optimized in conjunction with the reconstruction of normal samples, effectively acts as a regularizer for the task at hand, by minimizing the differential entropy of the distribution spanned by latent vectors. In addition to providing a very general formulation, extensive experiments of our model on publicly available datasets deliver on-par or superior performances if compared to state-of-the-art methods in one-class and video anomaly detection settings. Differently from prior works, our proposal does not make any assumption about the nature of the novelties, making our work readily applicable to diverse contexts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.01653

PDF

http://arxiv.org/pdf/1807.01653
Read All
High-Fidelity Image Generation With Fewer Labels

2019-03-06

Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, Sylvain Gelly

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Deep generative models are becoming a cornerstone of modern machine learning. Recent work on conditional generative adversarial networks has shown that learning complex, high-dimensional distributions over natural images is within reach. While the latest models are able to generate high-fidelity, diverse natural images at high resolution, they rely on a vast quantity of labeled data. In this work we demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform state-of-the-art (SOTA) on both unsupervised ImageNet synthesis, as well as in the conditional setting. In particular, the proposed approach is able to match the sample quality (as measured by FID) of the current state-of-the art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02271

PDF

http://arxiv.org/pdf/1903.02271
Read All
Visual Discourse Parsing

2019-03-06

Arjun R Akula, Song-Chun Zhu

arXiv_CV

arXiv_CV Relation
Abstract

Text-level discourse parsing aims to unmask how two segments (or sentences) in the text are related to each other. We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video. Here we use the term scene to refer to a subset of video frames that can better summarize the video. In order to collect a dataset for learning discourse cues from videos, one needs to manually identify the scenes from a large pool of video frames and then annotate the discourse relations between them. This is clearly a time consuming, expensive and tedious task. In this work, we propose an approach to identify discourse cues from the videos without the need to explicitly identify and annotate the scenes. We also present a novel dataset containing 310 videos and the corresponding discourse cues to evaluate our approach. We believe that many of the multi-discipline Artificial Intelligence problems such as Visual Dialog and Visual Storytelling would greatly benefit from the use of visual discourse cues.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02252

PDF

http://arxiv.org/pdf/1903.02252
Read All
SilhoNet: An RGB Method for 6D Object Pose Estimation

2019-03-06

Gideon Billings, Matthew Johnson-Roberson

arXiv_CV

arXiv_CV Pose_Estimation CNN
Abstract

Autonomous robot manipulation involves estimating the pose of the object to be manipulated. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this work, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a Convolutional Neural Network (CNN) pipeline that takes in region of interest proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method achieves better overall performance than the state-of-the art PoseCNN network for 6D pose estimation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.06893

PDF

http://arxiv.org/pdf/1809.06893
Read All
Photo-realistic Image Super-resolution with Fast and Lightweight Cascading Residual Network

2019-03-06

Namhyuk Ahn, Byungkon Kang, Kyung-Ah Sohn

arXiv_CV

arXiv_CV Adversarial Super_Resolution Deep_Learning
Abstract

Recent progress in the deep learning-based models has improved single-image super-resolution significantly. However, despite their powerful performance, many models are difficult to apply to the real-world applications because of the heavy computational requirements. To facilitate the use of a deep learning model in such demands, we focus on keeping the model fast and lightweight while maintaining its accuracy. In detail, we design an architecture that implements a cascading mechanism on a residual network to boost the performance with limited resources via multi-level feature fusion. Moreover, we adopt group convolution and weight-tying for our proposed model in order to achieve extreme efficiency. In addition to the traditional super-resolution task, we apply our methods to the photo-realistic super-resolution field using the adversarial learning paradigm and a multi-scale discriminator approach. By doing so, we show that the performances of the proposed models surpass those of the recent methods, which have a complexity similar to ours, for both traditional pixel-based and perception-based tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02240

PDF

http://arxiv.org/pdf/1903.02240
Read All
Characterizing Human Behaviours Using Statistical Motion Descriptor

2019-03-06

Eissa Jaber Alreshidi, Mohammad Bilal

arXiv_CV

arXiv_CV
Abstract

Identifying human behaviors is a challenging research problem due to the complexity and variation of appearances and postures, the variation of camera settings, and view angles. In this paper, we try to address the problem of human behavior identification by introducing a novel motion descriptor based on statistical features. The method first divide the video into N number of temporal segments. Then for each segment, we compute dense optical flow, which provides instantaneous velocity information for all the pixels. We then compute Histogram of Optical Flow (HOOF) weighted by the norm and quantized into 32 bins. We then compute statistical features from the obtained HOOF forming a descriptor vector of 192- dimensions. We then train a non-linear multi-class SVM that classify different human behaviors with the accuracy of 72.1%. We evaluate our method by using publicly available human action data set. Experimental results shows that our proposed method out performs state of the art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02236

PDF

http://arxiv.org/pdf/1903.02236
Read All
Robust Video Background Identification by Dominant Rigid Motion Estimation

2019-03-06

Kaimo Lin, Nianjuan Jiang, Loong Fah Cheong, Jiangbo Lu, Xun Xu

arXiv_CV

arXiv_CV Segmentation Face Quantitative
Abstract

The ability to identify the static background in videos captured by a moving camera is an important pre-requisite for many video applications (e.g. video stabilization, stitching, and segmentation). Existing methods usually face difficulties when the foreground objects occupy a larger area than the background in the image. Many methods also cannot scale up to handle densely sampled feature trajectories. In this paper, we propose an efficient local-to-global method to identify background, based on the assumption that as long as there is sufficient camera motion, the cumulative background features will have the largest amount of trajectories. Our motion model at the two-frame level is based on the epipolar geometry so that there will be no over-segmentation problem, another issue that plagues the 2D motion segmentation approach. Foreground objects erroneously labelled due to intermittent motions are also taken care of by checking their global consistency with the final estimated background motion. Lastly, by virtue of its efficiency, our method can deal with densely sampled trajectories. It outperforms several state-of-the-art motion segmentation methods on public datasets, both quantitatively and qualitatively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02232

PDF

http://arxiv.org/pdf/1903.02232
Read All
Dixit: Interactive Visual Storytelling via Term Manipulation

2019-03-06

Chao-Chun Hsu, Yu-Hua Chen, Zi-Yuan Chen, Hsin-Yu Lin, Ting-Hao (Kenneth) Huang, Lun-Wei Ku

arXiv_CL

arXiv_CL Image_Caption Caption RNN
Abstract

In this paper, we introduceDixit, an interactive visual storytelling system that the user interacts with iteratively to compose a short story for a photo sequence. The user initiates the process by up-loading a sequence of photos. Dixit first extracts text terms from each photo which describe the objects (e.g., boy, bike) or actions(e.g., sleep) in the photo, and then allows the user to add new terms or remove existing terms. Dixit then generates a short story based on these terms. Behind the scenes, Dixit uses an LSTM-based model trained on image caption data and FrameNet to distill terms from each image and utilizes a transformer decoder to compose a context-coherent story. Users change images or terms iteratively with Dixit to create the most ideal story. Dixit also allows users to manually edit and rate stories. The proposed procedure opens up possibilities for interpretable and controllable visual storytelling, allowing users to understand the story formation rationale and to intervene in the generation process.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02230

PDF

http://arxiv.org/pdf/1903.02230
Read All
DepthwiseGANs: Fast Training Generative Adversarial Networks for Realistic Image Synthesis

2019-03-06

Mkhuseli Ngxande, Jules-Raymond Tapamo, Michael Burke

arXiv_CV

arXiv_CV Adversarial Super_Resolution GAN CNN
Abstract

Recent work has shown significant progress in the direction of synthetic data generation using Generative Adversarial Networks (GANs). GANs have been applied in many fields of computer vision including text-to-image conversion, domain transfer, super-resolution, and image-to-video applications. In computer vision, traditional GANs are based on deep convolutional neural networks. However, deep convolutional neural networks can require extensive computational resources because they are based on multiple operations performed by convolutional layers, which can consist of millions of trainable parameters. Training a GAN model can be difficult and it takes a significant amount of time to reach an equilibrium point. In this paper, we investigate the use of depthwise separable convolutions to reduce training time while maintaining data generation performance. Our results show that a DepthwiseGAN architecture can generate realistic images in shorter training periods when compared to a StarGan architecture, but that model capacity still plays a significant role in generative modelling. In addition, we show that depthwise separable convolutions perform best when only applied to the generator. For quality evaluation of generated images, we use the Fr'echet Inception Distance (FID), which compares the similarity between the generated image distribution and that of the training dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02225

PDF

http://arxiv.org/pdf/1903.02225
Read All
Training in Task Space to Speed Up and Guide Reinforcement Learning

2019-03-06

Guillaume Bellegarda, Katie Byl

arXiv_RO

arXiv_RO Reinforcement_Learning
Abstract

Recent breakthroughs in the reinforcement learning (RL) community have made significant advances towards learning and deploying policies on real world robotic systems. However, even with the current state-of-the-art algorithms and computational resources, these algorithms are still plagued with high sample complexity, and thus long training times, especially for high degree of freedom (DOF) systems. There are also concerns arising from lack of perceived stability or robustness guarantees from emerging policies. This paper aims at mitigating these drawbacks by: (1) modeling a complex, high DOF system with a representative simple one, (2) making explicit use of forward and inverse kinematics without forcing the RL algorithm to “learn” them on its own, and (3) learning locomotion policies in Cartesian space instead of joint space. In this paper these methods are applied to JPL’s Robosimian, but can be readily used on any system with a base and end effector(s). These locomotion policies can be produced in just a few minutes, trained on a single laptop. We compare the robustness of the resulting learned policies to those of other control methods. An accompanying video for this paper can be found at https://youtu.be/xDxxSw5ahnc .

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02219

PDF

http://arxiv.org/pdf/1903.02219
Read All
Optimal Dexterity for a Snake-like Surgical Manipulator using Patient-specific Task-space Constraints in a Computational Design Algorithm

2019-03-06

Andrew Razjigaev, Ajay K. Pandey, Jonathan Roberts, Liao Wu

arXiv_RO

arXiv_RO
Abstract

Tendon-driven snake-like arms have been used to create highly dexterous continuum robots so that they can bend around anatomical obstacles to access clinical targets. In this paper, we propose a design algorithm for developing patient-specific surgical continuum manipulators optimized for oriental dexterity constrained by task-space obstacles. The algorithm uses a sampling-based approach to finding the dexterity distribution in the workspace discretized by voxels. The oriental dexterity measured in the region of interest in the task-space formed a fitness function to be optimized through differential evolution. This was implemented in the design of a tendon-driven manipulator for knee arthroscopy. The results showed a feasible design that achieves significantly better dexterity than a rigid tool. This highlights the potential of the proposed method to be used in the process of designing dexterous surgical manipulators in the field.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02217

PDF

http://arxiv.org/pdf/1903.02217
Read All
RINS-W: Robust Inertial Navigation System on Wheels

2019-03-06

Martin Brossard (CAOR), Axel Barrau, Silvere Bonnabel (CAOR)

arXiv_RO

arXiv_RO Object_Detection Knowledge Deep_Learning Detection
Abstract

This paper proposes a real-time approach for long-term inertial navigation based only on an Inertial Measurement Unit (IMU) for self-localizing wheeled robots. The approach builds upon two components: 1) a robust detector that uses recurrent deep neural networks to dynamically detect a variety of situations of interest, such as zero velocity or no lateral slip; and 2) a state-of-the-art Kalman filter which incorporates this knowledge as pseudo-measurements for localization. Evaluations on a publicly available car dataset demonstrates that the proposed scheme may achieve a final precision of 20 m for a 21 km long trajectory of a vehicle driving for over an hour, equipped with an IMU of moderate precision (the gyro drift rate is 10 deg/h). To our knowledge, this is the first paper which combines sophisticated deep learning techniques with state-of-the-art filtering methods for pure inertial navigation on wheeled vehicles and as such opens up for novel data-driven inertial navigation techniques. Moreover, albeit taylored for IMU-only based localization, our method may be used as a component for self-localization of wheeled robots equipped with a more complete sensor suite.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02210

PDF

http://arxiv.org/pdf/1903.02210
Read All
Transfer feature generating networks with semantic classes structure for zero-shot learning

2019-03-06

Guangfeng Lin, Wanjun Chen, Kaiyang Liao, Xiaobing Kang, Caixia Fan

arXiv_CV

arXiv_CV Adversarial Classification Relation
Abstract

Suffering from the generating feature inconsistence of seen classes training model for following the distribution of unseen classes , most of existing feature generating networks difficultly obtain satisfactory performance for the challenging generalization zero-shot learning (GZSL) by adversarial learning the distribution of semantic classes. To alleviate the negative influence of this inconsistence for zero-shot learning (ZSL), transfer feature generating networks with semantic classes structure (TFGNSCS) is proposed to construct networks model for improving the performance of ZSL and GZSL. TFGNSCS can not only consider the semantic structure relationship between seen and unseen classes but also learn the difference of generating features by balancing transfer information between seen and unseen classes in networks. The proposed method can integrate a Wasserstein generative adversarial network with classification loss and transfer loss to generate enough CNN feature, on which softmax classifiers are trained for ZSL and GZSL. Experiments demonstrate that the performance of TFGNSCS outperforms that of the state of the arts on four challenging datasets, which are CUB,FLO,SUN, AWA in GZSL.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02204

PDF

http://arxiv.org/pdf/1903.02204
Read All
Real-world Underwater Enhancement: Challenges, Benchmarks, and Solutions

2019-03-06

Risheng Liu, Xin Fan, Ming Zhu, Minjun Hou, Zhongxuan Luo

arXiv_CV

arXiv_CV Object_Detection Image_Enhancement Classification Detection
Abstract

Underwater image enhancement is such an important low-level vision task with many applications that numerous algorithms have been proposed in recent years. These algorithms developed upon various assumptions demonstrate successes from various aspects using different data sets and different metrics. In this work, we setup an undersea image capturing system, and construct a large-scale Real-world Underwater Image Enhancement (RUIE) data set divided into three subsets. The three subsets target at three challenging aspects for enhancement, i.e., image visibility quality, color casts, and higher-level detection/classification, respectively. We conduct extensive and systematic experiments on RUIE to evaluate the effectiveness and limitations of various algorithms to enhance visibility and correct color casts on images with hierarchical categories of degradation. Moreover, underwater image enhancement in practice usually serves as a preprocessing step for mid-level and high-level vision tasks. We thus exploit the object detection performance on enhanced images as a brand new task-specific evaluation criterion. The findings from these evaluations not only confirm what is commonly believed, but also suggest promising solutions and new directions for visibility enhancement, color correction, and object detection on real-world underwater images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05320

PDF

http://arxiv.org/pdf/1901.05320
Read All
Towards Better Human Robot Collaboration with Robust Plan Recognition and Trajectory Prediction

2019-03-06

Yujiao Cheng, Liting Sun, Masayoshi Tomizuka

arXiv_RO

arXiv_RO Prediction Relation Recognition
Abstract

Human robot collaboration (HRC) is becoming increasingly important as the paradigm of manufacturing is shifting from mass production to mass customization. The introduction of HRC can significantly improve the flexibility and intelligence of automation. However, due to the stochastic and time-varying nature of human collaborators, it is challenging for the robot to efficiently and accurately identify the plan of human and respond in a safe manner. To address this challenge, we propose an integrated human robot collaboration framework in this paper which includes both plan recognition and trajectory prediction. Such a framework enables the robot to perceive, predict and adapt their actions to human’s plan and intelligently avoid collision with human based on the predicted human trajectory. Moreover, by explicitly leveraging the hierarchical relationship between the plan and trajectories, more robust plan recognition performance can be achieved. Experiments are conducted on an industrial robot to verify the proposed framework.which shows that our proposed framework can not only assures safe HRC, but also improve the time efficiency of the HRC team, and the plan recognition module is not sensitive to noises.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02199

PDF

http://arxiv.org/pdf/1903.02199
Read All
Deep Transfer Learning for Multiple Class Novelty Detection

2019-03-06

Pramuditha Perera, Vishal M. Patel

arXiv_CV

arXiv_CV Knowledge Transfer_Learning Classification Detection
Abstract

We propose a transfer learning-based solution for the problem of multiple class novelty detection. In particular, we propose an end-to-end deep-learning based approach in which we investigate how the knowledge contained in an external, out-of-distributional dataset can be used to improve the performance of a deep network for visual novelty detection. Our solution differs from the standard deep classification networks on two accounts. First, we use a novel loss function, membership loss, in addition to the classical cross-entropy loss for training networks. Secondly, we use the knowledge from the external dataset more effectively to learn globally negative filters, filters that respond to generic objects outside the known class set. We show that thresholding the maximal activation of the proposed network can be used to identify novel objects effectively. Extensive experiments on four publicly available novelty detection datasets show that the proposed method achieves significant improvements over the state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02196

PDF

http://arxiv.org/pdf/1903.02196
Read All
Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks

2019-03-06

Qin Zou, Hanwen Jiang, Qiyu Dai, Yuanhao Yue, Long Chen, Qian Wang

arXiv_CV

arXiv_CV CNN RNN Prediction Detection
Abstract

Lane detection in driving scenes is an important module for autonomous vehicles and advanced driver assistance systems. In recent years, many sophisticated lane detection methods have been proposed. However, most methods focus on detecting the lane from one single image, and often lead to unsatisfactory performance in handling some extremely-bad situations such as heavy shadow, severe mark degradation, serious vehicle occlusion, and so on. In fact, lanes are continuous line structures on the road. Consequently, the lane that cannot be accurately detected in one current frame may potentially be inferred out by incorporating information of previous frames. To this end, we investigate lane detection by using multiple frames of a continuous driving scene, and propose a hybrid deep architecture by combining the convolutional neural network (CNN) and the recurrent neural network (RNN). Specifically, information of each frame is abstracted by a CNN block, and the CNN features of multiple continuous frames, holding the property of time-series, are then fed into the RNN block for feature learning and lane prediction. Extensive experiments on two large-scale datasets demonstrate that, the proposed method outperforms the competing methods in lane detection, especially in handling difficult situations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02193

PDF

http://arxiv.org/pdf/1903.02193
Read All
Real-Time Monocular Object-Model Aware Sparse SLAM

2019-03-06

Mehdi Hosseinzadeh, Kejie Li, Yasir Latif, Ian Reid

arXiv_CV

arXiv_CV Object_Detection Sparse Detection SLAM
Abstract

Simultaneous Localization And Mapping (SLAM) is a fundamental problem in mobile robotics. While sparse point-based SLAM methods provide accurate camera localization, the generated maps lack semantic information. On the other hand, state of the art object detection methods provide rich information about entities present in the scene from a single image. This work incorporates a real-time deep-learned object detector to the monocular SLAM framework for representing generic objects as quadrics that permit detections to be seamlessly integrated while allowing the real-time performance. Finer reconstruction of an object, learned by a CNN network, is also incorporated and provides a shape prior for the quadric leading further refinement. To capture the dominant structure of the scene, additional planar landmarks are detected by a CNN-based plane detector and modeled as independent landmarks in the map. Extensive experiments support our proposed inclusion of semantic objects and planar structures directly in the bundle-adjustment of SLAM - Semantic SLAM - that enriches the reconstructed map semantically, while significantly improving the camera localization. The performance of our SLAM system is demonstrated in https://youtu.be/UMWXd4sHONw and https://youtu.be/QPQqVrvP0dE .

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.09149

PDF

http://arxiv.org/pdf/1809.09149
Read All
Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases

2019-03-06

Yu Chen, Lingfei Wu, Mohammed J. Zaki

arXiv_CL

arXiv_CL Knowledge QA Attention Embedding Relation Memory_Networks
Abstract

When answering natural language questions over knowledge bases (KB), different question components and KB aspects play different roles. However, most existing embedding-based methods for knowledge base question answering (KBQA) ignore the subtle inter-relationships between the question and the KB (e.g., entity types, relation paths and context). In this work, we propose to directly model the two-way flow of interactions between the questions and the underlying KB via a novel two-layered bidirectional attention network, called BAMnet. Without requiring any external resources or hand-crafted features, on the WebQuestions benchmark, our method significantly outperforms existing information-retrieval based methods, and remains competitive with (hand-crafted) semantic parsing based methods. Also, since we use attention mechanisms, our method offers better interpretability compared to other baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02188

PDF

http://arxiv.org/pdf/1903.02188
Read All
Synthesizing Chemical Plant Operation Procedures using Knowledge, Dynamic Simulation and Deep Reinforcement Learning

2019-03-06

Shumpei Kubosawa, Takashi Onishi, Yoshimasa Tsuruoka

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

Chemical plants are complex and dynamical systems consisting of many components for manipulation and sensing, whose state transitions depend on various factors such as time, disturbance, and operation procedures. For the purpose of supporting human operators of chemical plants, we are developing an AI system that can semi-automatically synthesize operation procedures for efficient and stable operation. Our system can provide not only appropriate operation procedures but also reasons why the procedures are considered to be valid. This is achieved by integrating automated reasoning and deep reinforcement learning technologies with a chemical plant simulator and external knowledge. Our preliminary experimental results demonstrate that it can synthesize a procedure that achieves a much faster recovery from a malfunction compared to standard PID control.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02183

PDF

http://arxiv.org/pdf/1903.02183
Read All
Representative Task Self-selection for Flexible Clustered Lifelong Learning

2019-03-06

Gan Sun, Yang Cong, Qianqian Wang, Bineng Zhong, Yun Fu

arXiv_AI

arXiv_AI Knowledge Optimization
Abstract

Consider the lifelong learning paradigm whose objective is to learn a sequence of tasks depending on previous experiences, e.g., knowledge library or deep network weights. However, the knowledge libraries or deep networks for most recent lifelong learning models are with prescribed size, and can degenerate the performance for both learned tasks and coming ones when facing with a new task environment (cluster). To address this challenge, we propose a novel incremental clustered lifelong learning framework with two knowledge libraries: feature learning library and model knowledge library, called Flexible Clustered Lifelong Learning (FCL3). Specifically, the feature learning library modeled by an autoencoder architecture maintains a set of representation common across all the observed tasks, and the model knowledge library can be self-selected by identifying and adding new representative models (clusters). When a new task arrives, our proposed FCL3 model firstly transfers knowledge from these libraries to encode the new task, i.e., effectively and selectively soft-assigning this new task to multiple representative models over feature learning library. Then, 1) the new task with a higher outlier probability will then be judged as a new representative, and used to redefine both feature learning library and representative models over time; or 2) the new task with lower outlier probability will only refine the feature learning library. For model optimization, we cast this lifelong learning problem as an alternating direction minimization problem as a new task comes. Finally, we evaluate the proposed framework by analyzing several multi-task datasets, and the experimental results demonstrate that our FCL3 model can achieve better performance than most lifelong learning frameworks, even batch clustered multi-task learning models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02173

PDF

http://arxiv.org/pdf/1903.02173
Read All
AAAI-2019 Workshop on Games and Simulations for Artificial Intelligence

2019-03-06

Marwan Mattar, Roozbeh Mottaghi, Julian Togelius, Danny Lange

arXiv_AI

arXiv_AI
Abstract

This volume represents the accepted submissions from the AAAI-2019 Workshop on Games and Simulations for Artificial Intelligence held on January 29, 2019 in Honolulu, Hawaii, USA. https://www.gamesim.ai

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02172

PDF

http://arxiv.org/html/1903.02172
Read All
Autonomy, Authenticity, Authorship and Intention in computer generated art

2019-03-06

Jon McCormack, Toby Gifford, Patrick Hutchings

arXiv_AI

arXiv_AI Adversarial GAN Deep_Learning
Abstract

This paper examines five key questions surrounding computer generated art. Driven by the recent public auction of a work of `AI Art’ we selectively summarise many decades of research and commentary around topics of autonomy, authenticity, authorship and intention in computer generated art, and use this research to answer contemporary questions often asked about art made by computers that concern these topics. We additionally reflect on whether current techniques in deep learning and Generative Adversarial Networks significantly change the answers provided by many decades of prior research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02166

PDF

http://arxiv.org/pdf/1903.02166
Read All
Camera Obscurer: Generative Art for Design Inspiration

2019-03-06

Dilpreet Singh, Nina Rajcic, Simon Colton, Jon McCormack

arXiv_CV

arXiv_CV Image_Retrieval
Abstract

We investigate using generated decorative art as a source of inspiration for design tasks. Using a visual similarity search for image retrieval, the \emph{Camera Obscurer} app enables rapid searching of tens of thousands of generated abstract images of various types. The seed for a visual similarity search is a given image, and the retrieved generated images share some visual similarity with the seed. Implemented in a hand-held device, the app empowers users to use photos of their surroundings to search through the archive of generated images and other image archives. Being abstract in nature, the retrieved images supplement the seed image rather than replace it, providing different visual stimuli including shapes, colours, textures and juxtapositions, in addition to affording their own interpretations. This approach can therefore be used to provide inspiration for a design task, with the abstract images suggesting new ideas that might give direction to a graphic design project. We describe a crowdsourcing experiment with the app to estimate user confidence in retrieved images, and we describe a pilot study where Camera Obscurer provided inspiration for a design task. These experiments have enabled us to describe future improvements, and to begin to understand sources of visual inspiration for design tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02165

PDF

http://arxiv.org/pdf/1903.02165
Read All
SNU_IDS at SemEval-2019 Task 3: Addressing Training-Test Class Distribution Mismatch in Conversational Classification

2019-03-06

Sanghwan Bae, Jihun Choi, Sang-goo Lee

arXiv_AI

arXiv_AI Classification Prediction Detection
Abstract

We present several techniques to tackle the mismatch in class distributions between training and test data in the Contextual Emotion Detection task of SemEval 2019, by extending the existing methods for class imbalance problem. Reducing the distance between the distribution of prediction and ground truth, they consistently show positive effects on the performance. Also we propose a novel neural architecture which utilizes representation of overall context as well as of each utterance. The combination of the methods and the models achieved micro F1 score of about 0.766 on the final evaluation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02163

PDF

http://arxiv.org/pdf/1903.02163
Read All
Persona-Aware Tips Generation

2019-03-06

Piji Li, Zihao Wang, Lidong Bing, Wai Lam

arXiv_AI

arXiv_AI Sentiment Review Adversarial Attention Embedding RNN Prediction
Abstract

Tips, as a compacted and concise form of reviews, were paid less attention by researchers. In this paper, we investigate the task of tips generation by considering the `persona’ information which captures the intrinsic language style of the users or the different characteristics of the product items. In order to exploit the persona information, we propose a framework based on adversarial variational auto-encoders (aVAE) for persona modeling from the historical tips and reviews of users and items. The latent variables from aVAE are regarded as persona embeddings. Besides representing persona using the latent embeddings, we design a persona memory for storing the persona related words for users and items. Pointer Network is used to retrieve persona wordings from the memory when generating tips. Moreover, the persona embeddings are used as latent factors by a rating prediction component to predict the sentiment of a user over an item. Finally, the persona embeddings and the sentiment information are incorporated into a recurrent neural networks based tips generation component. Extensive experimental results are reported and discussed to elaborate the peculiarities of our framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02156

PDF

http://arxiv.org/pdf/1903.02156
Read All
Semantic Adversarial Network with Multi-scale Pyramid Attention for Video Classification

2019-03-06

De Xie, Cheng Deng, Hao Wang, Chao Li, Dapeng Tao

arXiv_CV

arXiv_CV Adversarial Attention CNN Video_Classification Classification
Abstract

Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatio-temporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow to model temporal information, which are often expensive to compute and store. Second, it has limited ability to capture details and local context information for video data. Third, it lacks explicit semantic guidance that greatly decrease the classification performance. In this paper, we proposed a new two-stream based deep framework for video classification to discover spatial and temporal information only from RGB frames, moreover, the multi-scale pyramid attention (MPA) layer and the semantic adversarial learning (SAL) module is introduced and integrated in our framework. The MPA enables the network capturing global and local feature to generate a comprehensive representation for video, and the SAL can make this representation gradually approximate to the real video semantics in an adversarial manner. Experimental results on two public benchmarks demonstrate our proposed methods achieves state-of-the-art results on standard video datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02155

PDF

http://arxiv.org/pdf/1903.02155
Read All
Safeguarded Dynamic Label Regression for Generalized Noisy Supervision

2019-03-06

Jiangchao Yao, Ya Zhang, Ivor W. Tsang, Jun Sun

arXiv_CV

arXiv_CV
Abstract

Learning with noisy labels, which aims to reduce expensive labors on accurate annotations, has become imperative in the Big Data era. Previous noise transition based method has achieved promising results and presented a theoretical guarantee on performance in the case of class-conditional noise. However, this type of approaches critically depend on an accurate pre-estimation of the noise transition, which is usually impractical. Subsequent improvement adapts the pre-estimation along with the training progress via a Softmax layer. However, the parameters in the Softmax layer are highly tweaked for the fragile performance due to the ill-posed stochastic approximation. To address these issues, we propose a Latent Class-Conditional Noise model (LCCN) that naturally embeds the noise transition under a Bayesian framework. By projecting the noise transition into a Dirichlet-distributed space, the learning is constrained on a simplex based on the whole dataset, instead of some ad-hoc parametric space. We then deduce a dynamic label regression method for LCCN to iteratively infer the latent labels, to stochastically train the classifier and to model the noise. Our approach safeguards the bounded update of the noise transition, which avoids previous arbitrarily tuning via a batch of samples. We further generalize LCCN for open-set noisy labels and the semi-supervised setting. We perform extensive experiments with the controllable noise data sets, CIFAR-10 and CIFAR-100, and the agnostic noise data sets, Clothing1M and WebVision17. The experimental results have demonstrated that the proposed model outperforms several state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02152

PDF

http://arxiv.org/pdf/1903.02152
Read All
FIESTA: Fast Incremental Euclidean Distance Fields for Online Motion Planning of Aerial Robots

2019-03-06

Luxin Han, Fei Gao, Boyu Zhou, Shaojie Shen

arXiv_RO

arXiv_RO
Abstract

Euclidean Signed Distance Field (ESDF) is useful for online motion planning of aerial robots since it can easily query the distance and gradient information against obstacles. Fast incrementally built ESDF map is the bottleneck for conducting real-time motion planning. In this paper, we investigate this problem and propose a mapping system called FIESTA to build global ESDF map incrementally. By introducing two independent updating queues for inserting and deleting obstacles separately, and using Indexing Data Structures and Doubly Linked Lists for map maintenance, our algorithm updates as few as possible nodes using a BFS framework. Our ESDF map has high computational performance and produces near-optimal results. We show our method outperforms other up-to-date methods in term of performance and accuracy by both theory and experiments. We integrate FIESTA into a completed quadrotor system and validate it by both simulation and onboard experiments. We release our method as open-source software for the community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02144

PDF

http://arxiv.org/pdf/1903.02144
Read All
Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

2019-03-06

Hui Jiang

arXiv_AI

arXiv_AI Optimization Gradient_Descent
Abstract

In this paper, we present some theoretical work to explain why simple gradient descent methods are so successful in solving non-convex optimization problems in learning large-scale neural networks (NN). After introducing a mathematical tool called canonical space, we have proved that the objective functions in learning NNs are convex in the canonical model space. We further elucidate that the gradients between the original NN model space and the canonical space are related by a pointwise linear transformation, which is represented by the so-called disparity matrix. Furthermore, we have proved that gradient descent methods surely converge to a global minimum of zero loss provided that the disparity matrices maintain full rank. If this full-rank condition holds, the learning of NNs behaves in the same way as normal convex optimization. At last, we have shown that the chance to have singular disparity matrices is extremely slim in large NNs. In particular, when over-parameterized NNs are randomly initialized, the gradient decent algorithms converge to a global minimum of zero loss in probability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02140

PDF

http://arxiv.org/pdf/1903.02140
Read All

131/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL