Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Recognition of Multiple Food Items in a Single Photo for Use in a Buffet-Style Restaurant

2019-03-03

Masashi Anzawa, Sosuke Amano, Yoko Yamakata, Keiko Motonaga, Akiko Kamei, Kiyoharu Aizawa

arXiv_CV

arXiv_CV Recognition
Abstract

We investigate image recognition of multiple food items in a single photo, focusing on a buffet restaurant application, where menu changes at every meal, and only a few images per class are available. After detecting food areas, we perform hierarchical recognition. We evaluate our results, comparing to two baseline methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00858

PDF

http://arxiv.org/pdf/1903.00858
Read All
CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery

2019-03-03

Gongjie Zhang, Shijian Lu, Wei Zhang

arXiv_CV

arXiv_CV Object_Detection Sparse Attention Detection Relation
Abstract

Accurate and robust detection of multi-class objects in optical remote sensing images is essential to many real-world applications such as urban planning, traffic control, searching and rescuing, etc. However, state-of-the-art object detection techniques designed for images captured using ground-level sensors usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in term of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. This paper presents a novel object detection network (CAD-Net) that exploits attention-modulated features as well as global and local contexts to address the new challenges in detecting objects from remote sensing images. The proposed CAD-Net learns global and local contexts of objects by capturing their correlations with the global scene (at scene-level) and the local neighboring objects or features (at object-level), respectively. In addition, it designs a spatial-and-scale-aware attention module that guides the network to focus on more informative regions and features as well as more appropriate feature scales. Experiments over two publicly available object detection datasets for remote sensing images demonstrate that the proposed CAD-Net achieves superior detection performance. The implementation codes will be made publicly available for facilitating future researches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00857

PDF

http://arxiv.org/pdf/1903.00857
Read All
Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network

2019-03-03

Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, Ling Shao

arXiv_CV

arXiv_CV Relation
Abstract

Crowd counting has recently attracted increasing interest in computer vision but remains a challenging problem. In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps. The major contributions are four-fold. First, we develop a new trellis architecture that incorporates multiple decoding paths to hierarchically aggregate features at different encoding stages, which can handle large variations of objects. Second, we design dense skip connections interleaved across paths to facilitate sufficient multi-scale feature fusions and to absorb the supervision information. Third, we propose a new combinatorial loss to enforce local coherence and spatial correlation in density maps. By distributedly imposing this combinatorial loss on intermediate outputs, gradient vanishing can be largely alleviated for better back-propagation and faster convergence. Finally, our TEDnet achieves new state-of-the art performance on four benchmarks, with an improvement up to 14% in terms of MAE.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00853

PDF

http://arxiv.org/pdf/1903.00853
Read All
Predicting Vehicle Behaviors Over An Extended Horizon Using Behavior Interaction Network

2019-03-03

Wenchao Ding, Jing Chen, Shaojie Shen

arXiv_RO

arXiv_RO RNN Prediction Detection Recognition
Abstract

Anticipating possible behaviors of traffic participants is an essential capability of autonomous vehicles. Many behavior detection and maneuver recognition methods only have a very limited prediction horizon that leaves inadequate time and space for planning. To avoid unsatisfactory reactive decisions, it is essential to count long-term future rewards in planning, which requires extending the prediction horizon. In this paper, we uncover that clues to vehicle behaviors over an extended horizon can be found in vehicle interaction, which makes it possible to anticipate the likelihood of a certain behavior, even in the absence of any clear maneuver pattern. We adopt a recurrent neural network (RNN) for observation encoding, and based on that, we propose a novel vehicle behavior interaction network (VBIN) to capture the vehicle interaction from the hidden states and connection feature of each interaction pair. The output of our method is a probabilistic likelihood of multiple behavior classes, which matches the multimodal and uncertain nature of the distant future. A systematic comparison of our method against two state-of-the-art methods and another two baseline methods on a publicly available real highway dataset is provided, showing that our method has superior accuracy and advanced capability for interaction modeling.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00848

PDF

http://arxiv.org/pdf/1903.00848
Read All
Online Vehicle Trajectory Prediction using Policy Anticipation Network and Optimization-based Context Reasoning

2019-03-03

Wenchao Ding, Shaojie Shen

arXiv_RO

arXiv_RO Optimization RNN Prediction
Abstract

In this paper, we present an online two-level vehicle trajectory prediction framework for urban autonomous driving where there are complex contextual factors, such as lane geometries, road constructions, traffic regulations and moving agents. Our method combines high-level policy anticipation with low-level context reasoning. We leverage a long short-term memory (LSTM) network to anticipate the vehicle’s driving policy (e.g., forward, yield, turn left, turn right, etc.) using its sequential history observations. The policy is then used to guide a low-level optimization-based context reasoning process. We show that it is essential to incorporate the prior policy anticipation due to the multimodal nature of the future trajectory. Moreover, contrary to existing regression-based trajectory prediction methods, our optimization-based reasoning process can cope with complex contextual factors. The final output of the two-level reasoning process is a continuous trajectory that automatically adapts to different traffic configurations and accurately predicts future vehicle motions. The performance of the proposed framework is analyzed and validated in an emerging autonomous driving simulation platform (CARLA).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00847

PDF

http://arxiv.org/pdf/1903.00847
Read All
Variational Auto-Decoder: Neural Generative Modeling from Partial Data

2019-03-03

Amir Zadeh, Yao-Chong Lim, Paul Pu Liang, Louis-Philippe Morency

arXiv_AI

arXiv_AI Optimization Inference
Abstract

Learning a generative model from partial data (data with missingness) is a challenging area of machine learning research. We study a specific implementation of the Auto-Encoding Variational Bayes (AVEB) algorithm, named in this paper as a Variational Auto-Decoder (VAD). VAD is a generic framework which uses Variational Bayes and Markov Chain Monte Carlo (MCMC) methods to learn a generative model from partial data. The main distinction between VAD and Variational Auto-Encoder (VAE) is the encoder component, as VAD does not have one. Using a proposed efficient inference method from a multivariate Gaussian approximate posterior, VAD models allow inference to be performed via simple gradient ascent rather than MCMC sampling from a probabilistic decoder. This technique reduces the inference computational cost, allows for using more complex optimization techniques during latent space inference (which are shown to be crucial due to a high degree of freedom in the VAD latent space), and keeps the framework simple to implement. Through extensive experiments over several datasets and different missing ratios, we show that encoders cannot efficiently marginalize the input volatility caused by imputed missing values. We study multimodal datasets in this paper, which is a particular area of impact for VAD models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00840

PDF

http://arxiv.org/pdf/1903.00840
Read All
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

2019-03-03

Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li

arXiv_CV

arXiv_CV Attention
Abstract

Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions. Although the attention mechanism has been successfully applied for cross-modal alignments, previous attention models focus on only the most dominant features of both modalities, and neglect the fact that there could be multiple comprehensive textual-visual correspondences between images and referring expressions. To tackle this issue, we design a novel cross-modal attention-guided erasing approach, where we discard the most dominant information from either textual or visual domains to generate difficult training samples online, and to drive the model to discover complementary textual-visual correspondences. Extensive experiments demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance on three referring expression grounding datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00839

PDF

http://arxiv.org/pdf/1903.00839
Read All
Image Super-Resolution by Neural Texture Transfer

2019-03-03

Zhifei Zhang, Zhaowen Wang, Zhe Lin, Hairong Qi

arXiv_CV

arXiv_CV Super_Resolution Quantitative
Abstract

Due to the significant information loss in low-resolution (LR) images, it has become extremely challenging to further advance the state-of-the-art of single image super-resolution (SISR). Reference-based super-resolution (RefSR), on the other hand, has proven to be promising in recovering high-resolution (HR) details when a reference (Ref) image with similar content as that of the LR input is given. However, the quality of RefSR can degrade severely when Ref is less similar. This paper aims to unleash the potential of RefSR by leveraging more texture details from Ref images with stronger robustness even when irrelevant Ref images are provided. Inspired by the recent work on image stylization, we formulate the RefSR problem as neural texture transfer. We design an end-to-end deep model which enriches HR details by adaptively transferring the texture from Ref images according to their textural similarity. Instead of matching content in the raw pixel space as done by previous methods, our key contribution is a multi-level matching conducted in the neural space. This matching scheme facilitates multi-scale neural transfer that allows the model to benefit more from those semantically related Ref patches, and gracefully degrade to SISR performance on the least relevant Ref inputs. We build a benchmark dataset for the general research of RefSR, which contains Ref images paired with LR inputs with varying levels of similarity. Both quantitative and qualitative evaluations demonstrate the superiority of our method over state-of-the-art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00834

PDF

http://arxiv.org/pdf/1903.00834
Read All
Pancreas Segmentation via Spatial Context based U-net and Bidirectional LSTM

2019-03-03

Hao Li, Jun Li, Xiaozhu Lin, Xiaohua Qian

arXiv_CV

arXiv_CV Regularization Segmentation CNN RNN Deep_Learning
Abstract

Pancreas is characterized by small size and irregular shape, so achieving accurate pancreas segmentation is challenging. Traditional 2D pancreas segmentation network based on the independent 2D image slices, which often leads to spatial discontinuity problem. Therefore, how to utility spatial context information is the key point to improve the segmentation quality. In this paper, we proposed a divide-and-conquer strategy, divided the abdominal CT scans into several isometric blocks. And we designed a multiple channels convolutional neural network to learn the local spatial context characteristics from blocks called SCU-Net. SCU-Net is a partial 3D segmentation idea, which transforms overall pancreas segmentation into a combination of multiple local segmentation results. In order to improve the segmentation accuracy for each layer, we also proposed a new loss function for inter-slice constrain and regularization. Thereafter, we introduced the BiCLSTM network for stimulating the interaction between bidirectional segmentation sequence, thus making up the boundary defect and fault problem of the segmentation results. We trained SCU-Net+BiLSTM network respectively, and evaluated segmentation result on the NIH data set. Keywords: Pancreas Segmentation, Convolutional Neural Networks, Recurrent Neural Networks, Deep Learning, Inter-slice Regularization

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00832

PDF

http://arxiv.org/pdf/1903.00832
Read All
Predicting Algorithm Classes for Programming Word Problems

2019-03-03

vinayak athavale, aayush naik, rajas vanjape, manish shrivastava

arXiv_AI

arXiv_AI Knowledge Text_Classification Classification Prediction
Abstract

We introduce the task of algorithm class prediction for programming word problems. A programming word problem is a problem written in natural language, which can be solved using an algorithm or a program. We define classes of various programming word problems which correspond to the class of algorithms required to solve the problem. We present four new datasets for this task, two multiclass datasets with 550 and 1159 problems each and two multilabel datasets having 3737 and 3960 problems each. We pose the problem as a text classification problem and train neural network and non-neural network-based models on this task. Our best performing classifier gets an accuracy of 62.7 percent for the multiclass case on the five class classification dataset, Codeforces Multiclass-5 (CFMC5). We also do some human-level analysis and compare human performance with that of our text classification models. Our best classifier has an accuracy only 9 percent lower than that of a human on this task. To the best of our knowledge, these are the first reported results on such a task. We make our code and datasets publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00830

PDF

http://arxiv.org/pdf/1903.00830
Read All
On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network

2019-03-03

Yuanhang Su, C.-C. Jay Kuo

arXiv_CV

arXiv_CV Attention CNN RNN Prediction
Abstract

In this work, we first analyze the memory behavior in three recurrent neural networks (RNN) cells; namely, the simple RNN (SRN), the long short-term memory (LSTM) and the gated recurrent unit (GRU), where the memory is defined as a function that maps previous elements in a sequence to the current output. Our study shows that all three of them suffer rapid memory decay. Then, to alleviate this effect, we introduce trainable scaling factors that act like an attention mechanism to adjust memory decay adaptively. The new design is called the extended LSTM (ELSTM). Finally, to design a system that is robust to previous erroneous predictions, we propose a dependent bidirectional recurrent neural network (DBRNN). Extensive experiments are conducted on different language tasks to demonstrate the superiority of the proposed ELSTM and DBRNN solutions. The ELTSM has achieved up to 30% increase in the labeled attachment score (LAS) as compared to LSTM and GRU in the dependency parsing (DP) task. Our models also outperform other state-of-the-art models such as bi-attention and convolutional sequence to sequence (convseq2seq) by close to 10% in the LAS.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1803.01686

PDF

https://arxiv.org/pdf/1803.01686
Read All
End-to-end Driving Deploying through Uncertainty-Aware Imitation Learning and Stochastic Visual Domain Adaptation

2019-03-03

Lei Tai, Peng Yun, Yuying Chen, Congcong Liu, Haoyang Ye, Ming Liu

arXiv_AI

arXiv_AI
Abstract

End-to-end visual-based imitation learning has been widely applied in autonomous driving. When deploying the trained visual-based driving policy, a deterministic command is usually directly applied without considering the uncertainty of the input data. Such kind of policies may bring dramatical damage when applied in the real world. In this paper, we follow the recent real-to-sim pipeline by translating the testing world image back to the training domain when using the trained policy. In the translating process, a stochastic generator is used to generate various images stylized under the training domain randomly or directionally. Based on those translated images, the trained uncertainty-aware imitation learning policy would output both the predicted action and the data uncertainty motivated by the aleatoric loss function. Through the uncertainty-aware imitation learning policy, we can easily choose the safest one with the lowest uncertainty among the generated images. Experiments in the Carla navigation benchmark show that our strategy outperforms previous methods, especially in dynamic environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00821

PDF

http://arxiv.org/pdf/1903.00821
Read All
Robot-to-Robot Relative Pose Estimation using Humans as Markers

2019-03-03

Md Jahidul Islam, Jiawei Mo, Junaed Sattar

arXiv_RO

arXiv_RO Re-identification Object_Detection Pose_Estimation Person_Re-identification Optimization Detection
Abstract

In this paper, we propose a method to determine the 3D relative pose of pairs of communicating robots by using human pose-based key-points as correspondences. We adopt a `leader-follower’ framework where the leader robot detects and triangulates the key-points in its own frame of reference. Afterwards, the follower robots match the corresponding 2D projections on their respective calibrated cameras and find their relative poses by solving the perspective-n-point (PnP) problem. In the proposed method, we use the state-of-the-art pose detector named OpenPose for extracting the pose-based key-points pertaining to humans in the scene. Additionally, we design an efficient model for person re-identification and present an iterative optimization algorithm to refine the key-point correspondences based on their local structural similarities in the image space. We evaluate the performance of the proposed relative pose estimation method through a number of experiments conducted in terrestrial and underwater environments. Finally, we discuss the relevant operational challenges of this approach and analyze its feasibility for multi-robot cooperative systems in human-dominated social settings and in feature-deprived environments such as underwater.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00820

PDF

http://arxiv.org/pdf/1903.00820
Read All
On Computation and Generalization of GANs with Spectrum Control

2019-03-03

Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao

arXiv_CV

arXiv_CV Adversarial GAN Image_Generation
Abstract

Generative Adversarial Networks (GANs), though powerful, is hard to train. Several recent works (brock2016neural,miyato2018spectral) suggest that controlling the spectra of weight matrices in the discriminator can significantly improve the training of GANs. Motivated by their discovery, we propose a new framework for training GANs, which allows more flexible spectrum control (e.g., making the weight matrices of the discriminator have slow singular value decays). Specifically, we propose a new reparameterization approach for the weight matrices of the discriminator in GANs, which allows us to directly manipulate the spectra of the weight matrices through various regularizers and constraints, without intensively computing singular value decompositions. Theoretically, we further show that the spectrum control improves the generalization ability of GANs. Our experiments on CIFAR-10, STL-10, and ImageNet datasets confirm that compared to other methods, our proposed method is capable of generating images with competitive quality by utilizing spectral normalization and encouraging the slow singular value decay.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.10912

PDF

https://arxiv.org/pdf/1812.10912
Read All
Detecting Invasive Insects with Unmanned Aerial Vehicles

2019-03-03

Brian Stumph, Miguel Hernandez Virto, Henry Medeiros, Amy Tabb, Scott Wolford, Kevin Rice, Tracy Leskey

arXiv_RO

arXiv_RO Knowledge
Abstract

A key aspect to controlling and reducing the effects invasive insect species have on agriculture is to obtain knowledge about the migration patterns of these species. Current state-of-the-art methods of studying these migration patterns involve a mark-release-recapture technique, in which insects are released after being marked and researchers attempt to recapture them later. However, this approach involves a human researcher manually searching for these insects in large fields and results in very low recapture rates. In this paper, we propose an automated system for detecting released insects using an unmanned aerial vehicle. This system utilizes ultraviolet lighting technology, digital cameras, and lightweight computer vision algorithms to more quickly and accurately detect insects compared to the current state of the art. The efficiency and accuracy that this system provides will allow for a more comprehensive understanding of invasive insect species migration patterns. Our experimental results demonstrate that our system can detect real target insects in field conditions with high precision and recall rates.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00815

PDF

http://arxiv.org/pdf/1903.00815
Read All
3D Hand Shape and Pose Estimation from a Single RGB Image

2019-03-03

Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan

arXiv_CV

arXiv_CV Face Pose_Estimation CNN
Abstract

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00812

PDF

http://arxiv.org/pdf/1903.00812
Read All
Autonomous Tissue Manipulation via Surgical Robot Using Learning Based Model Predictive Control

2019-03-03

Changyeob Shin, Peter Walker Ferguson, Sahba Aghajani Pedram, Ji Ma, Erik P. Dutson, Jacob Rosen

arXiv_RO

arXiv_RO Reinforcement_Learning
Abstract

Tissue manipulation is a frequently used fundamental subtask of any surgical procedures, and in some cases it may require the involvement of a surgeon’s assistant. The complex dynamics of soft tissue as an unstructured environment is one of the main challenges in any attempt to automate the manipulation of it via a surgical robotic system. Two AI learning based model predictive control algorithms using vision strategies are proposed and studied: (1) reinforcement learning and (2) learning from demonstration. Comparison of the performance of these AI algorithms in a simulation setting indicated that the learning from demonstration algorithm can boost the learning policy by initializing the predicted dynamics with given demonstrations. Furthermore, the learning from demonstration algorithm is implemented on a Raven IV surgical robotic system and successfully demonstrated feasibility of the proposed algorithm using an experimental approach. This study is part of a profound vision in which the role of a surgeon will be redefined as a pure decision maker whereas the vast majority of the manipulation will be conducted autonomously by a surgical robotic system. A supplementary video can be found at: this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01459

PDF

http://arxiv.org/pdf/1902.01459
Read All
Segmenting Unknown 3D Objects from Real Depth Images using Mask R-CNN Trained on Synthetic Data

2019-03-03

Michael Danielczuk, Matthew Matl, Saurabh Gupta, Andrew Li, Andrew Lee, Jeffrey Mahler, Ken Goldberg

arXiv_CV

arXiv_CV Segmentation Tracking Object_Tracking
Abstract

The ability to segment unknown objects in depth images has potential to enhance robot skills in grasping and object tracking. Recent computer vision research has demonstrated that Mask R-CNN can be trained to segment specific categories of objects in RGB images when massive hand-labeled datasets are available. As generating these datasets is time consuming, we instead train with synthetic depth images. Many robots now use depth sensors, and recent results suggest training on synthetic depth data can transfer successfully to the real world. We present a method for automated dataset generation and rapidly generate a synthetic training dataset of 50,000 depth images and 320,000 object masks using simulated heaps of 3D CAD models. We train a variant of Mask R-CNN with domain randomization on the generated dataset to perform category-agnostic instance segmentation without any hand-labeled data and we evaluate the trained network, which we refer to as Synthetic Depth (SD) Mask R-CNN, on a set of real, high-resolution depth images of challenging, densely-cluttered bins containing objects with highly-varied geometry. SD Mask R-CNN outperforms point cloud clustering baselines by an absolute 15% in Average Precision and 20% in Average Recall on COCO benchmarks, and achieves performance levels similar to a Mask R-CNN trained on a massive, hand-labeled RGB dataset and fine-tuned on real images from the experimental setup. We deploy the model in an instance-specific grasping pipeline to demonstrate its usefulness in a robotics application. Code, the synthetic training dataset, and supplementary material are available at https://bit.ly/2letCuE.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.05825

PDF

http://arxiv.org/pdf/1809.05825
Read All
Calibration of Encoder Decoder Models for Neural Machine Translation

2019-03-03

Aviral Kumar, Sunita Sarawagi

arXiv_CL

arXiv_CL Attention NMT Inference Prediction
Abstract

We study the calibration of several state of the art neural machine translation(NMT) systems built on attention-based encoder-decoder models. For structured outputs like in NMT, calibration is important not just for reliable confidence with predictions, but also for proper functioning of beam-search inference. We show that most modern NMT models are surprisingly miscalibrated even when conditioned on the true previous tokens. Our investigation leads to two main reasons – severe miscalibration of EOS (end of sequence marker) and suppression of attention uncertainty. We design recalibration methods based on these signals and demonstrate improved accuracy, better sequence-level calibration, and more intuitive results from beam-search.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.00802

PDF

https://arxiv.org/pdf/1903.00802
Read All
Visual Localization Under Appearance Change: A Filtering Approach

2019-03-03

Anh-Dzung Doan, Yasir Latif, Thanh-Toan Do, Yu Liu, Shin-Fang Ch'ng, Tat-Jun Chin, Ian Reid

arXiv_CV

arXiv_CV Image_Retrieval Recognition
Abstract

A major focus of current research on place recognition is visual localization for autonomous driving, which must be robust against significant appearance change. This work makes three contributions towards solving visual localization under appearance change: i) We present G2D, a software that enables capturing videos from Grand Theft Auto V, a popular role playing game set in an expansive virtual city. The target users of our software are robotic vision researchers who wish to collect hyper-realistic computer-generated imagery of a city from the street level, under controlled 6 DoF camera poses and different environmental conditions; ii) Using G2D, we construct a synthetic dataset simulating a realistic setting, i.e., multiple vehicles traversing through a road network in an urban area under different environmental conditions; iii) Based on image retrieval using local features and an encoding technique, a novel Monte Carlo localization algorithm is proposed. The experimental results show that our proposed method achieves better results than state-of-the-art approaches for the task on visual localization under significant appearance change. The dataset will be available online upon acceptance. G2D is made available at: https://github.com/dadung/G2D

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.08063

PDF

http://arxiv.org/pdf/1811.08063
Read All
Keyframe-based Direct Thermal-Inertial Odometry

2019-03-03

Shehryar Khattak, Christos Papachristos, Kostas Alexis

arXiv_CV

arXiv_CV Optimization
Abstract

This paper proposes an approach for fusing direct radiometric data from a thermal camera with inertial measurements to extend the robotic capabilities of aerial robots for navigation in GPS-denied and visually degraded environments in the conditions of darkness and in the presence of airborne obscurants such as dust, fog and smoke. An optimization based approach is developed that jointly minimizes the re-projection error of 3D landmarks and inertial measurement errors. The developed solution is extensively verified against both ground-truth in an indoor laboratory setting, as well as inside an underground mine under severely visually degraded conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00798

PDF

http://arxiv.org/pdf/1903.00798
Read All
Let's Transfer Transformations of Shared Semantic Representations

2019-03-02

Nam Vo, Lu Jiang, James Hays

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval Embedding
Abstract

With a good image understanding capability, can we manipulate the images high level semantic representation? Such transformation operation can be used to generate or retrieve similar images but with a desired modification (for example changing beach background to street background); similar ability has been demonstrated in zero shot learning, attribute composition and attribute manipulation image search. In this work we show how one can learn transformations with no training examples by learning them on another domain and then transfer to the target domain. This is feasible if: first, transformation training data is more accessible in the other domain and second, both domains share similar semantics such that one can learn transformations in a shared embedding space. We demonstrate this on an image retrieval task where search query is an image, plus an additional transformation specification (for example: search for images similar to this one but background is a street instead of a beach). In one experiment, we transfer transformation from synthesized 2D blobs image to 3D rendered image, and in the other, we transfer from text domain to natural image domain.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00793

PDF

http://arxiv.org/pdf/1903.00793
Read All
AIRD: Adversarial Learning Framework for Image Repurposing Detection

2019-03-02

Ayush Jaiswal, Yue Wu, Wael AbdAlmageed, Iacopo Masi, Premkumar Natarajan

arXiv_CV

arXiv_CV Adversarial Knowledge GAN Detection
Abstract

Image repurposing is a commonly used method for spreading misinformation on social media and online forums, which involves publishing untampered images with modified metadata to create rumors and further propaganda. While manual verification is possible, given vast amounts of verified knowledge available on the internet, the increasing prevalence and ease of this form of semantic manipulation call for the development of robust automatic ways of assessing the semantic integrity of multimedia data. In this paper, we present a novel method for image repurposing detection that is based on the real-world adversarial interplay between a bad actor who repurposes images with counterfeit metadata and a watchdog who verifies the semantic consistency between images and their accompanying metadata, where both players have access to a reference dataset of verified content, which they can use to achieve their goals. The proposed method exhibits state-of-the-art performance on location-identity, subject-identity and painting-artist verification, showing its efficacy across a diverse set of scenarios.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00788

PDF

http://arxiv.org/pdf/1903.00788
Read All
Marker based Thermal-Inertial Localization for Aerial Robots in Obscurant Filled Environments

2019-03-02

Shehryar Khattak, Christos Papachristos, Kostas Alexis

arXiv_CV

arXiv_CV Pose_Estimation Detection
Abstract

For robotic inspection tasks in known environments fiducial markers provide a reliable and low-cost solution for robot localization. However, detection of such markers relies on the quality of RGB camera data, which degrades significantly in the presence of visual obscurants such as fog and smoke. The ability to navigate known environments in the presence of obscurants can be critical for inspection tasks especially, in the aftermath of a disaster. Addressing such a scenario, this work proposes a method for the design of fiducial markers to be used with thermal cameras for the pose estimation of aerial robots. Our low cost markers are designed to work in the long wave infrared spectrum, which is not affected by the presence of obscurants, and can be affixed to any object that has measurable temperature difference with respect to its surroundings. Furthermore, the estimated pose from the fiducial markers is fused with inertial measurements in an extended Kalman filter to remove high frequency noise and error present in the fiducial pose estimates. The proposed markers and the pose estimation method are experimentally evaluated in an obscurant filled environment using an aerial robot carrying a thermal camera.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00782

PDF

http://arxiv.org/pdf/1903.00782
Read All
Fairness in Recommendation Ranking through Pairwise Comparisons

2019-03-02

Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, Cristos Goodrow

arXiv_AI

arXiv_AI Regularization Recommendation
Abstract

Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information. As such it is important to ask: what are the possible fairness risks, how can we quantify them, and how should we address them? In this paper we offer a set of novel metrics for evaluating algorithmic fairness concerns in recommender systems. In particular we show how measuring fairness based on pairwise comparisons from randomized experiments provides a tractable means to reason about fairness in rankings from recommender systems. Building on this metric, we offer a new regularizer to encourage improving this metric during model training and thus improve fairness in the resulting rankings. We apply this pairwise regularization to a large-scale, production recommender system and show that we are able to significantly improve the system’s pairwise fairness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00780

PDF

http://arxiv.org/pdf/1903.00780
Read All
Specifying and Computing Causes for Query Answers in Databases via Database Repairs and Repair Programs

2019-03-02

Leopoldo Bertossi

arXiv_AI

arXiv_AI
Abstract

A correspondence between database tuples as causes for query answers in databases and tuple-based repairs of inconsistent databases with respect to denial constraints has already been established. In this work, answer-set programs that specify repairs of databases are used as a basis for solving computational and reasoning problems about causes. Here, causes are also introduced at the attribute level by appealing to a both null-based and attribute-based repair semantics. The corresponding repair programs are presented, and they are used as a basis for computation and reasoning about attribute-level causes. They are extended to deal with the case of causality under integrity constraints. Several examples with the DLV system are shown.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1712.01001

PDF

http://arxiv.org/pdf/1712.01001
Read All
Spatio-Temporal Vegetation Pixel Classification By Using Convolutional Networks

2019-03-02

Keiller Nogueira, Jefersson A. dos Santos, Nathalia Menini, Thiago S. F. Silva, Leonor Patricia C. Morellato, Ricardo da S. Torres

arXiv_CV

arXiv_CV Face CNN Classification
Abstract

Plant phenology studies rely on long-term monitoring of life cycles of plants. High-resolution unmanned aerial vehicles (UAVs) and near-surface technologies have been used for plant monitoring, demanding the creation of methods capable of locating and identifying plant species through time and space. However, this is a challenging task given the high volume of data, the constant data missing from temporal dataset, the heterogeneity of temporal profiles, the variety of plant visual patterns, and the unclear definition of individuals’ boundaries in plant communities. In this letter, we propose a novel method, suitable for phenological monitoring, based on Convolutional Networks (ConvNets) to perform spatio-temporal vegetation pixel-classification on high resolution images. We conducted a systematic evaluation using high-resolution vegetation image datasets associated with the Brazilian Cerrado biome. Experimental results show that the proposed approach is effective, overcoming other spatio-temporal pixel-classification strategies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00774

PDF

http://arxiv.org/pdf/1903.00774
Read All
Weakly labelled AudioSet Classification with Attention Neural Networks

2019-03-02

Qiuqiang Kong, Changsong Yu, Turab Iqbal, Yong Xu, Wenwu Wang, Mark D. Plumbley

arXiv_SD

arXiv_SD Salient Attention Embedding Classification Relation
Abstract

Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio classification focused on relatively small datasets limited to recognising a small number of sound classes. We investigate audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes. AudioSet is weakly labelled, in that only the presence or absence of sound classes is known for each clip, while the onset and offset times are unknown. To address the weakly-labelled audio classification problem, we propose attention neural networks as a way to attend the the most salient parts of an audio clip. We bridge the connection between attention neural networks and multiple instance learning (MIL) methods, and propose decision-level and feature-level attention neural networks for audio tagging. We investigate attention neural networks modelled by different functions, depths and widths. Experiments on AudioSet show that the feature-level attention neural network achieves a state-of-the-art mean average precision (mAP) of 0.369, outperforming the best multiple instance learning (MIL) method of 0.317 and Google’s deep neural network baseline of 0.314. In addition, we discover that the audio tagging performance on AudioSet embedding features has a weak correlation with the number of training examples and the quality of labels of each sound class.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00765

PDF

http://arxiv.org/pdf/1903.00765
Read All
Extreme Channel Prior Embedded Network for Dynamic Scene Deblurring

2019-03-02

Jianrui Cai, Wangmeng Zuo, Lei Zhang

arXiv_CV

arXiv_CV Image_Caption Regularization Sparse CNN Quantitative
Abstract

Recent years have witnessed the significant progress on convolutional neural networks (CNNs) in dynamic scene deblurring. While CNN models are generally learned by the reconstruction loss defined on training data, incorporating suitable image priors as well as regularization terms into the network architecture could boost the deblurring performance. In this work, we propose an Extreme Channel Prior embedded Network (ECPeNet) to plug the extreme channel priors (i.e., priors on dark and bright channels) into a network architecture for effective dynamic scene deblurring. A novel trainable extreme channel prior embedded layer (ECPeL) is developed to aggregate both extreme channel and blurry image representations, and sparse regularization is introduced to regularize the ECPeNet model learning. Furthermore, we present an effective multi-scale network architecture that works in both coarse-to-fine and fine-to-coarse manners for better exploiting information flow across scales. Experimental results on GoPro and Kohler datasets show that our proposed ECPeNet performs favorably against state-of-the-art deep image deblurring methods in terms of both quantitative metrics and visual quality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00763

PDF

http://arxiv.org/pdf/1903.00763
Read All
Time-Delay Momentum: A Regularization Perspective on the Convergence and Generalization of Stochastic Momentum for Deep Learning

2019-03-02

Ziming Zhang, Wenju Xu, Alan Sullivan

arXiv_CV

arXiv_CV Regularization Optimization Deep_Learning Gradient_Descent
Abstract

In this paper we study the problem of convergence and generalization error bound of stochastic momentum for deep learning from the perspective of regularization. To do so, we first interpret momentum as solving an $\ell_2$-regularized minimization problem to learn the offsets between arbitrary two successive model parameters. We call this {\em time-delay momentum} because the model parameter is updated after a few iterations towards finding the minimizer. We then propose our learning algorithm, \ie stochastic gradient descent (SGD) with time-delay momentum. We show that our algorithm can be interpreted as solving a sequence of strongly convex optimization problems using SGD. We prove that under mild conditions our algorithm can converge to a stationary point with rate of $O(\frac{1}{\sqrt{K}})$ and generalization error bound of $O(\frac{1}{\sqrt{n\delta}})$ with probability at least $1-\delta$, where $K,n$ are the numbers of model updates and training samples, respectively. We demonstrate the empirical superiority of our algorithm in deep learning in comparison with the state-of-the-art deep learning solvers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00760

PDF

http://arxiv.org/pdf/1903.00760
Read All
Equilibrated Recurrent Neural Network: Neuronal Time-Delayed Self-Feedback Improves Accuracy and Stability

2019-03-02

Ziming Zhang, Anil Kag, Alan Sullivan, Venkatesh Saligrama

arXiv_CV

arXiv_CV RNN
Abstract

We propose a novel {\it Equilibrated Recurrent Neural Network} (ERNN) to combat the issues of inaccuracy and instability in conventional RNNs. Drawing upon the concept of autapse in neuroscience, we propose augmenting an RNN with a time-delayed self-feedback loop. Our sole purpose is to modify the dynamics of each internal RNN state and, at any time, enforce it to evolve close to the equilibrium point associated with the input signal at that time. We show that such self-feedback helps stabilize the hidden state transitions leading to fast convergence during training while efficiently learning discriminative latent features that result in state-of-the-art results on several benchmark datasets at test-time. We propose a novel inexact Newton method to solve fixed-point conditions given model parameters for generating the latent features at each hidden state. We prove that our inexact Newton method converges locally with linear rate (under mild conditions). We leverage this result for efficient training of ERNNs based on backpropagation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00755

PDF

http://arxiv.org/pdf/1903.00755
Read All
Lexicographically Ordered Multi-Objective Clustering

2019-03-02

Sainyam Galhotra, Sandhya Saisubramanian, Shlomo Zilberstein

arXiv_AI

arXiv_AI
Abstract

We introduce a rich model for multi-objective clustering with lexicographic ordering over objectives and a slack. The slack denotes the allowed multiplicative deviation from the optimal objective value of the higher priority objective to facilitate improvement in lower-priority objectives. We then propose an algorithm called Zeus to solve this class of problems, which is characterized by a makeshift function. The makeshift fine tunes the clusters formed by the processed objectives so as to improve the clustering with respect to the unprocessed objectives, given the slack. We present makeshift for solving three different classes of objectives and analyze their solution guarantees. Finally, we empirically demonstrate the effectiveness of our approach on three applications using real-world data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00750

PDF

http://arxiv.org/pdf/1903.00750
Read All
A Formal Framework for Robot Construction Problems: A Hybrid Planning Approach

2019-03-02

Faseeh Ahmad, Esra Erdem, Volkan Patoglu

arXiv_AI

arXiv_AI
Abstract

We study robot construction problems where multiple autonomous robots rearrange stacks of prefabricated blocks to build stable structures. These problems are challenging due to ramifications of actions, true concurrency, and requirements of supportedness of blocks by other blocks and stability of the structure at all times. We propose a formal hybrid planning framework to solve a wide range of robot construction problems, based on Answer Set Programming. This framework not only decides for a stable final configuration of the structure, but also computes the order of manipulation tasks for multiple autonomous robots to build the structure from an initial configuration, while simultaneously ensuring the stability, supportedness and other desired properties of the partial construction at each step of the plan. We prove the soundness and completeness of our formal method with respect to these properties. We introduce a set of challenging robot construction benchmark instances, including bridge building and stack overhanging scenarios, discuss the usefulness of our framework over these instances, and demonstrate the applicability of our method using a bimanual Baxter robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00745

PDF

http://arxiv.org/pdf/1903.00745
Read All
Overcoming Multi-Model Forgetting

2019-03-02

Yassine Benyahia, Kaicheng Yu, Kamil Bennani-Smires, Martin Jaggi, Anthony Davison, Mathieu Salzmann, Claudiu Musat

arXiv_CV

arXiv_CV NAS
Abstract

We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, due to the overwriting of shared parameters. To overcome this, we introduce a statistically-justified weight plasticity loss that regularizes the learning of a model’s shared parameters according to their importance for the previous models, and demonstrate its effectiveness when training two models sequentially and for neural architecture search. Adding weight plasticity in neural architecture search preserves the best models to the end of the search and yields improved results in both natural language processing and computer vision tasks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.08232

PDF

https://arxiv.org/pdf/1902.08232
Read All
Automating Predictive Modeling Process using Reinforcement Learning

2019-03-02

Udayan Khurana, Horst Samulowitz

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization Classification
Abstract

Building a good predictive model requires an array of activities such as data imputation, feature transformations, estimator selection, hyper-parameter search and ensemble construction. Given the large, complex and heterogenous space of options, off-the-shelf optimization methods are infeasible for realistic response times. In practice, much of the predictive modeling process is conducted by experienced data scientists, who selectively make use of available tools. Over time, they develop an understanding of the behavior of operators, and perform serial decision making under uncertainty, colloquially referred to as educated guesswork. With an unprecedented demand for application of supervised machine learning, there is a call for solutions that automatically search for a good combination of parameters across these tasks to minimize the modeling error. We introduce a novel system called APRL (Autonomous Predictive modeler via Reinforcement Learning), that uses past experience through reinforcement learning to optimize such sequential decision making from within a set of diverse actions under a time constraint on a previously unseen predictive learning problem. APRL actions are taken to optimize the performance of a final ensemble. This is in contrast to other systems, which maximize individual model accuracy first and create ensembles as a disconnected post-processing step. As a result, APRL is able to reduce up to 71\% of classification error on average over a wide variety of problems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00743

PDF

http://arxiv.org/pdf/1903.00743
Read All
Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research

2019-03-02

Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel

arXiv_AI

arXiv_AI
Abstract

Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally emergent curriculum, which we term an autocurriculum. The solution of one social task often begets new social tasks, continually generating novel challenges, and thereby promoting innovation. Under certain conditions these challenges may become increasingly complex over time, demanding that agents accumulate ever more innovations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00742

PDF

http://arxiv.org/pdf/1903.00742
Read All
Predicting and interpreting embeddings for out of vocabulary words in downstream tasks

2019-03-02

Nicolas Garneau, Jean-Samuel Leboeuf, Luc Lamontagne

arXiv_CL

arXiv_CL Attention Face Embedding Prediction
Abstract

We propose a novel way to handle out of vocabulary (OOV) words in downstream natural language processing (NLP) tasks. We implement a network that predicts useful embeddings for OOV words based on their morphology and on the context in which they appear. Our model also incorporates an attention mechanism indicating the focus allocated to the left context words, the right context words or the word’s characters, hence making the prediction more interpretable. The model is a ``drop-in’’ module that is jointly trained with the downstream task’s neural network, thus producing embeddings specialized for the task at hand. When the task is mostly syntactical, we observe that our model aims most of its attention on surface form characters. On the other hand, for tasks more semantical, the network allocates more attention to the surrounding words. In all our tests, the module helps the network to achieve better performances in comparison to the use of simple random embeddings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00724

PDF

http://arxiv.org/pdf/1903.00724
Read All
ROI-based Robotic Grasp Detection for Object Overlapping Scenes

2019-03-02

Hanbo Zhang, Xuguang Lan, Xinwen Zhou, Nanning Zheng

arXiv_RO

arXiv_RO Object_Detection Detection Relation
Abstract

Grasp detection with consideration of the affiliations between grasps and their owner in object overlapping scenes is a necessary and challenging task for the practical use of the robotic grasping approach. In this paper, a robotic grasp detection algorithm named ROI-GD is proposed to provide a feasible solution to this problem based on Region of Interest (ROI), which is the region proposal for objects. ROI-GD uses features from ROIs to detect grasps instead of the whole scene. It has two stages: the first stage is to provide ROIs in the input image and the second-stage is the grasp detector based on ROI features. We also contribute a multi-object grasp dataset, which is much larger than Cornell Grasp Dataset, by labeling Visual Manipulation Relationship Dataset. Experimental results demonstrate that ROI-GD performs much better in object overlapping scenes and at the meantime, remains comparable with state-of-the-art grasp detection algorithms on Cornell Grasp Dataset and Jacquard Dataset. Robotic experiments demonstrate that ROI-GD can help robots grasp the target in single-object and multi-object scenes with the overall success rates of 92.5% and 83.8% respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.10313

PDF

http://arxiv.org/pdf/1808.10313
Read All
Virtual Representations for Iterative IoT Deployment

2019-03-02

Sebastian R. Bader, Maria Maleshkova

arXiv_AI

arXiv_AI
Abstract

A central vision of the Internet of Things is the representation of the physical world in a consistent virtual environment. Especially in the context of smart factories the connection of the different, heterogeneous production modules through a digital shop floor promises faster conversion rates, data-driven maintenance or automated machine configurations for use cases, which have not been known at design time. Nevertheless, these scenarios demand IoT representations of all participating machines and components, which requires high installation efforts and hardware adjustments. We propose an incremental process for bringing the shop floor closer to the IoT vision. Currently the majority of systems, components or parts are not yet connected with the internet and might not even provide the possibility to be technically equipped with sensors. However, those could be essential parts for a realistic digital shop floor representation. We, therefore, propose Virtual Representations, which are capable of independently calculating a physical object’s condition by dynamically collecting and interpreting already available data through RESTful Web APIs. The internal logic of such Virtual Representations are further adjustable at runtime, since changes to its respective physical object, its environment or updates to the resource itself should not cause any downtime.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00718

PDF

http://arxiv.org/pdf/1903.00718
Read All
Efficient Reinforcement Learning with a Mind-Game for Full-Length StarCraft II

2019-03-02

Ruo-Ze Liu, Haifeng Guo, Xiaozhong Ji, Yang Yu, Zitai Xiao, Yuzhou Wu, Zhen-Jia Pang, Tong Lu

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

StarCraft II provides an extremely challenging platform for reinforcement learning due to its huge state-space and game length. The previous fastest method requires days to train a full-length game policy in a single commercial machine. In this paper, we introduce the mind-game to facilitate the reinforcement learning, which is an abstract task model. With the mind-game, the policy is firstly trained in the mind-game fastly and is then mapped to the real game for the second phase training. In our experiments, the trained agent can achieve a 100% win-rate on the map Simple64 against the most difficult non-cheating built-in bot (level-7), and the training is 100 times faster than the previous ones under the same computational resource. To test the generalization performance of the agent, a Golden level of StarCraft II Ladder human player has competed with the agent. With restricted strategy, the agent wins the human player by 4 out of 5 games. The mind-game approach might shed some light for further studies of efficient reinforcement learning. The codes are publicly available (https://github.com/mindgameSC2/mind-SC2).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00715

PDF

http://arxiv.org/pdf/1903.00715
Read All
A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network

2019-03-02

Xihan Li, Jia Zhang, Jiang Bian, Yunhai Tong, Tie-Yan Liu

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

Resource balancing within complex transportation networks is one of the most important problems in real logistics domain. Traditional solutions on these problems leverage combinatorial optimization with demand and supply forecasting. However, the high complexity of transportation routes, severe uncertainty of future demand and supply, together with non-convex business constraints make it extremely challenging in the traditional resource management field. In this paper, we propose a novel sophisticated multi-agent reinforcement learning approach to address these challenges. In particular, inspired by the externalities especially the interactions among resource agents, we introduce an innovative cooperative mechanism for state and reward design resulting in more effective and efficient transportation. Extensive experiments on a simulated ocean transportation service demonstrate that our new approach can stimulate cooperation among agents and lead to much better performance. Compared with traditional solutions based on combinatorial optimization, our approach can give rise to a significant improvement in terms of both performance and stability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00714

PDF

http://arxiv.org/pdf/1903.00714
Read All
neuralRank: Searching and ranking ANN-based model repositories

2019-03-02

Nirmit Desai, Linsong Chu, Raghu K. Ganti, Sebastian Stein, Mudhakar Srivatsa

arXiv_AI

arXiv_AI Attention Transfer_Learning Deep_Learning
Abstract

Widespread applications of deep learning have led to a plethora of pre-trained neural network models for common tasks. Such models are often adapted from other models via transfer learning. The models may have varying training sets, training algorithms, network architectures, and hyper-parameters. For a given application, what isthe most suitable model in a model repository? This is a critical question for practical deployments but it has not received much attention. This paper introduces the novel problem of searching and ranking models based on suitability relative to a target dataset and proposes a ranking algorithm called \textit{neuralRank}. The key idea behind this algorithm is to base model suitability on the discriminating power of a model, using a novel metric to measure it. With experimental results on the MNIST, Fashion, and CIFAR10 datasets, we demonstrate that (1) neuralRank is independent of the domain, the training set, or the network architecture and (2) that the models ranked highly by neuralRank ranking tend to have higher model accuracy in practice.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00711

PDF

http://arxiv.org/pdf/1903.00711
Read All
PartNet: A Recursive Part Decomposition Network for Fine-grained and Hierarchical Shape Segmentation

2019-03-02

Fenggen Yu, Kun Liu, Yan Zhang, Chenyang Zhu, Kai Xu

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Deep_Learning
Abstract

Deep learning approaches to 3D shape segmentation are typically formulated as a multi-class labeling problem. Existing models are trained for a fixed set of labels, which greatly limits their flexibility and adaptivity. We opt for top-down recursive decomposition and develop the first deep learning model for hierarchical segmentation of 3D shapes, based on recursive neural networks. Starting from a full shape represented as a point cloud, our model performs recursive binary decomposition, where the decomposition network at all nodes in the hierarchy share weights. At each node, a node classifier is trained to determine the type (adjacency or symmetry) and stopping criteria of its decomposition. The features extracted in higher level nodes are recursively propagated to lower level ones. Thus, the meaningful decompositions in higher levels provide strong contextual cues constraining the segmentations in lower levels. Meanwhile, to increase the segmentation accuracy at each node, we enhance the recursive contextual feature with the shape feature extracted for the corresponding part. Our method segments a 3D shape in point cloud into an unfixed number of parts, depending on the shape complexity, showing strong generality and flexibility. It achieves the state-of-the-art performance, both for fine-grained and semantic segmentation, on the public benchmark and a new benchmark of fine-grained segmentation proposed in this work. We also demonstrate its application for fine-grained part refinements in image-to-shape reconstruction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00709

PDF

http://arxiv.org/pdf/1903.00709
Read All
Strong homotopy of digitally continuous functions

2019-03-02

P. Christopher Staecker

arXiv_CV

arXiv_CV Relation
Abstract

We introduce a new type of homotopy relation for digitally continuous functions which we call strong homotopy.'' Both digital homotopy and strong homotopy are natural digitizations of classical topological homotopy: the difference between them is analogous to the difference between digital 4-adjacency and 8-adjacency in the plane. We explore basic properties of strong homotopy, and give some equivalent characterizations. In particular we show that strong homotopy is related topunctuated homotopy,’’ in which the function changes by only one point in each homotopy time step. We also show that strongly homotopic maps always have the same induced homomorphisms in the digital homology theory. This is not generally true for digitally homotopic maps, though we do show that it is true for any homotopic selfmaps on the digital cycle $C_n$ with $n\ge 4$. We also define and consider strong homotopy equivalence of digital images. Using some computer assistance, we produce a catalog of all small digital images up to strong homotopy equivalence. We also briefly consider pointed strong homotopy equivalence, and give an example of a pointed contractible image which is not pointed strongly contractible.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00706

PDF

http://arxiv.org/pdf/1903.00706
Read All
Deep Optimization model for Screen Content Image Quality Assessment using Neural Networks

2019-03-02

Xuhao Jiang, Liquan Shen, Guorui Feng, Liangwei Yu, Ping An

arXiv_CV

arXiv_CV Regularization QA CNN Optimization Relation
Abstract

In this paper, we propose a novel quadratic optimized model based on the deep convolutional neural network (QODCNN) for full-reference and no-reference screen content image (SCI) quality assessment. Unlike traditional CNN methods taking all image patches as training data and using average quality pooling, our model is optimized to obtain a more effective model including three steps. In the first step, an end-to-end deep CNN is trained to preliminarily predict the image visual quality, and batch normalized (BN) layers and l2 regularization are employed to improve the speed and performance of network fitting. For second step, the pretrained model is fine-tuned to achieve better performance under analysis of the raw training data. An adaptive weighting method is proposed in the third step to fuse local quality inspired by the perceptual property of the human visual system (HVS) that the HVS is sensitive to image patches containing texture and edge information. The novelty of our algorithm can be concluded as follows: 1) with the consideration of correlation between local quality and subjective differential mean opinion score (DMOS), the Euclidean distance is utilized to measure effectiveness of image patches, and the pretrained model is fine-tuned with more effective training data; 2) an adaptive pooling approach is employed to fuse patch quality of textual and pictorial regions, whose feature only extracted from distorted images owns strong noise robust and effects on both FR and NR IQA; 3) Considering the characteristics of SCIs, a deep and valid network architecture is designed for both NR and FR visual quality evaluation of SCIs. Experimental results verify that our model outperforms both current no-reference and full-reference image quality assessment methods on the benchmark screen content image quality assessment database (SIQAD).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00705

PDF

http://arxiv.org/pdf/1903.00705
Read All
Complex Stiffness Model of Physical Human-Robot Interaction: Implications for Control of Performance Augmentation Exoskeletons

2019-03-02

Binghan He, Huang Huang, Gray C. Thomas, Luis Sentis

arXiv_RO

arXiv_RO Relation
Abstract

Human joint dynamic stiffness plays an important role in the stability of performance augmentation exoskeletons. In this paper, we consider a new frequency domain model of the human joint dynamics which features a complex value stiffness. This complex stiffness consists of a real stiffness and a hysteretic damping. We use it to explain the dynamic behaviors of the human connected to the exoskeleton, in particular the observed non-zero low frequency phase shift and the near constant damping ratio of the resonant as stiffness and inertia vary. We validate this concept by experimenting with an elbow-joint exoskeleton testbed on a subject while modifying joint stiffness behavior, exoskeleton inertia, and strength augmentation gains. We compare three different models of elbow-joint dynamic stiffness: a model with real stiffness, viscous damping and inertia, a model with complex stiffness and inertia, and a model combining the previous two models. Our results show that the hysteretic damping term improves modeling accuracy, using a statistical F-test. Moreover this improvement is statistically more significant than using classical viscous damping term. In addition, we experimentally observe a linear relationship between the hysteretic damping and the real part of the stiffness which allows us to simplify the complex stiffness model as a 1-parameter system. Ultimately, we design a fractional order controller to demonstrate how human hysteretic damping behavior can be exploited to improve strength amplification performance while maintaining stability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00704

PDF

http://arxiv.org/pdf/1903.00704
Read All
MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models

2019-03-02

Lucky Onwuzurike, Enrico Mariconti, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, Gianluca Stringhini

arXiv_AI

arXiv_AI Detection
Abstract

As Android has become increasingly popular, so has malware targeting it, thus pushing the research community to propose different detection techniques. However, the constant evolution of the Android ecosystem, and of malware itself, makes it hard to design robust tools that can operate for long periods of time without the need for modifications or costly re-training. Aiming to address this issue, we set to detect malware from a behavioral point of view, modeled as the sequence of abstracted API calls. We introduce MaMaDroid, a static-analysis based system that abstracts the API calls performed by an app to their class, package, or family, and builds a model from their sequences obtained from the call graph of an app as Markov chains. This ensures that the model is more resilient to API changes and the features set is of manageable size. We evaluate MaMaDroid using a dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it effectively detects malware (with up to 0.99 F-measure) and keeps its detection capabilities for long periods of time (up to 0.87 F-measure two years after training). We also show that MaMaDroid remarkably outperforms DroidAPIMiner, a state-of-the-art detection system that relies on the frequency of (raw) API calls. Aiming to assess whether MaMaDroid’s effectiveness mainly stems from the API abstraction or from the sequencing modeling, we also evaluate a variant of it that uses frequency (instead of sequences), of abstracted API calls. We find that it is not as accurate, failing to capture maliciousness when trained on malware samples that include API calls that are equally or more frequently used by benign apps.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.07477

PDF

http://arxiv.org/pdf/1711.07477
Read All
Fine-Grained Semantic Segmentation of Motion Capture Data using Dilated Temporal Fully-Convolutional Networks

2019-03-02

Noshaba Cheema, Somayeh Hosseini, Janis Sprenger, Erik Herrmann, Han Du, Klaus Fischer, Philipp Slusallek

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation
Abstract

Human motion capture data has been widely used in data-driven character animation. In order to generate realistic, natural-looking motions, most data-driven approaches require considerable efforts of pre-processing, including motion segmentation and annotation. Existing (semi-) automatic solutions either require hand-crafted features for motion segmentation or do not produce the semantic annotations required for motion synthesis and building large-scale motion databases. In addition, human labeled annotation data suffers from inter- and intra-labeler inconsistencies by design. We propose a semi-automatic framework for semantic segmentation of motion capture data based on supervised machine learning techniques. It first transforms a motion capture sequence into a ``motion image’’ and applies a convolutional neural network for image segmentation. Dilated temporal convolutions enable the extraction of temporal information from a large receptive field. Our model outperforms two state-of-the-art models for action segmentation, as well as a popular network for sequence modeling. Most of all, our method is very robust under noisy and inaccurate training labels and thus can handle human errors during the labeling process.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00695

PDF

http://arxiv.org/pdf/1903.00695
Read All
Sdf-GAN: Semi-supervised Depth Fusion with Multi-scale Adversarial Networks

2019-03-02

Can Pu, Runzi Song, Radim Tylecek, Nanbo Li, Robert B Fisher

arXiv_CV

arXiv_CV Adversarial GAN Relation
Abstract

Refining raw disparity maps from different algorithms to exploit their complementary advantages is still challenging. Uncertainty estimation and complex disparity relationships among pixels limit the accuracy and robustness of existing methods and there is no standard method for fusion of different kinds of depth data. In this paper, we introduce a new method to fuse disparity maps from different sources, while incorporating supplementary information (intensity, gradient, etc.) into a refiner network to better refine raw disparity inputs. A discriminator network classifies disparities at different receptive fields and scales. Assuming a Markov Random Field for the refined disparity map produces better estimates of the true disparity distribution. Both fully supervised and semi-supervised versions of the algorithm are proposed. The approach includes a more robust loss function to inpaint invalid disparity values and requires much less labeled data to train in the semi-supervised learning mode. The algorithm can be generalized to fuse depths from different kinds of depth sources. Experiments explored different fusion opportunities: stereo-monocular fusion, stereo-ToF fusion and stereo-stereo fusion. The experiments show the superiority of the proposed algorithm compared with the most recent algorithms on public synthetic datasets (Scene Flow, SYNTH3, our synthetic garden dataset) and real datasets (Kitti2015 dataset and Trimbot2020 Garden dataset).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.06657

PDF

http://arxiv.org/pdf/1803.06657
Read All
DimDraw -- A novel tool for drawing concept lattices

2019-03-02

Dominik Dürrschnabel, Tom Hanika, Gerd Stumme

arXiv_AI

arXiv_AI Relation
Abstract

Concept lattice drawings are an important tool to visualize complex relations in data in a simple manner to human readers. Many attempts were made to transfer classical graph drawing approaches to order diagrams. Although those methods are satisfying for some lattices they unfortunately perform poorly in general. In this work we present a novel tool to draw concept lattices that is purely motivated by the order structure.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00686

PDF

http://arxiv.org/pdf/1903.00686
Read All

136/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL