Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Revisiting Precision and Recall Definition for Generative Model Evaluation

2019-05-14

Loïc Simon, Ryan Webster, Julien Rabin

arXiv_CV

arXiv_CV
Abstract

In this article we revisit the definition of Precision-Recall (PR) curves for generative models proposed by Sajjadi et al. (arXiv:1806.00035). Rather than providing a scalar for generative quality, PR curves distinguish mode-collapse (poor recall) and bad quality (poor precision). We first generalize their formulation to arbitrary measures, hence removing any restriction to finite support. We also expose a bridge between PR curves and type I and type II error rates of likelihood ratio classifiers on the task of discriminating between samples of the two distributions. Building upon this new perspective, we propose a novel algorithm to approximate precision-recall curves, that shares some interesting methodological properties with the hypothesis testing technique from Lopez-Paz et al (arXiv:1610.06545). We demonstrate the interest of the proposed formulation over the original approach on controlled multi-modal datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05441

PDF

https://arxiv.org/pdf/1905.05441
Read All
Panoramic Annular Localizer: Tackling the Variation Challenges of Outdoor Localization Using Panoramic Annular Images and Active Deep Descriptors

2019-05-14

Ruiqi Cheng, Kaiwei Wang, Shufei Lin, Weijian Hu, Kailun Yang, Xiao Huang, Huabing Li, Dongming Sun, Jian Bai

arXiv_CV

arXiv_CV
Abstract

Visual localization is an attractive problem that estimates the camera localization from database images based on the query image. It is a crucial task for various applications, such as autonomous vehicles, assistive navigation and augmented reality. The challenging issues of the task lie in various appearance variations between query and database images, including illumination variations, season variations, dynamic object variations and viewpoint variations. In order to tackle those challenges, Panoramic Annular Localizer into which panoramic annular lens and robust deep image descriptors are incorporated is proposed in this paper. The panoramic annular images captured by the single camera are processed and fed into the NetVLAD network to form the active deep descriptor, and sequential matching is utilized to generate the localization result. The experiments carried on the public datasets and in the field illustrate the validation of the proposed system.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05425

PDF

https://arxiv.org/pdf/1905.05425
Read All
Massively Multilingual Transfer for NER

2019-05-14

Afshin Rahimi, Yuan Li, Trevor Cohn

arXiv_CL

arXiv_CL Recognition
Abstract

In cross-lingual transfer, NLP models over one or more source languages are applied to a low-resource target language. While most prior work has used a single source model or a few carefully selected models, here we consider a massive setting with many such models. This setting raises the problem of poor transfer, particularly from distant languages. We propose two techniques for modulating the transfer, suitable for zero-shot or few-shot learning, respectively. Evaluating on named entity recognition, we show that our techniques are much more effective than strong baselines, including standard ensembling, and our unsupervised method rivals oracle selection of the single best individual model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00193

PDF

http://arxiv.org/pdf/1902.00193
Read All
Knowledge-based multi-level aggregation for decision aid in the machining industry

2019-05-14

Mathieu Ritou (RoMas, IUT NANTES), Farouk Belkadi (IS3P, ECN), Zakaria Yahouni (LS2N, IUT NANTES), Catherine Da Cunha (IS3P, ECN), Florent Laroche (IS3P, ECN), Benoit Furet (RoMas, IUT NANTES)

arXiv_AI

arXiv_AI Knowledge
Abstract

In the context of Industry 4.0, data management is a key point for decision aid approaches. Large amounts of manufacturing digital data are collected on the shop floor. Their analysis can then require a large amount of computing power. The Big Data issue can be solved by aggregation, generating smart and meaningful data. This paper presents a new knowledge-based multi-level aggregation strategy to support decision making. Manufacturing knowledge is used at each level to design the monitoring criteria or aggregation operators. The proposed approach has been implemented as a demonstrator and successfully applied to a real machining database from the aeronautic industry. Decision Making; Machining; Knowledge based system

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.06413

PDF

http://arxiv.org/pdf/1905.06413
Read All
Towards a Skeleton-Based Action Recognition For Realistic Scenarios

2019-05-14

Cagatay Odabasi, Jewel Jose

arXiv_CV

arXiv_CV Action_Recognition Recognition
Abstract

Understanding human actions is a crucial problem for service robots. However, the general trend in Action Recognition is developing and testing these systems on structured datasets. That’s why this work presents a practical Skeleton-based Action Recognition framework which can be used in realistic scenarios. Our results show that although non-augmented and non-normalized data may yield comparable results on the test split of the dataset, it is far from being useful on another dataset which is a manually collected data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05420

PDF

https://arxiv.org/pdf/1905.05420
Read All
Expression Conditional GAN for Facial Expression-to-Expression Translation

2019-05-14

Hao Tang, Wei Wang, Songsong Wu, Xinya Chen, Dan Xu, Nicu Sebe, Yan Yan

arXiv_AI

arXiv_AI GAN Face Quantitative Recognition
Abstract

In this paper, we focus on the facial expression translation task and propose a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one image domain to another one based on an additional expression attribute. The proposed ECGAN is a generic framework and is applicable to different expression generation tasks where specific facial expression can be easily controlled by the conditional attribute label. Besides, we introduce a novel face mask loss to reduce the influence of background changing. Moreover, we propose an entire framework for facial expression generation and recognition in the wild, which consists of two modules, i.e., generation and recognition. Finally, we evaluate our framework on several public face datasets in which the subjects have different races, illumination, occlusion, pose, color, content and background conditions. Even though these datasets are very diverse, both the qualitative and quantitative results demonstrate that our approach is able to generate facial expressions accurately and robustly.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05416

PDF

https://arxiv.org/pdf/1905.05416
Read All
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

2019-05-14

Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, Yung Yi

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We explore value-based solutions for multi-agent reinforcement learning (MARL) tasks in the centralized training with decentralized execution (CTDE) regime popularized recently. However, VDN and QMIX are representative examples that use the idea of factorization of the joint action-value function into individual ones for decentralized execution. VDN and QMIX address only a fraction of factorizable MARL tasks due to their structural constraint in factorization such as additivity and monotonicity. In this paper, we propose a new factorization method for MARL, QTRAN, which is free from such structural constraints and takes on a new approach to transforming the original joint action-value function into an easily factorizable one, with the same optimal actions. QTRAN guarantees more general factorization than VDN or QMIX, thus covering a much wider class of MARL tasks than does previous methods. Our experiments for the tasks of multi-domain Gaussian-squeeze and modified predator-prey demonstrate QTRAN’s superior performance with especially larger margins in games whose payoffs penalize non-cooperative behavior more aggressively.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05408

PDF

https://arxiv.org/pdf/1905.05408
Read All
On the number of k-skip-n-grams

2019-05-14

Dmytro Krasnoshtan

arXiv_CL

arXiv_CL
Abstract

The paper proves that the number of k-skip-n-grams for a corpus of size $L$ is $\frac{Ln + n + k' - n^2 - nk'}{n} \cdot \binom{n-1+k'}{n-1}$ where $k’ = \min(L - n + 1, k)$.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05407

PDF

https://arxiv.org/pdf/1905.05407
Read All
Plug-and-Play Methods Provably Converge with Properly Trained Denoisers

2019-05-14

Ernest K. Ryu, Jialin Liu, Sicheng Wang, Xiaohan Chen, Zhangyang Wang, Wotao Yin

arXiv_CV

arXiv_CV Deep_Learning
Abstract

Plug-and-play (PnP) is a non-convex framework that integrates modern denoising priors, such as BM3D or deep learning-based denoisers, into ADMM or other proximal algorithms. An advantage of PnP is that one can use pre-trained denoisers when there is not sufficient data for end-to-end training. Although PnP has been recently studied extensively with great empirical success, theoretical analysis addressing even the most basic question of convergence has been insufficient. In this paper, we theoretically establish convergence of PnP-FBS and PnP-ADMM, without using diminishing stepsizes, under a certain Lipschitz condition on the denoisers. We then propose real spectral normalization, a technique for training deep learning-based denoisers to satisfy the proposed Lipschitz condition. Finally, we present experimental results validating the theory.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05406

PDF

https://arxiv.org/pdf/1905.05406
Read All
An Effective Two-Branch Model-Based Deep Network for Single Image Deraining

2019-05-14

Yinglong Wang, Dong Gong, Jie Yang, Qinfeng Shi, Anton van den Hengel, Dehua Xie, Bing Zeng

arXiv_CV

arXiv_CV Attention Drone Deep_Learning
Abstract

Removing rain effects from an image automatically has many applications such as autonomous driving, drone piloting and photo editing and still draws the attention of many people. Traditional methods use heuristics to handcraft various priors to remove or separate the rain effects from an image. Recently end-to-end deep learning based deraining methods have been proposed to offer more flexibility and effectiveness. However, they tend not to obtain good visual effect when encountered images with heavy rain. Heavy rain brings not only rain streaks but also haze-like effect which is caused by the accumulation of tiny raindrops. Different from previous deraining methods, in this paper we model rainy images with a new rain model to remove not only rain streaks but also haze-like effect. Guided by our model, we design a two-branch network to learn its parameters. Then, an SPP structure is jointly trained to refine the results of our model to control the degree of removing the haze-like effect flexibly. Besides, a subnetwork which can localize the rainy pixels is proposed to guide the training of our network. Extensive experiments on several datasets show that our method outperforms the state-of-the-art in both objectives assessments and visual quality.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05404

PDF

https://arxiv.org/pdf/1905.05404
Read All
Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection

2019-05-14

Taekyung Kim, Minki Jeong, Seunghyeon Kim, Seokeon Choi, Changick Kim

arXiv_CV

arXiv_CV Adversarial Object_Detection Represenation_Learning Detection
Abstract

We introduce a novel unsupervised domain adaptation approach for object detection. We aim to alleviate the imperfect translation problem of pixel-level adaptations, and the source-biased discriminativity problem of feature-level adaptations simultaneously. Our approach is composed of two stages, i.e., Domain Diversification (DD) and Multi-domain-invariant Representation Learning (MRL). At the DD stage, we diversify the distribution of the labeled data by generating various distinctive shifted domains from the source domain. At the MRL stage, we apply adversarial learning with a multi-domain discriminator to encourage feature to be indistinguishable among the domains. DD addresses the source-biased discriminativity, while MRL mitigates the imperfect image translation. We construct a structured domain adaptation framework for our learning paradigm and introduce a practical way of DD for implementation. Our method outperforms the state-of-the-art methods by a large margin of 3%~11% in terms of mean average precision (mAP) on various datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05396

PDF

https://arxiv.org/pdf/1905.05396
Read All
A Survey of Multilingual Neural Machine Translation

2019-05-14

Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

arXiv_CL

arXiv_CL Knowledge Survey NMT
Abstract

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years. MNMT has been useful in improving translation quality as a result of knowledge transfer. MNMT is more promising and interesting than its statistical machine translation counterpart because end-to-end modeling and distributed representations open new avenues. Many approaches have been proposed in order to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and hence deserve further exploration. In this paper, we present an in-depth survey of existing literature on MNMT. We categorize various approaches based on the resource scenarios as well as underlying modeling principles. We hope this paper will serve as a starting point for researchers and engineers interested in MNMT.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05395

PDF

https://arxiv.org/pdf/1905.05395
Read All
Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

2019-05-14

Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi Chen

arXiv_CV

arXiv_CV
Abstract

A key challenge in leveraging data augmentation for neural network training is choosing an effective augmentation policy from a large search space of candidate operations. Properly chosen augmentation policies can lead to significant generalization improvements; however, state-of-the-art approaches such as AutoAugment are computationally infeasible to run for the ordinary user. In this paper, we introduce a new data augmentation algorithm, Population Based Augmentation (PBA), which generates nonstationary augmentation policy schedules instead of a fixed augmentation policy. We show that PBA can match the performance of AutoAugment on CIFAR-10, CIFAR-100, and SVHN, with three orders of magnitude less overall compute. On CIFAR-10 we achieve a mean test error of 1.46%, which is a slight improvement upon the current state-of-the-art. The code for PBA is open source and is available at this https URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05393

PDF

https://arxiv.org/pdf/1905.05393
Read All
Data Markets to support AI for All: Pricing, Valuation and Governance

2019-05-14

Ramesh Raskar, Praneeth Vepakomma, Tristan Swedish, Aalekh Sharan

arXiv_AI

arXiv_AI
Abstract

We discuss a data market technique based on intrinsic (relevance and uniqueness) as well as extrinsic value (influenced by supply and demand) of data. For intrinsic value, we explain how to perform valuation of data in absolute terms (i.e just by itself), or relatively (i.e in comparison to multiple datasets) or in conditional terms (i.e valuating new data given currently existing data).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.06462

PDF

http://arxiv.org/pdf/1905.06462
Read All
Domain Adaptive Person Re-Identification via Camera Style Generation and Label Propagation

2019-05-14

Chuan-Xian Ren, Bo-Hua Liang, Zhen Lei

arXiv_CV

arXiv_CV Re-identification GAN Person_Re-identification
Abstract

Unsupervised domain adaptation in person re-identification resorts to labeled source data to promote the model training on target domain, facing the dilemmas caused by large domain shift and large camera variations. The non-overlapping labels challenge that source domain and target domain have entirely different persons further increases the re-identification difficulty. In this paper, we propose a novel algorithm to narrow such domain gaps. We derive a camera style adaptation framework to learn the style-based mappings between different camera views, from the target domain to the source domain, and then we can transfer the identity-based distribution from the source domain to the target domain on the camera level. To overcome the non-overlapping labels challenge and guide the person re-identification model to narrow the gap further, an efficient and effective soft-labeling method is proposed to mine the intrinsic local structure of the target domain through building the connection between GAN-translated source domain and the target domain. Experiment results conducted on real benchmark datasets indicate that our method gets state-of-the-art results.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05382

PDF

https://arxiv.org/pdf/1905.05382
Read All
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting

2019-05-14

Anh Duc Le, Hung Tuan Nguyen, Masaki Nakagawa

arXiv_CV

arXiv_CV Image_Caption Attention Caption RNN Language_Model Recognition
Abstract

Inspired by recent successes in neural machine translation and image caption generation, we present an attention based encoder decoder model (AED) to recognize Vietnamese Handwritten Text. The model composes of two parts: a DenseNet for extracting invariant features, and a Long Short-Term Memory network (LSTM) with an attention model incorporated for generating output text (LSTM decoder), which are connected from the CNN part to the attention model. The input of the CNN part is a handwritten text image and the target of the LSTM decoder is the corresponding text of the input image. Our model is trained end-to-end to predict the text from a given input image since all the parts are differential components. In the experiment section, we evaluate our proposed AED model on the VNOnDB-Word and VNOnDB-Line datasets to verify its efficiency. The experiential results show that our model achieves 12.30% of word error rate without using any language model. This result is competitive with the handwriting recognition system provided by Google in the Vietnamese Online Handwritten Text Recognition competition.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05381

PDF

https://arxiv.org/pdf/1905.05381
Read All
Automated Segmentation of Cervical Nuclei in Pap Smear Images using Deformable Multi-path Ensemble Model

2019-05-14

Jie Zhao, Quanzheng Li, Xiang Li, Hongfeng Li, Li Zhang

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

Pap smear testing has been widely used for detecting cervical cancers based on the morphology properties of cell nuclei in microscopic image. An accurate nuclei segmentation could thus improve the success rate of cervical cancer screening. In this work, a method of automated cervical nuclei segmentation using Deformable Multipath Ensemble Model (D-MEM) is proposed. The approach adopts a U-shaped convolutional network as a backbone network, in which dense blocks are used to transfer feature information more effectively. To increase the flexibility of the model, we then use deformable convolution to deal with different nuclei irregular shapes and sizes. To reduce the predictive bias, we further construct multiple networks with different settings, which form an ensemble model. The proposed segmentation framework has achieved state-of-the-art accuracy on Herlev dataset with Zijdenbos similarity index (ZSI) of 0.933, and has the potential to be extended for solving other medical image segmentation tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.00527

PDF

http://arxiv.org/pdf/1812.00527
Read All
3D Dense Separated Convolution Module for Volumetric Image Analysis

2019-05-14

Lei Qu, Changfeng Wu, Liang Zou

arXiv_CV

arXiv_CV Segmentation CNN Image_Classification Classification Deep_Learning
Abstract

With the thriving of deep learning, 3D Convolutional Neural Networks have become a popular choice in volumetric image analysis due to their impressive 3D contexts mining ability. However, the 3D convolutional kernels will introduce a significant increase in the amount of trainable parameters. Considering the training data is often limited in biomedical tasks, a tradeoff has to be made between model size and its representational power. To address this concern, in this paper, we propose a novel 3D Dense Separated Convolution (3D-DSC) module to replace the original 3D convolutional kernels. The 3D-DSC module is constructed by a series of densely connected 1D filters. The decomposition of 3D kernel into 1D filters reduces the risk of over-fitting by removing the redundancy of 3D kernels in a topologically constrained manner, while providing the infrastructure for deepening the network. By further introducing nonlinear layers and dense connections between 1D filters, the network’s representational power can be significantly improved while maintaining a compact architecture. We demonstrate the superiority of 3D-DSC on volumetric image classification and segmentation, which are two challenging tasks often encountered in biomedical image computing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08608

PDF

http://arxiv.org/pdf/1905.08608
Read All
A human-inspired recognition system for premodern Japanese historical documents

2019-05-14

Anh Duc Le, Tarin Clanuwat, Asanobu Kitamoto

arXiv_CV

arXiv_CV Segmentation Attention Recognition
Abstract

Recognition of historical documents is a challenging problem due to the noised, damaged characters and background. However, in Japanese historical documents, not only contains the mentioned problems, pre-modern Japanese characters were written in cursive and are connected. Therefore, character segmentation based methods do not work well. This leads to the idea of creating a new recognition system. In this paper, we propose a human-inspired document reading system to recognize multiple lines of premodern Japanese historical documents. During the reading, people employ eyes movement to determine the start of a text line. Then, they move the eyes from the current character/word to the next character/word. They can also determine the end of a line or skip a figure to move to the next line. The eyes movement integrates with visual processing to operate the reading process in the brain. We employ attention-based encoder-decoder to implement this recognition system. First, the recognition system detects where to start a text line. Second, the system scans and recognize character by character until the text line is completed. Then, the system continues to detect the start of the next text line. This process is repeated until reading the whole document. We tested our human-inspired recognition system on the pre-modern Japanese historical document provide by the PRMU Kuzushiji competition. The results of the experiments demonstrate the superiority and effectiveness of our proposed system by achieving Sequence Error Rate of 9.87% and 53.81% on level 2 and level 3 of the dataset, respectively. These results outperform to any other systems participated in the PRMU Kuzushiji competition.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05377

PDF

https://arxiv.org/pdf/1905.05377
Read All
Self-supervised Audio Spatialization with Correspondence Classifier

2019-05-14

Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

arXiv_CV

arXiv_CV
Abstract

Spatial audio is an essential medium to audiences for 3D visual and auditory experience. However, the recording devices and techniques are expensive or inaccessible to the general public. In this work, we propose a self-supervised audio spatialization network that can generate spatial audio given the corresponding video and monaural audio. To enhance spatialization performance, we use an auxiliary classifier to classify ground-truth videos and those with audio where the left and right channels are swapped. We collect a large-scale video dataset with spatial audio to validate the proposed method. Experimental results demonstrate the effectiveness of the proposed model on the audio spatialization task.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05375

PDF

https://arxiv.org/pdf/1905.05375
Read All
Image quality assessment for determining efficacy and limitations of Super-Resolution Convolutional Neural Network

2019-05-14

Chris M. Ward, Josh Harguess, Brendan Crabb, Shibin Parameswaran

arXiv_CV

arXiv_CV Super_Resolution CNN Deep_Learning
Abstract

Traditional metrics for evaluating the efficacy of image processing techniques do not lend themselves to understanding the capabilities and limitations of modern image processing methods - particularly those enabled by deep learning. When applying image processing in engineering solutions, a scientist or engineer has a need to justify their design decisions with clear metrics. By applying blind/referenceless image spatial quality (BRISQUE), Structural SIMilarity (SSIM) index scores, and Peak signal-to-noise ratio (PSNR) to images before and after image processing, we can quantify quality improvements in a meaningful way and determine the lowest recoverable image quality for a given method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05373

PDF

https://arxiv.org/pdf/1905.05373
Read All
Learning To Simulate

2019-05-14

Nataniel Ruiz, Samuel Schulter, Manmohan Chandraker

arXiv_CV

arXiv_CV Reinforcement_Learning
Abstract

Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. In contrast to prior art that hand-crafts these simulation parameters or adjusts only parts of the available parameters, our approach fully controls the simulator with the actual underlying goal of maximizing accuracy, rather than mimicking the real data distribution or randomly generating a large volume of data. We find that our approach (i) quickly converges to the optimal simulation parameters in controlled experiments and (ii) can indeed discover good sets of parameters for an image rendering simulator in actual computer vision applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.02513

PDF

http://arxiv.org/pdf/1810.02513
Read All
Streetscape augmentation using generative adversarial networks: insights related to health and wellbeing

2019-05-14

Jasper S. Wijnands, Kerry A. Nice, Jason Thompson, Haifeng Zhao, Mark Stevenson

arXiv_CV

arXiv_CV Adversarial GAN Style_Transfer Survey Deep_Learning Relation
Abstract

Deep learning using neural networks has provided advances in image style transfer, merging the content of one image (e.g., a photo) with the style of another (e.g., a painting). Our research shows this concept can be extended to analyse the design of streetscapes in relation to health and wellbeing outcomes. An Australian population health survey (n=34,000) was used to identify the spatial distribution of health and wellbeing outcomes, including general health and social capital. For each outcome, the most and least desirable locations formed two domains. Streetscape design was sampled using around 80,000 Google Street View images per domain. Generative adversarial networks translated these images from one domain to the other, preserving the main structure of the input image, but transforming the `style’ from locations where self-reported health was bad to locations where it was good. These translations indicate that areas in Melbourne with good general health are characterised by sufficient green space and compactness of the urban environment, whilst streetscape imagery related to high social capital contained more and wider footpaths, fewer fences and more grass. Beyond identifying relationships, the method is a first step towards computer-generated design interventions that have the potential to improve population health and wellbeing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.06464

PDF

http://arxiv.org/pdf/1905.06464
Read All
RaD-VIO: Rangefinder-aided Downward Visual-Inertial Odometry

2019-05-14

Bo Fu, Kumar Shaurya Shankar, Nathan Michael

arXiv_RO

arXiv_RO
Abstract

State-of-the-art forward facing monocular visual-inertial odometry algorithms are often brittle in practice, especially whilst dealing with initialisation and motion in directions that render the state unobservable. In such cases having a reliable complementary odometry algorithm enables robust and resilient flight. Using the common local planarity assumption, we present a fast, dense, and direct frame-to-frame visual-inertial odometry algorithm for downward facing cameras that minimises a joint cost function involving a homography based photometric cost and an IMU regularisation term. Via extensive evaluation in a variety of scenarios we demonstrate superior performance than existing state-of-the-art downward facing odometry algorithms for Micro Aerial Vehicles (MAVs).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.08704

PDF

http://arxiv.org/pdf/1810.08704
Read All
LiDAR-Camera Calibration under Arbitrary Configurations: Observability and Methods

2019-05-14

Bo Fu, Yue Wang, Xiaqing Ding, Yanmei Jiao, Li Tang, Rong Xiong

arXiv_RO

arXiv_RO Optimization
Abstract

LiDAR-camera calibration is a precondition for many heterogeneous systems that fuse data from LiDAR and camera. However, the constraint from common field of view and the requirement for strict time synchronization make the calibration a challenging problem. In this paper, we propose a novel LiDAR-camera calibration method aiming to eliminate these two constraints. Specifically, we capture a scan of 3D LiDAR when both the environment and the sensors are stationary, then move the camera to reconstruct the 3D environment using the sequentially obtained images. Finally, we align 3D visual points to the laser scan based on tightly couple graph optimization method to calculate the extrinsic parameters between LiDAR and camera. Under this design, the configuration of these two sensors are free from the common field of view constraint owing to the extended view from the moving camera. And we also eliminate the requirement for strict time synchronization as we only use the single scan of laser data when the sensors are stationary. We theoretically derive the conditions of minimal observability for our method and prove that the accuracy of calibration is improved by collecting more observations from multiple scattered calibration targets. We validate our method on both simulation platform and real-world datasets. Experiments show that our method achieves higher accuracy than other comparable methods, which is in accordance with our theoretical analysis. In addition, the proposed method is beneficial to not only plane measurement error based chessboard, but also other point measurement error based calibration targets, such as boxes and polygonal boards.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06141

PDF

http://arxiv.org/pdf/1903.06141
Read All
A Context-and-Spatial Aware Network for Multi-Person Pose Estimation

2019-05-14

Dongdong Yu, Kai Su, Xin Geng, Changhu Wang

arXiv_CV

arXiv_CV Pose_Estimation
Abstract

Multi-person pose estimation is a fundamental yet challenging task in computer vision. Both rich context information and spatial information are required to precisely locate the keypoints for all persons in an image. In this paper, a novel Context-and-Spatial Aware Network (CSANet), which integrates both a Context Aware Path and Spatial Aware Path, is proposed to obtain effective features involving both context information and spatial information. Specifically, we design a Context Aware Path with structure supervision strategy and spatial pyramid pooling strategy to enhance the context information. Meanwhile, a Spatial Aware Path is proposed to preserve the spatial information, which also shortens the information propagation path from low-level features to high-level features. On top of these two paths, we employ a Heavy Head Path to further combine and enhance the features effectively. Experimentally, our proposed network outperforms state-of-the-art methods on the COCO keypoint benchmark, which verifies the effectiveness of our method and further corroborates the above proposition.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05355

PDF

https://arxiv.org/pdf/1905.05355
Read All
Listwise View Ranking for Image Cropping

2019-05-14

Weirui Lu, Xiaofen Xing, Bolun Cai, Xiangmin Xu

arXiv_CV

arXiv_CV
Abstract

Rank-based Learning with deep neural network has been widely used for image cropping. However, the performance of ranking-based methods is often poor and this is mainly due to two reasons: 1) image cropping is a listwise ranking task rather than pairwise comparison; 2) the rescaling caused by pooling layer and the deformation in view generation damage the performance of composition learning. In this paper, we develop a novel model to overcome these problems. To address the first problem, we formulate the image cropping as a listwise ranking problem to find the best view composition. For the second problem, a refined view sampling (called RoIRefine) is proposed to extract refined feature maps for candidate view generation. Given a series of candidate views, the proposed model learns the Top-1 probability distribution of views and picks up the best one. By integrating refined sampling and listwise ranking, the proposed network called LVRN achieves the state-of-the-art performance both in accuracy and speed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05352

PDF

https://arxiv.org/pdf/1905.05352
Read All
Understanding Pedestrian-Vehicle Interactions with Vehicle Mounted Vision: An LSTM Model and Empirical Analysis

2019-05-14

Daniela A. Ridel, Nachiket Deo, Denis Wolf, Mohan M. Trivedi

arXiv_CV

arXiv_CV Face RNN Prediction
Abstract

Pedestrians and vehicles often share the road in complex inner city traffic. This leads to interactions between the vehicle and pedestrians, with each affecting the other’s motion. In order to create robust methods to reason about pedestrian behavior and to design interfaces of communication between self-driving cars and pedestrians we need to better understand such interactions. In this paper, we present a data-driven approach to implicitly model pedestrians’ interactions with vehicles, to better predict pedestrian behavior. We propose a LSTM model that takes as input the past trajectories of the pedestrian and ego-vehicle, and pedestrian head orientation, and predicts the future positions of the pedestrian. Our experiments based on a real-world, inner city dataset captured with vehicle mounted cameras, show that the usage of such cues improve pedestrian prediction when compared to a baseline that purely uses the past trajectory of the pedestrian.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05350

PDF

https://arxiv.org/pdf/1905.05350
Read All
Disparity-Augmented Trajectories for Human Activity Recognition

2019-05-14

Pejman Habashi, Boubakeur Boufama, Imran Shafiq Ahmad

arXiv_CV

arXiv_CV Sparse Recognition
Abstract

Numerous methods for human activity recognition have been proposed in the past two decades. Many of these methods are based on sparse representation, which describes the whole video content by a set of local features. Trajectories, being mid-level sparse features, are capable of describing the motion of an interest-point in 2D space. 2D trajectories might be affected by viewpoint changes, potentially decreasing their accuracy. In this paper, we initially propose and compare different 2D trajectory-based algorithms for human activity recognition. Moreover, we propose a new way of fusing disparity information with 2D trajectory information, without the calculation of 3D reconstruction. The obtained results show a 2.76\% improvement when using disparity-augmented trajectories, compared to using the classical 2D trajectory information only. Furthermore, we have also tested our method on the challenging Hollywood 3D dataset, and we have obtained competitive results, at a faster speed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05344

PDF

https://arxiv.org/pdf/1905.05344
Read All
Convolutional neural networks with fractional order gradient method

2019-05-14

Dian Sheng, Yiheng Wei, Yuquan Chen, Yong Wang

arXiv_CV

arXiv_CV CNN
Abstract

This paper proposes a fractional order gradient method for the backward propagation of convolutional neural networks. To overcome the problem that fractional order gradient method cannot converge to real extreme point, a simplified fractional order gradient method is designed based on Caputo’s definition. The parameters within layers are updated by the designed gradient method, but the propagations between layers still use integer order gradients, and thus the complicated derivatives of composite functions are avoided and the chain rule will be kept. By connecting every layers in series and adding loss functions, the proposed convolutional neural networks can be trained smoothly according to various tasks. Some practical experiments are carried out in order to demonstrate the effectiveness of neural networks at last.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05336

PDF

https://arxiv.org/pdf/1905.05336
Read All
What is known about Vertex Cover Kernelization?

2019-05-14

Michael R. Fellows, Lars Jaffke, Aliz Izabella Király, Frances A. Rosamond, Mathias Weller

arXiv_AI

arXiv_AI Survey
Abstract

We are pleased to dedicate this survey on kernelization of the Vertex Cover problem, to Professor Juraj Hromkovi\v{c} on the occasion of his 60th birthday. The Vertex Cover problem is often referred to as the Drosophila of parameterized complexity. It enjoys a long history. New and worthy perspectives will always be demonstrated first with concrete results here. This survey discusses several research directions in Vertex Cover kernelization. The Barrier Degree of Vertex Cover kernelization is discussed. We have reduction rules that kernelize vertices of small degree, including in this paper new results that reduce graphs almost to minimum degree five. Can this process go on forever? What is the minimum vertex-degree barrier for polynomial-time kernelization? Assuming the Exponential-Time Hypothesis, there is a minimum degree barrier. The idea of automated kernelization is discussed. We here report the first experimental results of an AI-guided branching algorithm for Vertex Cover whose logic seems amenable for application in finding reduction rules to kernelize small-degree vertices. The survey highlights a central open problem in parameterized complexity. Happy Birthday, Juraj!

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.09429

PDF

http://arxiv.org/pdf/1811.09429
Read All
Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

2019-05-14

Marvin Teichmann, Andre Araujo, Menglong Zhu, Jack Sim

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval Object_Detection Detection
Abstract

Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uniform or class-agnostic region selection. In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes $86k$ images with manually curated boxes from $15k$ unique landmarks. Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods. In addition, we introduce a novel regional aggregated selective match kernel (R-ASMK) to effectively combine information from detected regions into an improved holistic image representation. R-ASMK boosts image retrieval accuracy substantially with no dimensionality increase, while even outperforming systems that index image regions independently. Our complete image retrieval system improves upon the previous state-of-the-art by significant margins on the Revisited Oxford and Paris datasets. Code and data available at the project webpage: https://github.com/tensorflow/models/tree/master/research/delf.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.01584

PDF

http://arxiv.org/pdf/1812.01584
Read All
Impact of global structure on diffusive exploration of planar networks

2019-05-13

Aidan I Brown, Laura M Westrate, Elena F Koslover

arXiv_CV

arXiv_CV GAN
Abstract

We investigate diffusive search on planar networks, motivated by tubular networks in cell biology that contain molecules searching for reaction partners and binding sites. Exact calculation of the diffusive mean first-passage time on a spatial network is used to characterize the typical search time as a function of network connectivity. We find that global structural properties — the total edge length and number of loops — are sufficient to largely determine network exploration times for both synthetic planar networks and for organelle morphologies extracted from living cells. This suggests that network architecture can be designed for efficient search without controlling the precise arrangement of connections. Specifically, increasing the number of loops substantially decreases search times, pointing to a potential physical mechanism for regulating reaction rates within organelle network structures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05320

PDF

https://arxiv.org/pdf/1905.05320
Read All
Towards VQA Models That Can Read

2019-05-13

Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach

arXiv_CV

arXiv_CV QA VQA
Abstract

Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new “TextVQA” dataset to facilitate progress on this important problem. Existing datasets either have a small proportion of questions about text (e.g., the VQA dataset) or are too small (e.g., the VizWiz dataset). TextVQA contains 45,336 questions on 28,408 images that require reasoning about text to answer. Second, we introduce a novel model architecture that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the image. Consequently, we call our approach Look, Read, Reason & Answer (LoRRA). We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset. We find that the gap between human performance and machine performance is significantly larger on TextVQA than on VQA 2.0, suggesting that TextVQA is well-suited to benchmark progress along directions complementary to VQA 2.0.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.08920

PDF

https://arxiv.org/pdf/1904.08920
Read All
Affine Variational Autoencoders: An Efficient Approach for Improving Generalization and Robustness to Distribution Shift

2019-05-13

Rene Bidart, Alexander Wong

arXiv_CV

arXiv_CV
Abstract

In this study, we propose the Affine Variational Autoencoder (AVAE), a variant of Variational Autoencoder (VAE) designed to improve robustness by overcoming the inability of VAEs to generalize to distributional shifts in the form of affine perturbations. By optimizing an affine transform to maximize ELBO, the proposed AVAE transforms an input to the training distribution without the need to increase model complexity to model the full distribution of affine transforms. In addition, we introduce a training procedure to create an efficient model by learning a subset of the training distribution, and using the AVAE to improve generalization and robustness to distributional shift at test time. Experiments on affine perturbations demonstrate that the proposed AVAE significantly improves generalization and robustness to distributional shift in the form of affine perturbations without an increase in model complexity.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05300

PDF

https://arxiv.org/pdf/1905.05300
Read All
Towards Content Transfer through Grounded Text Generation

2019-05-13

Shrimai Prabhumoye, Chris Quirk, Michel Galley

arXiv_CL

arXiv_CL Text_Generation
Abstract

Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05293

PDF

https://arxiv.org/pdf/1905.05293
Read All
TopoResNet: A hybrid deep learning architecture and its application to skin lesion classification

2019-05-13

Yu-Min Chung, Chuan-Shen Hu, Austin Lawson, Clifford Smyth

arXiv_CV

arXiv_CV Segmentation CNN Classification Deep_Learning
Abstract

Skin cancer is one of the most common cancers in the United States. As technological advancements are made, algorithmic diagnosis of skin lesions is becoming more important. In this paper, we develop algorithms for segmenting the actual diseased area of skin in a given image of a skin lesion, and for classifying different types of skin lesions pictured in a given image. The cores of the algorithms used were based in persistent homology, an algebraic topology technique that is part of the rising field of Topological Data Analysis (TDA). The segmentation algorithm utilizes a similar concept to persistent homology that captures the robustness of segmented regions. For classification, we design two families of topological features from persistence diagrams—which we refer to as {\em persistence statistics} (PS) and {\em persistence curves} (PC), and use linear support vector machine as classifiers. We also combined those topological features, PS and PC, into ResNet-101 model, which we call {\em TopoResNet-101}, the results show that PS and PC are effective in two folds—improving classification performances and stabilizing the training process. Although convolutional features are the most important learning targets in CNN models, global information of images may be lost in the training process. Because topological features were extracted globally, our results show that the global property of topological features provide additional information to machine learning models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08607

PDF

http://arxiv.org/pdf/1905.08607
Read All
Model-Based Active Exploration

2019-05-13

Pranav Shyam, Wojciech Jaśkowski, Faustino Gomez

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent behaviour with respect to a measure of novelty derived from the Bayesian perspective of exploration, which is estimated using the disagreement between the futures predicted by the ensemble members. We show empirically that in semi-random discrete environments where directed exploration is critical to make progress, MAX is at least an order of magnitude more efficient than strong baselines. MAX scales to high-dimensional continuous environments where it builds task-agnostic models that can be used for any downstream task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.12162

PDF

http://arxiv.org/pdf/1810.12162
Read All
VGG Fine-tuning for Cooking State Recognition

2019-05-13

Juan Wilches

arXiv_CV

arXiv_CV CNN Recognition
Abstract

An important task that domestic robots need to achieve is the recognition of states of food ingredients so they can continue their cooking actions. This project focuses on a fine-tuning algorithm for the VGG (Visual Geometry Group) architecture of deep convolutional neural networks (CNN) for object recognition. The algorithm aims to identify eleven different ingredient cooking states for an image dataset. The original VGG model was adjusted and trained to properly classify the food states. The model was initialized with Imagenet weights. Different experiments were carried out in order to find the model parameters that provided the best performance. The accuracy achieved for the validation set was 76.7% and for the test set 76.6% after changing several parameters of the VGG model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08606

PDF

http://arxiv.org/pdf/1905.08606
Read All
Deep Local Trajectory Replanning and Control for Robot Navigation

2019-05-13

Ashwini Pokle, Roberto Martín-Martín, Patrick Goebel, Vincent Chow, Hans M. Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, Marynel Vázquez

arXiv_AI

arXiv_AI Attention
Abstract

We present a navigation system that combines ideas from hierarchical planning and machine learning. The system uses a traditional global planner to compute optimal paths towards a goal, and a deep local trajectory planner and velocity controller to compute motion commands. The latter components of the system adjust the behavior of the robot through attention mechanisms such that it moves towards the goal, avoids obstacles, and respects the space of nearby pedestrians. Both the structure of the proposed deep models and the use of attention mechanisms make the system’s execution interpretable. Our simulation experiments suggest that the proposed architecture outperforms baselines that try to map global plan information and sensor data directly to velocity commands. In comparison to a hand-designed traditional navigation system, the proposed approach showed more consistent performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05279

PDF

https://arxiv.org/pdf/1905.05279
Read All
Human Visual Understanding for Cognition and Manipulation -- A primer for the roboticist

2019-05-13

Martin Hjelm

arXiv_RO

arXiv_RO Review Face
Abstract

Robotic research is often built on approaches that are motivated by insights from self-examination of how we interface with the world. However, given current theories about human cognition and sensory processing, it is reasonable to assume that the internal workings of the brain are separate from how we interface with the world and ourselves. To amend some of these misconceptions arising from self-examination this article reviews human visual understanding for cognition and action, specifically manipulation. Our focus is on identifying overarching principles such as the separation into visual processing for action and cognition, hierarchical processing of visual input, and the contextual and anticipatory nature of visual processing for action. We also provide a rudimentary exposition of previous theories about visual understanding that shows how self-examination can lead down the wrong path. Our hope is that the article will provide insights for the robotic researcher that can help them navigate the path of self-examination, give them an overview of current theories about human visual processing, as well as provide a source for further relevant reading.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05272

PDF

https://arxiv.org/pdf/1905.05272
Read All
Cooper: Cooperative Perception for Connected Autonomous Vehicles based on 3D Point Clouds

2019-05-13

Qi Chen, Sihai Tang, Qing Yang, Song Fu

arXiv_CV

arXiv_CV Object_Detection Knowledge Detection Recognition
Abstract

Autonomous vehicles may make wrong decisions due to inaccurate detection and recognition. Therefore, an intelligent vehicle can combine its own data with that of other vehicles to enhance perceptive ability, and thus improve detection accuracy and driving safety. However, multi-vehicle cooperative perception requires the integration of real world scenes and the traffic of raw sensor data exchange far exceeds the bandwidth of existing vehicular networks. To the best our knowledge, we are the first to conduct a study on raw-data level cooperative perception for enhancing the detection ability of self-driving systems. In this work, relying on LiDAR 3D point clouds, we fuse the sensor data collected from different positions and angles of connected vehicles. A point cloud based 3D object detection method is proposed to work on a diversity of aligned point clouds. Experimental results on KITTI and our collected dataset show that the proposed system outperforms perception by extending sensing area, improving detection accuracy and promoting augmented results. Most importantly, we demonstrate it is possible to transmit point clouds data for cooperative perception via existing vehicular network technologies.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05265

PDF

https://arxiv.org/pdf/1905.05265
Read All
Distributional Reinforcement Learning for Efficient Exploration

2019-05-13

Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yaoliang Yu

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.06125

PDF

http://arxiv.org/pdf/1905.06125
Read All
Robustness Analysis of Face Obscuration

2019-05-13

Hanxiang Hao, David Güera, Amy R. Reibman, Edward J. Delp

arXiv_CV

arXiv_CV Face
Abstract

Face obscuration is often needed by law enforcement or mass media outlets to provide privacy protection. Sharing sensitive content where the obscuration or redaction technique may have failed to completely remove all identifiable traces can lead to life-threatening consequences. Hence, it is critical to be able to systematically measure the face obscuration performance of a given technique. In this paper we propose to measure the effectiveness of three obscuration techniques: Gaussian blurring, median blurring, and pixelation. We do so by identifying the redacted faces under two scenarios: classifying an obscured face into a group of identities and comparing the similarity of an obscured face with a clear face. Threat modeling is also considered to provide a vulnerability analysis for each studied obscuration technique. Based on our evaluation, we show that pixelation-based face obscuration approaches are the most effective.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05243

PDF

https://arxiv.org/pdf/1905.05243
Read All
Deep Neural Networks for Marine Debris Detection in Sonar Images

2019-05-13

Matias Valdenegro-Toro

arXiv_CV

arXiv_CV Review Face Survey Image_Classification Classification Detection
Abstract

Garbage and waste disposal is one of the biggest challenges currently faced by mankind. Proper waste disposal and recycling is a must in any sustainable community, and in many coastal areas there is significant water pollution in the form of floating or submerged garbage. This is called marine debris. Submerged marine debris threatens marine life, and for shallow coastal areas, it can also threaten fishing vessels [Iñiguez et al. 2016, Renewable and Sustainable Energy Reviews]. Submerged marine debris typically stays in the environment for a long time (20+ years), and consists of materials that can be recycled, such as metals, plastics, glass, etc. Many of these items should not be disposed in water bodies as this has a negative effect in the environment and human health. This thesis performs a comprehensive evaluation on the use of DNNs for the problem of marine debris detection in FLS images, as well as related problems such as image classification, matching, and detection proposals. We do this in a dataset of 2069 FLS images that we captured with an ARIS Explorer 3000 sensor on marine debris objects lying in the floor of a small water tank. The objects we used to produce this dataset contain typical household marine debris and distractor marine objects (tires, hooks, valves, etc), divided in 10 classes plus a background class. Our results show that for the evaluated tasks, DNNs are a superior technique than the corresponding state of the art. There are large gains particularly for the matching and detection proposal tasks. We also study the effect of sample complexity and object size in many tasks, which is valuable information for practitioners. We expect that our results will advance the objective of using Autonomous Underwater Vehicles to automatically survey, detect and collect marine debris from underwater environments.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05241

PDF

https://arxiv.org/pdf/1905.05241
Read All
Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning

2019-05-13

Sara Elkerdawy, Hong Zhang, Nilanjan Ray

arXiv_CV

arXiv_CV CNN Relation
Abstract

Convolutional neural networks (CNNs) have emerged as the state-of-the-art in multiple vision tasks including depth estimation. However, memory and computing power requirements remain as challenges to be tackled in these models. Monocular depth estimation has significant use in robotics and virtual reality that requires deployment on low-end devices. Training a small model from scratch results in a significant drop in accuracy and it does not benefit from pre-trained large models. Motivated by the literature of model pruning, we propose a lightweight monocular depth model obtained from a large trained model. This is achieved by removing the least important features with a novel joint end-to-end filter pruning. We propose to learn a binary mask for each filter to decide whether to drop the filter or not. These masks are trained jointly to exploit relations between filters at different layers as well as redundancy within the same layer. We show that we can achieve around 5x compression rate with small drop in accuracy on the KITTI driving dataset. We also show that masking can improve accuracy over the baseline with fewer parameters, even without enforcing compression loss.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05212

PDF

https://arxiv.org/pdf/1905.05212
Read All
PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

2019-05-13

Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, Hao Li

arXiv_CV

arXiv_CV Face Deep_Learning
Abstract

We introduce Pixel-aligned Implicit Function (PIFu), a highly effective implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object. Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images. Highly intricate shapes, such as hairstyles, clothing, as well as their variations and deformations can be digitized in a unified way. Compared to existing representations used for 3D deep learning, PIFu can produce high-resolution surfaces including largely unseen regions such as the back of a person. In particular, it is memory efficient unlike the voxel representation, can handle arbitrary topology, and the resulting surface is spatially aligned with the input image. Furthermore, while previous techniques are designed to process either a single image or multiple views, PIFu extends naturally to arbitrary number of views. We demonstrate high-resolution and robust reconstructions on real world images from the DeepFashion dataset, which contains a variety of challenging clothing types. Our method achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05172

PDF

http://arxiv.org/pdf/1905.05172
Read All
Zoom To Learn, Learn To Zoom

2019-05-13

Xuaner Cecilia Zhang, Qifeng Chen, Ren Ng, Vladlen Koltun

arXiv_CV

arXiv_CV Super_Resolution
Abstract

This paper shows that when applying machine learning to digital zoom for photography, it is beneficial to use real, RAW sensor data for training. Existing learning-based super-resolution methods do not use real sensor data, instead operating on RGB images. In practice, these approaches result in loss of detail and accuracy in their digitally zoomed output when zooming in on distant image regions. We also show that synthesizing sensor data by resampling high-resolution RGB images is an oversimplified approximation of real sensor data and noise, resulting in worse image quality. The key barrier to using real sensor data for training is that ground truth high-resolution imagery is missing. We show how to obtain the ground-truth data with optically zoomed images and contribute a dataset, SR-RAW, for real-world computational zoom. We use SR-RAW to train a deep network with a novel contextual bilateral loss (CoBi) that delivers critical robustness to mild misalignment in input-output image pairs. The trained network achieves state-of-the-art performance in 4X and 8X computational zoom.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05169

PDF

http://arxiv.org/pdf/1905.05169
Read All
Locally Weighted Regression Pseudo-Rehearsal for Online Learning of Vehicle Dynamics

2019-05-13

Grady Williams, Brian Goldfain, James M. Rehg, Evangelos A. Theodorou

arXiv_AI

arXiv_AI
Abstract

We consider the problem of online adaptation of a neural network designed to represent vehicle dynamics. The neural network model is intended to be used by an MPC control law to autonomously control the vehicle. This problem is challenging because both the input and target distributions are non-stationary, and naive approaches to online adaptation result in catastrophic forgetting, which can in turn lead to controller failures. We present a novel online learning method, which combines the pseudo-rehearsal method with locally weighted projection regression. We demonstrate the effectiveness of the resulting Locally Weighted Projection Regression Pseudo-Rehearsal (LW-PR$^2$) method in simulation and on a large real world dataset collected with a 1/5 scale autonomous vehicle.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05162

PDF

http://arxiv.org/pdf/1905.05162
Read All
AMZ Driverless: The Full Autonomous Racing System

2019-05-13

Juraj Kabzan, Miguel de la Iglesia Valls, Victor Reijgwart, Hubertus Franciscus Cornelis Hendrikx, Claas Ehmke, Manish Prajapat, Andreas Bühler, Nikhil Gosala, Mehak Gupta, Ramya Sivanesan, Ankit Dhall, Eugenio Chisari, Napat Karnchanachari, Sonja Brits, Manuel Dangel, Inkyu Sa, Renaud Dubé, Abel Gawel, Mark Pfeiffer, Alexander Liniger, John Lygeros, Roland Siegwart

arXiv_RO

arXiv_RO
Abstract

This paper presents the algorithms and system architecture of an autonomous racecar. The introduced vehicle is powered by a software stack designed for robustness, reliability, and extensibility. In order to autonomously race around a previously unknown track, the proposed solution combines state of the art techniques from different fields of robotics. Specifically, perception, estimation, and control are incorporated into one high-performance autonomous racecar. This complex robotic system, developed by AMZ Driverless and ETH Zurich, finished 1st overall at each competition we attended: Formula Student Germany 2017, Formula Student Italy 2018 and Formula Student Germany 2018. We discuss the findings and learnings from these competitions and present an experimental evaluation of each module of our solution.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05150

PDF

http://arxiv.org/pdf/1905.05150
Read All

29/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL