Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

KnowBias: A Novel AI Method to Detect Polarity in Online Content

2019-05-02

Aditya Saligrama

arXiv_CL

arXiv_CL Text_Classification Embedding Classification Relation
Abstract

We introduce KnowBias, a system for detecting the degree of political bias in textual content such as social media posts and news articles. In the space of scalable text classification, a common problem is domain mismatch, where easily accessible training data (i.e., tweets) does not correspond in format to the desired testing domain (i.e., longer form article content). While universal text encoders such as word or sentence embeddings could be leveraged to train target agnostic classifiers, such schemes result in poor performance on long-form articles. Our key insight is that long-form articles are a mix of neutral and political sentences, while tweets are concentrated with opinion. We propose a two-step classification system that first automatically filters out neutral sentences from the input text document at evaluation time, and then the resulting text is input into a polarity classifier. We evaluate our two-step approach using a variety of test suites, including a set of tweets and long-form articles where annotations were crowd-sourced to decrease label noise, measuring accuracy and Spearman-rho rank correlation. In practice, KnowBias achieves a high accuracy of 86% (rho = 0.65) on these tweets and 75% (rho = 0.69) on long-form articles.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00724

PDF

http://arxiv.org/pdf/1905.00724
Read All
A multi-agent system approach in evaluating human spatio-temporal vulnerability to seismic risk using social attachment

2019-05-02

Julius Bañgate, Julie Dugdale (LIG Laboratoire d'Informatique de Grenoble), Elise Beck (IV), Carole Adam (LIG, LIG Laboratoire d'Informatique de Grenoble)

arXiv_AI

arXiv_AI Face Recommendation
Abstract

Social attachment theory states that individuals seek the proximity of attachment figures (e.g. family members, friends, colleagues, familiar places or objects) when faced with threat. During disasters, this means that family members may seek each other before evacuating, gather personal property before heading to familiar exits and places, or follow groups/crowds, etc. This hard-wired human tendency should be considered in the assessment of risk and the creation of disaster management plans. Doing so may result in more realistic evacuation procedures and may minimise the number of casualties and injuries. In this context, a dynamic spatio-temporal analysis of seismic risk is presented using SOLACE, a multi-agent model of pedestrian behaviour based on social attachment theory implemented using the Belief-Desire-Intention approach. The model focuses on the influence of human, social, physical and temporal factors on successful evacuation. Human factors considered include perception and mobility defined by age. Social factors are defined by attachment bonds, social groups, population distribution, and cultural norms. Physical factors refer to the location of the epicentre of the earthquake, spatial distribution/layout and attributes of environmental objects such as buildings, roads, barriers (cars), placement of safe areas, evacuation routes, and the resulting debris/damage from the earthquake. Experiments tested the influence of time of the day, presence of disabled persons and earthquake intensity. Initial results show that factors that influence arrivals in safe areas include (a) human factors (age, disability, speed), (b) pre-evacuation behaviours, (c) perception distance (social attachment, time of day), (d) social interaction during evacuation, and (e) physical and spatial aspects, such as limitations imposed by debris (damage), and the distance to safe areas. To validate the results, scenarios will be designed with stakeholders, who will also take part in the definition of a serious game. The recommendation of this research is that both social and physical aspects should be considered when defining vulnerability in the analysis of risk.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01365

PDF

http://arxiv.org/pdf/1905.01365
Read All
From Specifications to Behavior: Maneuver Verification in a Semantic State Space

2019-05-02

Klemens Esterle, Vincent Aravantinos, Alois Knoll

arXiv_RO

arXiv_RO
Abstract

To realize a market entry of autonomous vehicles in the foreseeable future, the behavior planning system will need to abide by the same rules that humans follow. Product liability cannot be enforced without a proper solution to the approval trap. In this paper, we define a semantic abstraction of the continuous space and formalize traffic rules in linear temporal logic (LTL). Sequences in the semantic state space represent maneuvers a high-level planner could choose to execute. We check these maneuvers against the formalized traffic rules using runtime verification. By using the standard model checker NuSMV, we demonstrate the effectiveness of our approach and provide runtime properties for the maneuver verification. We show that high-level behavior can be verified in a semantic state space to fulfill a set of formalized rules, which could serve as a step towards safety of the intended functionality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00708

PDF

http://arxiv.org/pdf/1905.00708
Read All
Face Identification using Local Ternary Tree Pattern based Spatial Structural Components

2019-05-02

Rinku Datta Rakshit, Dakshina Ranjan Kisku, Massimo Tistarelli, Phalguni Gupta

arXiv_CV

arXiv_CV Face
Abstract

This paper reports groundbreaking results of a face identification system which makes use of a novel local descriptor called Local Ternary Tree Pattern. Devising deft and feasible local descriptors for a face image plays an emergent preface in face identification task when the system performs in presence of lots of variety of face images including constrained, unconstrained and plastic surgery images. The LTTP has been proposed to extract robust and discriminatory spatial features from a face image as this descriptor can be used to best describe the various structural components of a face. To extract the most useful features, a ternary tree is formed for each pixel with its eight neighbors. LTTP pattern can be generated in four ways: LTTP Left Depth, LTTP Left Breadth, LTTP Right Depth and LTTP Right Breadth. The encoding schemes of these four patterns generation are very simple and efficient in terms of computational complexity as well as time complexity. The proposed face identification system is tested on six face databases, namely, the UMIST, the JAFFE, the extended Yale face B, the Plastic Surgery, the LFW and the UFI. The experimental evaluation demonstrates the most outstanding results which will have long term impact in designing face identification systems considering a variety of faces captured under different environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00693

PDF

http://arxiv.org/pdf/1905.00693
Read All
Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars

2019-05-02

Alexandros Kouris, Stylianos I. Venieris, Michail Rizakis, Christos-Savvas Bouganis

arXiv_RO

arXiv_RO Inference RNN
Abstract

The need to recognise long-term dependencies in sequential data such as video streams has made LSTMs a prominent AI model for many emerging applications. However, the high computational and memory demands of LSTMs introduce challenges in their deployment on latency-critical systems such as self-driving cars which are equipped with limited computational resources on-board. In this paper, we introduce an approximate computing scheme combining model pruning and computation restructuring to obtain a high-accuracy approximation of the result in early stages of the computation. Our experiments demonstrate that using the proposed methodology, mission-critical systems responsible for autonomous navigation and collision avoidance are able to make informed decisions based on approximate calculations within the available time budget, meeting their specifications on safety and robustness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00689

PDF

http://arxiv.org/pdf/1905.00689
Read All
DS-VIO: Robust and Efficient Stereo Visual Inertial Odometry based on Dual Stage EKF

2019-05-02

Xiaogang Xiong, Wenqing Chen, Zhichao Liu, Qiang Shen

arXiv_CV

arXiv_CV
Abstract

This paper presents a dual stage EKF (Extended Kalman Filter)-based algorithm for the real-time and robust stereo VIO (visual inertial odometry). The first stage of this EKF-based algorithm performs the fusion of accelerometer and gyroscope while the second performs the fusion of stereo camera and IMU. Due to the sufficient complementary characteristics between accelerometer and gyroscope as well as stereo camera and IMU, the dual stage EKF-based algorithm can achieve a high precision of odometry estimations. At the same time, because of the low dimension of state vector in this algorithm, its computational efficiency is comparable to previous filter-based approaches. We call our approach DS-VIO (dual stage EKFbased stereo visual inertial odometry) and evaluate our DSVIO algorithm by comparing it with the state-of-art approaches including OKVIS, ROVIO, VINS-MONO and S-MSCKF on the EuRoC dataset. Results show that our algorithm can achieve comparable or even better performances in terms of the RMS error

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00684

PDF

http://arxiv.org/pdf/1905.00684
Read All
k-NN Graph Construction: a Generic Online Approach

2019-05-02

Wan-Lei Zhao

arXiv_CV

arXiv_CV
Abstract

Nearest neighbor search and k-nearest neighbor graph construction are two fundamental issues arise from many disciplines such as information retrieval, data-mining and machine learning. Despite continuous efforts have been taken in the last several decades, these two issues remain challenging. They become more and more imminent given the big data emerge in various fields in recent years. In this paper, a simple but effective solution both for k-nearest neighbor search and k-nearest neighbor graph construction is presented. These two issues are addressed jointly in our solution. On one hand, the k-nearest neighbor graph construction is treated as a search task. Each sample along with its k-nearest neighbors are joined into the k-nearest neighbor graph by performing the nearest neighbor search sequentially on the graph under construction. On the other hand, the built k-nearest neighbor graph is used to support k-nearest neighbor search. Since the graph is built online, the dynamic update on the graph, which is not desirable from most of the existing solutions, is supported. This solution is feasible for various distance measures. Its effectiveness both as k-nearest neighbor construction and k-nearest neighbor search approaches is verified across various datasets in different scales, various dimensions and under different metrics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.03032

PDF

http://arxiv.org/pdf/1804.03032
Read All
Proprties of biclustering algorithms and a novel biclustering technique based on relative density

2019-05-02

Namita Jain, Susmita Ghosh, C. A. Murthy

arXiv_CV

arXiv_CV Relation
Abstract

Biclustering is found to be useful in areas like data mining and bioinformatics. The term biclustering involves searching subsets of observations and features forming coherent structure. This can be interpreted in different ways like spatial closeness, relation between features for selected observations etc. This article discusses different properties, objectives and approaches of biclustering algorithms. We also present an algorithm which detects feature relation based biclusters using density based techniques. Here we use relative density of regions to identify biclusters embedded in the data. Properties of this algorithm are discussed and demonstrated using artificial datasets. The proposed method is seen to provide better results on both artificial and real datasets. Paired right tailed t test is used for artificial datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.04661

PDF

http://arxiv.org/pdf/1811.04661
Read All
Impact of Argument Type and Concerns in Argumentation with a Chatbot

2019-05-02

Lisa A. Chalaguine, Anthony Hunter, Fiona L. Hamilton, Henry W. W. Potts

arXiv_AI

arXiv_AI QA
Abstract

Conversational agents, also known as chatbots, are versatile tools that have the potential of being used in dialogical argumentation. They could possibly be deployed in tasks such as persuasion for behaviour change (e.g. persuading people to eat more fruit, to take regular exercise, etc.) However, to achieve this, there is a need to develop methods for acquiring appropriate arguments and counterargument that reflect both sides of the discussion. For instance, to persuade someone to do regular exercise, the chatbot needs to know counterarguments that the user might have for not doing exercise. To address this need, we present methods for acquiring arguments and counterarguments, and importantly, meta-level information that can be useful for deciding when arguments can be used during an argumentation dialogue. We evaluate these methods in studies with participants and show how harnessing these methods in a chatbot can make it more persuasive.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00646

PDF

http://arxiv.org/pdf/1905.00646
Read All
Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality

2019-05-02

Sukarna Barua, Xingjun Ma, Sarah Monazam Erfani, Michael E. Houle, James Bailey

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Generative Adversarial Networks (GANs) are an elegant mechanism for data generation. However, a key challenge when using GANs is how to best measure their ability to generate realistic data. In this paper, we demonstrate that an intrinsic dimensional characterization of the data space learned by a GAN model leads to an effective evaluation metric for GAN quality. In particular, we propose a new evaluation measure, CrossLID, that assesses the local intrinsic dimensionality (LID) of real-world data with respect to neighborhoods found in GAN-generated samples. Intuitively, CrossLID measures the degree to which manifolds of two data distributions coincide with each other. In experiments on 4 benchmark image datasets, we compare our proposed measure to several state-of-the-art evaluation metrics. Our experiments show that CrossLID is strongly correlated with the progress of GAN training, is sensitive to mode collapse, is robust to small-scale noise and image transformations, and robust to sample size. Furthermore, we show how CrossLID can be used within the GAN training process to improve generation quality.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.00643

PDF

https://arxiv.org/pdf/1905.00643
Read All
RetinaFace: Single-stage Dense Face Localisation in the Wild

2019-05-02

Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou

arXiv_CV

arXiv_CV Object_Detection Face Detection Face_Detection
Abstract

Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which performs pixel-wise face localisation on various scales of faces by taking advantages of joint extra-supervised and self-supervised multi-task learning. Specifically, We make contributions in the following five aspects: (1) We manually annotate five facial landmarks on the WIDER FACE dataset and observe significant improvement in hard face detection with the assistance of this extra supervision signal. (2) We further add a self-supervised mesh decoder branch for predicting a pixel-wise 3D shape face information in parallel with the existing supervised branches. (3) On the WIDER FACE hard test set, RetinaFace outperforms the state of the art average precision (AP) by $1.1\%$ (achieving AP equal to {\bf $91.4\%$}). (4) On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to improve their results in face verification (TAR=$89.59\%$ for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on a single CPU core for a VGA-resolution image. Extra annotations and code will be released to facilitate future research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00641

PDF

http://arxiv.org/pdf/1905.00641
Read All
LivDet in Action - Fingerprint Liveness Detection Competition 2019

2019-05-02

Giulia Orrù, Roberto Casula, Pierluigi Tuveri, Carlotta Bazzoni, Giovanna Dessalvi, Marco Micheletto, Luca Ghiani, Gian Luca Marcialis

arXiv_CV

arXiv_CV Knowledge Detection
Abstract

The International Fingerprint liveness Detection Competition (LivDet) is an open and well-acknowledged meeting point of academies and private companies that deal with the problem of distinguishing images coming from reproductions of fingerprints made of artificial materials and images relative to real fingerprints. In this edition of LivDet we invited the competitors to propose integrated algorithms with matching systems. The goal was to investigate at which extent this integration impact on the whole performance. Twelve algorithms were submitted to the competition, eight of which worked on integrated systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00639

PDF

http://arxiv.org/pdf/1905.00639
Read All
Inverse Halftoning Through Structure-Aware Deep Convolutional Neural Networks

2019-05-02

Chang-Hwan Son

arXiv_CV

arXiv_CV CNN Prediction Gradient_Descent
Abstract

The primary issue in inverse halftoning is removing noisy dots on flat areas and restoring image structures (e.g., lines, patterns) on textured areas. Hence, a new structure-aware deep convolutional neural network that incorporates two subnetworks is proposed in this paper. One subnetwork is for image structure prediction while the other is for continuous-tone image reconstruction. First, to predict image structures, patch pairs comprising continuous-tone patches and the corresponding halftoned patches generated through digital halftoning are trained. Subsequently, gradient patches are generated by convolving gradient filters with the continuous-tone patches. The subnetwork for the image structure prediction is trained using the mini-batch gradient descent algorithm given the halftoned patches and gradient patches, which are fed into the input and loss layers of the subnetwork, respectively. Next, the predicted map including the image structures is stacked on the top of the input halftoned image through a fusion layer and fed into the image reconstruction subnetwork such that the entire network is trained adaptively to the image structures. The experimental results confirm that the proposed structure-aware network can remove noisy dot-patterns well on flat areas and restore details clearly on textured areas. Furthermore, it is demonstrated that the proposed method surpasses the conventional state-of-the-art methods based on deep convolutional neural networks and locally learned dictionaries.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00637

PDF

http://arxiv.org/pdf/1905.00637
Read All
Truth Discovery via Proxy Voting

2019-05-02

Reshef Meir, Ofra Amir, Gal Cohensius, Omer Ben-Porat, Lirong Xia

arXiv_AI

arXiv_AI Face
Abstract

Truth discovery is a general name for a broad range of statistical methods aimed to extract the correct answers to questions, based on multiple answers coming from noisy sources. For example, workers in a crowdsourcing platform. In this paper, we design simple truth discovery methods inspired by \emph{proxy voting}, that give higher weight to workers whose answers are close to those of other workers. We prove that under standard statistical assumptions, proxy-based truth discovery (\PTD) allows us to estimate the true competence of each worker, whether workers face questions whose answers are real-valued, categorical, or rankings. We then demonstrate through extensive empirical study on synthetic and real data that \PTD is substantially better than unweighted aggregation, and competes well with other truth discovery methods, in all of the above domains.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00629

PDF

http://arxiv.org/pdf/1905.00629
Read All
Psychoacoustically Motivated Declipping Based on Weighted l1 Minimization

2019-05-02

Pavel Záviška, Pavel Rajmic, Jíří Schimmel

arXiv_SD

arXiv_SD
Abstract

A novel method for audio declipping based on sparsity is presented. The method incorporates psychoacoustic information by weighting the transform coefficients in the $\ell_1$ minimization. Weighting leads to an improved quality of restoration while retaining a low complexity of the algorithm. Three possible constructions of the weights are proposed, based on the absolute threshold of hearing, the global masking threshold and on a quadratic curve. Experiments compare the restoration quality according to the signal-to-distortion ratio (SDR) and PEMO-Q objective difference grade (ODG) and indicate that with correctly chosen weights, the presented method is able to compete, or even outperform, the current state of the art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00628

PDF

http://arxiv.org/pdf/1905.00628
Read All
Attention Based Fully Convolutional Network for Speech Emotion Recognition

2019-05-02

Yuanyuan Zhang, Jun Du, Zirui Wang, Jianshu Zhang

arXiv_SD

arXiv_SD Segmentation Attention CNN Transfer_Learning Recognition
Abstract

Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long utterance; 3) speech data with emotional labeling is usually limited. In this paper, we present a novel attention based fully convolutional network for speech emotion recognition. We employ fully convolutional network as it is able to handle variable-length speech, free of the demand of segmentation to keep critical information not lost. The proposed attention mechanism can make our model be aware of which time-frequency region of speech spectrogram is more emotion-relevant. Considering limited data, the transfer learning is also adapted to improve the accuracy. Especially, it’s interesting to observe obvious improvement obtained with natural scene image based pre-trained model. Validated on the publicly available IEMOCAP corpus, the proposed model outperformed the state-of-the-art methods with a weighted accuracy of 70.4% and an unweighted accuracy of 63.9% respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.01506

PDF

http://arxiv.org/pdf/1806.01506
Read All
The Right Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping

2019-05-02

Tom Bruls, Horia Porav, Lars Kunze, Paul Newman

arXiv_CV

arXiv_CV Adversarial Tracking Object_Tracking Detection
Abstract

Many tasks performed by autonomous vehicles such as road marking detection, object tracking, and path planning are simpler in bird’s-eye view. Hence, Inverse Perspective Mapping (IPM) is often applied to remove the perspective effect from a vehicle’s front-facing camera and to remap its images into a 2D domain, resulting in a top-down view. Unfortunately, however, this leads to unnatural blurring and stretching of objects at further distance, due to the resolution of the camera, limiting applicability. In this paper, we present an adversarial learning approach for generating a significantly improved IPM from a single camera image in real time. The generated bird’s-eye-view images contain sharper features (e.g. road markings) and a more homogeneous illumination, while (dynamic) objects are automatically removed from the scene, thus revealing the underlying road layout in an improved fashion. We demonstrate our framework using real-world data from the Oxford RobotCar Dataset and show that scene understanding tasks directly benefit from our boosted IPM approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.00913

PDF

http://arxiv.org/pdf/1812.00913
Read All
Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

2019-05-02

Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

arXiv_SD

arXiv_SD CNN Relation
Abstract

In this work, we investigate the effectiveness of two techniques for improving variational autoencoder (VAE) based voice conversion (VC). First, we reconsider the relationship between vocoder features extracted using the high quality vocoders adopted in conventional VC systems, and hypothesize that the spectral features are in fact F0 dependent. Such hypothesis implies that during the conversion phase, the latent codes and the converted features in VAE based VC are in fact source F0 dependent. To this end, we propose to utilize the F0 as an additional input of the decoder. The model can learn to disentangle the latent code from the F0 and thus generates converted F0 dependent converted features. Second, to better capture temporal dependencies of the spectral features and the F0 pattern, we replace the frame wise conversion structure in the original VAE based VC framework with a fully convolutional network structure. Our experiments demonstrate that the degree of disentanglement as well as the naturalness of the converted speech are indeed improved.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00615

PDF

http://arxiv.org/pdf/1905.00615
Read All
Alternative Techniques for Mapping Paths to HLAI

2019-05-02

Ross Gruetzemacher, David Paradice

arXiv_AI

arXiv_AI Knowledge GAN
Abstract

The only systematic mapping of the HLAI technical landscape was conducted at a workshop in 2009 [Adams et al., 2012]. However, the results from it were not what organizers had hoped for [Goertzel 2014, 2016], merely just a series of milestones, up to 50% of which could be argued to have been completed already. We consider two more recent articles outlining paths to human-like intelligence [Mikolov et al., 2016; Lake et al., 2017]. These offer technical and more refined assessments of the requirements for HLAI rather than just milestones. While useful, they also have limitations. To address these limitations we propose the use of alternative techniques for an updated systematic mapping of the paths to HLAI. The newly proposed alternative techniques can model complex paths of future technologies using intricate directed graphs. Specifically, there are two classes of alternative techniques that we consider: scenario mapping methods and techniques for eliciting expert opinion through digital platforms and crowdsourcing. We assess the viability and utility of both the previous and alternative techniques, finding that the proposed alternative techniques could be very beneficial in advancing the existing body of knowledge on the plausible frameworks for creating HLAI. In conclusion, we encourage discussion and debate to initiate efforts to use these proposed techniques for mapping paths to HLAI.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00614

PDF

http://arxiv.org/pdf/1905.00614
Read All
A knowledge-based intelligence system for control of dirt recognition process in the smart washing machines

2019-05-02

Mohsen Annabestani, Alireza Rowhanimanesh, Akram Rezaei, Ladan Avazpour, Fatemeh Sheikhhasani

arXiv_AI

arXiv_AI Knowledge Recognition
Abstract

In this paper, we propose an intelligence approach based on fuzzy logic to modeling human intelligence in washing clothes. At first, an intelligent feedback loop is designed for perception-based sensing of dirt inspired by human color understanding. Then, when color stains leak out of some colored clothes the human probabilistic decision making is computationally modeled to detect this stain leakage and thus the problem of recognizing dirt from stain can be considered in the washing process. Finally, we discuss the fuzzy control of washing clothes and design and simulate a smart controller based on the fuzzy intelligence feedback loop.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00607

PDF

http://arxiv.org/pdf/1905.00607
Read All
On Self Modulation for Generative Adversarial Networks

2019-05-02

Ting Chen, Mario Lucic, Neil Houlsby, Sylvain Gelly

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Training Generative Adversarial Networks (GANs) is notoriously challenging. We propose and study an architectural modification, self-modulation, which improves GAN performance across different data sets, architectures, losses, regularizers, and hyperparameter settings. Intuitively, self-modulation allows the intermediate feature maps of a generator to change as a function of the input noise vector. While reminiscent of other conditioning techniques, it requires no labeled data. In a large-scale empirical study we observe a relative decrease of $5\%-35\%$ in FID. Furthermore, all else being equal, adding this modification to the generator leads to improved performance in $124/144$ ($86\%$) of the studied settings. Self-modulation is a simple architectural change that requires no additional parameter tuning, which suggests that it can be applied readily to any GAN.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.01365

PDF

http://arxiv.org/pdf/1810.01365
Read All
Human Activity Recognition Using Visual Object Detection

2019-05-02

Schalk Wilhelm Pienaar, Reza Malekian

arXiv_CV

arXiv_CV Object_Detection Tracking Detection Recognition
Abstract

Visual Human Activity Recognition (HAR) and data fusion with other sensors can help us at tracking the behavior and activity of underground miners with little obstruction. Existing models, such as Single Shot Detector (SSD), trained on the Common Objects in Context (COCO) dataset is used in this paper to detect the current state of a miner, such as an injured miner vs a non-injured miner. Tensorflow is used for the abstraction layer of implementing machine learning algorithms, and although it uses Python to deal with nodes and tensors, the actual algorithms run on C++ libraries, providing a good balance between performance and speed of development. The paper further discusses evaluation methods for determining the accuracy of the machine-learning and an approach to increase the accuracy of the detected activity/state of people in a mining environment, by means of data fusion.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03707

PDF

http://arxiv.org/pdf/1905.03707
Read All
Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation Mapping

2019-05-02

Xi Yang, Bojian Wu, Issei Sato, Takeo Igarashi

arXiv_CV

arXiv_CV Attention Image_Classification Classification
Abstract

Deep neural networks (DNNs) have a high accuracy on image classification tasks. However, DNNs trained by such dataset with co-occurrence bias may rely on wrong features while making decisions for classification. It will greatly affect the transferability of pre-trained DNNs. In this paper, we propose an interactive method to direct classifiers paying attentions to the regions that are manually specified by the users, in order to mitigate the influence of co-occurrence bias. We test on CelebA dataset, the pre-trained AlexNet is fine-tuned to focus on the specific facial attributes based on the results of Grad-CAM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00593

PDF

http://arxiv.org/pdf/1905.00593
Read All
High quality, lightweight and adaptable TTS using LPCNet

2019-05-02

Zvi Kons, Slava Shechtman, Alex Sorin, Carmel Rabinovitz, Ron Hoory

arXiv_SD

arXiv_SD Prediction
Abstract

We present a lightweight adaptable neural TTS system with high quality output. The system is composed of three separate neural network blocks: prosody prediction, acoustic feature prediction and Linear Prediction Coding Net as a neural vocoder. This system can synthesize speech with close to natural quality while running 3 times faster than real-time on a standard CPU. The modular setup of the system allows for simple adaptation to new voices with a small amount of data. We first demonstrate the ability of the system to produce high quality speech when trained on large, high quality datasets. Following that, we demonstrate its adaptability by mimicking unseen voices using 5 to 20 minutes long datasets with lower recording quality. Large scale Mean Opinion Score quality and similarity tests are presented, showing that the system can adapt to unseen voices with quality gap of 0.12 and similarity gap of 3% compared to natural speech for male voices and quality gap of 0.35 and similarity of gap of 9 % for female voices.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00590

PDF

http://arxiv.org/pdf/1905.00590
Read All
Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling

2019-05-02

Jiachen Li, Hengbo Ma, Wei Zhan, Masayoshi Tomizuka

arXiv_AI

arXiv_AI Deep_Learning Prediction Recognition
Abstract

Coordination recognition and subtle pattern prediction of future trajectories play a significant role when modeling interactive behaviors of multiple agents. Due to the essential property of uncertainty in the future evolution, deterministic predictors are not sufficiently safe and robust. In order to tackle the task of probabilistic prediction for multiple, interactive entities, we propose a coordination and trajectory prediction system (CTPS), which has a hierarchical structure including a macro-level coordination recognition module and a micro-level subtle pattern prediction module which solves a probabilistic generation task. We illustrate two types of representation of the coordination variable: categorized and real-valued, and compare their effects and advantages based on empirical studies. We also bring the ideas of Bayesian deep learning into deep generative models to generate diversified prediction hypotheses. The proposed system is tested on multiple driving datasets in various traffic scenarios, which achieves better performance than baseline approaches in terms of a set of evaluation metrics. The results also show that using categorized coordination can better capture multi-modality and generate more diversified samples than the real-valued coordination, while the latter can generate prediction hypotheses with smaller errors with a sacrifice of sample diversity. Moreover, employing neural networks with weight uncertainty is able to generate samples with larger variance and diversity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00587

PDF

http://arxiv.org/pdf/1905.00587
Read All
Incremental Learning in Deep Convolutional Neural Networks Using Partial Network Sharing

2019-05-02

Syed Shakib Sarwar, Aayush Ankit, Kaushik Roy

arXiv_CV

arXiv_CV CNN Image_Classification Transfer_Learning Classification Recognition
Abstract

Deep convolutional neural network (DCNN) based supervised learning is a widely practiced approach for large-scale image classification. However, retraining these large networks to accommodate new, previously unseen data demands high computational time and energy requirements. Also, previously seen training samples may not be available at the time of retraining. We propose an efficient training methodology and incrementally growing DCNN to learn new tasks while sharing part of the base network. Our proposed methodology is inspired by transfer learning techniques, although it does not forget previously learned tasks. An updated network for learning new set of classes is formed using previously learned convolutional layers (shared from initial part of base network) with addition of few newly added convolutional kernels included in the later layers of the network. We employed a `clone-and-branch’ technique which allows the network to learn new tasks one after another without any performance loss in old tasks. We evaluated the proposed scheme on several recognition applications. The classification accuracy achieved by our approach is comparable to the regular incremental learning approach (where networks are updated with new training samples only, without any network sharing), while achieving energy efficiency, reduction in storage requirements, memory access and training time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1712.02719

PDF

http://arxiv.org/pdf/1712.02719
Read All
Recurrent-Convolution Approach to DeepFake Detection - State-Of-Art Results on FaceForensics++

2019-05-02

Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, Prem Natarajan

arXiv_CV

arXiv_CV Face Detection
Abstract

Spread of misinformation has become a significant problem, raising the importance of relevant detection methods. While there are different manifestations of misinformation, in this work we focus on detecting face manipulations in videos. Specifically, we attempt to detect Deepfake, Face2Face and FaceSwap manipulations in videos. We exploit the temporal dynamics of videos with a recurrent approach. Evaluation is done on FaceForensics++ dataset and our method improves upon the previous state-of-the-art up to 4.55%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00582

PDF

http://arxiv.org/pdf/1905.00582
Read All
Agnostic Lane Detection

2019-05-02

Yuenan Hou

arXiv_AI

arXiv_AI Segmentation Semantic_Segmentation Detection
Abstract

Lane detection is an important yet challenging task in autonomous driving, which is affected by many factors, e.g., light conditions, occlusions caused by other vehicles, irrelevant markings on the road and the inherent long and thin property of lanes. Conventional methods typically treat lane detection as a semantic segmentation task, which assigns a class label to each pixel of the image. This formulation heavily depends on the assumption that the number of lanes is pre-defined and fixed and no lane changing occurs, which does not always hold. To make the lane detection model applicable to an arbitrary number of lanes and lane changing scenarios, we adopt an instance segmentation approach, which first differentiates lanes and background and then classify each lane pixel into each lane instance. Besides, a multi-task learning paradigm is utilized to better exploit the structural information and the feature pyramid architecture is used to detect extremely thin lanes. Three popular lane detection benchmarks, i.e., TuSimple, CULane and BDD100K, are used to validate the effectiveness of our proposed algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03704

PDF

http://arxiv.org/pdf/1905.03704
Read All
Deterministic Leader Election in Programmable Matter

2019-05-02

Yuval Emek, Shay Kutten, Ron Lavi, William K. Moses Jr

arXiv_RO

arXiv_RO
Abstract

Addressing a fundamental problem in programmable matter, we present the first deterministic algorithm to elect a unique leader in a system of connected amoebots assuming only that amoebots are initially contracted. Previous algorithms either used randomization, made various assumptions (shapes with no holes, or known shared chirality), or elected several co-leaders in some cases. Some of the building blocks we introduce in constructing the algorithm are of interest by themselves, especially the procedure we present for reaching common chirality among the amoebots. Given the leader election and the chirality agreement building block, it is known that various tasks in programmable matter can be performed or improved. The main idea of the new algorithm is the usage of the ability of the amoebots to move, which previous leader election algorithms have not used.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00580

PDF

http://arxiv.org/pdf/1905.00580
Read All
Argument Identification in Public Comments from eRulemaking

2019-05-02

Vlad Eidelman, Brian Grom

arXiv_CL

arXiv_CL Classification
Abstract

Administrative agencies in the United States receive millions of comments each year concerning proposed agency actions during the eRulemaking process. These comments represent a diversity of arguments in support and opposition of the proposals. While agencies are required to identify and respond to substantive comments, they have struggled to keep pace with the volume of information. In this work we address the tasks of identifying argumentative text, classifying the type of argument claims employed, and determining the stance of the comment. First, we propose a taxonomy of argument claims based on an analysis of thousands of rules and millions of comments. Second, we collect and semi-automatically bootstrap annotations to create a dataset of millions of sentences with argument claim type annotation at the sentence level. Third, we build a system for automatically determining argumentative spans and claim type using our proposed taxonomy in a hierarchical classification model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00572

PDF

http://arxiv.org/pdf/1905.00572
Read All
26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone

2019-05-02

Wei Niu, Xiaolong Ma, Yanzhi Wang, Bin Ren

arXiv_CV

arXiv_CV Optimization Inference
Abstract

With the rapid emergence of a spectrum of high-end mobile devices, many applications that required desktop-level computation capability formerly can now run on these devices without any problem. However, without a careful optimization, executing Deep Neural Networks (a key building block of the real-time video stream processing that is the foundation of many popular applications) is still challenging, specifically, if an extremely low latency or high accuracy inference is needed. This work presents CADNN, a programming framework to efficiently execute DNN on mobile devices with the help of advanced model compression (sparsity) and a set of thorough architecture-aware optimization. The evaluation result demonstrates that CADNN outperforms all the state-of-the-art dense DNN execution frameworks like TensorFlow Lite and TVM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00571

PDF

http://arxiv.org/pdf/1905.00571
Read All
Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications

2019-05-02

Pouya Pezeshkpour, Yifan Tian, Sameer Singh

arXiv_CL

arXiv_CL Adversarial Knowledge_Graph Knowledge Embedding Optimization Prediction Relation
Abstract

Representing entities and relations in an embedding space is a well-studied approach for machine learning on relational data. Existing approaches, however, primarily focus on improving accuracy and overlook other aspects such as robustness and interpretability. In this paper, we propose adversarial modifications for link prediction models: identifying the fact to add into or remove from the knowledge graph that changes the prediction for a target fact after the model is retrained. Using these single modifications of the graph, we identify the most influential fact for a predicted link and evaluate the sensitivity of the model to the addition of fake facts. We introduce an efficient approach to estimate the effect of such modifications by approximating the change in the embeddings when the knowledge graph changes. To avoid the combinatorial search over all possible facts, we train a network to decode embeddings to their corresponding graph components, allowing the use of gradient-based optimization to identify the adversarial modification. We use these techniques to evaluate the robustness of link prediction models (by measuring sensitivity to additional facts), study interpretability through the facts most responsible for predictions (by identifying the most influential neighbors), and detect incorrect facts in the knowledge base.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00563

PDF

http://arxiv.org/pdf/1905.00563
Read All
Large-scale weakly-supervised pre-training for video action recognition

2019-05-02

Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan

arXiv_CV

arXiv_CV Action_Recognition Transfer_Learning Recognition
Abstract

Current fully-supervised video datasets consist of only a few hundred thousand videos and fewer than a thousand domain-specific labels. This hinders the progress towards advanced video architectures. This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition. Our primary empirical finding is that pre-training at a very large scale (over 65 million videos), despite on noisy social-media videos and hashtags, substantially improves the state-of-the-art on three challenging public action recognition datasets. Further, we examine three questions in the construction of weakly-supervised video action datasets. First, given that actions involve interactions with objects, how should one construct a verb-object pre-training label space to benefit transfer learning the most? Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning? Finally, actions are generally less well-localized in long videos vs. short videos; since action labels are provided at a video level, how should one choose video clips for best performance, given some fixed budget of number or minutes of videos?

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00561

PDF

http://arxiv.org/pdf/1905.00561
Read All
Adaptive Intelligent Secondary Control of Microgrids Using a Biologically-Inspired Reinforcement Learning

2019-05-02

Mohammad Jafari, Vahid Sarfi, Amir Ghasemkhani, Hanif Livani, Lei Yang, Hao Xu

arXiv_AI

arXiv_AI Tracking Reinforcement_Learning
Abstract

In this paper, a biologically-inspired adaptive intelligent secondary controller is developed for microgrids to tackle system dynamics uncertainties, faults, and/or disturbances. The developed adaptive biologically-inspired controller adopts a novel computational model of emotional learning in mammalian limbic system. The learning capability of the proposed biologically-inspired intelligent controller makes it a promising approach to deal with the power system non-linear and volatile dynamics without increasing the controller complexity, and maintain the voltage and frequency stabilities by using an efficient reference tracking mechanism. The performance of the proposed intelligent secondary controller is validated in terms of the voltage and frequency absolute errors in the simulated microgrid. Simulation results highlight the efficiency and robustness of the proposed intelligent controller under the fault conditions and different system uncertainties compared to other benchmark controllers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00557

PDF

http://arxiv.org/pdf/1905.00557
Read All
TimbreTron: A WaveNet)) Pipeline for Musical Timbre Transfer

2019-05-02

Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger B. Grosse

arXiv_SD

arXiv_SD Style_Transfer CNN
Abstract

In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having a representation that allows independent manipulation of timbre as well as high-quality waveform generation. We introduce TimbreTron, a method for musical timbre transfer which applies “image” domain style transfer to a time-frequency representation of the audio signal, and then produces a high-quality waveform using a conditional WaveNet synthesizer. We show that the Constant Q Transform (CQT) representation is particularly well-suited to convolutional architectures due to its approximate pitch equivariance. Based on human perceptual evaluations, we confirmed that TimbreTron recognizably transferred the timbre while otherwise preserving the musical content, for both monophonic and polyphonic samples.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.09620

PDF

http://arxiv.org/pdf/1811.09620
Read All
Billion-scale semi-supervised learning for image classification

2019-05-02

I. Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, Dhruv Mahajan

arXiv_CV

arXiv_CV CNN Image_Classification Classification Recommendation
Abstract

This paper presents a study of semi-supervised learning with large convolutional networks. We propose a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images (up to 1 billion). Our main goal is to improve the performance for a given target architecture, like ResNet-50 or ResNext. We provide an extensive analysis of the success factors of our approach, which leads us to formulate some recommendations to produce high-accuracy models for image classification with semi-supervised learning. As a result, our approach brings important gains to standard architectures for image, video and fine-grained classification. For instance, by leveraging one billion unlabelled images, our learned vanilla ResNet-50 achieves 81.2% top-1 accuracy on the ImageNet benchmark.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00546

PDF

http://arxiv.org/pdf/1905.00546
Read All
Conditioning LSTM Decoder and Bi-directional Attention Based Question Answering System

2019-05-02

Heguang Liu

arXiv_AI

arXiv_AI Attention RNN Prediction
Abstract

Applying neural-networks on Question Answering has gained increasing popularity in recent years. In this paper, I implemented a model with Bi-directional attention flow layer, connected with a Multi-layer LSTM encoder, connected with one start-index decoder and one conditioning end-index decoder. I introduce a new end-index decoder layer, conditioning on start-index output. The Experiment shows this has increased model performance by 15.16%. For prediction, I proposed a new smart-span equation, rewarding both short answer length and high probability in start-index and end-index, which further improved the prediction accuracy. The best single model achieves an F1 score of 73.97% and EM score of 64.95% on test set.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.02019

PDF

http://arxiv.org/pdf/1905.02019
Read All
DPSNet: End-to-end Deep Plane Sweep Stereo

2019-05-02

Sunghoon Im, Hae-Gon Jeon, Stephen Lin, In So Kweon

arXiv_CV

arXiv_CV CNN Deep_Learning
Abstract

Multiview stereo aims to reconstruct scene depth from images acquired by a camera under arbitrary motion. Recent methods address this problem through deep learning, which can utilize semantic cues to deal with challenges such as textureless and reflective regions. In this paper, we present a convolutional neural network called DPSNet (Deep Plane Sweep Network) whose design is inspired by best practices of traditional geometry-based approaches for dense depth reconstruction. Rather than directly estimating depth and/or optical flow correspondence from image pairs as done in many previous deep learning methods, DPSNet takes a plane sweep approach that involves building a cost volume from deep features using the plane sweep algorithm, regularizing the cost volume via a context-aware cost aggregation, and regressing the dense depth map from the cost volume. The cost volume is constructed using a differentiable warping process that allows for end-to-end training of the network. Through the effective incorporation of conventional multiview stereo concepts within a deep learning framework, DPSNet achieves state-of-the-art reconstruction results on a variety of challenging datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00538

PDF

http://arxiv.org/pdf/1905.00538
Read All
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

2019-05-02

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

arXiv_AI

arXiv_AI Transfer_Learning
Abstract

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently come close to the level of non-expert humans, suggesting limited headroom for further research. This paper recaps lessons learned from the GLUE benchmark and presents SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a new public leaderboard. SuperGLUE will be available soon at super.gluebenchmark.com.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00537

PDF

http://arxiv.org/pdf/1905.00537
Read All
An Efficient Reachability-Based Framework for Provably Safe Autonomous Navigation in Unknown Environments

2019-05-01

Andrea Bajcsy, Somil Bansal, Eli Bronstein, Varun Tolani, Claire J. Tomlin

arXiv_RO

arXiv_RO Face
Abstract

Real-world autonomous vehicles often operate in a priori unknown environments. Since most of these systems are safety-critical, it is important to ensure they operate safely in the face of environment uncertainty, such as unseen obstacles. Current safety analysis tools enable autonomous systems to reason about safety given full information about the state of the environment a priori. However, these tools do not scale well to scenarios where the environment is being sensed in real time, such as during navigation tasks. In this work, we propose a novel, real-time safety analysis method based on Hamilton-Jacobi reachability that provides strong safety guarantees despite environment uncertainty. Our safety method is planner-agnostic and provides guarantees for a variety of mapping sensors. We demonstrate our approach in simulation and in hardware to provide safety guarantees around a state-of-the-art vision-based, learning-based planner.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00532

PDF

http://arxiv.org/pdf/1905.00532
Read All
Dynamic Transfer Learning for Named Entity Recognition

2019-05-01

Parminder Bhatia, Kristjan Arumae, Busra Celikkaya

arXiv_CL

arXiv_CL Transfer_Learning Optimization Recognition
Abstract

State-of-the-art named entity recognition (NER) systems have been improving continuously using neural architectures over the past several years. However, many tasks including NER require large sets of annotated data to achieve such performance. In particular, we focus on NER from clinical notes, which is one of the most fundamental and critical problems for medical text analysis. Our work centers on effectively adapting these neural architectures towards low-resource settings using parameter transfer methods. We complement a standard hierarchical NER model with a general transfer learning framework consisting of parameter sharing between the source and target tasks, and showcase scores significantly above the baseline architecture. These sharing schemes require an exponential search over tied parameter sets to generate an optimal configuration. To mitigate the problem of exhaustively searching for model optimization, we propose the Dynamic Transfer Networks (DTN), a gated architecture which learns the appropriate parameter sharing scheme between source and target datasets. DTN achieves the improvements of the optimized transfer learning framework with just a single training setting, effectively removing the need for exponential search.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.05288

PDF

http://arxiv.org/pdf/1812.05288
Read All
AI-Powered Text Generation for Harmonious Human-Machine Interaction: Current State and Future Directions

2019-05-01

Qiuyun Zhang, Bin Guo, Hao Wang, Yunji Liang, Shaoyang Hao, Zhiwen Yu

arXiv_AI

arXiv_AI Text_Generation Survey Deep_Learning
Abstract

In the last two decades, the landscape of text generation has undergone tremendous changes and is being reshaped by the success of deep learning. New technologies for text generation ranging from template-based methods to neural network-based methods emerged. Meanwhile, the research objectives have also changed from generating smooth and coherent sentences to infusing personalized traits to enrich the diversification of newly generated content. With the rapid development of text generation solutions, one comprehensive survey is urgent to summarize the achievements and track the state of the arts. In this survey paper, we present the general systematical framework, illustrate the widely utilized models and summarize the classic applications of text generation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01984

PDF

http://arxiv.org/pdf/1905.01984
Read All
Smoothed Dilated Convolutions for Improved Dense Prediction

2019-05-01

Zhengyang Wang, Shuiwang Ji

arXiv_CV

arXiv_CV CNN Prediction
Abstract

Dilated convolutions, also known as atrous convolutions, have been widely explored in deep convolutional neural networks (DCNNs) for various dense prediction tasks. However, dilated convolutions suffer from the gridding artifacts, which hampers the performance. In this work, we propose two simple yet effective degridding methods by studying a decomposition of dilated convolutions. Unlike existing models, which explore solutions by focusing on a block of cascaded dilated convolutional layers, our methods address the gridding artifacts by smoothing the dilated convolution itself. In addition, we point out that the two degridding approaches are intrinsically related and define separable and shared (SS) operations, which generalize the proposed methods. We further explore SS operations in view of operations on graphs and propose the SS output layer, which is able to smooth the entire DCNNs by only replacing the output layer. We evaluate our degridding methods and the SS output layer thoroughly, and visualize the smoothing effect through effective receptive field analysis. Results show that our methods degridding yield consistent improvements on the performance of dense prediction tasks, while adding negligible amounts of extra training parameters. And the SS output layer improves the performance significantly and is very efficient in terms of number of training parameters.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.08931

PDF

http://arxiv.org/pdf/1808.08931
Read All
RRPN: Radar Region Proposal Network for Object Detection in Autonomous Vehicles

2019-05-01

Ramin Nabati, Hairong Qi

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Region proposal algorithms play an important role in most state-of-the-art two-stage object detection networks by hypothesizing object locations in each image. Nonetheless, region proposal generators are known to be the bottleneck in these two-stage object detection networks, making them slow and not suitable for real-time applications such as autonomous vehicles. In this paper we introduce a Radar-based real-time region proposal algorithm for object detection in autonomous vehicles. The proposed Regions of Interest (RoI) are generated by mapping Radar detections to the image coordinate system and generating pre-defined anchor boxes as object proposals at each mapped Radar point. We then perform transformation and scaling operations on the generated anchors based on objects’ distance to provide better fit for the detected objects. We evaluate our method on the newly released NuScenes dataset using the Fast R-CNN object detection network. Compared to the Selective Search object proposal algorithm, our model operates more than 100x faster while at the same time achieves higher detection precision and recall. Code has been made publicly available at https://github.com/mrnabati/RRPN.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00526

PDF

http://arxiv.org/pdf/1905.00526
Read All
3D BAT: A Semi-Automatic, Web-based 3D Annotation Toolbox for Full-Surround, Multi-Modal Data Streams

2019-05-01

Walter Zimmer, Akshay Rangesh, Mohan Trivedi

arXiv_CV

arXiv_CV Tracking Prediction
Abstract

In this paper, we focus on obtaining 2D and 3D labels, as well as track IDs for objects on the road with the help of a novel 3D Bounding Box Annotation Toolbox (3D BAT). Our open source, web-based 3D BAT incorporates several smart features to improve usability and efficiency. For instance, this annotation toolbox supports semi-automatic labeling of tracks using interpolation, which is vital for downstream tasks like tracking, motion planning and motion prediction. Moreover, annotations for all camera images are automatically obtained by projecting annotations from 3D space into the image domain. In addition to the raw image and point cloud feeds, a Masterview consisting of the top view (bird’s-eye-view), side view and front views is made available to observe objects of interest from different perspectives. Comparisons of our method with other publicly available annotation tools reveal that 3D annotations can be obtained faster and more efficiently by using our toolbox.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00525

PDF

http://arxiv.org/pdf/1905.00525
Read All
Optimal Multi-view Correction of Local Affine Frames

2019-05-01

Ivan Eichhardt, Daniel Barath

arXiv_CV

arXiv_CV Object_Detection Face Pose_Estimation Detection
Abstract

The technique requires the epipolar geometry to be pre-estimated between each image pair. It exploits the constraints which the camera movement implies, in order to apply a closed-form correction to the parameters of the input affinities. Also, it is shown that the rotations and scales obtained by partially affine-covariant detectors, e.g., AKAZE or SIFT, can be completed to be full affine frames by the proposed algorithm. It is validated both in synthetic experiments and on publicly available real-world datasets that the method always improves the output of the evaluated affine-covariant feature detectors. As a by-product, these detectors are compared and the ones obtaining the most accurate affine frames are reported. For demonstrating the applicability, we show that the proposed technique as a pre-processing step improves the accuracy of pose estimation for a camera rig, surface normal and homography estimation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00519

PDF

http://arxiv.org/pdf/1905.00519
Read All
From Abstractions to 'Natural Languages' for Planning Agents

2019-05-01

Yu Zhang, Li Wang

arXiv_AI

arXiv_AI
Abstract

Despite our unique ability to use natural languages, we know little about their origins like how they are created and evolved. The answer lies deeply in the evolution of our cognitive and social abilities over a very long period of time which is beyond our scrutiny. Existing studies on the origin of languages are often focused on the emergence of specific language features (such as recursion) without supporting a comprehensive view. Investigation of restricted language representations, such as temporal logic, unfortunately does not reveal much about the impetus underlying language formation and evolution, since much of their construction is based on natural languages themselves. In this paper, we investigate the origin of “natural languages” in a restricted setting involving only planning agents. Similar to a common view that considers languages as a tool for grounding symbols to semantic meanings, we take the view that a language for planning agents is a tool for grounding symbols to physical configurations. From this perspective, a language is used by the agents to coordinate their behaviors during planning. With a few assumptions, we show that language is closely connected to a type of domain abstractions, based on which a language can be constructed. We study how such abstractions can be identified and discuss how to use them during planning. We apply our method to several domains, discuss the results, and relaxation of the assumptions made.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00517

PDF

http://arxiv.org/pdf/1905.00517
Read All
Land Use and Land Cover Classification Using Deep Learning Techniques

2019-05-01

Nagesh Kumar Uba

arXiv_CV

arXiv_CV CNN Classification Deep_Learning
Abstract

Large datasets of sub-meter aerial imagery represented as orthophoto mosaics are widely available today, and these data sets may hold a great deal of untapped information. This imagery has a potential to locate several types of features; for example, forests, parking lots, airports, residential areas, or freeways in the imagery. However, the appearances of these things vary based on many things including the time that the image is captured, the sensor settings, processing done to rectify the image, and the geographical and cultural context of the region captured by the image. This thesis explores the use of deep convolutional neural networks to classify land use from very high spatial resolution (VHR), orthorectified, visible band multispectral imagery. Recent technological and commercial applications have driven the collection a massive amount of VHR images in the visible red, green, blue (RGB) spectral bands, this work explores the potential for deep learning algorithms to exploit this imagery for automatic land use/ land cover (LULC) classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00510

PDF

http://arxiv.org/pdf/1905.00510
Read All
Functional Object-Oriented Network: Considering Robot's Capability in Human-Robot Collaboration

2019-05-01

David Paulius, Kelvin Sheng Pei Dong, Yu Sun

arXiv_AI

arXiv_AI Knowledge
Abstract

In this work, we explore human-robot collaborative planning using the \emph{functional object-oriented network} (FOON), a graphical knowledge representation for manipulations that can be performed by domestic robots. The knowledge retrieval procedure, used for acquiring the necessary steps (as a task tree) to solve a given problem, is modified to account for weights that reflect the difficulty of performing motions in a universal FOON. These weights are given as success rates, which describe the likelihood of a robot successfully completing the action(s) on its own. However, certain manipulations may be too difficult for it to perform on its own based on its own physical limitations. To make it easier for the robot, a human can assist to the minimal extent needed to perform the activity to completion by identifying those actions with low success rates for the human to do. From our experiments, it is shown that tasks can be executed successfully with the aid of the assistant. Our results show that the best task tree can be found with the adequate chance of success in completing three activities while minimizing the effort needed from the human assistant.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00502

PDF

http://arxiv.org/pdf/1905.00502
Read All
Hierarchically Consistent Motion Primitives for Quadrotor Coordination

2019-05-01

Marijan Vukosavljev, Angela P. Schoellig, Mireille E. Broucke

arXiv_RO

arXiv_RO
Abstract

We present a hierarchical framework for motion planning of a large collection of agents. The proposed framework starts from low level motion primitives over a gridded workspace and provides a set of rules for constructing higher level motion primitives. Our hierarchical approach is highly scalable and robust making it an ideal tool for planning for multi-agent systems. Results are demonstrated experimentally on a collection of quadrotors that must navigate a cluttered environment while maintaining a formation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00500

PDF

http://arxiv.org/pdf/1905.00500
Read All

45/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL