Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Synchronous Bidirectional Neural Machine Translation

2019-05-13

Long Zhou, Jiajun Zhang, Chengqing Zong

arXiv_AI

arXiv_AI NMT
Abstract

Existing approaches to neural machine translation (NMT) generate the target language sequence token by token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectional neural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST Chinese-English, WMT14 English-German, and WMT18 Russian-English translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49 and 1.04 BLEU points respectively, and obtains the state-of-the-art performance on Chinese-English and English-German translation tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04847

PDF

http://arxiv.org/pdf/1905.04847
Read All
Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model

2019-05-13

Na Pang, Li Qian, Weimin Lyu, Jin-Dong Yang

arXiv_CL

arXiv_CL Knowledge Transfer_Learning Relation
Abstract

Computational chemistry develops fast in recent years due to the rapid growth and breakthroughs in AI. Thanks for the progress in natural language processing, researchers can extract more fine-grained knowledge in publications to stimulate the development in computational chemistry. While the works and corpora in chemical entity extraction have been restricted in the biomedicine or life science field instead of the chemistry field, we build a new corpus in chemical bond field annotated for 7 types of entities: compound, solvent, method, bond, reaction, pKa and pKa value. This paper presents a novel BERT-CRF model to build scientific chemical data chains by extracting 7 chemical entities and relations from publications. And we propose a joint model to extract the entities and relations simultaneously. Experimental results on our Chemical Special Corpus demonstrate that we achieve state-of-art and competitive NER performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05615

PDF

https://arxiv.org/pdf/1905.05615
Read All
Learning and Exploiting Multiple Subgoals for Fast Exploration in Hierarchical Reinforcement Learning

2019-05-13

Libo Xing

arXiv_AI

arXiv_AI Sparse Reinforcement_Learning
Abstract

Hierarchical Reinforcement Learning (HRL) exploits temporally extended actions, or options, to make decisions from a higher-dimensional perspective to alleviate the sparse reward problem, one of the most challenging problems in reinforcement learning. The majority of existing HRL algorithms require either significant manual design with respect to the specific environment or enormous exploration to automatically learn options from data. To achieve fast exploration without using manual design, we devise a multi-goal HRL algorithm, consisting of a high-level policy Manager and a low-level policy Worker. The Manager provides the Worker multiple subgoals at each time step. Each subgoal corresponds to an option to control the environment. Although the agent may show some confusion at the beginning of training since it is guided by three diverse subgoals, the agent’s behavior policy will quickly learn how to respond to multiple subgoals from the high-level controller on different occasions. By exploiting multiple subgoals, the exploration efficiency is significantly improved. We conduct experiments in Atari’s Montezuma’s Revenge environment, a well-known sparse reward environment, and in doing so achieve the same performance as state-of-the-art HRL methods with substantially reduced training time cost.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05180

PDF

https://arxiv.org/pdf/1905.05180
Read All
Extending Policy from One-Shot Learning through Coaching

2019-05-13

Mythra V. Balakuntala, Vishnunandan L. N. Venkatesh, Jyothsna Padmakumar Bindu, Richard M. Voyles, Juan Wachs

arXiv_RO

arXiv_RO Reinforcement_Learning
Abstract

Humans generally teach their fellow collaborators to perform tasks through a small number of demonstrations. The learnt task is corrected or extended to meet specific task goals by means of coaching. Adopting a similar framework for teaching robots through demonstrations and coaching makes teaching tasks highly intuitive. Unlike traditional Learning from Demonstration (LfD) approaches which require multiple demonstrations, we present a one-shot learning from demonstration approach to learn tasks. The learnt task is corrected and generalized using two layers of evaluation/modification. First, the robot self-evaluates its performance and corrects the performance to be closer to the demonstrated task. Then, coaching is used as a means to extend the policy learnt to be adaptable to varying task goals. Both the self-evaluation and coaching are implemented using reinforcement learning (RL) methods. Coaching is achieved through human feedback on desired goal and action modification to generalize to specified task goals. The proposed approach is evaluated with a scooping task, by presenting a single demonstration. The self-evaluation framework aims to reduce the resistance to scooping in the media. To reduce the search space for RL, we bootstrap the search using least resistance path obtained using resistive force theory. Coaching is used to generalize the learnt task policy to transfer the desired quantity of material. Thus, the proposed method provides a framework for learning tasks from one demonstration and generalizing it using human feedback through coaching.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04841

PDF

http://arxiv.org/pdf/1905.04841
Read All
Evidence Propagation and Consensus Formation in Noisy Environments

2019-05-13

Michael Crosscombe, Jonathan Lawry

arXiv_AI

arXiv_AI
Abstract

We study the effectiveness of consensus formation in multi-agent systems where there is both belief updating based on direct evidence and also belief combination between agents. In particular, we consider the scenario in which a population of agents collaborate on the best-of-n problem where the aim is to reach a consensus about which is the best (alternatively, true) state from amongst a set of states, each with a different quality value (or level of evidence). Agents’ beliefs are represented within Dempster-Shafer theory by mass functions and we invegate the macro-level properties of four well-known belief combination operators for this multi-agent consensus formation problem: Dempster’s rule, Yager’s rule, Dubois & Prade’s operator and the averaging operator. The convergence properties of the operators are considered and simulation experiments are conducted for different evidence rates and noise levels. Results show that a combination of updating from direct evidence and belief combination between agents results in better consensus to the best state than does evidence updating alone. We also find that in this framework the operators are robust to noise. Broadly, Dubois & Prade’s operator results in better convergence to the best state. Finally, we consider how well the Dempster-Shafer approach to the best-of-n problem scales to large numbers of states.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04840

PDF

http://arxiv.org/pdf/1905.04840
Read All
Challenges in Building Intelligent Open-domain Dialog Systems

2019-05-13

Minlie Huang, Xiaoyan Zhu, Jianfeng Gao

arXiv_AI

arXiv_AI Review
Abstract

There is a resurgent interest in developing intelligent open-domain dialog systems due to the availability of large amounts of conversational data and the recent progress on neural approaches to conversational AI. Unlike traditional task-oriented bots, an open-domain dialog system aims to establish long-term connections with users by satisfying the human need for communication, affection, and social belonging. This paper reviews the recent works on neural approaches that are devoted to addressing three challenges in developing such systems: semantics, consistency, and interactiveness. Semantics requires a dialog system to not only understand the content of the dialog but also identify user’s social needs during the conversation. Consistency requires the system to demonstrate a consistent personality to win users trust and gain their long-term confidence. Interactiveness refers to the system’s ability to generate interpersonal responses to achieve particular social goals such as entertainment, conforming, and task completion. The works we select to present here is based on our unique views and are by no means complete. Nevertheless, we hope that the discussion will inspire new research in developing more intelligent dialog systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05709

PDF

https://arxiv.org/pdf/1905.05709
Read All
A feature agnostic approach for glaucoma detection in OCT volumes

2019-05-13

Stefan Maetschke, Bhavna Antony, Hiroshi Ishikawa, Gadi Wollstein, Joel S. Schuman, Rahil Garvani

arXiv_CV

arXiv_CV Segmentation GAN CNN Classification Deep_Learning Detection
Abstract

Optical coherence tomography (OCT) based measurements of retinal layer thickness, such as the retinal nerve fibre layer (RNFL) and the ganglion cell with inner plexiform layer (GCIPL) are commonly used for the diagnosis and monitoring of glaucoma. Previously, machine learning techniques have utilized segmentation-based imaging features such as the peripapillary RNFL thickness and the cup-to-disc ratio. Here, we propose a deep learning technique that classifies eyes as healthy or glaucomatous directly from raw, unsegmented OCT volumes of the optic nerve head (ONH) using a 3D Convolutional Neural Network (CNN). We compared the accuracy of this technique with various feature-based machine learning algorithms and demonstrated the superiority of the proposed deep learning based method. Logistic regression was found to be the best performing classical machine learning technique with an AUC of 0.89. In direct comparison, the deep learning approach achieved a substantially higher AUC of 0.94 with the additional advantage of providing insight into which regions of an OCT volume are important for glaucoma detection. Computing Class Activation Maps (CAM), we found that the CNN identified neuroretinal rim and optic disc cupping as well as the lamina cribrosa (LC) and its surrounding areas as the regions significantly associated with the glaucoma classification. These regions anatomically correspond to the well established and commonly used clinical markers for glaucoma diagnosis such as increased cup volume, cup diameter, and neuroretinal rim thinning at the superior and inferior segments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.04855

PDF

http://arxiv.org/pdf/1807.04855
Read All
Multi-Agent Image Classification via Reinforcement Learning

2019-05-13

Hossein K. Mousavi, Mohammadreza Nazari, Martin Takáč, Nader Motee

arXiv_CV

arXiv_CV Reinforcement_Learning Image_Classification Classification
Abstract

We investigate a classification problem using multiple mobile agents that are capable of collecting (partial) pose-dependent observations of an unknown environment. The objective is to classify an image (e.g, map of a large area) over a finite time horizon. We propose a network architecture on how agents should form a local belief, take local actions, extract relevant features and specification from their raw partial observations. Agents are allowed to exchange information with their neighboring agents and run a decentralized consensus protocol to update their own beliefs. It is shown how reinforcement learning techniques can be utilized to achieve decentralized implementation of the classification problem. Our experimental results on MNIST handwritten digit dataset demonstrates the effectiveness of our proposed framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04835

PDF

http://arxiv.org/pdf/1905.04835
Read All
Learning and Planning in Feature Deception Games

2019-05-13

Zheyuan Ryan Shi, Ariel D. Procaccia, Kevin S. Chan, Sridhar Venkatesan, Noam Ben-Asher, Nandi O. Leslie, Charles Kamhoua, Fei Fang

arXiv_AI

arXiv_AI Adversarial
Abstract

Today’s high-stakes adversarial interactions feature attackers who constantly breach the ever-improving security measures. Deception mitigates the defender’s loss by misleading the attacker to make suboptimal decisions. In order to formally reason about deception, we introduce the feature deception game (FDG), a domain-independent game-theoretic model and present a learning and planning framework. We make the following contributions. (1) We show that we can uniformly learn the adversary’s preferences using data from a modest number of deception strategies. (2) We propose an approximation algorithm for finding the optimal deception strategy and show that the problem is NP-hard. (3) We perform extensive experiments to empirically validate our methods and results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04833

PDF

http://arxiv.org/pdf/1905.04833
Read All
A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark

2019-05-13

Yinglu Liu, Hailin Shi, Yue Si, Hao Shen, Xiaobo Wang, Tao Mei

arXiv_CV

arXiv_CV Knowledge Face Detection Face_Detection Recognition Face_Recognition
Abstract

Face parsing, which is to assign a semantic label to each pixel in face images, has recently attracted increasing interest due to its huge application potentials. Although many face related fields (e.g., face recognition and face detection) have been well studied for many years, the existing datasets for face parsing are still severely limited in terms of the scale and quality, e.g., the widely used Helen dataset only contains 2,330 images. This is mainly because pixel-level annotation is a high cost and time-consuming work, especially for the facial parts without clear boundaries. The lack of accurate annotated datasets becomes a major obstacle in the progress of face parsing task. It is a feasible way to utilize dense facial landmarks to guide the parsing annotation. However, annotating dense landmarks on human face encounters the same issues as the parsing annotation. To overcome the above problems, in this paper, we develop a high-efficiency framework for face parsing annotation, which considerably simplifies and speeds up the parsing annotation by two consecutive modules. Benefit from the proposed framework, we construct a new Dense Landmark Guided Face Parsing (LaPa) benchmark. It consists of 22,000 face images with large variations in expression, pose, occlusion, etc. Each image is provided with accurate annotation of a 11-category pixel-level label map along with coordinates of 106-point landmarks. To the best of our knowledge, it is currently the largest public dataset for face parsing. To make full use of our LaPa dataset with abundant face shape and boundary priors, we propose a simple yet effective Boundary-Sensitive Parsing Network (BSPNet). Our network is taken as a baseline model on the proposed LaPa dataset, and meanwhile, it achieves the state-of-the-art performance on the Helen dataset without resorting to extra face alignment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04830

PDF

http://arxiv.org/pdf/1905.04830
Read All
Leveraging synthetic imagery for collision-at-sea avoidance

2019-05-13

Chris M. Ward, Josh Harguess, Alexander G. Corelli

arXiv_CV

arXiv_CV CNN
Abstract

Maritime collisions involving multiple ships are considered rare, but in 2017 several United States Navy vessels were involved in fatal at-sea collisions that resulted in the death of seventeen American Servicemembers. The experimentation introduced in this paper is a direct response to these incidents. We propose a shipboard Collision-At-Sea avoidance system, based on video image processing, that will help ensure the safe stationing and navigation of maritime vessels. Our system leverages a convolutional neural network trained on synthetic maritime imagery in order to detect nearby vessels within a scene, perform heading analysis of detected vessels, and provide an alert in the presence of an inbound vessel. Additionally, we present the Navigational Hazards - Synthetic (NAVHAZ-Synthetic) dataset. This dataset, is comprised of one million annotated images of ten vessel classes observed from virtual vessel-mounted cameras, as well as a human “Topside Lookout” perspective. NAVHAZ-Synthetic includes imagery displaying varying sea-states, lighting conditions, and optical degradations such as fog, sea-spray, and salt-accumulation. We present our results on the use of synthetic imagery in a computer vision based collision-at-sea warning system with promising performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04828

PDF

http://arxiv.org/pdf/1905.04828
Read All
Draining the Water Hole: Mitigating Social Engineering Attacks

2019-05-13

Zheyuan Ryan Shi, Aaron Schlenker, Brian Hay, Fei Fang

arXiv_AI

arXiv_AI Salient GAN
Abstract

Cyber adversaries have increasingly leveraged social engineering attacks to breach large organizations and threaten the well-being of today’s online users. One clever technique, the ``watering hole’’ attack, compromises a legitimate website to execute drive-by download attacks by redirecting users to another malicious domain. We introduce a game-theoretic model that captures the salient aspects for an organization protecting itself from a watering hole attack by altering the environment information in web traffic so as to deceive the attackers. Our main contributions are (1) a novel Social Engineering Deception (SED) game model that features a continuous action set for the attacker, (2) an in-depth analysis of the SED model to identify computationally feasible real-world cases, and (3) an iterative algorithm which solves for the optimal protection policy using (i) a characterization of websites that may be compromised, (ii) an LP-relaxation with optimality condition, and (iii) the column generation method. A Chrome extension is being built to field our algorithms in the real world.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.00586

PDF

http://arxiv.org/pdf/1901.00586
Read All
Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

2019-05-13

Yilun Du, Karthik Narasimhan

arXiv_AI

arXiv_AI Reinforcement_Learning Prediction
Abstract

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment. A wide variety of domains have dynamics that share common foundations like the laws of classical mechanics, which are rarely exploited by existing algorithms. In fact, humans continuously acquire and use such dynamics priors to easily adapt to operating in new environments. In this work, we propose an approach to learn task-agnostic dynamics priors from videos and incorporate them into an RL agent. Our method involves pre-training a frame predictor on task-agnostic physics videos to initialize dynamics models (and fine-tune them) for unseen target environments. Our frame prediction architecture, SpatialNet, is designed specifically to capture localized physical phenomena and interactions. Our approach allows for both faster policy learning and convergence to better policies, outperforming competitive approaches on several different environments. We also demonstrate that incorporating this prior allows for more effective transfer between environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04819

PDF

http://arxiv.org/pdf/1905.04819
Read All
Programmable Spectrometry -- Per-pixel Classification of Materials using Learned Spectral Filters

2019-05-13

Vishwanath Saragadam, Aswin C. Sankaranarayanan

arXiv_CV

arXiv_CV Classification
Abstract

Many materials have distinct spectral profiles. This facilitates estimation of the material composition of a scene at each pixel by first acquiring its hyperspectral image, and subsequently filtering it using a bank of spectral profiles. This process is inherently wasteful since only a set of linear projections of the acquired measurements contribute to the classification task. We propose a novel programmable camera that is capable of producing images of a scene with an arbitrary spectral filter. We use this camera to optically implement the spectral filtering of the scene’s hyperspectral image with the bank of spectral profiles needed to perform per-pixel material classification. This provides gains both in terms of acquisition speed — since only the relevant measurements are acquired — and in signal-to-noise ratio — since we invariably avoid narrowband filters that are light inefficient. Given training data, we use a range of classical and modern techniques including SVMs and neural networks to identify the bank of spectral profiles that facilitate material classification. We verify the method in simulations on standard datasets as well as real data using a lab prototype of the camera.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04815

PDF

http://arxiv.org/pdf/1905.04815
Read All
Video Instance Segmentation

2019-05-12

Linjie Yang, Yuchen Fan, Ning Xu

arXiv_CV

arXiv_CV Video_Caption Segmentation Tracking Detection
Abstract

In this paper we present a new computer vision task, named video instance segmentation. The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain. To facilitate research on this new task, we propose a large-scale benchmark called YouTube-VIS, which consists of 2883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks. In addition, we propose a novel algorithm called MaskTrack R-CNN for this task. Our new method introduces a new tracking branch to Mask R-CNN to jointly perform the detection, segmentation and tracking tasks simultaneously. Finally, we evaluate the proposed method and several strong baselines on our new dataset. Experimental results clearly demonstrate the advantages of the proposed algorithm and reveal insight for future improvement. We believe the video instance segmentation task will motivate the community along the line of research for video understanding.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04804

PDF

http://arxiv.org/pdf/1905.04804
Read All
The Secret Lives of Names? Name Embeddings from Social Media

2019-05-12

Junting Ye, Steven Skiena

arXiv_CL

arXiv_CL Embedding
Abstract

Your name tells a lot about you: your gender, ethnicity and so on. It has been shown that name embeddings are more effective in representing names than traditional substring features. However, our previous name embedding model is trained on private email data and are not publicly accessible. In this paper, we explore learning name embeddings from public Twitter data. We argue that Twitter embeddings have two key advantages: \textit{(i)} they can and will be publicly released to support research community. \textit{(ii)} even with a smaller training corpus, Twitter embeddings achieve similar performances on multiple tasks comparing to email embeddings. As a test case to show the power of name embeddings, we investigate the modeling of lifespans. We find it interesting that adding name embeddings can further improve the performances of models using demographic features, which are traditionally used for lifespan modeling. Through residual analysis, we observe that fine-grained groups (potentially reflecting socioeconomic status) are the latent contributing factors encoded in name embeddings. These were previously hidden to demographic models, and may help to enhance the predictive power of a wide class of research studies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04799

PDF

http://arxiv.org/pdf/1905.04799
Read All
DeepIlluminance: Contextual Illuminance Estimation via Deep Neural Networks

2019-05-12

Jun Zhang, Tong Zheng, Shengping Zhang, Meng Wang

arXiv_AI

arXiv_AI CNN Prediction
Abstract

Computational color constancy refers to the estimation of the scene illumination and makes the perceived color relatively stable under varying illumination. In the past few years, deep Convolutional Neural Networks (CNNs) have delivered superior performance in illuminant estimation. Several representative methods formulate it as a multi-label prediction problem by learning the local appearance of image patches using CNNs. However, these approaches inevitably make incorrect estimations for the ambiguous patches affected by their neighborhood contexts. Inaccurate local estimates are likely to bring in degraded performance when combining into a global prediction. To address the above issues, we propose a contextual deep network for patch-based illuminant estimation equipped with refinement. First, the contextual net with a center-surround architecture extracts local contextual features from image patches, and generates initial illuminant estimates and the corresponding color corrected patches. The patches are sampled based on the observation that pixels with large color differences describe the illumination well. Then, the refinement net integrates the input patches with the corrected patches in conjunction with the use of intermediate features to improve the performance. To train such a network with numerous parameters, we propose a stage-wise training strategy, in which the features and the predicted illuminant from previous stages are provided to the next learning stage with more finer estimates recovered. Experiments show that our approach obtains competitive performance on two illuminant estimation benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04791

PDF

http://arxiv.org/pdf/1905.04791
Read All
Structure from Articulated Motion: An Accurate and Stable Monocular 3D Reconstruction Approach without Training Data

2019-05-12

Onorina Kovalenko, Vladislav Golyanik, Jameel Malik, Ahmed Elhayek, Didier Stricker

arXiv_CV

arXiv_CV
Abstract

Recovery of articulated 3D structure from 2D observations is a challenging computer vision problem with many applications. Current learning-based approaches achieve state-of-the-art performance on public benchmarks but are limited to the specific types of objects and motions covered by the training datasets. Model-based approaches do not rely on training data but show lower accuracy on public benchmarks. In this paper, we introduce a new model-based method called Structure from Articulated Motion (SfAM). SfAM includes a new articulated structure term which ensures consistency of bone lengths throughout the whole image sequence and recovers a scene-specific configuration of the articulated structure. The proposed approach is highly robust to noisy 2D annotations, generalizes to arbitrary objects and motion types and does not rely on training data. It achieves state-of-the-art accuracy and scales across different scenarios which is shown in extensive experiments on public benchmarks and real video sequences.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04789

PDF

http://arxiv.org/pdf/1905.04789
Read All
Failure-Tolerant Connectivity Maintenance for Robot Swarms

2019-05-12

Vivek Shankar Varadharajan, Bram Adams, Giovanni Beltrame

arXiv_RO

arXiv_RO
Abstract

Connectivity maintenance plays a key role in achieving a desired global behavior among a swarm of robots. However, connectivity maintenance in realistic environments is hampered by lack of computation resources, low communication bandwidth, robot failures, and unstable links. In this paper, we propose a novel decentralized connectivity-preserving algorithm that can be deployed on top of other behaviors to enforce connectivity constraints. The algorithm takes a set of targets to be reached while keeping a minimum number of redundant links between robots, with the goal of guaranteeing bandwidth and reliability. Robots then incrementally build and maintain a communication backbone with the specified number of links. We empirically study the performance of the algorithm, analyzing its time to convergence, as well as robustness to faults injected into the backbone robots. Our results statistically demonstrate the algorithm’s ability to preserve the desired connectivity constraints and to reach the targets with up to 70 percent of individual robot failures in the communication backbone.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04771

PDF

http://arxiv.org/pdf/1905.04771
Read All
Real-Time Kinodynamic Motion Planning for Omnidirectional Mobile Robot Soccer using Rapidly-Exploring Random Tree in Dynamic Environment with Moving Obstacles

2019-05-12

Fahri Ali Rahman, Igi Ardiyanto, Adha Imam Cahyadi

arXiv_RO

arXiv_RO Adversarial Tracking
Abstract

RoboCup Middle Size League (RoboCup MSL) provides a standardized testbed for research on mobile robot navigation, multi-robot cooperation, communication and integration via robot soccer competition in which the environment is highly dynamic and adversarial. One of important research topic in such area is kinodynamic motion planning that plan the trajectory of the robot while avoiding obstacles and obeying its dynamics. Kinodynamic motion planning for omnidirectional robot based on kinodynamic-RRT* method is presented in this work. Trajectory tracking control to execute the planned trajectory is also considered in this work. Robot motion planning in translational and rotational direction are decoupled. Then we implemented kinodynamic-RRT* with double integrator model to plan the translational trajectory. The rotational trajectory is generated using minimum-time trajectory generator satisfying velocity and acceleration constraints. The planned trajectory is then tracked using PI-Control. To address changing environment, we developed concurrent sofware module for motion planning and trajectory tracking. The resulting system were applied and tested using RoboCup simulation system based on Robot Operating System (ROS). The simulation results that the motion planning system are able to generate collision-free trajectory and the trajectory tracking system are able to follow the generated trajectory. It is also shown that in highly dynamic environment the online scheme are able to re-plan the trajectory.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04762

PDF

http://arxiv.org/pdf/1905.04762
Read All
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

2019-05-12

Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, Alex C. Kot

arXiv_CV

arXiv_CV Action_Recognition Deep_Learning Recognition
Abstract

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: this http URL]

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04757

PDF

http://arxiv.org/pdf/1905.04757
Read All
Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

2019-05-12

Mengtian Li, Ersin Yumer, Deva Ramanan

arXiv_CV

arXiv_CV Object_Detection Segmentation NAS Image_Classification Semantic_Segmentation Optimization Video_Classification Classification Detection
Abstract

In most practical settings and theoretical analysis, one assumes that a model can be trained until convergence. However, the growing complexity of machine learning datasets and models may violate such assumptions. Moreover, current approaches for hyper-parameter tuning and neural architecture search tend to be limited by practical resource constraints. Therefore, we introduce a formal setting for studying training under the non-asymptotic, resource-constrained regime, i.e. budgeted training. We analyze the following problem: “given a dataset, algorithm, and resource budget, what is the best achievable performance?” We focus on the number of optimization iterations as the representative resource. Under such a setting, we show that it is critical to adjust the learning rate schedule according to the given budget. Among budget-aware learning schedules, we find simple linear decay to be both robust and high-performing. We support our claim through extensive experiments with state-of-the-art models on ImageNet (image classification), Cityscapes (semantic segmentation), MS COCO (object detection and instance segmentation), and Kinetics (video classification). We also analyze our results and find that the key to a good schedule is budgeted convergence, a phenomenon whereby the gradient vanishes at the end of each allowed budget. We also revisit existing approaches for fast convergence, and show that budget-aware learning schedules readily outperform such approaches under (the practical but under-explored) budgeted setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04753

PDF

http://arxiv.org/pdf/1905.04753
Read All
A Benchmark Study on Machine Learning Methods for Fake News Detection

2019-05-12

Junaed Younus Khan, Md. Tawkat Islam Khondaker, Anindya Iqbal, Sadia Afroz

arXiv_CL

arXiv_CL Deep_Learning Detection
Abstract

The proliferation of fake news and its propagation on social media have become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been attempted to detect it. However, most of those focused on a special type of news (such as political) and did not apply many advanced techniques. In this research, we conduct a benchmark study to assess the performance of different applicable approaches on three different datasets where the largest and most diversified one was developed by us. We also implemented some advanced deep learning models that have shown promising results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04749

PDF

http://arxiv.org/pdf/1905.04749
Read All
Approximated Oracle Filter Pruning for Destructive CNN Width Optimization

2019-05-12

Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han, Chenggang Yan

arXiv_CV

arXiv_CV Knowledge CNN Optimization Inference
Abstract

It is not easy to design and run Convolutional Neural Networks (CNNs) due to: 1) finding the optimal number of filters (i.e., the width) at each layer is tricky, given an architecture; and 2) the computational intensity of CNNs impedes the deployment on computationally limited devices. Oracle Pruning is designed to remove the unimportant filters from a well-trained CNN, which estimates the filters’ importance by ablating them in turn and evaluating the model, thus delivers high accuracy but suffers from intolerable time complexity, and requires a given resulting width but cannot automatically find it. To address these problems, we propose Approximated Oracle Filter Pruning (AOFP), which keeps searching for the least important filters in a binary search manner, makes pruning attempts by masking out filters randomly, accumulates the resulting errors, and finetunes the model via a multi-path framework. As AOFP enables simultaneous pruning on multiple layers, we can prune an existing very deep CNN with acceptable time cost, negligible accuracy drop, and no heuristic knowledge, or re-design a model which exerts higher accuracy and faster inference.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04748

PDF

http://arxiv.org/pdf/1905.04748
Read All
Low-Rank Tensor Modeling for Hyperspectral Unmixing Accounting for Spectral Variability

2019-05-12

Tales Imbiriba, Ricardo Augusto Borsoi, José Carlos Moreira Bermudez

arXiv_CV

arXiv_CV Regularization
Abstract

Traditional hyperspectral unmixing methods neglect the underlying variability of spectral signatures often obeserved in typical hyperspectral images, propagating these missmodeling errors throughout the whole unmixing process. Attempts to model material spectra as members of sets or as random variables tend to lead to severely ill-posed unmixing problems. To overcome this drawback, endmember variability has been handled through generalizations of the mixing model, combined with spatial regularization over the abundance and endmember estimations. Recently, tensor-based strategies considered low-rank decompositions of hyperspectral images as an alternative to impose low-dimensional structures on the solutions of standard and multitemporal unmixing problems. These strategies, however, present two main drawbacks: 1) they confine the solutions to low-rank tensors, which often cannot represent the complexity of real-world scenarios; and 2) they lack guarantees that endmembers and abundances will be correctly factorized in their respective tensors. In this work, we propose a more flexible approach, called ULTRA-V, that imposes low-rank structures through regularizations whose strictness is controlled by scalar parameters. Simulations attest the superior accuracy of the method when compared with state-of-the-art unmixing algorithms that account for spectral variability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.02413

PDF

http://arxiv.org/pdf/1811.02413
Read All
Object Detection in Specific Traffic Scenes using YOLOv2

2019-05-12

Shouyu Wang, Weitao Tang

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

object detection framework plays crucial role in autonomous driving. In this paper, we introduce the real-time object detection framework called You Only Look Once (YOLOv1) and the related improvements of YOLOv2. We further explore the capability of YOLOv2 by implementing its pre-trained model to do the object detecting tasks in some specific traffic scenes. The four artificially designed traffic scenes include single-car, single-person, frontperson-rearcar and frontcar-rearperson.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04740

PDF

http://arxiv.org/pdf/1905.04740
Read All
Social Relation Recognition in Egocentric Photostreams

2019-05-12

Emanuel Sanchez Aimar, Petia Radeva, Mariella Dimiccoli

arXiv_CV

arXiv_CV Deep_Learning Relation Recognition
Abstract

This paper proposes an approach to automatically categorize the social interactions of a user wearing a photo-camera 2fpm, by relying solely on what the camera is seeing. The problem is challenging due to the overwhelming complexity of social life and the extreme intra-class variability of social interactions captured under unconstrained conditions. We adopt the formalization proposed in Bugental’s social theory, that groups human relations into five social domains with related categories. Our method is a new deep learning architecture that exploits the hierarchical structure of the label space and relies on a set of social attributes estimated at frame level to provide a semantic representation of social interactions. Experimental results on the new EgoSocialRelation dataset demonstrate the effectiveness of our proposal.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04734

PDF

http://arxiv.org/pdf/1905.04734
Read All
Flat Metric Minimization with Applications in Generative Modeling

2019-05-12

Thomas Möllenhoff, Daniel Cremers

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

We take the novel perspective to view data not as a probability distribution but rather as a current. Primarily studied in the field of geometric measure theory, $k$-currents are continuous linear functionals acting on compactly supported smooth differential forms and can be understood as a generalized notion of oriented $k$-dimensional manifold. By moving from distributions (which are $0$-currents) to $k$-currents, we can explicitly orient the data by attaching a $k$-dimensional tangent plane to each sample point. Based on the flat metric which is a fundamental distance between currents, we derive FlatGAN, a formulation in the spirit of generative adversarial networks but generalized to $k$-currents. In our theoretical contribution we prove that the flat metric between a parametrized current and a reference current is Lipschitz continuous in the parameters. In experiments, we show that the proposed shift to $k>0$ leads to interpretable and disentangled latent representations which behave equivariantly to the specified oriented tangent planes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04730

PDF

http://arxiv.org/pdf/1905.04730
Read All
One-Shot Image-to-Image Translation via Part-Global Learning with a Multi-adversarial Framework

2019-05-12

Ziqiang Zheng, Zhibin Yu, Haiyong Zheng, Yang Yang, Heng Tao Shen

arXiv_CV

arXiv_CV Adversarial
Abstract

It is well known that humans can learn and recognize objects effectively from several limited image samples. However, learning from just a few images is still a tremendous challenge for existing main-stream deep neural networks. Inspired by analogical reasoning in the human mind, a feasible strategy is to translate the abundant images of a rich source domain to enrich the relevant yet different target domain with insufficient image data. To achieve this goal, we propose a novel, effective multi-adversarial framework (MA) based on part-global learning, which accomplishes one-shot cross-domain image-to-image translation. In specific, we first devise a part-global adversarial training scheme to provide an efficient way for feature extraction and prevent discriminators being over-fitted. Then, a multi-adversarial mechanism is employed to enhance the image-to-image translation ability to unearth the high-level semantic representation. Moreover, a balanced adversarial loss function is presented, which aims to balance the training data and stabilize the training process. Extensive experiments demonstrate that the proposed approach can obtain impressive results on various datasets between two extremely imbalanced image domains and outperform state-of-the-art methods on one-shot image-to-image translation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04729

PDF

http://arxiv.org/pdf/1905.04729
Read All
A Comparison of Techniques for Sentiment Classification of Film Reviews

2019-05-12

Milan Gritta

arXiv_CL

arXiv_CL Sentiment Review Sentiment_Classification Classification Relation
Abstract

We undertake the task of comparing lexicon-based sentiment classification of film reviews with machine learning approaches. We look at existing methodologies and attempt to emulate and improve on them using a ‘given’ lexicon and a bag-of-words approach. We also utilise syntactical information such as part-of-speech and dependency relations. We will show that a simple lexicon-based classification achieves good results however machine learning techniques prove to be the superior tool. We also show that more features do not necessarily deliver better performance as well as elaborate on three further enhancements not tested in this article.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04727

PDF

http://arxiv.org/pdf/1905.04727
Read All
Learning Phase Competition for Traffic Signal Control

2019-05-12

Guanjie Zheng, Yuanhao Xiong, Xinshi Zang, Jie Feng, Hua Wei, Huichu Zhang, Yong Li, Kai Xu, Zhenhui Li

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

Increasingly available city data and advanced learning techniques have empowered people to improve the efficiency of our city functions. Among them, improving the urban transportation efficiency is one of the most prominent topics. Recent studies have proposed to use reinforcement learning (RL) for traffic signal control. Different from traditional transportation approaches which rely heavily on prior knowledge, RL can learn directly from the feedback. On the other side, without a careful model design, existing RL methods typically take a long time to converge and the learned models may not be able to adapt to new scenarios. For example, a model that is trained well for morning traffic may not work for the afternoon traffic because the traffic flow could be reversed, resulting in a very different state representation. In this paper, we propose a novel design called FRAP, which is based on the intuitive principle of phase competition in traffic signal control: when two traffic signals conflict, priority should be given to one with larger traffic movement (i.e., higher demand). Through the phase competition modeling, our model achieves invariance to symmetrical cases such as flipping and rotation in traffic flow. By conducting comprehensive experiments, we demonstrate that our model finds better solutions than existing RL methods in the complicated all-phase selection problem, converges much faster during training, and achieves superior generalizability for different road structures and traffic conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04722

PDF

http://arxiv.org/pdf/1905.04722
Read All
Some Research Problems in Biometrics: The Future Beckons

2019-05-12

Arun Ross, Sudipta Banerjee, Cunjian Chen, Anurag Chowdhury, Vahid Mirjalili, Renu Sharma, Thomas Swearingen, Shivangi Yadav

arXiv_CV

arXiv_CV Face Tracking
Abstract

The need for reliably determining the identity of a person is critical in a number of different domains ranging from personal smartphones to border security; from autonomous vehicles to e-voting; from tracking child vaccinations to preventing human trafficking; from crime scene investigation to personalization of customer service. Biometrics, which entails the use of biological attributes such as face, fingerprints and voice for recognizing a person, is being increasingly used in several such applications. While biometric technology has made rapid strides over the past decade, there are several fundamental issues that are yet to be satisfactorily resolved. In this article, we will discuss some of these issues and enumerate some of the exciting challenges in this field.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04717

PDF

http://arxiv.org/pdf/1905.04717
Read All
Egocentric Vision-based Traffic Accident Detection using Future Object Localization

2019-05-12

Yu Yao, Mingze Xu, Yuchen Wang, David J. Crandall, Ella M. Atkins

arXiv_CV

arXiv_CV Classification Prediction Detection
Abstract

Recognizing abnormal events such as traffic violations and accidents in natural driving scenes is essential for successful autonomous and advanced driver assistance systems. However, most work on video anomaly detection suffers from one of two crucial drawbacks. First, it assumes cameras are fixed and videos have a static background, which is reasonable for surveillance applications but not for vehicle-mounted cameras. Second, it poses the problem as one-class classification, which relies on arduous human annotation and only recognizes categories of anomalies that have been explicitly trained. In this paper, we propose an unsupervised approach for traffic accident detection in first-person videos. Our major novelty is to detect anomalies by predicting the future locations of traffic participants and then monitoring the prediction accuracy and consistency metrics with three different strategies. To evaluate our approach, we introduce a new dataset of diverse traffic accidents, AnAn Accident Detection (A3D), as well as another publicly-available dataset. Experimental results show that our approach outperforms the state-of-the-art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00618

PDF

http://arxiv.org/pdf/1903.00618
Read All
Diagnosing Reinforcement Learning for Traffic Signal Control

2019-05-12

Guanjie Zheng, Xinshi Zang, Nan Xu, Hua Wei, Zhengyao Yu, Vikash Gayah, Kai Xu, Zhenhui Li

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

With the increasing availability of traffic data and advance of deep reinforcement learning techniques, there is an emerging trend of employing reinforcement learning (RL) for traffic signal control. A key question for applying RL to traffic signal control is how to define the reward and state. The ultimate objective in traffic signal control is to minimize the travel time, which is difficult to reach directly. Hence, existing studies often define reward as an ad-hoc weighted linear combination of several traffic measures. However, there is no guarantee that the travel time will be optimized with the reward. In addition, recent RL approaches use more complicated state (e.g., image) in order to describe the full traffic situation. However, none of the existing studies has discussed whether such a complex state representation is necessary. This extra complexity may lead to significantly slower learning process but may not necessarily bring significant performance gain. In this paper, we propose to re-examine the RL approaches through the lens of classic transportation theory. We ask the following questions: (1) How should we design the reward so that one can guarantee to minimize the travel time? (2) How to design a state representation which is concise yet sufficient to obtain the optimal solution? Our proposed method LIT is theoretically supported by the classic traffic signal control methods in transportation field. LIT has a very simple state and reward design, thus can serve as a building block for future RL approaches to traffic signal control. Extensive experiments on both synthetic and real datasets show that our method significantly outperforms the state-of-the-art traffic signal control methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04716

PDF

http://arxiv.org/pdf/1905.04716
Read All
Style transfer based data augmentation in material microscopic image processing

2019-05-12

Boyuan Ma, Xiaoyan Wei, Chuni Liu, Xiaojuan Ban, Haiyou Huan, Hao Wang, Weihua Xue

arXiv_CV

arXiv_CV Segmentation Style_Transfer Semantic_Segmentation
Abstract

Recently progress in material microscopic image semantic segmentation has been driven by high-capacity models trained on large datasets. However, collecting microscopic images with pixel-level labels has been extremely costly due to the amount of human effort required. In this paper, we present an approach to rapidly creating microscopic images with pixel-level labels from material 3d simulated models. Usually images extracted directly from those 3d simulated models are not realistic enough. It is easy to get semantic labels, though. We introduce style transfer technique to make simulated image data more similar to real microscopic data. We validate the presented approach by using real image data from experiment and simulated image data from Monte Carlo Potts Models, which simulate the growth of polycrystal. Experiments show that using the acquired simulated image data and style transfer technique to supplement real images of polycrystalline iron significantly improves the mean precision of image processing. Besides, models trained with simulated image data and just 1/3 of the real data outperform models trained on the complete real image data. In the study of such polycrystalline materials, this approach can reduce pressure of getting and labeling images from microscopes. Also, it can be applied to numbers of other material images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04711

PDF

http://arxiv.org/pdf/1905.04711
Read All
Deep Vocoder: Low Bit Rate Speech Compression of Speech with Deep Autoencoder

2019-05-12

Gang Min, Changqing Zhang, Xiongwei Zhang, Wei Tan

arXiv_SD

arXiv_SD
Abstract

Inspired by the success of deep neural networks (DNNs) in speech processing, this paper presents Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE). In Deep Vocoder, DAE is used for extracting the latent representing features (LRFs) of speech, which are then efficiently quantized by an analysis-by-synthesis vector quantization (AbS VQ) method. AbS VQ aims to minimize the perceptual spectral reconstruction distortion rather than the distortion of LRFs vector itself. Also, a suboptimal codebook searching technique is proposed to further reduce the computational complexity. Experimental results demonstrate that Deep Vocoder yields substantial improvements in terms of frequency-weighted segmental SNR, STOI and PESQ score when compared to the output of the conventional SQ- or VQ-based codec. The yielded PESQ score over the TIMIT corpus is 3.34 and 3.08 for speech coding at 2400 bit/s and 1200 bit/s, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04709

PDF

http://arxiv.org/pdf/1905.04709
Read All
Integrating Objects into Monocular SLAM: Line Based Category Specific Models

2019-05-12

Nayan Joshi, Yogesh Sharma, Parv Parkhiya, Rishabh Khawad, K Madhava Krishna, Brojeshwar Bhowmick

arXiv_RO

arXiv_RO SLAM
Abstract

We propose a novel Line based parameterization for category specific CAD models. The proposed parameterization associates 3D category-specific CAD model and object under consideration using a dictionary based RANSAC method that uses object Viewpoints as prior and edges detected in the respective intensity image of the scene. The association problem is posed as a classical Geometry problem rather than being dataset driven, thus saving the time and labour that one invests in annotating dataset to train Keypoint Network for different category objects. Besides eliminating the need of dataset preparation, the approach also speeds up the entire process as this method processes the image only once for all objects, thus eliminating the need of invoking the network for every object in an image across all images. A 3D-2D edge association module followed by a resection algorithm for lines is used to recover object poses. The formulation optimizes for shape and pose of the object, thus aiding in recovering object 3D structure more accurately. Finally, a Factor Graph formulation is used to combine object poses with camera odometry to formulate a SLAM problem.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04698

PDF

http://arxiv.org/pdf/1905.04698
Read All
Ensemble Super-Resolution with A Reference Dataset

2019-05-12

Junjun Jiang, Yi Yu, Zheng Wang, Suhua Tang, Ruimin Hu, Jiayi Ma

arXiv_CV

arXiv_CV Super_Resolution Knowledge Optimization Inference Deep_Learning
Abstract

By developing sophisticated image priors or designing deep(er) architectures, a variety of image Super-Resolution (SR) approaches have been proposed recently and achieved very promising performance. A natural question that arises is whether these methods can be reformulated into a unifying framework and whether this framework assists in SR reconstruction? In this paper, we present a simple but effective single image SR method based on ensemble learning, which can produce a better performance than that could be obtained from any of SR methods to be ensembled (or called component super-resolvers). Based on the assumption that better component super-resolver should have larger ensemble weight when performing SR reconstruction, we present a Maximum A Posteriori (MAP) estimation framework for the inference of optimal ensemble weights. Specially, we introduce a reference dataset, which is composed of High-Resolution (HR) and Low-Resolution (LR) image pairs, to measure the super-resolution abilities (prior knowledge) of different component super-resolvers. To obtain the optimal ensemble weights, we propose to incorporate the reconstruction constraint, which states that the degenerated HR image should be equal to the LR observation one, as well as the prior knowledge of ensemble weights into the MAP estimation framework. Moreover, the proposed optimization problem can be solved by an analytical solution. We study the performance of the proposed method by comparing with different competitive approaches, including four state-of-the-art non-deep learning based methods, four latest deep learning based methods and one ensemble learning based method, and prove its effectiveness and superiority on three public datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04696

PDF

http://arxiv.org/pdf/1905.04696
Read All
The relational processing limits of classic and contemporary neural network models of language processing

2019-05-12

Guillermo Puebla, Andrea E. Martin, Leonidas A. A. Doumas

arXiv_AI

arXiv_AI Knowledge Attention Deep_Learning Relation
Abstract

The ability of neural networks to capture relational knowledge is a matter of long-standing controversy. Recently, some researchers in the PDP side of the debate have argued that (1) classic PDP models can handle relational structure (Rogers & McClelland, 2008, 2014) and (2) the success of deep learning approaches to text processing suggests that structured representations are unnecessary to capture the gist of human language (Rabovsky et al., 2018). In the present study we tested the Story Gestalt model (St. John, 1992), a classic PDP model of text comprehension, and a Sequence-to-Sequence with Attention model (Bahdanau et al., 2015), a contemporary deep learning architecture for text processing. Both models were trained to answer questions about stories based on the thematic roles that several concepts played on the stories. In three critical test we varied the statistical structure of new stories while keeping their relational structure constant with respect to the training data. Each model was susceptible to each statistical structure manipulation to a different degree, with their performance failing below chance at least under one manipulation. We argue that the failures of both models are due to the fact that they cannotperform dynamic binding of independent roles and fillers. Ultimately, these results cast doubts onthe suitability of traditional neural networks models for explaining phenomena based on relational reasoning, including language processing.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05708

PDF

https://arxiv.org/pdf/1905.05708
Read All
Adaptive Composition GAN towards Realistic Image Synthesis

2019-05-12

Fangneng Zhan, Jiaxing Huang, Shijian Lu

arXiv_CV

arXiv_CV Adversarial Attention GAN Quantitative
Abstract

Despite the rapid progress of generative adversarial networks (GANs) in image synthesis in recent years, current approaches work in either geometry domain or appearance domain which tend to introduce various synthesis artifacts. This paper presents an innovative Adaptive Composition GAN (AC-GAN) that incorporates image synthesis in geometry and appearance domains into an end-to-end trainable network and achieves synthesis realism in both domains simultaneously. An innovative hierarchical synthesis mechanism is designed which is capable of generating realistic geometry and composition when multiple foreground objects with or without occlusions are involved in synthesis. In addition, a novel attention mask is introduced to guide the appearance adaptation to the embedded foreground objects which helps preserve image details and resolution and also provide better reference for synthesis in geometry domain. Extensive experiments on scene text image synthesis, automated portrait editing and indoor rendering tasks show that the proposed AC-GAN achieves superior synthesis performance qualitatively and quantitatively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04693

PDF

http://arxiv.org/pdf/1905.04693
Read All
Rough Contact in General Rough Mereology

2019-05-12

A. Mani

arXiv_AI

arXiv_AI Relation
Abstract

Theories of rough mereology have originated from diverse semantic considerations from contexts relating to study of databases, to human reasoning. These ideas of origin, especially in the latter context, are intensely complex. In this research, concepts of rough contact relations are introduced and rough mereologies are situated in relation to general spatial mereology by the present author. These considerations are restricted to her rough mereologies that seek to avoid contamination.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04689

PDF

http://arxiv.org/pdf/1905.04689
Read All
On Flow Profile Image for Video Representation

2019-05-12

Mohammadreza Babaee, David Full, Gerhard Rigoll

arXiv_CV

arXiv_CV Video_Caption Caption Optimization Video_Classification Classification Recognition
Abstract

Video representation is a key challenge in many computer vision applications such as video classification, video captioning, and video surveillance. In this paper, we propose a novel approach for video representation that captures meaningful information including motion and appearance from a sequence of video frames and compacts it into a single image. To this end, we compute the optical flow and use it in a least squares optimization to find a new image, the so-called Flow Profile Image (FPI). This image encodes motions as well as foreground appearance information while background information is removed. The quality of this image is validated in activity recognition experiments and the results are compared with other video representation techniques such as dynamic images [1] and eigen images [2]. The experimental results as well as visual quality confirm that FPIs can be successfully used in video processing applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04668

PDF

http://arxiv.org/pdf/1905.04668
Read All
Learning to Convolve: A Generalized Weight-Tying Approach

2019-05-12

Nichita Diaconu, Daniel E Worrall

arXiv_CV

arXiv_CV
Abstract

Recent work (Cohen & Welling, 2016) has shown that generalizations of convolutions, based on group theory, provide powerful inductive biases for learning. In these generalizations, filters are not only translated but can also be rotated, flipped, etc. However, coming up with exact models of how to rotate a 3 x 3 filter on a square pixel-grid is difficult. In this paper, we learn how to transform filters for use in the group convolution, focussing on roto-translation. For this, we learn a filter basis and all rotated versions of that filter basis. Filters are then encoded by a set of rotation invariant coefficients. To rotate a filter, we switch the basis. We demonstrate we can produce feature maps with low sensitivity to input rotations, while achieving high performance on MNIST and CIFAR-10.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04663

PDF

http://arxiv.org/pdf/1905.04663
Read All
Training Multi-Task Adversarial Network for Extracting Noise-Robust Speaker Embedding

2019-05-12

Jianfeng Zhou, Tao Jiang, Lin Li, Qingyang Hong, Zhe Wang, Bingyin Xia

arXiv_SD

arXiv_SD Adversarial Embedding Optimization Recognition
Abstract

Under noisy environments, to achieve the robust performance of speaker recognition is still a challenging task. Motivated by the promising performance of multi-task training in a variety of image processing tasks, we explore the potential of multi-task adversarial training for learning a noise-robust speaker embedding. In this paper we present a novel framework which consists of three components: an encoder that extracts noise-robust speaker embedding; a classifier that classifies the speakers; a discriminator that discriminates the noise type of the speaker embedding. Besides, we propose a training strategy using the training accuracy as an indicator to stabilize the multi-class adversarial optimization process. We conduct our experiments on the English and Mandarin corpus and the experimental results demonstrate that our proposed multi-task adversarial training method could greatly outperform the other methods without adversarial training in noisy environments. Furthermore, experiments indicate that our method is also able to improve the speaker verification performance the clean condition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.09355

PDF

http://arxiv.org/pdf/1811.09355
Read All
Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

2019-05-12

Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in such areas as transportation, healthcare, and industrial automation. We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system. The challenge of doing system-wide optimization is a combinatorial problem. Local attempts to boost the performance of a specific module by modifying its configuration often leads to losses in overall utility of the system’s performance as the distribution of inputs to downstream modules changes drastically. We present metareasoning techniques which consider a rich representation of the input, monitor the state of the entire pipeline, and adjust the configuration of modules on-the-fly so as to maximize the utility of a system’s operation. We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05179

PDF

https://arxiv.org/pdf/1905.05179
Read All
Improving Natural Language Interaction with Robots Using Advice

2019-05-12

Nikhil Mehta, Dan Goldwasser

arXiv_AI

arXiv_AI Prediction
Abstract

Over the last few years, there has been growing interest in learning models for physically grounded language understanding tasks, such as the popular blocks world domain. These works typically view this problem as a single-step process, in which a human operator gives an instruction and an automated agent is evaluated on its ability to execute it. In this paper we take the first step towards increasing the bandwidth of this interaction, and suggest a protocol for including advice, high-level observations about the task, which can help constrain the agent’s prediction. We evaluate our approach on the blocks world task, and show that even simple advice can help lead to significant performance improvements. To help reduce the effort involved in supplying the advice, we also explore model self-generated advice which can still improve results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04655

PDF

http://arxiv.org/pdf/1905.04655
Read All
Implementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation

2019-05-12

Md. Abu Bakr Siddique, Rezoana Bente Arif, Mohammad Mahmudur Rahman Khan, Zahidun Ashrafi

arXiv_CV

arXiv_CV
Abstract

In this paper, several two-dimensional clustering scenarios are given. In those scenarios, soft partitioning clustering algorithms (Fuzzy C-means (FCM) and Possibilistic c-means (PCM)) are applied. Afterward, VAT is used to investigate the clustering tendency visually, and then in order of checking cluster validation, three types of indices (e.g., PC, DI, and DBI) were used. After observing the clustering algorithms, it was evident that each of them has its limitations; however, PCM is more robust to noise than FCM as in case of FCM a noise point has to be considered as a member of any of the cluster.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08417

PDF

http://arxiv.org/pdf/1809.08417
Read All
Predictive Ensemble Learning with Application to Scene Text Detection

2019-05-12

Danlu Chen, Xu-Yao Zhang, Wei Zhang, Yao Lu, Xiuli Li, Tao Me

arXiv_CV

arXiv_CV Object_Detection Segmentation Classification Deep_Learning Detection
Abstract

Deep learning based approaches have achieved significant progresses in different tasks like classification, detection, segmentation, and so on. Ensemble learning is widely known to further improve performance by combining multiple complementary models. It is easy to apply ensemble learning for classification tasks, for example, based on averaging, voting, or other methods. However, for other tasks (like object detection) where the outputs are varying in quantity and unable to be simply compared, the ensemble of multiple models become difficult. In this paper, we propose a new method called Predictive Ensemble Learning (PEL), based on powerful predictive ability of deep neural networks, to directly predict the best performing model among a pool of base models for each test example, thus transforming ensemble learning to a traditional classification task. Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression. Experimental results show the possibility and potential of PEL in predicting different models’ performance based only on a query example, which can be extended for ensemble learning in many other complex tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04641

PDF

http://arxiv.org/pdf/1905.04641
Read All
Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

2019-05-12

Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Shangtong Zhang, Mai Xu

arXiv_AI

arXiv_AI Knowledge Relation
Abstract

Intrinsic rewards are introduced to simulate how human intelligence works, which are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (mega-reward) which, to our knowledge, is the first approach that achieves comparable human-level performance in intrinsically-motivated play. The intuition of mega-rewards comes from the observation that infants’ intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward can (i) greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores; and (iii) has also superior performance when it is incorporated with extrinsic reward.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04640

PDF

http://arxiv.org/pdf/1905.04640
Read All
Ceiling Effects for Hybrid Aerial-Surface Locomotion of Small Rotorcraft

2019-05-12

Yi Hsuan Hsiao, Pakpong Chirarattananon

arXiv_RO

arXiv_RO Face
Abstract

As platform size is reduced, the flight of aerial robots becomes increasingly energetically expensive. Limitations on payload and endurance of these small robots have prompted researchers to explore the use of bimodal aerial-surface locomotion as a strategy to prolong operation time while retaining a high vantage point. In this work, we propose the use of “ceiling effects” as a power conserving strategy for small rotorcraft to perch on an overhang. In the vicinity of a ceiling, spinning propellers generate marked higher thrust. To understand the observed aerodynamic phenomena, the momentum theory and the blade element method are employed to describe the thrust, power, and rotational rate of spinning propellers in terms of propeller-to-ceiling distance. The models, which take into account the influence of neighboring propellers as present in multirotor vehicles, are verified using two propeller types (23-mm and 50-mm radii) in various configurations on a benchtop setup. The results are consistent with the proposed models. In proximity to the ceiling, power consumption of propellers with 23-mm radius arranged in a quadrotor configuration was found to reduce by a factor of three. To this end, we present a conceptual prototype that demonstrates the use of ceiling effects for perching maneuvers. Overall, the promising outcomes highlight possible uses of ceiling effects for efficient bimodal locomotion in small multirotor vehicles.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04632

PDF

http://arxiv.org/pdf/1905.04632
Read All

31/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL