Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Hierarchical Reinforcement Learning for Quadruped Locomotion

2019-05-22

Deepali Jain, Atil Iscen, Ken Caluwaerts

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework to automatically decompose complex locomotion tasks. A high-level policy issues commands in a latent space and also selects for how long the low-level policy will execute the latent command. Concurrently, the low-level policy uses the latent command and only the robot’s on-board sensors to control the robot’s actuators. Our approach allows the high-level policy to run at a lower frequency than the low-level one. We test our framework on a path-following task for a dynamic quadruped robot and we show that steering behaviors automatically emerge in the latent command space as low-level skills are needed for this task. We then show efficient adaptation of the trained policy to a different task by transfer of the trained low-level policy. Finally, we validate the policies on a real quadruped robot. To the best of our knowledge, this is the first application of end-to-end hierarchical learning to a real robotic locomotion task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08926

PDF

http://arxiv.org/pdf/1905.08926
Read All
Joint Information Preservation for Heterogeneous Domain Adaptation

2019-05-22

Peng Xu, Zhaohong Deng, Kup-Sze Choi, Jun Wang, Shitong Wang

arXiv_CV

arXiv_CV Knowledge Relation
Abstract

Domain adaptation aims to assist the modeling tasks of the target domain with knowledge of the source domain. The two domains often lie in different feature spaces due to diverse data collection methods, which leads to the more challenging task of heterogeneous domain adaptation (HDA). A core issue of HDA is how to preserve the information of the original data during adaptation. In this paper, we propose a joint information preservation method to deal with the problem. The method preserves the information of the original data from two aspects. On the one hand, although paired samples often exist between the two domains of the HDA, current algorithms do not utilize such information sufficiently. The proposed method preserves the paired information by maximizing the correlation of the paired samples in the shared subspace. On the other hand, the proposed method improves the strategy of preserving the structural information of the original data, where the local and global structural information are preserved simultaneously. Finally, the joint information preservation is integrated by distribution matching. Experimental results show the superiority of the proposed method over the state-of-the-art HDA algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08924

PDF

http://arxiv.org/pdf/1905.08924
Read All
Matrix optimization on universal unitary photonic devices

2019-05-22

Sunil Pai, Ben Bartlett, Olav Solgaard, David A. B. Miller

arXiv_CV

arXiv_CV Optimization
Abstract

Universal unitary photonic devices can apply arbitrary unitary transformations to a vector of input modes and provide a promising hardware platform for fast and energy-efficient machine learning using light. We simulate the gradient-based optimization of random unitary matrices on universal photonic devices composed of imperfect tunable interferometers. If device components are initialized uniform-randomly, the locally-interacting nature of the mesh components biases the optimization search space towards banded unitary matrices, limiting convergence to random unitary matrices. We detail a procedure for initializing the device by sampling from the distribution of random unitary matrices and show that this greatly improves convergence speed. We also explore mesh architecture improvements such as adding extra tunable beamsplitters or permuting waveguide layers to further improve the training speed and scalability of these devices.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1808.00458

PDF

https://arxiv.org/pdf/1808.00458
Read All
A Neural-Symbolic Architecture for Inverse Graphics Improved by Lifelong Meta-Learning

2019-05-22

Michael Kissner, Helmut Mayer

arXiv_CV

arXiv_CV Detection
Abstract

We follow the idea of formulating vision as inverse graphics and propose a new type of element for this task, a neural-symbolic capsule. It is capable of de-rendering a scene into semantic information feed-forward, as well as rendering it feed-backward. An initial set of capsules for graphical primitives is obtained from a generative grammar and connected into a full capsule network. Lifelong meta-learning continuously improves this network’s detection capabilities by adding capsules for new and more complex objects it detects in a scene using few-shot learning. Preliminary results demonstrate the potential of our novel approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08910

PDF

http://arxiv.org/pdf/1905.08910
Read All
Planning and Synthesis Under Assumptions

2019-05-21

Benjamin Aminof, Giuseppe De Giacomo, Aniello Murano, Sasha Rubin

arXiv_AI

arXiv_AI
Abstract

In Reasoning about Action and Planning, one synthesizes the agent plan by taking advantage of the assumption on how the environment works (that is, one exploits the environment’s effects, its fairness, its trajectory constraints). In this paper we study this form of synthesis in detail. We consider assumptions as constraints on the possible strategies that the environment can have in order to respond to the agent’s actions. Such constraints may be given in the form of a planning domain (or action theory), as linear-time formulas over infinite or finite runs, or as a combination of the two. We argue though that not all assumption specifications are meaningful: they need to be consistent, which means that there must exist an environment strategy fulfilling the assumption in spite of the agent actions. For such assumptions, we study how to do synthesis/planning for agent goals, ranging from a classical reachability to goal on traces specified in \LTL and \LTLf/\LDLf, characterizing the problem both mathematically and algorithmically.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.06777

PDF

http://arxiv.org/pdf/1807.06777
Read All
Automated Pupillary Light Reflex Test on a Portable Platform

2019-05-21

Dogancan Temel, Melvin J. Mathew, Ghassan AlRegib, Yousuf M. Khalifa

arXiv_CV

arXiv_CV
Abstract

In this paper, we introduce a portable eye imaging device denoted as lab-on-a-headset, which can automatically perform a swinging flashlight test. We utilized this device in a clinical study to obtain high-resolution recordings of eyes while they are exposed to a varying light stimuli. Half of the participants had relative afferent pupillary defect (RAPD) while the other half was a control group. In case of positive RAPD, patients pupils constrict less or do not constrict when light stimuli swings from the unaffected eye to the affected eye. To automatically diagnose RAPD, we propose an algorithm based on pupil localization, pupil size measurement, and pupil size comparison of right and left eye during the light reflex test. We validate the algorithmic performance over a dataset obtained from 22 subjects and show that proposed algorithm can achieve a sensitivity of 93.8% and a specificity of 87.5%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08886

PDF

http://arxiv.org/pdf/1905.08886
Read All
Evolving neural networks to follow trajectories of arbitrary complexity

2019-05-21

Benjamin Inden, Jürgen Jost

arXiv_RO

arXiv_RO
Abstract

Many experiments have been performed that use evolutionary algorithms for learning the topology and connection weights of a neural network that controls a robot or virtual agent. These experiments are not only performed to better understand basic biological principles, but also with the hope that with further progress of the methods, they will become competitive for automatically creating robot behaviors of interest. However, current methods are limited with respect to the (Kolmogorov) complexity of evolved behavior. Using the evolution of robot trajectories as an example, we show that by adding four features, namely (1) freezing of previously evolved structure, (2) temporal scaffolding, (3) a homogeneous transfer function for output nodes, and (4) mutations that create new pathways to outputs, to standard methods for the evolution of neural networks, we can achieve an approximately linear growth of the complexity of behavior over thousands of generations. Overall, evolved complexity is up to two orders of magnitude over that achieved by standard methods in the experiments reported here, with the major limiting factor for further growth being the available run time. Thus, the set of methods proposed here promises to be a useful addition to various current neuroevolution methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08885

PDF

http://arxiv.org/pdf/1905.08885
Read All
Look Who's Talking Now: Implications of AV's Explanations on Driver's Trust, AV Preference, Anxiety and Mental Workload

2019-05-21

Na Du, Jacob Haspiel, Qiaoning Zhang, Dawn Tilbury, Anuj K. Pradhan, X. Jessie Yang, Lionel P. Robert Jr

arXiv_RO

arXiv_RO
Abstract

Explanations given by automation are often used to promote automation adoption. However, it remains unclear whether explanations promote acceptance of automated vehicles (AVs). In this study, we conducted a within-subject experiment in a driving simulator with 32 participants, using four different conditions. The four conditions included: (1) no explanation, (2) explanation given before or (3) after the AV acted and (4) the option for the driver to approve or disapprove the AV’s action after hearing the explanation. We examined four AV outcomes: trust, preference for AV, anxiety and mental workload. Results suggest that explanations provided before an AV acted were associated with higher trust in and preference for the AV, but there was no difference in anxiety and workload. These results have important implications for the adoption of AVs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08878

PDF

http://arxiv.org/pdf/1905.08878
Read All
Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution

2019-05-21

Yinchuan Xu, Junlin Yang

arXiv_CL

arXiv_CL Embedding CNN Relation
Abstract

Gender bias has been found in existing coreference resolvers. In order to eliminate gender bias, a gender-balanced dataset Gendered Ambiguous Pronouns (GAP) has been released and the best baseline model achieves only 66.9% F1. Bidirectional Encoder Representations from Transformers (BERT) has broken several NLP task records and can be used on GAP dataset. However, fine-tune BERT on a specific task is computationally expensive. In this paper, we propose an end-to-end resolver by combining pre-trained BERT with Relational Graph Convolutional Network (R-GCN). R-GCN is used for digesting structural syntactic information and learning better task-specific embeddings. Empirical results demonstrate that, under explicit syntactic supervision and without the need to fine tune BERT, R-GCN’s embeddings outperform the original BERT embeddings on the coreference task. Our work obtains the state-of-the-art results on GAP dataset, and significantly improves the snippet-context baseline F1 score from 66.9% to 80.3%. We participated in the 2019 GAP Coreference Shared Task, and our codes are available online.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08868

PDF

http://arxiv.org/pdf/1905.08868
Read All
Looking to Relations for Future Trajectory Forecast

2019-05-21

Chiho Choi, Behzad Dariush

arXiv_CV

arXiv_CV Prediction Relation
Abstract

Inferring relational behavior between road users as well as road users and their surrounding physical space is an important step toward effective modeling and prediction of navigation strategies adopted by participants in road scenes. To this end, we propose a relation-aware framework for future trajectory forecast. Our system aims to infer relational information from the interactions of road users with each other and with the environment. The first module involves visual encoding of spatio-temporal features, which captures human-human and human-space interactions over time. The following module explicitly constructs pair-wise relations from spatio-temporal interactions and identifies more descriptive relations that highly influence future motion of the target road user by considering its past trajectory. The resulting relational features are used to forecast future locations of the target, in the form of heatmaps with an additional guidance of spatial dependencies and consideration of the uncertainty. Extensive evaluations on a public benchmark dataset demonstrate the robustness and efficacy of the proposed framework as observed by performances higher than the state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08855

PDF

http://arxiv.org/pdf/1905.08855
Read All
Efficient Plane-Based Optimization of Geometry and Texture for Indoor RGB-D Reconstruction

2019-05-21

Chao Wang, Xiaohu Guo

arXiv_CV

arXiv_CV Face Optimization
Abstract

We propose a novel approach to reconstruct RGB-D indoor scene based on plane primitives. Our approach takes as input a RGB-D sequence and a dense coarse mesh reconstructed from it, and generates a lightweight, low-polygonal mesh with clear face textures and sharp features without losing geometry details from the original scene. Compared to existing methods which only cover large planar regions in the scene, our method builds the entire scene by adaptive planes without losing geometry details and also preserves sharp features in the mesh. Experiments show that our method is more efficient to generate textured mesh from RGB-D data than state-of-the-arts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08853

PDF

http://arxiv.org/pdf/1905.08853
Read All
DoPa: A Fast and Comprehensive CNN Defense Methodology against Physical Adversarial Attacks

2019-05-21

Zirui Xu, Fuxun Yu, Xiang Chen

arXiv_CV

arXiv_CV Adversarial CNN Inference Detection Recognition
Abstract

Recently, Convolutional Neural Networks (CNNs) demonstrate a considerable vulnerability to adversarial attacks, which can be easily misled by adversarial perturbations. With more aggressive methods proposed, adversarial attacks can be also applied to the physical world, causing practical issues to various CNN powered applications. Most existing defense works for physical adversarial attacks only focus on eliminating explicit perturbation patterns from inputs, ignoring interpretation and solution to CNN’s intrinsic vulnerability. Therefore, most of them depend on considerable data processing costs and lack the expected versatility to different attacks. In this paper, we propose DoPa - a fast and comprehensive CNN defense methodology against physical adversarial attacks. By interpreting the CNN’s vulnerability, we find that non-semantic adversarial perturbations can activate CNN with significantly abnormal activations and even overwhelm other semantic input patterns’ activations. We improve the CNN recognition process by adding a self-verification stage to analyze the semantics of distinguished activation patterns with only one CNN inference involved. Based on the detection result, we further propose a data recovery methodology to defend the physical adversarial attacks. We apply such detection and defense methodology into both image and audio CNN recognition process. Experiments show that our methodology can achieve an average rate of 90% success for attack detection and 81% accuracy recovery for image physical adversarial attacks. Also, the proposed defense method can achieve a 92% detection successful rate and 77.5% accuracy recovery for audio recognition applications. Moreover, the proposed defense methods are at most 2.3x faster compared to the state-of-the-art defense methods, making them feasible to resource-constrained platforms, such as mobile devices.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08790

PDF

http://arxiv.org/pdf/1905.08790
Read All
Semi-Supervised Learning with Scarce Annotations

2019-05-21

Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Kai Han, Andrea Vedaldi, Andrew Zisserman

arXiv_CV

arXiv_CV Transfer_Learning Classification
Abstract

While semi-supervised learning (SSL) algorithms provide an efficient way to make use of both labelled and unlabelled data, they generally struggle when the number of annotated samples is very small. In this work, we consider the problem of SSL multi-class classification with very few labelled instances. We introduce two key ideas. The first is a simple but effective one: we leverage the power of transfer learning among different tasks and self-supervision to initialize a good representation of the data without making use of any label. The second idea is a new algorithm for SSL that can exploit well such a pre-trained representation. The algorithm works by alternating two phases, one fitting the labelled points and one fitting the unlabelled ones, with carefully-controlled information flow between them. The benefits are greatly reducing overfitting of the labelled data and avoiding issue with balancing labelled and unlabelled losses during training. We show empirically that this method can successfully train competitive models with as few as 10 labelled data points per class. More in general, we show that the idea of bootstrapping features using self-supervised learning always improves SSL on standard benchmarks. We show that our algorithm works increasingly well compared to other methods when refining from other tasks or datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08845

PDF

http://arxiv.org/pdf/1905.08845
Read All
Properties and Extensions of Alternating Path Relevance - I

2019-05-21

David A. Plaisted

arXiv_AI

arXiv_AI Knowledge
Abstract

When proving theorems from large sets of logical assertions, it can be helpful to restrict the search for a proof to those assertions that are relevant, that is, closely related to the theorem in some sense. For example, in the Watson system, a large knowledge base must rapidly be searched for relevant facts. It is possible to define formal concepts of relevance for propositional and first-order logic. Various concepts of relevance have been defined for this, and some have yielded good results on large problems. We consider here in particular a concept based on alternating paths.We present efficient graph-based methods for computing alternating path relevance and give some results indicating its effectiveness. We also propose an alternating path based extension of this relevance method to DPLL with an improved time bound, and give other extensions to alternating path relevance intended to improve its performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08842

PDF

http://arxiv.org/pdf/1905.08842
Read All
Sample Efficient Text Summarization Using a Single Pre-Trained Transformer

2019-05-21

Urvashi Khandelwal, Kevin Clark, Dan Jurafsky, Lukasz Kaiser

arXiv_CL

arXiv_CL Attention Summarization Language_Model
Abstract

Language model (LM) pre-training has resulted in impressive performance and sample efficiency on a variety of language understanding tasks. However, it remains unclear how to best use pre-trained LMs for generation tasks such as abstractive summarization, particularly to enhance sample efficiency. In these sequence-to-sequence settings, prior work has experimented with loading pre-trained weights into the encoder and/or decoder networks, but used non-pre-trained encoder-decoder attention weights. We instead use a pre-trained decoder-only network, where the same Transformer LM both encodes the source and generates the summary. This ensures that all parameters in the network, including those governing attention over source states, have been pre-trained before the fine-tuning step. Experiments on the CNN/Daily Mail dataset show that our pre-trained Transformer LM substantially improves over pre-trained Transformer encoder-decoder networks in limited-data settings. For instance, it achieves 13.1 ROUGE-2 using only 1% of the training data (~3000 examples), while pre-trained encoder-decoder models score 2.3 ROUGE-2.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08836

PDF

http://arxiv.org/pdf/1905.08836
Read All
Learning to Prove Theorems via Interacting with Proof Assistants

2019-05-21

Kaiyu Yang, Jia Deng

arXiv_AI

arXiv_AI Deep_Learning
Abstract

Humans prove theorems by relying on substantial high-level reasoning and problem-specific insights. Proof assistants offer a formalism that resembles human mathematical reasoning, representing theorems in higher-order logic and proofs as high-level tactics. However, human experts have to construct proofs manually by entering tactics into the proof assistant. In this paper, we study the problem of using machine learning to automate the interaction with proof assistants. We construct CoqGym, a large-scale dataset and learning environment containing 71K human-written proofs from 123 projects developed with the Coq proof assistant. We develop ASTactic, a deep learning-based model that generates tactics as programs in the form of abstract syntax trees (ASTs). Experiments show that ASTactic trained on CoqGym can generate effective tactics and can be used to prove new theorems not previously provable by automated methods. Code is available at https://github.com/princeton-vl/CoqGym.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.09381

PDF

http://arxiv.org/pdf/1905.09381
Read All
A Comparative Analysis of Distributional Term Representations for Author Profiling in Social Media

2019-05-21

Miguel Á. Álvarez-Carmona, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Luis Villaseñor-Pienda

arXiv_CL

arXiv_CL Language_Model
Abstract

Author Profiling (AP) aims at predicting specific characteristics from a group of authors by analyzing their written documents. Many research has been focused on determining suitable features for modeling writing patterns from authors. Reported results indicate that content-based features continue to be the most relevant and discriminant features for solving this task. Thus, in this paper, we present a thorough analysis regarding the appropriateness of different distributional term representations (DTR) for the AP task. In this regard, we introduce a novel framework for supervised AP using these representations and, supported on it. We approach a comparative analysis of representations such as DOR, TCOR, SSR, and word2vec in the AP problem. We also compare the performance of the DTRs against classic approaches including popular topic-based methods. The obtained results indicate that DTRs are suitable for solving the AP task in social media domains as they achieve competitive results while providing meaningful interpretability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08780

PDF

http://arxiv.org/pdf/1905.08780
Read All
Textured Neural Avatars

2019-05-21

Aliaksandra Shysheya (Samsung AI Center, Skolkovo Institute of Science and Technology), Egor Zakharov (Samsung AI Center, Skolkovo Institute of Science and Technology), Kara-Ali Aliev (Samsung AI Center), Renat Bashirov (Samsung AI Center), Egor Burkov (Samsung AI Center, Skolkovo Institute of Science and Technology), Karim Iskakov (Samsung AI Center), Aleksei Ivakhnenko (Samsung AI Center), Yury Malkov (Samsung AI Center), Igor Pasechnik (Samsung AI Center), Dmitry Ulyanov (Samsung AI Center, Skolkovo Institute of Science and Technology), Alexander Vakhitov (Samsung AI Center, Skolkovo Institute of Science and Technology), Victor Lempitsky (Samsung AI Center, Skolkovo Institute of Science and Technology)

arXiv_AI

arXiv_AI Face CNN Deep_Learning
Abstract

We present a system for learning full-body neural avatars, i.e. deep networks that produce full-body renderings of a person for varying body pose and camera position. Our system takes the middle path between the classical graphics pipeline and the recent deep learning approaches that generate images of humans using image-to-image translation. In particular, our system estimates an explicit two-dimensional texture map of the model surface. At the same time, it abstains from explicit shape modeling in 3D. Instead, at test time, the system uses a fully-convolutional network to directly map the configuration of body feature points w.r.t. the camera to the 2D texture coordinates of individual pixels in the image frame. We show that such a system is capable of learning to generate realistic renderings while being trained on videos annotated with 3D poses and foreground masks. We also demonstrate that maintaining an explicit texture representation helps our system to achieve better generalization compared to systems that use direct image-to-image translation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08776

PDF

http://arxiv.org/pdf/1905.08776
Read All
Toward Learning a Unified Many-to-Many Mapping for Diverse Image Translation

2019-05-21

Wenju Xu, Shawn Keshmiri, Guanghui Wang

arXiv_CV

arXiv_CV Adversarial GAN Quantitative
Abstract

Image-to-image translation, which translates input images to a different domain with a learned one-to-one mapping, has achieved impressive success in recent years. The success of translation mainly relies on the network architecture to reserve the structural information while modify the appearance slightly at the pixel level through adversarial training. Although these networks are able to learn the mapping, the translated images are predictable without exclusion. It is more desirable to diversify them using image-to-image translation by introducing uncertainties, i.e., the generated images hold potential for variations in colors and textures in addition to the general similarity to the input images, and this happens in both the target and source domains. To this end, we propose a novel generative adversarial network (GAN) based model, InjectionGAN, to learn a many-to-many mapping. In this model, the input image is combined with latent variables, which comprise of domain-specific attribute and unspecific random variations. The domain-specific attribute indicates the target domain of the translation, while the unspecific random variations introduce uncertainty into the model. A unified framework is proposed to regroup these two parts and obtain diverse generations in each domain. Extensive experiments demonstrate that the diverse generations have high quality for the challenging image-to-image translation tasks where no pairing information of the training dataset exits. Both quantitative and qualitative results prove the superior performance of InjectionGAN over the state-of-the-art approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08766

PDF

http://arxiv.org/pdf/1905.08766
Read All
Sampling from Stochastic Finite Automata with Applications to CTC Decoding

2019-05-21

Martin Jansche, Alexander Gutkin

arXiv_CL

arXiv_CL Classification
Abstract

Stochastic finite automata arise naturally in many language and speech processing tasks. They include stochastic acceptors, which represent certain probability distributions over random strings. We consider the problem of efficient sampling: drawing random string variates from the probability distribution represented by stochastic automata and transformations of those. We show that path-sampling is effective and can be efficient if the epsilon-graph of a finite automaton is acyclic. We provide an algorithm that ensures this by conflating epsilon-cycles within strongly connected components. Sampling is also effective in the presence of non-injective transformations of strings. We illustrate this in the context of decoding for Connectionist Temporal Classification (CTC), where the predictive probabilities yield auxiliary sequences which are transformed into shorter labeling strings. We can sample efficiently from the transformed labeling distribution and use this in two different strategies for finding the most probable CTC labeling.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08760

PDF

http://arxiv.org/pdf/1905.08760
Read All
aUToTrack: A Lightweight Object Detection and Tracking System for the SAE AutoDrive Challenge

2019-05-21

Keenan Burnett, Sepehr Samavi, Steven L. Waslander, Timothy D. Barfoot, Angela P. Schoellig

arXiv_RO

arXiv_RO Object_Detection Knowledge Tracking Object_Tracking Detection
Abstract

The University of Toronto is one of eight teams competing in the SAE AutoDrive Challenge – a competition to develop a self-driving car by 2020. After placing first at the Year 1 challenge, we are headed to MCity in June 2019 for the second challenge. There, we will interact with pedestrians, cyclists, and cars. For safe operation, it is critical to have an accurate estimate of the position of all objects surrounding the vehicle. The contributions of this work are twofold: First, we present a new object detection and tracking dataset (UofTPed50), which uses GPS to ground truth the position and velocity of a pedestrian. To our knowledge, a dataset of this type for pedestrians has not been shown in the literature before. Second, we present a lightweight object detection and tracking system (aUToTrack) that uses vision, LIDAR, and GPS/IMU positioning to achieve state-of-the-art performance on the KITTI Object Tracking benchmark. We show that aUToTrack accurately estimates the position and velocity of pedestrians, in real-time, using CPUs only. aUToTrack has been tested in closed-loop experiments on a real self-driving car, and we demonstrate its performance on our dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08758

PDF

http://arxiv.org/pdf/1905.08758
Read All
RIU-Net: Embarrassingly simple semantic segmentation of 3D LiDAR point cloud

2019-05-21

Pierre Biasutti, Aurélie Bugeau, Jean-François Aujol, Mathieu Brédif

arXiv_CV

arXiv_CV Object_Detection Segmentation Semantic_Segmentation Detection
Abstract

This paper proposes RIU-Net (for Range-Image U-Net), the adaptation of a popular semantic segmentation network for the semantic segmentation of a 3D LiDAR point cloud. The point cloud is turned into a 2D range-image by exploiting the topology of the sensor. This image is then used as input to a U-net. This architecture has already proved its efficiency for the task of semantic segmentation of medical images. We propose to demonstrate how it can also be used for the accurate semantic segmentation of a 3D LiDAR point cloud. Our model is trained on range-images built from KITTI 3D object detection dataset. Experiments show that RIU-Net, despite being very simple, outperforms the state-of-the-art of range-image based methods. Finally, we demonstrate that this architecture is able to operate at 90fps on a single GPU, which enables deployment on low computational power systems such as robots.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08748

PDF

http://arxiv.org/pdf/1905.08748
Read All
Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems

2019-05-21

Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, Pascale Fung

arXiv_AI

arXiv_AI Knowledge Ontology Tracking Inference
Abstract

Over-dependence on domain ontology and lack of knowledge sharing across domains are two practical and yet less studied problems of dialogue state tracking. Existing approaches generally fall short in tracking unknown slot values during inference and often have difficulties in adapting to new domains. In this paper, we propose a Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using a copy mechanism, facilitating knowledge transfer when predicting (domain, slot, value) triplets not encountered during training. Our model is composed of an utterance encoder, a slot gate, and a state generator, which are shared across domains. Empirical results demonstrate that TRADE achieves state-of-the-art joint goal accuracy of 48.62% for the five domains of MultiWOZ, a human-human dialogue dataset. In addition, we show its transferring ability by simulating zero-shot and few-shot dialogue state tracking for unseen domains. TRADE achieves 60.58% joint goal accuracy in one of the zero-shot domains, and is able to adapt to few-shot cases without forgetting already trained domains.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08743

PDF

http://arxiv.org/pdf/1905.08743
Read All
Increasing the adversarial robustness of capsule networks through scaled distance agreements

2019-05-21

David Peer, Sebastian Stabinger, Antonio Rodriguez-Sanchez

arXiv_AI

arXiv_AI Adversarial
Abstract

The capsule of a capsule network represents an object or part of an object in a parse tree. An output vector of a capsule encodes the instantiation parameters such as position, size, or orientation. The most used algorithm to route vectors from the lower level layers to the upper level layers is the routing-by-agreement algorithm. This algorithm is thought to activate capsules in the network such that all active capsules form a parse tree. This parse tree structure should represent a hierarchical composition of objects that are build out of smaller objects. In this paper we introduce different metrics to evaluate the parse tree structure of capsule networks and show that the commonly used routing-by-agreement algorithm does not ensure the emergence of a parse tree. Therefore, we introduce a new routing algorithm named scaled-distance-agreement routing that calculates agreements with distances rather than the dot product. We show experimentally for different network architectures and datasets that this new calculation of the agreement ensures a parse tree structure. The novel routing algorithm is also much more robust against whitebox adversarial attacks than the original routing algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.09707

PDF

http://arxiv.org/pdf/1812.09707
Read All
A realistic and robust model for Chinese word segmentation

2019-05-21

Chu-Ren Huang, Ting-Shuo Yo, Petr Simon, Shu-Kai Hsieh

arXiv_CL

arXiv_CL Segmentation
Abstract

A realistic Chinese word segmentation tool must adapt to textual variations with minimal training input and yet robust enough to yield reliable segmentation result for all variants. Various lexicon-driven approaches to Chinese segmentation, e.g. [1,16], achieve high f-scores yet require massive training for any variation. Text-driven approach, e.g. [12], can be easily adapted for domain and genre changes yet has difficulty matching the high f-scores of the lexicon-driven approaches. In this paper, we refine and implement an innovative text-driven word boundary decision (WBD) segmentation model proposed in [15]. The WBD model treats word segmentation simply and efficiently as a binary decision on whether to realize the natural textual break between two adjacent characters as a word boundary. The WBD model allows simple and quick training data preparation converting characters as contextual vectors for learning the word boundary decision. Machine learning experiments with four different classifiers show that training with 1,000 vectors and 1 million vectors achieve comparable and reliable results. In addition, when applied to SigHAN Bakeoff 3 competition data, the WBD model produces OOV recall rates that are higher than all published results. Unlike all previous work, our OOV recall rate is comparable to our own F-score. Both experiments support the claim that the WBD model is a realistic model for Chinese word segmentation as it can be easily adapted for new variants with the robust result. In conclusion, we will discuss linguistic ramifications as well as future implications for the WBD approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08732

PDF

http://arxiv.org/pdf/1905.08732
Read All
Task Decomposition and Synchronization for Semantic Biomedical Image Segmentation

2019-05-21

Xuhua Ren, Lichi Zhang, Sahar Ahmad, Dong Nie, Fan Yang, Lei Xiang, Qian Wang, Dinggang Shen

arXiv_CV

arXiv_CV Regularization Segmentation CNN Semantic_Segmentation Classification Prediction
Abstract

Semantic segmentation is essentially important to biomedical image analysis. Many recent works mainly focus on integrating the Fully Convolutional Network (FCN) architecture with sophisticated convolution implementation and deep supervision. In this paper, we propose to decompose the single segmentation task into three subsequent sub-tasks, including (1) pixel-wise image segmentation, (2) prediction of the class labels of the objects within the image, and (3) classification of the scene the image belonging to. While these three sub-tasks are trained to optimize their individual loss functions of different perceptual levels, we propose to let them interact by the task-task context ensemble. Moreover, we propose a novel sync-regularization to penalize the deviation between the outputs of the pixel-wise segmentation and the class prediction tasks. These effective regularizations help FCN utilize context information comprehensively and attain accurate semantic segmentation, even though the number of the images for training may be limited in many biomedical applications. We have successfully applied our framework to three diverse 2D/3D medical image datasets, including Robotic Scene Segmentation Challenge 18 (ROBOT18), Brain Tumor Segmentation Challenge 18 (BRATS18), and Retinal Fundus Glaucoma Challenge (REFUGE18). We have achieved top-tier performance in all three challenges.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08720

PDF

http://arxiv.org/pdf/1905.08720
Read All
Lightweight Network Architecture for Real-Time Action Recognition

2019-05-21

Alexander Kozlov, Vadim Andronov, Yana Gritsenko

arXiv_AI

arXiv_AI Video_Caption Action_Recognition Inference Recognition
Abstract

In this work we present a new efficient approach to Human Action Recognition called Video Transformer Network (VTN). It leverages the latest advances in Computer Vision and Natural Language Processing and applies them to video understanding. The proposed method allows us to create lightweight CNN models that achieve high accuracy and real-time speed using just an RGB mono camera and general purpose CPU. Furthermore, we explain how to improve accuracy by distilling from multiple models with different modalities into a single model. We conduct a comparison with state-of-the-art methods and show that our approach performs on par with most of them on famous Action Recognition datasets. We benchmark the inference time of the models using the modern inference framework and argue that our approach compares favorably with other methods in terms of speed/accuracy trade-off, running at 56 FPS on CPU. The models and the training code are available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08711

PDF

http://arxiv.org/pdf/1905.08711
Read All
Autonomous Wireless Systems with Artificial Intelligence

2019-05-21

Haris Gacanin

arXiv_AI

arXiv_AI Knowledge GAN
Abstract

This paper discusses technology and opportunities to embrace artificial intelligence (AI) in the design of autonomous wireless systems. We aim to provide readers with motivation and general AI methodology of autonomous agents in the context of self-organization in real time by unifying knowledge management with sensing, reasoning and active learning. We highlight differences between training-based methods for matching problems and training-free methods for environment-specific problems. Finally, we conceptually introduce the functions of an autonomous agent with knowledge management.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.10518

PDF

http://arxiv.org/pdf/1806.10518
Read All
GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature of Point Cloud

2019-05-21

Can Chen, Luca Zanotti Fragonara, Antonios Tsourdos

arXiv_CV

arXiv_CV Sparse Segmentation Attention Embedding Classification
Abstract

Exploiting fine-grained semantic features on point cloud is still challenging due to its irregular and sparse structure in a non-Euclidean space. Among existing studies, PointNet provides an efficient and promising approach to learn shape features directly on unordered 3D point cloud and has achieved competitive performance. However, local feature that is helpful towards better contextual learning is not considered. Meanwhile, attention mechanism shows efficiency in capturing node representation on graph-based data by attending over neighboring nodes. In this paper, we propose a novel neural network for point cloud, dubbed GAPNet, to learn local geometric representations by embedding graph attention mechanism within stacked Multi-Layer-Perceptron (MLP) layers. Firstly, we introduce a GAPLayer to learn attention features for each point by highlighting different attention weights on neighborhood. Secondly, in order to exploit sufficient features, a multi-head mechanism is employed to allow GAPLayer to aggregate different features from independent heads. Thirdly, we propose an attention pooling layer over neighbors to capture local signature aimed at enhancing network robustness. Finally, GAPNet applies stacked MLP layers to attention features and local signature to fully extract local geometric structures. The proposed GAPNet architecture is tested on the ModelNet40 and ShapeNet part datasets, and achieves state-of-the-art performance in both shape classification and part segmentation tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08705

PDF

http://arxiv.org/pdf/1905.08705
Read All
Acoustic-to-Word Models with Conversational Context Information

2019-05-21

Suyoun Kim, Florian Metze

arXiv_CL

arXiv_CL Knowledge Speech_Recognition Recognition
Abstract

Conversational context information, higher-level knowledge that spans across sentences, can help to recognize a long conversation. However, existing speech recognition models are typically built at a sentence level, and thus it may not capture important conversational context information. The recent progress in end-to-end speech recognition enables integrating context with other available information (e.g., acoustic, linguistic resources) and directly recognizing words from speech. In this work, we present a direct acoustic-to-word, end-to-end speech recognition model capable of utilizing the conversational context to better process long conversations. We evaluate our proposed approach on the Switchboard conversational speech corpus and show that our system outperforms a standard end-to-end speech recognition system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08796

PDF

http://arxiv.org/pdf/1905.08796
Read All
AMR Parsing as Sequence-to-Graph Transduction

2019-05-21

Sheng Zhang, Xutai Ma, Kevin Duh, Benjamin Van Durme

arXiv_CL

arXiv_CL Attention
Abstract

We propose an attention-based model that treats AMR parsing as sequence-to-graph transduction. Unlike most AMR parsers that rely on pre-trained aligners, external semantic resources, or data augmentation, our proposed parser is aligner-free, and it can be effectively trained with limited amounts of labeled AMR data. Our experimental results outperform all previously reported SMATCH scores, on both AMR 2.0 (76.3% F1 on LDC2017T10) and AMR 1.0 (70.2% F1 on LDC2014T12).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08704

PDF

http://arxiv.org/pdf/1905.08704
Read All
Design of conversational humanoid robot based on hardware independent gesture generation

2019-05-21

katsushi ikeuchi, David Baumert, Shunsuke Kudoh, Masaru Takizawa

arXiv_RO

arXiv_RO
Abstract

With an increasing need for elderly and disability care, there is an increasing opportunity for intelligent and mobile devices such as robots to provide care and support solutions. In order to naturally assist and interact with humans, a robot must possess effective conversational capabilities. Gestures accompanying spoken sentences are an important factor in human-to-human conversational communication. Humanoid robots must also use gestures if they are to be capable of the rich interactions implied and afforded by their humanlike appearance. However, present systems for gesture generation do not dynamically provide realistic physical gestures that are naturally understood by humans. A method for humanoid robots to generate gestures along with spoken sentences is proposed herein. We emphasize that our gesture-generating architecture can be applied to any type of humanoid robot through the use of labanotation, which is an existing system for notating human dance movements. Labanotation’s gesture symbols can computationally transformed to be compatible across a range of robots with doddering physical characteristics. This paper describes a solution as an integrated system for conversational robots whose speech and gestures can supplement each other in human-robot interaction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08702

PDF

http://arxiv.org/pdf/1905.08702
Read All
Understanding trained CNNs by indexing neuron selectivity

2019-05-21

Ivet Rafegas, Maria Vanrell, Luis A. Alexandre, Guillem Arias

arXiv_CV

arXiv_CV GAN Face CNN
Abstract

The impressive performance of Convolutional Neural Networks (CNNs) when solving different vision problems is shadowed by their black-box nature and our consequent lack of understanding of the representations they build and how these representations are organized. To help understanding these issues, we propose to describe the activity of individual neurons by their Neuron Feature visualization and quantify their inherent selectivity with two specific properties. We explore selectivity indexes for: an image feature (color); and an image label (class membership). Our contribution is a framework to seek or classify neurons by indexing on these selectivity properties. It helps to find color selective neurons, such as a red-mushroom neuron in layer Conv4 or class selective neurons such as dog-face neurons in layer Conv5 in VGG-M, and establishes a methodology to derive other selectivity properties. Indexing on neuron selectivity can statistically draw how features and classes are represented through layers in a moment when the size of trained nets is growing and automatic tools to index neurons can be helpful.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1702.00382

PDF

http://arxiv.org/pdf/1702.00382
Read All
Approximating probabilistic models as weighted finite automata

2019-05-21

Ananda Theertha Suresh, Brian Roark, Michael Riley, Vlad Schogol

arXiv_CL

arXiv_CL Optimization Language_Model Recognition
Abstract

Weighted finite automata (WFA) are often used to represent probabilistic models, such as $n$-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leiber divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on various tasks, including distilling $n$-gram models from neural models, building compact language models, and building open-vocabulary character models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08701

PDF

http://arxiv.org/pdf/1905.08701
Read All
Real time Traffic Flow Parameters Prediction with Basic Safety Messages at Low Penetration of Connected Vehicles

2019-05-21

Mizanur Rahman, Mashrur Chowdhury, Jerome McClendon

arXiv_CV

arXiv_CV RNN Prediction
Abstract

The expected low market penetration of connected vehicles (CVs) in the near future could be a constraint in estimating traffic flow parameters, such as average travel speed of a roadway segment and average space headway between vehicles from the CV broadcasted data. This estimated traffic flow parameters from low penetration of connected vehicles become noisy compared to 100 percent penetration of CVs, and such noise reduces the real time prediction accuracy of a machine learning model, such as the accuracy of long short term memory (LSTM) model in terms of predicting traffic flow parameters. The accurate prediction of the parameters is important for future traffic condition assessment. To improve the prediction accuracy using noisy traffic flow parameters, which is constrained by limited CV market penetration and limited CV data, we developed a real time traffic data prediction model that combines LSTM with Kalman filter based Rauch Tung Striebel (RTS) noise reduction model. We conducted a case study using the Enhanced Next Generation Simulation (NGSIM) dataset, which contains vehicle trajectory data for every one tenth of a second, to evaluate the performance of this prediction model. Compared to a baseline LSTM model performance, for only 5 percent penetration of CVs, the analyses revealed that combined LSTM and RTS model reduced the mean absolute percentage error (MAPE) from 19 percent to 5 percent for speed prediction and from 27 percent to 9 percent for space-headway prediction. The statistical significance test with a 95 percent confidence interval confirmed no significant difference in predicted average speed and average space headway using this LSTM and RTS combination with only 5 percent CV penetration rate.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.03562

PDF

https://arxiv.org/pdf/1811.03562
Read All
EventKG - the Hub of Event Knowledge on the Web - and Biographical Timeline Generation

2019-05-21

Simon Gottschalk, Elena Demidova

arXiv_CL

arXiv_CL Knowledge_Graph Knowledge Relation
Abstract

One of the key requirements to facilitate the semantic analytics of information regarding contemporary and historical events on the Web, in the news and in social media is the availability of reference knowledge repositories containing comprehensive representations of events, entities and temporal relations. Existing knowledge graphs, with popular examples including DBpedia, YAGO and Wikidata, focus mostly on entity-centric information and are insufficient in terms of their coverage and completeness with respect to events and temporal relations. In this article we address this limitation, formalise the concept of a temporal knowledge graph and present its instantiation - EventKG. EventKG is a multilingual event-centric temporal knowledge graph that incorporates over 690 thousand events and over 2.3 million temporal relations obtained from several large-scale knowledge graphs and semi-structured sources and makes them available through a canonical RDF representation. Whereas popular entities often possess hundreds of relations within a temporal knowledge graph such as EventKG, generating a concise overview of the most important temporal relations for a given entity is a challenging task. In this article we demonstrate an application of EventKG to biographical timeline generation, where we adopt a distant supervision method to identify relations most relevant for an entity biography. Our evaluation results provide insights on the characteristics of EventKG and demonstrate the effectiveness of the proposed biographical timeline generation method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08794

PDF

http://arxiv.org/pdf/1905.08794
Read All
Proprioceptive Robot Collision Detection through Gaussian Process Regression

2019-05-21

Dalla Libera Alberto, Tosello Elisa, Pillonetto Gianluigi, Ghidoni Stefano, Carli Ruggero

arXiv_RO

arXiv_RO Detection
Abstract

This paper proposes a proprioceptive collision detection algorithm based on Gaussian Regression. Compared to sensor-based collision detection and other proprioceptive algorithms, the proposed approach has minimal sensing requirements, since only the currents and the joint configurations are needed. The algorithm extends the standard Gaussian Process models adopted in learning the robot inverse dynamics, using a more rich set of input locations and an ad-hoc kernel structure to model the complex and non-linear behaviors due to frictions in quasi-static configurations. Tests performed on a Universal Robots UR10 show the effectiveness of the proposed algorithm to detect when a collision has occurred.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08689

PDF

http://arxiv.org/pdf/1905.08689
Read All
Improved Optical Flow for Gesture-based Human-robot Interaction

2019-05-21

Jen-Yen Chang, Antonio Tejero-de-Pablos, Tatsuya Harada

arXiv_CV

arXiv_CV Attention Deep_Learning Recognition
Abstract

Gesture interaction is a natural way of communicating with a robot as an alternative to speech. Gesture recognition methods leverage optical flow in order to understand human motion. However, while accurate optical flow estimation (i.e., traditional) methods are costly in terms of runtime, fast estimation (i.e., deep learning) methods’ accuracy can be improved. In this paper, we present a pipeline for gesture-based human-robot interaction that uses a novel optical flow estimation method in order to achieve an improved speed-accuracy trade-off. Our optical flow estimation method introduces four improvements to previous deep learning-based methods: strong feature extractors, attention to contours, midway features, and a combination of these three. This results in a better understanding of motion, and a finer representation of silhouettes. In order to evaluate our pipeline, we generated our own dataset, MIBURI, which contains gestures to command a house service robot. In our experiments, we show how our method improves not only optical flow estimation, but also gesture recognition, offering a speed-accuracy trade-off more realistic for practical robot applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08685

PDF

http://arxiv.org/pdf/1905.08685
Read All
Revisiting hard thresholding for DNN pruning

2019-05-21

Konstantinos Pitas, Mike Davies, Pierre Vandergheynst

arXiv_AI

arXiv_AI Classification
Abstract

The most common method for DNN pruning is hard thresholding of network weights, followed by retraining to recover any lost accuracy. Recently developed smart pruning algorithms use the DNN response over the training set for a variety of cost functions to determine redundant network weights, leading to less accuracy degradation and possibly less retraining time. For experiments on the total pruning time (pruning time + retraining time) we show that hard thresholding followed by retraining remains the most efficient way of reducing the number of network parameters. However smart pruning algorithms still have advantages when retraining is not possible. In this context we propose a novel smart pruning algorithm based on difference of convex functions optimisation and show that it is often orders of magnitude faster than competing approaches while achieving the lowest classification accuracy degradation. Furthermore we investigate theoretically the effect of hard thresholding on DNN accuracy. We show that accuracy degradation increases with remaining network depth from the pruned layer. We also discover a link between the latent dimensionality of the training data manifold and network robustness to hard thresholding.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08793

PDF

http://arxiv.org/pdf/1905.08793
Read All
MultiWiki: Interlingual Text Passage Alignment in Wikipedia

2019-05-21

Simon Gottschalk, Elena Demidova

arXiv_CL

arXiv_CL
Abstract

In this article we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences and build a basis for qualitative analysis of the articles. An important challenge in this context is the trade-off between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian and the English Wikipedia and collect a user-annotated benchmark. Then we propose MultiWiki – a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. MultiWiki demonstration is publicly available and currently supports four language pairs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08675

PDF

http://arxiv.org/pdf/1905.08675
Read All
RASNet: Segmentation for Tracking Surgical Instruments in Surgical Videos Using Refined Attention Segmentation Network

2019-05-21

Zhen-Liang Ni, Gui-Bin Bian, Xiao-Liang Xie, Zeng-Guang Hou, Xiao-Hu Zhou, Yan-Jie Zhou

arXiv_CV

arXiv_CV Segmentation Attention Tracking Transfer_Learning
Abstract

Segmentation for tracking surgical instruments plays an important role in robot-assisted surgery. Segmentation of surgical instruments contributes to capturing accurate spatial information for tracking. In this paper, a novel network, Refined Attention Segmentation Network, is proposed to simultaneously segment surgical instruments and identify their categories. The U-shape network which is popular in segmentation is used. Different from previous work, an attention module is adopted to help the network focus on key regions, which can improve the segmentation accuracy. To solve the class imbalance problem, the weighted sum of the cross entropy loss and the logarithm of the Jaccard index is used as loss function. Furthermore, transfer learning is adopted in our network. The encoder is pre-trained on ImageNet. The dataset from the MICCAI EndoVis Challenge 2017 is used to evaluate our network. Based on this dataset, our network achieves state-of-the-art performance 94.65% mean Dice and 90.33% mean IOU.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08663

PDF

http://arxiv.org/pdf/1905.08663
Read All
Generic Multilayer Network Data Analysis with the Fusion of Content and Structure

2019-05-21

Xuan-Son Vu, Abhishek Santra, Sharma Chakravarthy, Lili Jiang

arXiv_CL

arXiv_CL Knowledge Face Relation
Abstract

Multi-feature data analysis (e.g., on Facebook, LinkedIn) is challenging especially if one wants to do it efficiently and retain the flexibility by choosing features of interest for analysis. Features (e.g., age, gender, relationship, political view etc.) can be explicitly given from datasets, but also can be derived from content (e.g., political view based on Facebook posts). Analysis from multiple perspectives is needed to understand the datasets (or subsets of it) and to infer meaningful knowledge. For example, the influence of age, location, and marital status on political views may need to be inferred separately (or in combination). In this paper, we adapt multilayer network (MLN) analysis, a nontraditional approach, to model the Facebook datasets, integrate content analysis, and conduct analysis, which is driven by a list of desired application based queries. Our experimental analysis shows the flexibility and efficiency of the proposed approach when modeling and analyzing datasets with multiple features.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08635

PDF

http://arxiv.org/pdf/1905.08635
Read All
Automatic discrete differentiation and its applications

2019-05-21

Ai Ishikawa, Takaharu Yaguchi

arXiv_AI

arXiv_AI
Abstract

In this paper, a method for automatically deriving energy-preserving numerical methods for the Euler-Lagrange equation and the Hamilton equation is proposed. The derived energy-preserving scheme is based on the discrete gradient method. In the proposed approach, the discrete gradient, which is a key tool for designing the scheme, is automatically computed by a similar algorithm to the automatic differentiation. Besides, the discrete gradient coincides with the usual gradient if the two arguments required to define the discrete gradient are the same. Hence the proposed method is an extension of the automatic differentiation in the sense that the proposed method derives not only the discrete gradient but also the usual gradient. Due to this feature, both energy-preserving integrators and variational (and hence symplectic) integrators can be implemented in the same programming code simultaneously. This allows users to freely switch between the energy-preserving numerical method and the symplectic numerical method in accordance with the problem-setting and other requirements. As applications, an energy-preserving numerical scheme for a nonlinear wave equation and a training algorithm of artificial neural networks derived from an energy-dissipative numerical scheme are shown.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08604

PDF

http://arxiv.org/pdf/1905.08604
Read All
Une ou deux composantes ? La réponse de la diffusion en ondelettes

2019-05-21

Vincent Lostanlen

arXiv_SD

arXiv_SD
Abstract

With the aim of constructing a biologically plausible model of machine listening, we study the representation of a multicomponent stationary signal by a wavelet scattering network. First, we show that renormalizing second-order nodes by their first-order parents gives a simple numerical criterion to establish whether two neighboring components will interfere psychoacoustically. Secondly, we generalize the `one or two components’ framework to three sine waves or more, and show that a network of depth $M = \log_2 N$ suffices to characterize the relative amplitudes of the first $N$ terms in a Fourier series, while enjoying properties of invariance to frequency transposition and component-wise phase shifts.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.08601

PDF

https://arxiv.org/pdf/1905.08601
Read All
Une ou deux composantes ? La r'eponse de la diffusion en ondelettes

2019-05-21

Vincent Lostanlen

arXiv_SD

arXiv_SD
Abstract

With the aim of constructing a biologically plausible model of machine listening, we study the representation of a multicomponent stationary signal by a wavelet scattering network. First, we show that renormalizing second-order nodes by their first-order parents gives a simple numerical criterion to establish whether two neighboring components will interfere psychoacoustically. Secondly, we generalize the `one or two components’ framework to three sine waves or more, and show that a network of depth $M = \log_2 N$ suffices to characterize the relative amplitudes of the first $N$ terms in a Fourier series, while enjoying properties of invariance to frequency transposition and component-wise phase shifts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08601

PDF

http://arxiv.org/pdf/1905.08601
Read All
SharpNet: Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation

2019-05-21

Michaël Ramamonjisoa, Vincent Lepetit

arXiv_CV

arXiv_CV Attention Prediction Recognition
Abstract

We introduce SharpNet, a method that predicts an accurate depth map for an input color image, with a particular attention to the reconstruction of occluding contours: Occluding contours are an important cue for object recognition, and for realistic integration of virtual objects in Augmented Reality, but they are also notoriously difficult to reconstruct accurately. For example, they are a challenge for stereo-based reconstruction methods, as points around an occluding contour are visible in only one image. Inspired by recent methods that introduce normal estimation to improve depth prediction, we introduce a novel term that constrains depth and occluding contours predictions. Since ground truth depth is difficult to obtain with pixel-perfect accuracy along occluding contours, we use synthetic images for training, followed by fine-tuning on real data. We demonstrate our approach on the challenging NYUv2-Depth dataset, and show that our method outperforms the state-of-the-art along occluding contours, while performing on par with the best recent methods for the rest of the images. Its accuracy along the occluding contours is actually better than the `ground truth’ acquired by a depth camera based on structured light. We show this by introducing a new benchmark based on NYUv2-Depth for evaluating occluding contours in monocular reconstruction, which is our second contribution.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08598

PDF

http://arxiv.org/pdf/1905.08598
Read All
Marginalized Average Attentional Network for Weakly-Supervised Learning

2019-05-21

Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung

arXiv_CV

arXiv_CV Salient Attention
Abstract

In weakly-supervised temporal action localization, previous works have failed to locate dense and integral regions for each entire action due to the overestimation of the most salient regions. To alleviate this issue, we propose a marginalized average attentional network (MAAN) to suppress the dominant response of the most salient regions in a principled manner. The MAAN employs a novel marginalized average aggregation (MAA) module and learns a set of latent discriminative probabilities in an end-to-end fashion. MAA samples multiple subsets from the video snippet features according to a set of latent discriminative probabilities and takes the expectation over all the averaged subset features. Theoretically, we prove that the MAA module with learned latent discriminative probabilities successfully reduces the difference in responses between the most salient regions and the others. Therefore, MAAN is able to generate better class activation sequences and identify dense and integral action regions in the videos. Moreover, we propose a fast algorithm to reduce the complexity of constructing MAA from O($2^T$) to O($T^2$). Extensive experiments on two large-scale video datasets show that our MAAN achieves superior performance on weakly-supervised temporal action localization

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08586

PDF

http://arxiv.org/pdf/1905.08586
Read All
Similarity Measure Development for Case-Based Reasoning- A Data-driven Approach

2019-05-21

Deepika Verma, Kerstin Bach, Paul Jarle Mork

arXiv_AI

arXiv_AI
Abstract

In this paper, we demonstrate a data-driven methodology for modelling the local similarity measures of various attributes in a dataset. We analyse the spread in the numerical attributes and estimate their distribution using polynomial function to showcase an approach for deriving strong initial value ranges of numerical attributes and use a non-overlapping distribution for categorical attributes such that the entire similarity range [0,1] is utilized. We use an open source dataset for demonstrating modelling and development of the similarity measures and will present a case-based reasoning (CBR) system that can be used to search for the most relevant similar cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08581

PDF

http://arxiv.org/pdf/1905.08581
Read All
Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery

2019-05-21

Chaitanya Devaguptapu, Ninad Akolekar, Manuj M Sharma, Vineeth N Balasubramanian

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Can we improve detection in the thermal domain by borrowing features from rich domains like visual RGB? In this paper, we propose a pseudo-multimodal object detector trained on natural image domain data to help improve the performance of object detection in thermal images. We assume access to a large-scale dataset in the visual RGB domain and relatively smaller dataset (in terms of instances) in the thermal domain, as is common today. We propose the use of well-known image-to-image translation frameworks to generate pseudo-RGB equivalents of a given thermal image and then use a multi-modal architecture for object detection in the thermal image. We show that our framework outperforms existing benchmarks without the explicit need for paired training examples from the two domains. We also show that our framework has the ability to learn with less data from thermal domain when using our approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08789

PDF

http://arxiv.org/pdf/1905.08789
Read All
Online Signature Verification Based on Writer Specific Feature Selection and Fuzzy Similarity Measure

2019-05-21

Chandra Sekhar V, Prerana Mukherjee, D.S. Guru, Viswanath Pulabaigari

arXiv_CV

arXiv_CV
Abstract

Online Signature Verification (OSV) is a widely used biometric attribute for user behavioral characteristic verification in digital forensics. In this manuscript, owing to large intra-individual variability, a novel method for OSV based on an interval symbolic representation and a fuzzy similarity measure grounded on writer specific parameter selection is proposed. The two parameters, namely, writer specific acceptance threshold and optimal feature set to be used for authenticating the writer are selected based on minimum equal error rate (EER) attained during parameter fixation phase using the training signature samples. This is in variation to current techniques for OSV, which are primarily writer independent, in which a common set of features and acceptance threshold are chosen. To prove the robustness of our system, we have exhaustively assessed our system with four standard datasets i.e. MCYT-100 (DB1), MCYT-330 (DB2), SUSIG-Visual corpus and SVC-2004- Task2. Experimental outcome confirms the effectiveness of fuzzy similarity metric-based writer dependent parameter selection for OSV by achieving a lower error rate as compared to many recent and state-of-the art OSV models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08574

PDF

http://arxiv.org/pdf/1905.08574
Read All

17/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL