Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Strategyproof Peer Selection using Randomization, Partitioning, and Apportionment

2019-04-25

Haris Aziz, Omer Lev, Nicholas Mattei, Jeffrey S. Rosenschein, Toby Walsh

arXiv_AI

arXiv_AI Review
Abstract

Peer review, evaluation, and selection is a fundamental aspect of modern science. Funding bodies the world over employ experts to review and select the best proposals of those submitted for funding. The problem of peer selection, however, is much more general: a professional society may want to give a subset of its members awards based on the opinions of all members; an instructor for a MOOC or online course may want to crowdsource grading; or a marketing company may select ideas from group brainstorming sessions based on peer evaluation. We make three fundamental contributions to the study of procedures or mechanisms for peer selection, a specific type of group decision-making problem, studied in computer science, economics, and political science. First, we propose a novel mechanism that is strategyproof, i.e., agents cannot benefit by reporting insincere valuations. Second, we demonstrate the effectiveness of our mechanism by a comprehensive simulation-based comparison with a suite of mechanisms found in the literature. Finally, our mechanism employs a randomized rounding technique that is of independent interest, as it solves the apportionment problem that arises in various settings where discrete resources such as parliamentary representation slots need to be divided proportionally.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1604.03632

PDF

http://arxiv.org/pdf/1604.03632
Read All
Unsupervised Deep Learning by Neighbourhood Discovery

2019-04-25

Jiabo Huang, Qi Dong, Shaogang Gong, Xiatian Zhu

arXiv_CV

arXiv_CV CNN Image_Classification Classification Deep_Learning
Abstract

Deep convolutional neural networks (CNNs) have demonstrated remarkable success in computer vision by supervisedly learning strong visual feature representations. However, training CNNs relies heavily on the availability of exhaustive training data annotations, limiting significantly their deployment and scalability in many application scenarios. In this work, we introduce a generic unsupervised deep learning approach to training deep models without the need for any manual label supervision. Specifically, we progressively discover sample anchored/centred neighbourhoods to reason and learn the underlying class decision boundaries iteratively and accumulatively. Every single neighbourhood is specially formulated so that all the member samples can share the same unseen class labels at high probability for facilitating the extraction of class discriminative feature representations during training. Experiments on image classification show the performance advantages of the proposed method over the state-of-the-art unsupervised learning models on six benchmarks including both coarse-grained and fine-grained object image categorisation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11567

PDF

http://arxiv.org/pdf/1904.11567
Read All
Pedestrian Collision Avoidance System for Scenarios with Occlusions

2019-04-25

Markus Schratter, Maxime Bouton, Mykel J. Kochenderfer, Daniel Watzenig

arXiv_AI

arXiv_AI
Abstract

Safe autonomous driving in urban areas requires robust algorithms to avoid collisions with other traffic participants with limited perception ability. Current deployed approaches relying on Autonomous Emergency Braking (AEB) systems are often overly conservative. In this work, we formulate the problem as a partially observable Markov decision process (POMDP), to derive a policy robust to uncertainty in the pedestrian location. We investigate how to integrate such a policy with an AEB system that operates only when a collision is unavoidable. In addition, we propose a rigorous evaluation methodology on a set of well defined scenarios. We show that combining the two approaches provides a robust autonomous braking system that reduces unnecessary braking caused by using the AEB system on its own.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11566

PDF

http://arxiv.org/pdf/1904.11566
Read All
Neural Text Generation from Rich Semantic Representations

2019-04-25

Valerie Hajdik, Jan Buys, Michael W. Goodman, Emily M. Bender

arXiv_CL

arXiv_CL Text_Generation
Abstract

We propose neural models to generate high-quality text from structured representations based on Minimal Recursion Semantics (MRS). MRS is a rich semantic representation that encodes more precise semantic detail than other representations such as Abstract Meaning Representation (AMR). We show that a sequence-to-sequence model that maps a linearization of Dependency MRS, a graph-based representation of MRS, to English text can achieve a BLEU score of 66.11 when trained on gold data. The performance can be improved further using a high-precision, broad coverage grammar-based parser to generate a large silver training corpus, achieving a final BLEU score of 77.17 on the full test set, and 83.37 on the subset of test data most closely matching the silver data domain. Our results suggest that MRS-based representations are a good choice for applications that need both structured semantics and the ability to produce natural language text as output.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11564

PDF

http://arxiv.org/pdf/1904.11564
Read All
Machine Learning For Distributed Acoustic Sensors, Classic versus Image and Deep Neural Networks Approach

2019-04-25

Mugdim Bublin

arXiv_SD

arXiv_SD Deep_Learning Detection
Abstract

Distributed Acoustic Sensing (DAS) using fiber optic cables is a promising new technology for pipeline monitoring and protection. In this work, we applied and compared two approaches for event detection using DAS: Classic machine learning approach and the approach based on image processing and deep learning. Although with both approaches acceptable performance can be achieved, the preliminary results show that image based deep learning is more promising approach, offering six times lower event detection delay and twelve times lower execution time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11546

PDF

http://arxiv.org/pdf/1904.11546
Read All
Disentangling Latent Hands for Image Synthesis and Pose Estimation

2019-04-25

Linlin Yang, Angela Yao

arXiv_CV

arXiv_CV Pose_Estimation Inference
Abstract

Hand image synthesis and pose estimation from RGB images are both highly challenging tasks due to the large discrepancy between factors of variation ranging from image background content to camera viewpoint. To better analyze these factors of variation, we propose the use of disentangled representations and a disentangled variational autoencoder (dVAE) that allows for specific sampling and inference of these factors. The derived objective from the variational lower bound as well as the proposed training strategy are highly flexible, allowing us to handle cross-modal encoders and decoders as well as semi-supervised learning scenarios. Experiments show that our dVAE can synthesize highly realistic images of the hand specifiable by both pose and image background content and also estimate 3D hand poses from RGB images with accuracy competitive with state-of-the-art on two public benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.01002

PDF

http://arxiv.org/pdf/1812.01002
Read All
Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

2019-04-25

Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, R. Thomas McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, Ellie Pavlick

arXiv_CL

arXiv_CL Knowledge Inference Language_Model
Abstract

We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeling, CCG supertagging and natural language inference (NLI)) on the learned representations. Our results show that pretraining on CCG—our most syntactic objective—performs the best on average across our probing tasks, suggesting that syntactic knowledge helps function word comprehension. Language modeling also shows strong performance, supporting its widespread use for pretraining state-of-the-art NLP models. Overall, no pretraining objective dominates across the board, and our function word probing tasks highlight several intuitive differences between pretraining objectives, e.g., that NLI helps the comprehension of negation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11544

PDF

http://arxiv.org/pdf/1904.11544
Read All
Face Video Generation from a Single Image and Landmarks

2019-04-25

Kritaphat Songsri-in, Stefanos Zafeiriou

arXiv_CV

arXiv_CV Sparse GAN Face CNN
Abstract

In this paper we are concerned with the challenging problem of producing a full image sequence of a deformable face given only an image and generic facial motions encoded by a set of sparse landmarks. To this end we build upon recent breakthroughs in image-to-image translation such as pix2pix, CycleGAN and StarGAN which learn Deep Convolutional Neural Networks (DCNNs) that learn to map aligned pairs or images between different domains (i.e., having different labels) and propose a new architecture which is not driven any more by labels but by spatial maps, facial landmarks. In particular, we propose the MotionGAN which transforms an input face image into a new one according to a heatmap of target landmarks. We show that it is possible to create very realistic face videos using a single image and a set of target landmarks. Furthermore, our method can be used to edit a facial image with arbitrary motions according to landmarks (e.g., expression, speech, etc.). This provides much more flexibility to face editing, expression transfer, facial video creation, etc. than models based on discrete expressions, audios or action units.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11521

PDF

http://arxiv.org/pdf/1904.11521
Read All
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

2019-04-25

Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu

arXiv_AI

arXiv_AI Recognition
Abstract

The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by non-local network are almost the same for different query positions within an image. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further observe that this simplified design shares similar structure with Squeeze-Excitation Network (SENet). Hence we unify them into a three-step general framework for global context modeling. Within the general framework, we design a better instantiation, called the global context (GC) block, which is lightweight and can effectively model the global context. The lightweight property allows us to apply it for multiple layers in a backbone network to construct a global context network (GCNet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks. The code and configurations are released at https://github.com/xvjiarui/GCNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11492

PDF

http://arxiv.org/pdf/1904.11492
Read All
Local Relation Networks for Image Recognition

2019-04-25

Han Hu, Zheng Zhang, Zhenda Xie, Stephen Lin

arXiv_AI

arXiv_AI Inference Classification Relation Recognition
Abstract

The convolution layer has been the dominant feature extractor in computer vision for years. However, the spatial aggregation in convolution is basically a pattern matching process that applies fixed filters which are inefficient at modeling visual elements with varying spatial distributions. This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs. With this relational approach, it can composite visual elements into higher-level entities in a more efficient manner that benefits semantic inference. A network built with local relation layers, called the Local Relation Network (LR-Net), is found to provide greater modeling capacity than its counterpart built with regular convolution on large-scale recognition tasks such as ImageNet classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11491

PDF

http://arxiv.org/pdf/1904.11491
Read All
RepPoints: Point Set Representation for Object Detection

2019-04-25

Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, Stephen Lin

arXiv_CV

arXiv_CV Object_Detection Prediction Detection Recognition
Abstract

Modern object detectors rely heavily on rectangular bounding boxes, such as anchors, proposals and the final predictions, to represent objects at various recognition stages. The bounding box is convenient to use but provides only a coarse localization of objects and leads to a correspondingly coarse extraction of object features. In this paper, we present \textbf{RepPoints} (representative points), a new finer representation of objects as a set of sample points useful for both localization and recognition. Given ground truth localization and recognition targets for training, RepPoints learn to automatically arrange themselves in a manner that bounds the spatial extent of an object and indicates semantically significant local areas. They furthermore do not require the use of anchors to sample a space of bounding boxes. We show that an anchor-free object detector based on RepPoints, implemented without multi-scale training and testing, can be as effective as state-of-the-art anchor-based detection methods, with 42.8 AP and 65.0 $AP_{50}$ on the COCO test-dev detection benchmark.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11490

PDF

http://arxiv.org/pdf/1904.11490
Read All
Spatial-Temporal Relation Networks for Multi-Object Tracking

2019-04-25

Jiarui Xu, Yue Cao, Zheng Zhang, Han Hu

arXiv_CV

arXiv_CV Tracking Object_Tracking Detection Relation
Abstract

Recent progress in multiple object tracking (MOT) has shown that a robust similarity score is key to the success of trackers. A good similarity score is expected to reflect multiple cues, e.g. appearance, location, and topology, over a long period of time. However, these cues are heterogeneous, making them hard to be combined in a unified network. As a result, existing methods usually encode them in separate networks or require a complex training approach. In this paper, we present a unified framework for similarity measurement which could simultaneously encode various cues and perform reasoning across both spatial and temporal domains. We also study the feature representation of a tracklet-object pair in depth, showing a proper design of the pair features can well empower the trackers. The resulting approach is named spatial-temporal relation networks (STRN). It runs in a feed-forward way and can be trained in an end-to-end manner. The state-of-the-art accuracy was achieved on all of the MOT15-17 benchmarks using public detection and online settings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11489

PDF

http://arxiv.org/pdf/1904.11489
Read All
Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields

2019-04-25

Evan Shelhamer, Dequan Wang, Trevor Darrell

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Inference
Abstract

The visual world is vast and varied, but its variations divide into structured and unstructured factors. We compose free-form filters and structured Gaussian filters, optimized end-to-end, to factorize deep representations and learn both local features and their degree of locality. Our semi-structured composition is strictly more expressive than free-form filtering, and changes in its structured parameters would require changes in free-form architecture. In effect this optimizes over receptive field size and shape, tuning locality to the data and task. Dynamic inference, in which the Gaussian structure varies with the input, adapts receptive field size to compensate for local scale variation. Optimizing receptive field size improves semantic segmentation accuracy on Cityscapes by 1-2 points for strong dilated and skip architectures and by up to 10 points for suboptimal designs. Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11487

PDF

http://arxiv.org/pdf/1904.11487
Read All
Making Convolutional Networks Shift-Invariant Again

2019-04-25

Richard Zhang

arXiv_CV

arXiv_CV Regularization CNN Image_Classification Classification
Abstract

Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks leads to performance degradation; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling. The technique is general and can be incorporated across layer types and applications, such as image classification and conditional image generation. In addition to increased shift-invariance, we also observe, surprisingly, that anti-aliasing boosts accuracy in ImageNet classification, across several commonly-used architectures. This indicates that anti-aliasing serves as effective regularization. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks. Code and anti-aliased versions of popular networks will be made available at \url{https://richzhang.github.io/antialiased-cnns/} .

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11486

PDF

http://arxiv.org/pdf/1904.11486
Read All
Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments

2019-04-25

Maxime Bouton, Alireza Nakhaei, Kikuo Fujimura, Mykel J. Kochenderfer

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants. We propose a modular decision making algorithm to autonomously navigate intersections, addressing challenges of existing rule-based and reinforcement learning (RL) approaches. We first present a safe RL algorithm relying on a model-checker to ensure safety guarantees. To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach. Finally, we use a scene decomposition approach to scale our algorithm to environments with multiple traffic participants. We empirically demonstrate that our algorithm outperforms rule-based methods and reinforcement learning techniques on a complex intersection scenario.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11483

PDF

http://arxiv.org/pdf/1904.11483
Read All
Radar-only ego-motion estimation in difficult settings via graph matching

2019-04-25

Sarah H. Cen, Paul Newman

arXiv_CV

arXiv_CV Optimization
Abstract

Radar detects stable, long-range objects under variable weather and lighting conditions, making it a reliable and versatile sensor well suited for ego-motion estimation. In this work, we propose a radar-only odometry pipeline that is highly robust to radar artifacts (e.g., speckle noise and false positives) and requires only one input parameter. We demonstrate its ability to adapt across diverse settings, from urban UK to off-road Iceland, achieving a scan matching accuracy of approximately 5.20 cm and 0.0929 deg when using GPS as ground truth (compared to visual odometry’s 5.77 cm and 0.1032 deg). We present algorithms for keypoint extraction and data association, framing the latter as a graph matching optimization problem, and provide an in-depth system analysis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11476

PDF

http://arxiv.org/pdf/1904.11476
Read All
Importance of Copying Mechanism for News Headline Generation

2019-04-25

Ilya Gusev

arXiv_AI

arXiv_AI Summarization
Abstract

News headline generation is an essential problem of text summarization because it is constrained, well-defined, and is still hard to solve. Models with a limited vocabulary can not solve it well, as new named entities can appear regularly in the news and these entities often should be in the headline. News articles in morphologically rich languages such as Russian require model modifications due to a large number of possible word forms. This study aims to validate that models with a possibility of copying words from the original article performs better than models without such an option. The proposed model achieves a mean ROUGE score of 23 on the provided test dataset, which is 8 points greater than the result of a similar model without a copying mechanism. Moreover, the resulting model performs better than any known model on the new dataset of Russian news.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11475

PDF

http://arxiv.org/pdf/1904.11475
Read All
Terminologies augmented recurrent neural network model for clinical named entity recognition

2019-04-25

Ivan Lerner, Nicolas Paris, Xavier Tannier

arXiv_CL

arXiv_CL Prediction Recognition
Abstract

We aimed to enhance the performance of a supervised model for clinical named-entity recognition (NER) using medical terminologies. In order to evaluate our system in French, we built a corpus for 5 types of clinical entities. We used a terminology-based system as baseline, built upon UMLS and SNOMED. Then, we evaluated a biGRU-CRF, and an hybrid system using the prediction of the terminology-based system as feature for the biGRU-CRF. In English, we evaluated the NER systems on the i2b2-2009 Medication Challenge for Drug name recognition, which contained 8,573 entities for 268 documents. In French, we built APcNER, a corpus of 147 documents annotated for 5 entities (drug name, sign or symptom, disease or disorder, diagnostic procedure or lab test and therapeutic procedure). We evaluated each NER systems using exact and partial match definition of F-measure for NER. The APcNER contains 4,837 entities which took 28 hours to annotate, the inter-annotator agreement was acceptable for Drug name in exact match (85%) and acceptable for other entity types in non-exact match (>70%). For drug name recognition on both i2b2-2009 and APcNER, the biGRU-CRF performed better that the terminology-based system, with an exact-match F-measure of 91.1% versus 73% and 81.9% versus 75% respectively. Moreover, the hybrid system outperformed the biGRU-CRF, with an exact-match F-measure of 92.2% versus 91.1% (i2b2-2009) and 88.4% versus 81.9% (APcNER). On APcNER corpus, the micro-average F-measure of the hybrid system on the 5 entities was 69.5% in exact match, and 84.1% in non-exact match. APcNER is a French corpus for clinical-NER of five type of entities which covers a large variety of document types. Extending supervised model with terminology allowed for an easy performance gain, especially in low regimes of entities, and established state of the art results on the i2b2-2009 corpus.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11473

PDF

http://arxiv.org/pdf/1904.11473
Read All
The Mutex Watershed and its Objective: Efficient, Parameter-Free Image Partitioning

2019-04-25

Steffen Wolf, Alberto Bailoni, Constantin Pape, Nasim Rahaman, Anna Kreshuk, Ullrich Köthe, Fred A. Hamprecht

arXiv_CV

arXiv_CV Segmentation Relation
Abstract

Image partitioning, or segmentation without semantics, is the task of decomposing an image into distinct segments, or equivalently to detect closed contours. Most prior work either requires seeds, one per segment; or a threshold; or formulates the task as multicut / correlation clustering, an NP-hard problem. Here, we propose a greedy algorithm for signed graph partitioning, the “Mutex Watershed”. Unlike seeded watershed, the algorithm can accommodate not only attractive but also repulsive cues, allowing it to find a previously unspecified number of segments without the need for explicit seeds or a tunable threshold. We also prove that this simple algorithm solves to global optimality an objective function that is intimately related to the multicut / correlation clustering integer linear programming formulation. The algorithm is deterministic, very simple to implement, and has empirically linearithmic complexity. When presented with short-range attractive and long-range repulsive cues from a deep neural network, the Mutex Watershed gives the best results currently known for the competitive ISBI 2012 EM segmentation benchmark.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12654

PDF

http://arxiv.org/pdf/1904.12654
Read All
The Zero Resource Speech Challenge 2019: TTS without T

2019-04-25

Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

arXiv_CL

arXiv_CL
Abstract

We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery dataset) and align them to the voice recordings in a way that works best for the purpose of synthesizing novel utterances from novel speakers, similar to the target speaker’s voice. We describe the metrics used for evaluation, a baseline system consisting of unsupervised subword unit discovery plus a standard TTS system, and a topline TTS using gold phoneme transcriptions. We present an overview of the 19 submitted systems from 11 teams and discuss the main results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11469

PDF

http://arxiv.org/pdf/1904.11469
Read All
Autonomous Driving in Reality with Reinforcement Learning and Image Translation

2019-04-25

Nayun Xu, Bowen Tan, Bingyu Kong

arXiv_CV

arXiv_CV Segmentation Reinforcement_Learning Semantic_Segmentation
Abstract

Supervised learning is widely used in training autonomous driving vehicle. However, it is trained with large amount of supervised labeled data. Reinforcement learning can be trained without abundant labeled data, but we cannot train it in reality because it would involve many unpredictable accidents. Nevertheless, training an agent with good performance in virtual environment is relatively much easier. Because of the huge difference between virtual and real, how to fill the gap between virtual and real is challenging. In this paper, we proposed a novel framework of reinforcement learning with image semantic segmentation network to make the whole model adaptable to reality. The agent is trained in TORCS, a car racing simulator.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.05299

PDF

http://arxiv.org/pdf/1801.05299
Read All
Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation

2019-04-25

Gregory P. Meyer, Jake Charland, Darshan Hegde, Ankit Laddha, Carlos Vallespi-Gonzalez

arXiv_CV

arXiv_CV Object_Detection Segmentation Semantic_Segmentation Detection
Abstract

In this paper, we present an extension to LaserNet, an efficient and state-of-the-art LiDAR based 3D object detector. We propose a method for fusing image data with the LiDAR data and show that this sensor fusion method improves the detection performance of the model especially at long ranges. The addition of image data is straightforward and does not require image labels. Furthermore, we expand the capabilities of the model to perform 3D semantic segmentation in addition to 3D object detection. On a large benchmark dataset, we demonstrate our approach achieves state-of-the-art performance on both object detection and semantic segmentation while maintaining a low runtime.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11466

PDF

http://arxiv.org/pdf/1904.11466
Read All
Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

2019-04-25

Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu

arXiv_AI

arXiv_AI Reinforcement_Learning Relation
Abstract

Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms. We study the learning dynamics of reinforcement learning (RL), specifically a characteristic coupling between learning and data generation that arises because RL agents control their future data distribution. In the presence of function approximation, this coupling can lead to a problematic type of ‘ray interference’, characterized by learning dynamics that sequentially traverse a number of performance plateaus, effectively constraining the agent to learn one thing at a time even when learning in parallel is better. We establish the conditions under which ray interference occurs, show its relation to saddle points and obtain the exact learning dynamics in a restricted setting. We characterize a number of its properties and discuss possible remedies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11455

PDF

http://arxiv.org/pdf/1904.11455
Read All
Reward-Based Deception with Cognitive Bias

2019-04-25

Bo Wu, Murat Cubuktepe, Suda Bharadwaj, Ufuk Topcu

arXiv_AI

arXiv_AI Adversarial
Abstract

Deception plays a key role in adversarial or strategic interactions for the purpose of self-defence and survival. This paper introduces a general framework and solution to address deception. Most existing approaches for deception consider obfuscating crucial information to rational adversaries with abundant memory and computation resources. In this paper, we consider deceiving adversaries with bounded rationality and in terms of expected rewards. This problem is commonly encountered in many applications especially involving human adversaries. Leveraging the cognitive bias of humans in reward evaluation under stochastic outcomes, we introduce a framework to optimally assign resources of a limited quantity to optimally defend against human adversaries. Modeling such cognitive biases follows the so-called prospect theory from behavioral psychology literature. Then we formulate the resource allocation problem as a signomial program to minimize the defender’s cost in an environment modeled as a Markov decision process. We use police patrol hour assignment as an illustrative example and provide detailed simulation results based on real-world data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11454

PDF

http://arxiv.org/pdf/1904.11454
Read All
Holistic Large Scale Video Understanding

2019-04-25

Ali Diba, Mohsen Fayyaz, Vivek Sharma, Manohar Paluri, Jurgen Gall, Rainer Stiefelhagen, Luc Van Gool

arXiv_CV

arXiv_CV Video_Caption GAN Action_Recognition Recognition
Abstract

Action recognition has been advanced in recent years by benchmarks with rich annotations. However, research is still mainly limited to human action or sports recognition - focusing on a highly specific video understanding task and thus leaving a significant gap towards describing the overall content of a video. We fill in this gap by presenting a large-scale “Holistic Video Understanding Dataset”~(HVU). HVU is organized hierarchically in a semantic taxonomy that focuses on multi-label and multi-task video understanding as a comprehensive problem that encompasses the recognition of multiple semantic aspects in the dynamic scene. HVU contains approx.~577k videos in total with 13M annotations for training and validation set spanning over {4378} classes. HVU encompasses semantic aspects defined on categories of scenes, objects, actions, events, attributes and concepts, which naturally captures the real-world scenarios. Further, we introduce a new spatio-temporal deep neural network architecture called “Holistic Appearance and Temporal Network”~(HATNet) that builds on fusing 2D and 3D architectures into one by combining intermediate representations of appearance and temporal cues. HATNet focuses on the multi-label and multi-task learning problem and is trained in an end-to-end manner. The experiments show that HATNet trained on HVU outperforms current state-of-the-art methods on challenging human action datasets: HMDB51, UCF101, and Kinetics. The dataset and codes will be made publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11451

PDF

http://arxiv.org/pdf/1904.11451
Read All
Faster and More Accurate Learning with Meta Trace Adaptation

2019-04-25

Mingde Zhao, Ian Porada

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Learning speed and accuracy are of universal interest for reinforcement learning problems. In this paper, we investigate meta-learning approaches for adaptation of the trace decay parameter {\lambda} used in TD({\lambda}), from the perspective of optimizing a bias-variance tradeoff. We propose an off-policy applicable method of meta-learning the {\lambda} parameters via optimizing a metaobjective with effcient incremental updates. The proposed trust-region style algorithm, under proper assumptions, is shown to be equivalent to optimizing the bias-variance tradeoff for the overall target for all states. In experiments, we validate the effectiveness of the proposed method MTA showing its significantly faster and more accurate learning patterns compared to the compared methods and baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11439

PDF

http://arxiv.org/pdf/1904.11439
Read All
A pressure field model for fast, robust approximation of net contact force and moment between nominally rigid objects

2019-04-25

Ryan Elandt, Evan Drumwright, Michael Sherman, Andy Ruina

arXiv_RO

arXiv_RO Face
Abstract

We introduce an approximate model for predicting the net contact wrench between nominally rigid objects for use in simulation, control, and state estimation. The model combines and generalizes two ideas: a bed of springs (an “elastic foundation”) and hydrostatic pressure. In this model, continuous pressure fields are computed offline for the interior of each nominally rigid object. Unlike hydrostatics or elastic foundations, the pressure fields need not satisfy mechanical equilibrium conditions. When two objects nominally overlap, a contact surface is defined where the two pressure fields are equal. This static pressure is supplemented with a dissipative rate-dependent pressure and friction to determine tractions on the contact surface. The contact wrench between pairs of objects is an integral of traction contributions over this surface. The model evaluates much faster than elasticity-theory models, while showing the essential trends of force, moment, and stiffness increase with contact load. It yields continuous wrenches even for non-convex objects and coarse meshes. The method shows promise as sufficiently fast, accurate, and robust for design-in-simulation of robot controllers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11433

PDF

http://arxiv.org/pdf/1904.11433
Read All
Assistive System in Conversational Agent for Health Coaching: The CoachAI Approach

2019-04-25

Ahmed Fadhil

arXiv_AI

arXiv_AI QA Recommendation
Abstract

With increasing physicians’ workload and patients’ needs for care, there is a need for technology that facilitates physicians work and performs continues follow-up with patients. Existing approaches focus merely on improving patient’s condition, and none have considered managing physician’s workload. This paper presents an initial evaluation of a conversational agent assisted coaching platform intended to manage physicians’ fatigue and provide continuous follow-up to patients. We highlight the approach adapted to build the chatbot dialogue and the coaching platform. We will particularly discuss the activity recommender algorithms used to suggest insights about patients’ condition and activities based on previously collected data. The paper makes three contributions: (1) present the conversational agent as an assistive virtual coach, (2) decrease physicians workload and continuous follow up with patients, all by handling some repetitive physician tasks and performing initial follow up with the patient, (3) present the activity recommender that tracks previous activities and patient information and provides useful insights about possible activity and patient match to the coach. Future work focuses on integrating the recommender model with the CoachAI platform and test the prototype with patient’s in collaboration with an ambulatory clinic.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11412

PDF

http://arxiv.org/pdf/1904.11412
Read All
Task-based End-to-end Model Learning in Stochastic Optimization

2019-04-25

Priya L. Donti, Brandon Amos, J. Zico Kolter

arXiv_AI

arXiv_AI Optimization Prediction
Abstract

With the increasing popularity of machine learning techniques, it has become common to see prediction algorithms operating within some larger process. However, the criteria by which we train these algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming. We present three experimental evaluations of the proposed approach: a classical inventory stock problem, a real-world electrical grid scheduling task, and a real-world energy storage arbitrage task. We show that the proposed approach can outperform both traditional modeling and purely black-box policy optimization approaches in these applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1703.04529

PDF

http://arxiv.org/pdf/1703.04529
Read All
DynamoNet: Dynamic Action and Motion Network

2019-04-25

Ali Diba, Vivek Sharma, Luc Van Gool, Rainer Stiefelhagen

arXiv_CV

arXiv_CV Action_Recognition CNN Video_Classification Classification Prediction Recognition
Abstract

In this paper, we are interested in self-supervised learning the motion cues in videos using dynamic motion filters for a better motion representation to finally boost human action recognition in particular. Thus far, the vision community has focused on spatio-temporal approaches using standard filters, rather we here propose dynamic filters that adaptively learn the video-specific internal motion representation by predicting the short-term future frames. We name this new motion representation, as dynamic motion representation (DMR) and is embedded inside of 3D convolutional network as a new layer, which captures the visual appearance and motion dynamics throughout entire video clip via end-to-end network learning. Simultaneously, we utilize these motion representation to enrich video classification. We have designed the frame prediction task as an auxiliary task to empower the classification problem. With these overall objectives, to this end, we introduce a novel unified spatio-temporal 3D-CNN architecture (DynamoNet) that jointly optimizes the video classification and learning motion representation by predicting future frames as a multi-task learning problem. We conduct experiments on challenging human action datasets: Kinetics 400, UCF101, HMDB51. The experiments using the proposed DynamoNet show promising results on all the datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11407

PDF

http://arxiv.org/pdf/1904.11407
Read All
Deep Constrained Dominant Sets for Person Re-identification

2019-04-25

Leulseged Tesfaye Alemu, Mubarak Shah, Marcello Pelillo

arXiv_CV

arXiv_CV Image_Retrieval Re-identification Knowledge Person_Re-identification Optimization
Abstract

In this work, we propose an end-to-end constrained clustering scheme to tackle the person re-identification (re-id) problem. Deep neural networks (DNN) have recently proven to be effective on person re-identification task. In particular, rather than leveraging solely a probe-gallery similarity, diffusing the similarities among the gallery images in an end-to-end manner has proven to be effective in yielding a robust probe-gallery affinity. However, existing methods do not apply probe image as a constraint, and are prone to noise propagation during the similarity diffusion process. To overcome this, we propose an intriguing scheme which treats person-image retrieval problem as a {\em constrained clustering optimization} problem, called deep constrained dominant sets (DCDS). Given a probe and gallery images, we re-formulate person re-id problem as finding a constrained cluster, where the probe image is taken as a constraint (seed) and each cluster corresponds to a set of images corresponding to the same person. By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images. We further enhance the performance by integrating an auxiliary net alongside DCDS, which employs a multi-scale Resnet. To validate the effectiveness of our method we present experiments on several benchmark datasets and show that the proposed method can outperform state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11397

PDF

http://arxiv.org/pdf/1904.11397
Read All
Fully Dense UNet for 2D Sparse Photoacoustic Tomography Artifact Removal

2019-04-25

Steven Guan, Amir Khan, Siddhartha Sikdar, Parag V. Chitnis

arXiv_CV

arXiv_CV Object_Detection Sparse Face CNN Detection
Abstract

Photoacoustic imaging is an emerging imaging modality that is based upon the photoacoustic effect. In photoacoustic tomography (PAT), the induced acoustic pressure waves are measured by an array of detectors and used to reconstruct an image of the initial pressure distribution. A common challenge faced in PAT is that the measured acoustic waves can only be sparsely sampled. Reconstructing sparsely sampled data using standard methods results in severe artifacts that obscure information within the image. We propose a modified convolutional neural network (CNN) architecture termed Fully Dense UNet (FD-UNet) for removing artifacts from 2D PAT images reconstructed from sparse data and compare the proposed CNN with the standard UNet in terms of reconstructed image quality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.10848

PDF

http://arxiv.org/pdf/1808.10848
Read All
Dual-Arm In-Hand Manipulation and Regrasping Using Dexterous Manipulation Graphs

2019-04-25

Silvia Cruciani, Kaiyu Hang, Christian Smith, Danica Kragic

arXiv_RO

arXiv_RO Face
Abstract

This work focuses on the problem of in-hand manipulation and regrasping of objects with parallel grippers. We propose Dexterous Manipulation Graph (DMG) as a representation on which we define planning for in-hand manipulation and regrasping. The DMG is a disconnected undirected graph that represents the possible motions of a finger along the object’s surface. We formulate the in-hand manipulation and regrasping problem as a graph search problem from the initial to the final configuration. The resulting plan is a sequence of coordinated in-hand pushing and regrasping movements. We propose a dual-arm system for the execution of the sequence where both hands are used interchangeably. We demonstrate our approach on an ABB Yumi robot tasked with different grasp reconfigurations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11382

PDF

http://arxiv.org/pdf/1904.11382
Read All
Decentralized Multi-Task Learning Based on Extreme Learning Machines

2019-04-25

Yu Ye, Ming Xiao, Mikael Skoglund

arXiv_AI

arXiv_AI Optimization
Abstract

In multi-task learning (MTL), related tasks learn jointly to improve generalization performance. To exploit the high learning speed of extreme learning machines (ELMs), we apply the ELM framework to the MTL problem, where the output weights of ELMs for all the tasks are learned collaboratively. We first present the ELM based MTL problem in the centralized setting, which is solved by the proposed MTL-ELM algorithm. Due to the fact that many data sets of different tasks are geo-distributed, decentralized machine learning is studied. We formulate the decentralized MTL problem based on ELM as majorized multi-block optimization with coupled bi-convex objective functions. To solve the problem, we propose the DMTL-ELM algorithm, which is a hybrid Jacobian and Gauss-Seidel Proximal multi-block alternating direction method of multipliers (ADMM). Further, to reduce the computation load of DMTL-ELM, DMTL-ELM with first-order approximation (FO-DMTL-ELM) is presented. Theoretical analysis shows that the convergence to the stationary point of DMTL-ELM and FO-DMTL-ELM can be guaranteed conditionally. Through simulations, we demonstrate the convergence of proposed MTL-ELM, DMTL-ELM, and FO-DMTL-ELM algorithms, and also show that they can outperform existing MTL methods. Moreover, by adjusting the dimension of hidden feature space, there exists a trade-off between communication load and learning accuracy for DMTL-ELM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11366

PDF

http://arxiv.org/pdf/1904.11366
Read All
Breast Cancer Classification with Ultrasound Images Based on SLIC

2019-04-25

Zhihao Fang, Wanyi Zhang, He Ma

arXiv_CV

arXiv_CV Classification
Abstract

Ultrasound image diagnosis of breast tumors has been widely used in recent years. However, there are some problems of it, for instance, poor quality, intense noise and uneven echo distribution, which has created a huge obstacle to diagnosis. To overcome these problems, we propose a novel method, a breast cancer classification with ultrasound images based on SLIC (BCCUI). We first utilize the Region of Interest (ROI) extraction based on Simple Linear Iterative Clustering (SLIC) algorithm and region growing algorithm to extract the ROI at the super-pixel level. Next, the features of ROI are extracted. Furthermore, the Support Vector Machine (SVM) classifier is applied. The calculation states that the accuracy of this segment algorithm is up to 88.00% and the sensitivity of the algorithm is up to 92.05%, which proves that the classifier presents in this paper has certain research meaning and applied worthiness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11322

PDF

http://arxiv.org/pdf/1904.11322
Read All
Unsupervised deep learning for Bayesian brain MRI segmentation

2019-04-25

Adrian V. Dalca, Evan Yu, Polina Golland, Bruce Fischl, Mert R. Sabuncu, Juan Eugenio Iglesias

arXiv_CV

arXiv_CV Segmentation Deep_Learning
Abstract

Probabilistic atlas priors have been commonly used to derive adaptive and robust brain MRI segmentation algorithms. Widely-used neuroimage analysis pipelines rely heavily on these techniques, which are often computationally expensive. In contrast, there has been a recent surge of approaches that leverage deep learning to implement segmentation tools that are computationally efficient at test time. However, most of these strategies rely on learning from manually annotated images. These supervised deep learning methods are therefore sensitive to the intensity profiles in the training dataset. To develop a deep learning-based segmentation model for a new image dataset (e.g., of different contrast), one usually needs to create a new labeled training dataset, which can be prohibitively expensive, or rely on suboptimal ad hoc adaptation or augmentation approaches. In this paper, we propose an alternative strategy that combines a conventional probabilistic atlas-based segmentation with deep learning, enabling one to train a segmentation model for new MRI scans without the need for any manually segmented images. Our experiments include thousands of brain MRI scans and demonstrate that the proposed method achieves good accuracy for a brain MRI segmentation task for different MRI contrasts, requiring only approximately 15 seconds at test time on a GPU. The code is freely available at this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11319

PDF

http://arxiv.org/pdf/1904.11319
Read All
JPEG XT Image Compression with Hue Compensation for Two-Layer HDR Coding

2019-04-25

Hiroyuki Kobayashi, Hitoshi Kiya

arXiv_CV

arXiv_CV
Abstract

We propose a novel JPEG XT image compression with hue compensation for two-layer HDR coding. LDR images produced from JPEG XT bitstreams have some distortion in hue due to tone mapping operations. In order to suppress the color distortion, we apply a novel hue compensation method based on the maximally saturated colors. Moreover, the bitstreams generated by using the proposed method are fully compatible with the JPEG XT standard. In an experiment, the proposed method is demonstrated not only to produce images with small hue degradation but also to maintain well-mapped luminance, in terms of three kinds of criterion: TMQI, hue value in CIEDE2000, and the maximally saturated color on the constant-hue plane.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11315

PDF

http://arxiv.org/pdf/1904.11315
Read All
Multi-scale Cross-form Pyramid Network for Stereo Matching

2019-04-25

Zhidong Zhu, Mingyi He, Yuchao Dai, Zhibo Rao, Bo Li

arXiv_CV

arXiv_CV Deep_Learning
Abstract

Stereo matching plays an indispensable part in autonomous driving, robotics and 3D scene reconstruction. We propose a novel deep learning architecture, which called CFP-Net, a Cross-Form Pyramid stereo matching network for regressing disparity from a rectified pair of stereo images. The network consists of three modules: Multi-Scale 2D local feature extraction module, Cross-form spatial pyramid module and Multi-Scale 3D Feature Matching and Fusion module. The Multi-Scale 2D local feature extraction module can extract enough multi-scale features. The Cross-form spatial pyramid module aggregates the context information in different scales and locations to form a cost volume. Moreover, it is proved to be more effective than SPP and ASPP in ill-posed regions. The Multi-Scale 3D feature matching and fusion module is proved to regularize the cost volume using two parallel 3D deconvolution structure with two different receptive fields. Our proposed method has been evaluated on the Scene Flow and KITTI datasets. It achieves state-of-the-art performance on the KITTI 2012 and 2015 benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11309

PDF

http://arxiv.org/pdf/1904.11309
Read All
SeFM: A Sequential Feature Point Matching Algorithm for Object 3D Reconstruction

2019-04-25

Zhihao Fang, He Ma, Xuemin Zhu, Xutao Guo, Ruixin Zhou

arXiv_CV

arXiv_CV
Abstract

3D reconstruction is a fundamental issue in many applications and the feature point matching problem is a key step while reconstructing target objects. Conventional algorithms can only find a small number of feature points from two images which is quite insufficient for reconstruction. To overcome this problem, we propose SeFM a sequential feature point matching algorithm. We first utilize the epipolar geometry to find the epipole of each image. Rotating along the epipole, we generate a set of the epipolar lines and reserve those intersecting with the input image. Next, a rough matching phase, followed by a dense matching phase, is applied to find the matching dot-pairs using dynamic programming. Furthermore, we also remove wrong matching dot-pairs by calculating the validity. Experimental results illustrate that SeFM can achieve around 1,000 to 10,000 times matching dot-pairs, depending on individual image, compared to conventional algorithms and the object reconstruction with only two images is semantically visible. Moreover, it outperforms conventional algorithms, such as SIFT and SURF, regarding precision and recall.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.02925

PDF

http://arxiv.org/pdf/1812.02925
Read All
Arabic Text Diacritization Using Deep Neural Networks

2019-04-25

Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh, Mahmoud Al-Ayyoub

arXiv_CL

arXiv_CL Review
Abstract

Diacritization of Arabic text is both an interesting and a challenging problem at the same time with various applications ranging from speech synthesis to helping students learning the Arabic language. Like many other tasks or problems in Arabic language processing, the weak efforts invested into this problem and the lack of available (open-source) resources hinder the progress towards solving this problem. This work provides a critical review for the currently existing systems, measures and resources for Arabic text diacritization. Moreover, it introduces a much-needed free-for-all cleaned dataset that can be easily used to benchmark any work on Arabic diacritization. Extracted from the Tashkeela Corpus, the dataset consists of 55K lines containing about 2.3M words. After constructing the dataset, existing tools and systems are tested on it. The results of the experiments show that the neural Shakkala system significantly outperforms traditional rule-based approaches and other closed-source tools with a Diacritic Error Rate (DER) of 2.88% compared with 13.78%, which the best DER for the non-neural approach (obtained by the Mishkal tool).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01965

PDF

http://arxiv.org/pdf/1905.01965
Read All
Reducing Anomaly Detection in Images to Detection in Noise

2019-04-25

Axel Davy, Thibaud Ehret, Jean-Michel Morel, Mauricio Delbracio

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Anomaly detectors address the difficult problem of detecting automatically exceptions in an arbitrary background image. Detection methods have been proposed by the thousands because each problem requires a different background model. By analyzing the existing approaches, we show that the problem can be reduced to detecting anomalies in residual images (extracted from the target image) in which noise and anomalies prevail. Hence, the general and impossible background modeling problem is replaced by simpler noise modeling, and allows the calculation of rigorous thresholds based on the a contrario detection theory. Our approach is therefore unsupervised and works on arbitrary images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11276

PDF

http://arxiv.org/pdf/1904.11276
Read All
LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup

2019-04-25

Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

arXiv_AI

arXiv_AI Adversarial Style_Transfer
Abstract

We propose a local adversarial disentangling network (LADN) for facial makeup and de-makeup. Central to our method are multiple and overlapping local adversarial discriminators in a content-style disentangling network for achieving local detail transfer between facial images, with the use of asymmetric loss functions for dramatic makeup styles with high-frequency details. Existing techniques do not demonstrate or fail to transfer high-frequency details in a global adversarial setting, or train a single local discriminator only to ensure image structure consistency and thus work only for relatively simple styles. Unlike others, our proposed local adversarial discriminators can distinguish whether the generated local image details are consistent with the corresponding regions in the given reference image in cross-image style transfer in an unsupervised setting. Incorporating these technical contributions, we achieve not only state-of-the-art results on conventional styles but also novel results involving complex and dramatic styles with high-frequency details covering large areas across multiple facial features. A carefully designed dataset of unpaired before and after makeup images will be released.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11272

PDF

http://arxiv.org/pdf/1904.11272
Read All
ExpandNet: Training Compact Networks by Linear Expansion

2019-04-25

Shuxuan Guo, Jose M. Alvarez, Mathieu Salzmann

arXiv_CV

arXiv_CV Object_Detection Segmentation Image_Classification Semantic_Segmentation Classification Detection
Abstract

While very deep networks can achieve great performance, they are ill-suited to applications in resource-constrained environments. In this paper, we introduce a novel approach to training a given compact network from scratch. We propose to expand each linear layer of the compact network into multiple linear layers, without adding any nonlinearity. As such, the resulting expanded network can be compressed back to the compact one algebraically, but, as evidenced by our experiments, consistently outperforms it. In this context, we introduce several expansion strategies, together with an initialization scheme, and demonstrate the benefits of our ExpandNets on several tasks, including image classification on ImageNet, object detection on PASCAL VOC, and semantic segmentation on Cityscapes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.10495

PDF

http://arxiv.org/pdf/1811.10495
Read All
On guiding video object segmentation

2019-04-25

Diego Ortego, Kevin McGuinness, Juan C. SanMiguel, Eric Arazo, José M. Martínez, Noel E. O'Connor

arXiv_CV

arXiv_CV Segmentation Attention CNN
Abstract

This paper presents a novel approach for segmenting moving objects in unconstrained environments using guided convolutional neural networks. This guiding process relies on foreground masks from independent algorithms (i.e. state-of-the-art algorithms) to implement an attention mechanism that incorporates the spatial location of foreground and background to compute their separated representations. Our approach initially extracts two kinds of features for each frame using colour and optical flow information. Such features are combined following a multiplicative scheme to benefit from their complementarity. These unified colour and motion features are later processed to obtain the separated foreground and background representations. Then, both independent representations are concatenated and decoded to perform foreground segmentation. Experiments conducted on the challenging DAVIS 2016 dataset demonstrate that our guided representations not only outperform non-guided, but also recent and top-performing video object segmentation algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11256

PDF

http://arxiv.org/pdf/1904.11256
Read All
Pointing Novel Objects in Image Captioning

2019-04-25

Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

arXiv_CV

arXiv_CV Image_Caption Knowledge Attention Caption RNN Recognition
Abstract

Image captioning has received significant attention with remarkable improvements in recent advances. Nevertheless, images in the wild encapsulate rich knowledge and cannot be sufficiently described with models built on image-caption pairs containing only in-domain objects. In this paper, we propose to address the problem by augmenting standard deep captioning architectures with object learners. Specifically, we present Long Short-Term Memory with Pointing (LSTM-P) — a new architecture that facilitates vocabulary expansion and produces novel objects via pointing mechanism. Technically, object learners are initially pre-trained on available object recognition data. Pointing in LSTM-P then balances the probability between generating a word through LSTM and copying a word from the recognized objects at each time step in decoder stage. Furthermore, our captioning encourages global coverage of objects in the sentence. Extensive experiments are conducted on both held-out COCO image captioning and ImageNet datasets for describing novel objects, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, we obtain an average of 60.9% in F1 score on held-out COCO~dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11251

PDF

http://arxiv.org/pdf/1904.11251
Read All
MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching

2019-04-25

Zhibo Rao, Mingyi He, Yuchao Daia, Zhidong Zhua, Bo Lia, Renjie He

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Prediction Detection
Abstract

Disparity prediction from stereo images is essential to computer vision applications including autonomous driving, 3D model reconstruction, and object detection. To predict accurate disparity map, we propose a novel deep learning architecture for detectingthe disparity map from a rectified pair of stereo images, called MSDC-Net. Our MSDC-Net contains two modules: multi-scale fusion 2D convolution and multi-scale residual 3D convolution modules. The multi-scale fusion 2D convolution module exploits the potential multi-scale features, which extracts and fuses the different scale features by Dense-Net. The multi-scale residual 3D convolution module learns the different scale geometry context from the cost volume which aggregated by the multi-scale fusion 2D convolution module. Experimental results on Scene Flow and KITTI datasets demonstrate that our MSDC-Net significantly outperforms other approaches in the non-occluded region.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12658

PDF

http://arxiv.org/pdf/1904.12658
Read All
Exploring Object Relation in Mean Teacher for Cross-Domain Detection

2019-04-25

Qi Cai, Yingwei Pan, Chong-Wah Ngo, Xinmei Tian, Lingyu Duan, Ting Yao

arXiv_CV

arXiv_CV Regularization Attention Prediction Detection Relation Recognition
Abstract

Rendering synthetic data (e.g., 3D CAD-rendered images) to generate annotations for learning deep models in vision tasks has attracted increasing attention in recent years. However, simply applying the models learnt on synthetic images may lead to high generalization error on real images due to domain shift. To address this issue, recent progress in cross-domain recognition has featured the Mean Teacher, which directly simulates unsupervised domain adaptation as semi-supervised learning. The domain gap is thus naturally bridged with consistency regularization in a teacher-student scheme. In this work, we advance this Mean Teacher paradigm to be applicable for cross-domain detection. Specifically, we present Mean Teacher with Object Relations (MTOR) that novelly remolds Mean Teacher under the backbone of Faster R-CNN by integrating the object relations into the measure of consistency cost between teacher and student modules. Technically, MTOR firstly learns relational graphs that capture similarities between pairs of regions for teacher and student respectively. The whole architecture is then optimized with three consistency regularizations: 1) region-level consistency to align the region-level predictions between teacher and student, 2) inter-graph consistency for matching the graph structures between teacher and student, and 3) intra-graph consistency to enhance the similarity between regions of same class within the graph of student. Extensive experiments are conducted on the transfers across Cityscapes, Foggy Cityscapes, and SIM10k, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, we obtain a new record of single model: 22.8% of mAP on Syn2Real detection dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11245

PDF

http://arxiv.org/pdf/1904.11245
Read All
NeuroPod: a real-time neuromorphic spiking CPG applied to robotics

2019-04-25

Daniel Gutierrez-Galan, Juan Pedro Dominguez-Morales, Fernando Perez-Pena, Alejandro Linares-Barranco

arXiv_RO

arXiv_RO Face
Abstract

Initially, robots were developed with the aim of making our life easier, carrying out repetitive or dangerous tasks for humans. Although they were able to perform these tasks, the latest generation of robots are being designed to take a step further, by performing more complex tasks that have been carried out by smart animals or humans up to date. To this end, inspiration needs to be taken from biological examples. For instance, insects are able to optimally solve complex environment navigation problems, and many researchers have started to mimic how these insects behave. Recent interest in neuromorphic engineering has motivated us to present a real-time, neuromorphic, spike-based Central Pattern Generator of application in neurorobotics, using an arthropod-like robot. A Spiking Neural Network was designed and implemented on SpiNNaker. The network models a complex, online-change capable Central Pattern Generator which generates three gaits for a hexapod robot locomotion. Reconfigurable hardware was used to manage both the motors of the robot and the real-time communication interface with the Spiking Neural Networks. Real-time measurements confirm the simulation results, and locomotion tests show that NeuroPod can perform the gaits without any balance loss or added delay.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11243

PDF

http://arxiv.org/pdf/1904.11243
Read All
Unsupervised label noise modeling and loss correction

2019-04-25

Eric Arazo, Diego Ortego, Paul Albert, Noel E. O'Connor, Kevin McGuinness

arXiv_CV

arXiv_CV CNN Prediction
Abstract

Despite being robust to small amounts of label noise, convolutional neural networks trained with stochastic gradient methods have been shown to easily fit random labels. When there are a mixture of correct and mislabelled targets, networks tend to fit the former before the latter. This suggests using a suitable two-component mixture model as an unsupervised generative model of sample loss values during training to allow online estimation of the probability that a sample is mislabelled. Specifically, we propose a beta mixture to estimate this probability and correct the loss by relying on the network prediction (the so-called bootstrapping loss). We further adapt mixup augmentation to drive our approach a step further. Experiments on CIFAR-10/100 and TinyImageNet demonstrate a robustness to label noise that substantially outperforms recent state-of-the-art. Source code is available at https://git.io/fjsvE

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11238

PDF

http://arxiv.org/pdf/1904.11238
Read All
Dynamic Spatio-temporal Graph-based CNNs for Traffic Prediction

2019-04-25

Menglin Wang, Baisheng Lai, Zhongming Jin, Yufeng Lin, Xiaojin Gong, Jianqiang Huang, Xiansheng Hua

arXiv_CV

arXiv_CV CNN Prediction Relation
Abstract

Accurate traffic forecast is a challenging problem due to the large-scale problem size, as well as the complex and dynamic nature of spatio-temporal dependency of traffic flow. Most existing graph-based CNNs attempt to capture the static relations while largely neglecting the dynamics underlying sequential data. In this paper, we present dynamic spatio-temporal graph-based CNNs (DST-GCNNs) by learning expressive features to represent spatio-temporal structures and predict future traffic from historical traffic flow. In particular, DST-GCNN is a two stream network. In the flow prediction stream, we present a novel graph-based spatio-temporal convolutional layer to extract features from a graph representation of traffic flow. Then several such layers are stacked together to predict future traffic over time. Meanwhile, the proximity relations between nodes in the graph are often time variant as the traffic condition changes over time. To capture the graph dynamics, we use the graph prediction stream to predict the dynamic graph structures, and the predicted structures are fed into the flow prediction stream. Experiments on real traffic datasets demonstrate that the proposed model achieves competitive performances compared with the other state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.02019

PDF

http://arxiv.org/pdf/1812.02019
Read All

54/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL