Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields

2019-04-01

Quang-Hieu Pham, Duc Thanh Nguyen, Binh-Son Hua, Gemma Roig, Sai-Kit Yeung

arXiv_CV

arXiv_CV Segmentation Embedding Semantic_Segmentation Deep_Learning
Abstract

Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised on several tasks in 3D space, e.g., 3D scene understanding. In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds. Specifically, we develop a multi-task pointwise network that simultaneously performs two tasks: predicting the semantic classes of 3D points and embedding the points into high-dimensional vectors so that points of the same object instance are represented by similar embeddings. We then propose a multi-value conditional random field model to incorporate the semantic and instance labels and formulate the problem of semantic and instance segmentation as jointly optimising labels in the field model. The proposed method is thoroughly evaluated and compared with existing methods on different indoor scene datasets including S3DIS and SceneNN. Experimental results showed the robustness of the proposed joint semantic-instance segmentation scheme over its single components. Our method also achieved state-of-the-art performance on semantic segmentation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00699

PDF

http://arxiv.org/pdf/1904.00699
Read All
Dance with Flow: Two-in-One Stream Action Detection

2019-04-01

Jiaojiao Zhao, Cees G.M. Snoek

arXiv_CV

arXiv_CV Detection
Abstract

The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00696

PDF

http://arxiv.org/pdf/1904.00696
Read All
Automatic text summarization: What has been done and what has to be done

2019-04-01

Abdelkrime Aries, Djamel eddine Zegour, Walid Khaled Hidouci

arXiv_CL

arXiv_CL Summarization
Abstract

Summaries are important when it comes to process huge amounts of information. Their most important benefit is saving time, which we do not have much nowadays. Therefore, a summary must be short, representative and readable. Generating summaries automatically can be beneficial for humans, since it can save time and help selecting relevant documents. Automatic summarization and, in particular, Automatic text summarization (ATS) is not a new research field; It was known since the 50s. Since then, researchers have been active to find the perfect summarization method. In this article, we will discuss different works in automatic summarization, especially the recent ones. We will present some problems and limits which prevent works to move forward. Most of these challenges are much more related to the nature of processed languages. These challenges are interesting for academics and developers, as a path to follow in this field.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00688

PDF

http://arxiv.org/pdf/1904.00688
Read All
Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

2019-04-01

Hugo J. Kuijf, J. Matthijs Biesbroek, Jeroen de Bresser, Rutger Heinen, Simon Andermatt, Mariana Bento, Matt Berseth, Mikhail Belyaev, M. Jorge Cardoso, Adrià Casamitjana, D. Louis Collins, Mahsa Dadar, Achilleas Georgiou, Mohsen Ghafoorian, Dakai Jin, April Khademi, Jesse Knight, Hongwei Li, Xavier Lladó, Miguel Luna, Qaiser Mahmood, Richard McKinley, Alireza Mehrtash, Sébastien Ourselin, Bo-yong Park, Hyunjin Park, Sang Hyun Park, Simon Pezold, Elodie Puybareau, Leticia Rittner, Carole H. Sudre, Sergi Valverde, Verónica Vilaplana, Roland Wiest, Yongchao Xu, Ziyue Xu, Guodong Zeng, Jianguo Zhang, Guoyan Zheng, Christopher Chen, Wiesje van der Flier, Frederik Barkhof, Max A. Viergever, Geert Jan Biessels

arXiv_CV

arXiv_CV Segmentation GAN
Abstract

Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their method on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge (https://wmh.isi.uu.nl/). Sixty T1+FLAIR images from three MR scanners were released with manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. Segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: (1) Dice similarity coefficient, (2) modified Hausdorff distance (95th percentile), (3) absolute log-transformed volume difference, (4) sensitivity for detecting individual lesions, and (5) F1-score for individual lesions. Additionally, methods were ranked on their inter-scanner robustness. Twenty participants submitted their method for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00682

PDF

http://arxiv.org/pdf/1904.00682
Read All
End-to-End Time-Lapse Video Synthesis from a Single Outdoor Image

2019-04-01

Seonghyeon Nam, Chongyang Ma, Menglei Chai, William Brendel, Ning Xu, Seon Joo Kim

arXiv_CV

arXiv_CV Adversarial Quantitative Relation
Abstract

Time-lapse videos usually contain visually appealing content but are often difficult and costly to create. In this paper, we present an end-to-end solution to synthesize a time-lapse video from a single outdoor image using deep neural networks. Our key idea is to train a conditional generative adversarial network based on existing datasets of time-lapse videos and image sequences. We propose a multi-frame joint conditional generation framework to effectively learn the correlation between the illumination change of an outdoor scene and the time of the day. We further present a multi-domain training scheme for robust training of our generative models from two datasets with different distributions and missing timestamp labels. Compared to alternative time-lapse video synthesis algorithms, our method uses the timestamp as the control variable and does not require a reference video to guide the synthesis of the final output. We conduct ablation studies to validate our algorithm and compare with state-of-the-art techniques both qualitatively and quantitatively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00680

PDF

http://arxiv.org/pdf/1904.00680
Read All
Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs

2019-04-01

Debjit Paul, Anette Frank

arXiv_CL

arXiv_CL Sentiment Knowledge Attention Face Relation
Abstract

To make machines better understand sentiments, research needs to move from polarity identification to understanding the reasons that underlie the expression of sentiment. Categorizing the goals or needs of humans is one way to explain the expression of sentiment in text. Humans are good at understanding situations described in natural language and can easily connect them to the character’s psychological needs using commonsense knowledge. We present a novel method to extract, rank, filter and select multi-hop relation paths from a commonsense knowledge resource to interpret the expression of sentiment in terms of their underlying human needs. We efficiently integrate the acquired knowledge paths in a neural model that interfaces context representations with knowledge using a gated attention mechanism. We assess the model’s performance on a recently published dataset for categorizing human needs. Selectively integrating knowledge paths boosts performance and establishes a new state-of-the-art. Our model offers interpretability through the learned attention map over commonsense knowledge paths. Human evaluation highlights the relevance of the encoded knowledge.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00676

PDF

http://arxiv.org/pdf/1904.00676
Read All
Deep Built-Structure Counting in Satellite Imagery Using Attention Based Re-Weighting

2019-04-01

Anza Shakeel, Waqas Sultani, Mohsen Ali

arXiv_CV

arXiv_CV Segmentation Attention CNN Deep_Learning
Abstract

In this paper, we attempt to address the challenging problem of counting built-structures in the satellite imagery. Building density is a more accurate estimate of the population density, urban area expansion and its impact on the environment, than the built-up area segmentation. However, building shape variances, overlapping boundaries, and variant densities make this a complex task. To tackle this difficult problem, we propose a deep learning based regression technique for counting built-structures in satellite imagery. Our proposed framework intelligently combines features from different regions of satellite image using attention based re-weighting techniques. Multiple parallel convolutional networks are designed to capture information at different granulates. These features are combined into the FusionNet which is trained to weigh features from different granularity differently, allowing us to predict a precise building count. To train and evaluate the proposed method, we put forward a new large-scale and challenging built-structure-count dataset. Our dataset is constructed by collecting satellite imagery from diverse geographical areas (planes, urban centers, deserts, etc.,) across the globe (Asia, Europe, North America, and Africa) and captures the wide density of built structures. Detailed experimental results and analysis validate the proposed technique. FusionNet has Mean Absolute Error of 3.65 and R-squared measure of 88% over the testing data. Finally, we perform the test on the 274:3 ? 103 m2 of the unseen region, with the error of 19 buildings off the 656 buildings in that area.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00674

PDF

http://arxiv.org/pdf/1904.00674
Read All
Syntactic Interchangeability in Word Embedding Models

2019-04-01

Daniel Hershcovich, Assaf Toledo, Alon Halfon, Noam Slonim

arXiv_CL

arXiv_CL Embedding Relation
Abstract

Nearest neighbors in word embedding models are commonly observed to be semantically similar, but the relations between them can vary greatly. We investigate the extent to which word embedding models preserve syntactic interchangeability, as reflected by distances between word vectors, and the effect of hyper-parameters—context window size in particular. We use part of speech (POS) as a proxy for syntactic interchangeability, as generally speaking, words with the same POS are syntactically valid in the same contexts. We also investigate the relationship between interchangeability and similarity as judged by commonly-used word similarity benchmarks, and correlate the result with the performance of word embedding models on these benchmarks. Our results will inform future research and applications in the selection of word embedding model, suggesting a principle for an appropriate selection of the context window size parameter depending on the use-case.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00669

PDF

http://arxiv.org/pdf/1904.00669
Read All
Learning Content-Weighted Deep Image Compression

2019-04-01

Mu Li, Wangmeng Zuo, Shuhang Gu, Jane You, David Zhang

arXiv_CV

arXiv_CV Salient CNN Optimization
Abstract

Learning-based lossy image compression usually involves the joint optimization of rate-distortion performance. Most existing methods adopt spatially invariant bit length allocation and incorporate discrete entropy approximation to constrain compression rate. Nonetheless, the information content is spatially variant, where the regions with complex and salient structures generally are more essential to image compression. Taking the spatial variation of image content into account, this paper presents a content-weighted encoder-decoder model, which involves an importance map subnet to produce the importance mask for locally adaptive bit rate allocation. Consequently, the summation of importance mask can thus be utilized as an alternative of entropy estimation for compression rate control. Furthermore, the quantized representations of the learned code and importance map are still spatially dependent, which can be losslessly compressed using arithmetic coding. To compress the codes effectively and efficiently, we propose a trimmed convolutional network to predict the conditional probability of quantized codes. Experiments show that the proposed method can produce visually much better results, and performs favorably in comparison with deep and traditional lossy image compression approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00664

PDF

http://arxiv.org/pdf/1904.00664
Read All
Deep Learning for Large-Scale Traffic-Sign Detection and Recognition

2019-04-01

Domen Tabernik, Danijel Skočaj

arXiv_CV

arXiv_CV CNN Deep_Learning Detection Recognition
Abstract

Automatic detection and recognition of traffic signs plays a crucial role in management of the traffic-sign inventory. It provides accurate and timely way to manage traffic-sign inventory with a minimal human effort. In the computer vision community the recognition and detection of traffic signs is a well-researched problem. A vast majority of existing approaches perform well on traffic signs needed for advanced drivers-assistance and autonomous systems. However, this represents a relatively small number of all traffic signs (around 50 categories out of several hundred) and performance on the remaining set of traffic signs, which are required to eliminate the manual labor in traffic-sign inventory management, remains an open question. In this paper, we address the issue of detecting and recognizing a large number of traffic-sign categories suitable for automating traffic-sign inventory management. We adopt a convolutional neural network (CNN) approach, the Mask R-CNN, to address the full pipeline of detection and recognition with automatic end-to-end learning. We propose several improvements that are evaluated on the detection of traffic signs and result in an improved overall performance. This approach is applied to detection of 200 traffic-sign categories represented in our novel dataset. Results are reported on highly challenging traffic-sign categories that have not yet been considered in previous works. We provide comprehensive analysis of the deep learning method for the detection of traffic signs with large intra-category appearance variation and show below 3% error rates with the proposed approach, which is sufficient for deployment in practical applications of traffic-sign inventory management.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00649

PDF

http://arxiv.org/pdf/1904.00649
Read All
Recognizing Musical Entities in User-generated Content

2019-04-01

Lorenzo Porcaro, Horacio Saggion

arXiv_CL

arXiv_CL Classification Recommendation Recognition
Abstract

Recognizing Musical Entities is important for Music Information Retrieval (MIR) since it can improve the performance of several tasks such as music recommendation, genre classification or artist similarity. However, most entity recognition systems in the music domain have concentrated on formal texts (e.g. artists’ biographies, encyclopedic articles, etc.), ignoring rich and noisy user-generated content. In this work, we present a novel method to recognize musical entities in Twitter content generated by users following a classical music radio channel. Our approach takes advantage of both formal radio schedule and users’ tweets to improve entity recognition. We instantiate several machine learning algorithms to perform entity recognition combining task-specific and corpus-based features. We also show how to improve recognition results by jointly considering formal and user-generated content

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00648

PDF

http://arxiv.org/pdf/1904.00648
Read All
Harvesting Visual Objects from Internet Images via Deep Learning Based Objectness Assessment

2019-04-01

Kan Wu, Guanbin Li, Haofeng Li, Jianjun Zhang, Yizhou Yu

arXiv_CV

arXiv_CV CNN Inference Deep_Learning Quantitative
Abstract

The collection of internet images has been growing in an astonishing speed. It is undoubted that these images contain rich visual information that can be useful in many applications, such as visual media creation and data-driven image synthesis. In this paper, we focus on the methodologies for building a visual object database from a collection of internet images. Such database is built to contain a large number of high-quality visual objects that can help with various data-driven image applications. Our method is based on dense proposal generation and objectness-based re-ranking. A novel deep convolutional neural network is designed for the inference of proposal objectness, the probability of a proposal containing optimally-located foreground object. In our work, the objectness is quantitatively measured in regard of completeness and fullness, reflecting two complementary features of an optimal proposal: a complete foreground and relatively small background. Our experiments indicate that object proposals re-ranked according to the output of our network generally achieve higher performance than those produced by other state-of-the-art methods. As a concrete example, a database of over 1.2 million visual objects has been built using the proposed method, and has been successfully used in various data-driven image applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00641

PDF

http://arxiv.org/pdf/1904.00641
Read All
Multimodal Machine Translation with Embedding Prediction

2019-04-01

Tosho Hirasawa, Hayahide Yamagishi, Yukio Matsumura, Mamoru Komachi

arXiv_CL

arXiv_CL Embedding NMT Prediction Relation
Abstract

Multimodal machine translation is an attractive application of neural machine translation (NMT). It helps computers to deeply understand visual objects and their relations with natural languages. However, multimodal NMT systems suffer from a shortage of available training data, resulting in poor performance for translating rare words. In NMT, pretrained word embeddings have been shown to improve NMT of low-resource domains, and a search-based approach is proposed to address the rare word problem. In this study, we effectively combine these two approaches in the context of multimodal NMT and explore how we can take full advantage of pretrained word embeddings to better translate rare words. We report overall performance improvements of 1.24 METEOR and 2.49 BLEU and achieve an improvement of 7.67 F-score for rare word translation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00639

PDF

http://arxiv.org/pdf/1904.00639
Read All
Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements

2019-04-01

Kaixuan Wei, Jiaolong Yang, Ying Fu, David Wipf, Hua Huang

arXiv_CV

arXiv_CV Embedding
Abstract

Removing undesirable reflections from a single image captured through a glass window is of practical importance to visual computing systems. Although state-of-the-art methods can obtain decent results in certain situations, performance declines significantly when tackling more general real-world cases. These failures stem from the intrinsic difficulty of single image reflection removal – the fundamental ill-posedness of the problem, and the insufficiency of densely-labeled training data needed for resolving this ambiguity within learning-based neural network pipelines. In this paper, we address these issues by exploiting targeted network enhancements and the novel use of misaligned data. For the former, we augment a baseline network architecture by embedding context encoding modules that are capable of leveraging high-level contextual clues to reduce indeterminacy within areas containing strong reflections. For the latter, we introduce an alignment-invariant loss function that facilitates exploiting misaligned real-world training data that is much easier to collect. Experimental results collectively show that our method outperforms the state-of-the-art with aligned data, and that significant improvements are possible when using additional misaligned data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00637

PDF

http://arxiv.org/pdf/1904.00637
Read All
CFSNet: Toward a Controllable Feature Space for Image Restoration

2019-04-01

Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang

arXiv_CV

arXiv_CV Super_Resolution Deep_Learning
Abstract

Deep learning methods have witnessed the great progress in image restoration with specific metrics (e.g., PSNR, SSIM). However, the perceptual quality of the restored image is relatively subjective, and it is necessary for users to control the reconstruction result according to personal preferences or image characteristics, which cannot be done using existing deterministic networks. This motivates us to exquisitely design a unified interactive framework for general image restoration tasks. Under this framework, users can control continuous transition of different objectives, e.g., the perception-distortion trade-off of image super-resolution, the trade-off between noise reduction and detail preservation. We achieve this goal by controlling latent features of the designed network. To be specific, our proposed framework, named Controllable Feature Space Network (CFSNet), is entangled by two branches based on different objectives. Our model can adaptively learn the coupling coefficients of different layers and channels, which provides finer control of the restored image quality. Experiments on several typical image restoration tasks fully validate the effective benefits of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00634

PDF

http://arxiv.org/pdf/1904.00634
Read All
TAN: Temporal Affine Network for Real-Time Left Ventricle Anatomical Structure Analysis Based on 2D Ultrasound Videos

2019-04-01

Sihong Chen, Kai Ma, Yefeng Zheng

arXiv_CV

arXiv_CV Segmentation Tracking Detection Recognition
Abstract

With superiorities on low cost, portability, and free of radiation, echocardiogram is a widely used imaging modality for left ventricle (LV) function quantification. However, automatic LV segmentation and motion tracking is still a challenging task. In addition to fuzzy border definition, low contrast, and abounding artifacts on typical ultrasound images, the shape and size of the LV change significantly in a cardiac cycle. In this work, we propose a temporal affine network (TAN) to perform image analysis in a warped image space, where the shape and size variations due to the cardiac motion as well as other artifacts are largely compensated. Furthermore, we perform three frequent echocardiogram interpretation tasks simultaneously: standard cardiac plane recognition, LV landmark detection, and LV segmentation. Instead of using three networks with one dedicating to each task, we use a multi-task network to perform three tasks simultaneously. Since three tasks share the same encoder, the compact network improves the segmentation accuracy with more supervision. The network is further finetuned with optical flow adjusted annotations to enhance motion coherence in the segmentation result. Experiments on 1,714 2D echocardiographic sequences demonstrate that the proposed method achieves state-of-the-art segmentation accuracy with real-time efficiency.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00631

PDF

http://arxiv.org/pdf/1904.00631
Read All
Med3D: Transfer Learning for 3D Medical Image Analysis

2019-04-01

Sihong Chen, Kai Ma, Yefeng Zheng

arXiv_CV

arXiv_CV Segmentation GAN Transfer_Learning Classification Deep_Learning
Abstract

The performance on deep learning is significantly affected by volume of training data. Models pre-trained from massive dataset such as ImageNet become a powerful weapon for speeding up training convergence and improving accuracy. Similarly, models based on large dataset are important for the development of deep learning in 3D medical images. However, it is extremely challenging to build a sufficiently large dataset due to difficulty of data acquisition and annotation in 3D medical imaging. We aggregate the dataset from several medical challenges to build 3DSeg-8 dataset with diverse modalities, target organs, and pathologies. To extract general medical three-dimension (3D) features, we design a heterogeneous 3D network called Med3D to co-train multi-domain 3DSeg-8 so as to make a series of pre-trained models. We transfer Med3D pre-trained models to lung segmentation in LIDC dataset, pulmonary nodule classification in LIDC dataset and liver segmentation on LiTS challenge. Experiments show that the Med3D can accelerate the training convergence speed of target 3D medical tasks 2 times compared with model pre-trained on Kinetics dataset, and 10 times compared with training from scratch as well as improve accuracy ranging from 3\% to 20\%. Transferring our Med3D model on state-the-of-art DenseASPP segmentation network, in case of single model, we achieve 94.6\% Dice coefficient which approaches the result of top-ranged algorithms on the LiTS challenge.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00625

PDF

http://arxiv.org/pdf/1904.00625
Read All
Constructing Hierarchical Q&A Datasets for Video Story Understanding

2019-04-01

Yu-Jung Heo, Kyoung-Woon On, Seongho Choi, Jaeseo Lim, Jinah Kim, Jeh-Kwang Ryu, Byung-Chull Bae, Byoung-Tak Zhang

arXiv_AI

arXiv_AI Video_Caption Knowledge
Abstract

Video understanding is emerging as a new paradigm for studying human-like AI. Question-and-Answering (Q&A) is used as a general benchmark to measure the level of intelligence for video understanding. While several previous studies have suggested datasets for video Q&A tasks, they did not really incorporate story-level understanding, resulting in highly-biased and lack of variance in degree of question difficulty. In this paper, we propose a hierarchical method for building Q&A datasets, i.e. hierarchical difficulty levels. We introduce three criteria for video story understanding, i.e. memory capacity, logical complexity, and DIKW (Data-Information-Knowledge-Wisdom) pyramid. We discuss how three-dimensional map constructed from these criteria can be used as a metric for evaluating the levels of intelligence relating to video story understanding.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00623

PDF

http://arxiv.org/pdf/1904.00623
Read All
Towards Ranking Geometric Automated Theorem Provers

2019-04-01

Nuno Baeta (University of Coimbra), Pedro Quaresma (University of Coimbra)

arXiv_AI

arXiv_AI
Abstract

The field of geometric automated theorem provers has a long and rich history, from the early AI approaches of the 1960s, synthetic provers, to today algebraic and synthetic provers. The geometry automated deduction area differs from other areas by the strong connection between the axiomatic theories and its standard models. In many cases the geometric constructions are used to establish the theorems’ statements, geometric constructions are, in some provers, used to conduct the proof, used as counter-examples to close some branches of the automatic proof. Synthetic geometry proofs are done using geometric properties, proofs that can have a visual counterpart in the supporting geometric construction. With the growing use of geometry automatic deduction tools as applications in other areas, e.g. in education, the need to evaluate them, using different criteria, is felt. Establishing a ranking among geometric automated theorem provers will be useful for the improvement of the current methods/implementations. Improvements could concern wider scope, better efficiency, proof readability and proof reliability. To achieve the goal of being able to compare geometric automated theorem provers a common test bench is needed: a common language to describe the geometric problems; a comprehensive repository of geometric problems and a set of quality measures.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00619

PDF

http://arxiv.org/pdf/1904.00619
Read All
Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle

2019-04-01

Maximin Coavoux, Shay B. Cohen

arXiv_CL

arXiv_CL
Abstract

We introduce a novel transition system for discontinuous constituency parsing. Instead of storing subtrees in a stack –i.e. a data structure with linear-time sequential access– the proposed system uses a set of parsing items, with constant-time random access. This change makes it possible to construct any discontinuous constituency tree in exactly $4n - 2$ transitions for a sentence of length $n$. At each parsing step, the parser considers every item in the set to be combined with a focus item and to construct a new constituent in a bottom-up fashion. The parsing strategy is based on the assumption that most syntactic structures can be parsed incrementally and that the set –the memory of the parser– remains reasonably small on average. Moreover, we introduce a provably correct dynamic oracle for the new transition system, and present the first experiments in discontinuous constituency parsing using a dynamic oracle. Our parser obtains state-of-the-art results on three English and German discontinuous treebanks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00615

PDF

http://arxiv.org/pdf/1904.00615
Read All
Video Object Segmentation using Space-Time Memory Networks

2019-04-01

Seoung Wug Oh, Joon-Young Lee, Ning Xu, Seon Joo Kim

arXiv_CV

arXiv_CV Segmentation Prediction Memory_Networks
Abstract

We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods are unable to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learn to read relevant information from all available sources. In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory. Specifically, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. Contrast to the previous approaches, the abundant use of the guidance information allows us to better handle the challenges such as appearance changes and occlussions. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (overall score of 79.4 on Youtube-VOS val set, J of 88.7 and 79.2 on DAVIS 2016/2017 val set respectively) while having a fast runtime (0.16 second/frame on DAVIS 2016 val set).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00607

PDF

http://arxiv.org/pdf/1904.00607
Read All
Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks

2019-04-01

Woo-Jeoung Nam, Jaesik Choi, Seong-Whan Lee

arXiv_CV

arXiv_CV Prediction Quantitative
Abstract

As Deep Neural Networks (DNNs) have demonstrated superhuman performance in many computer vision tasks, there is an increasing interest in revealing the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective that precisely separates the positive and negative attributions. By identifying the fundamental causes of activation and the proper inversion of relevance, RAP allows each neuron to be assigned an actual contribution to the output. Furthermore, we devise pragmatic methods to handle the effect of bias and batch normalization properly in the attributing procedures. Therefore, our method makes it possible to interpret various kinds of very deep neural network models with clear and attentive visualizations of positive and negative attributions. By utilizing the region perturbation method and comparing the distribution of attributions for a quantitative evaluation, we verify the correctness of our RAP whether the positive and negative attributions correctly account for each meaning. The positive and negative attributions propagated by RAP show the characteristics of vulnerability and robustness to the distortion of the corresponding pixels, respectively. We apply RAP to DNN models; VGG-16, ResNet-50 and Inception-V3, demonstrating its generation of more intuitive and improved interpretation compared to the existing attribution methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00605

PDF

http://arxiv.org/pdf/1904.00605
Read All
Distributed Power Control for Large Energy Harvesting Networks: A Multi-Agent Deep Reinforcement Learning Approach

2019-04-01

Mohit K.Sharma, Alessio Zappone, Mohamad Assaad, Merouane Debbah, Spyridon Vassilaras

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

In this paper, we develop a multi-agent reinforcement learning (MARL) framework to obtain online power control policies for a large energy harvesting (EH) multiple access channel, when only the causal information about the EH process and wireless channel is available. In the proposed framework, we model the online power control problem as a discrete-time mean-field game (MFG), and leverage the deep reinforcement learning technique to learn the stationary solution of the game in a distributed fashion. We analytically show that the proposed procedure converges to the unique stationary solution of the MFG. Using the proposed framework, the power control policies are learned in a completely distributed fashion. In order to benchmark the performance of the distributed policies, we also develop a deep neural network (DNN) based centralized as well as distributed online power control schemes. Our simulation results show the efficacy of the proposed power control policies. In particular, the DNN based centralized power control policies provide a very good performance for large EH networks for which the design of optimal policies is intractable using the conventional methods such as Markov decision processes. Further, performance of both the distributed policies is close to the throughput achieved by the centralized policies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00601

PDF

http://arxiv.org/pdf/1904.00601
Read All
EventNet: Asynchronous Recursive Event Processing

2019-04-01

Yusuke Sekikawa, Kosuke Hara, Hideo Saito

arXiv_CV

arXiv_CV Sparse Inference
Abstract

Event cameras are bio-inspired vision sensors that mimic retinas to asynchronously report per-pixel intensity changes rather than outputting an actual intensity image at regular intervals. This new paradigm of image sensor offers significant potential advantages; namely, sparse and non-redundant data representation. Unfortunately, however, most of the existing artificial neural network architectures, such as a CNN, require dense synchronous input data, and therefore, cannot make use of the sparseness of the data. We propose EventNet, a neural network designed for real-time processing of asynchronous event streams in a recursive and event-wise manner. EventNet models dependence of the output on tens of thousands of causal events recursively using a novel temporal coding scheme. As a result, at inference time, our network operates in an event-wise manner that is realized with very few sum-of-the-product operations—look-up table and temporal feature aggregation—which enables processing of 1 mega or more events per second on standard CPU. In experiments using real data, we demonstrated the real-time performance and robustness of our framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.07045

PDF

http://arxiv.org/pdf/1812.07045
Read All
Learning Combinatorial Embedding Networks for Deep Graph Matching

2019-04-01

Runzhong Wang, Junchi Yan, Xiaokang Yang

arXiv_CV

arXiv_CV Embedding Inference
Abstract

Graph matching refers to finding node correspondence between graphs, such that the corresponding node and edge’s affinity can be maximized. In addition with its NP-completeness nature, another important challenge is effective modeling of the node-wise and structure-wise affinity across graphs and the resulting objective, to guide the matching procedure effectively finding the true matching against noises. To this end, this paper devises an end-to-end differentiable deep network pipeline to learn the affinity for graph matching. It involves a supervised permutation loss regarding with node correspondence to capture the combinatorial nature for graph matching. Meanwhile deep graph embedding models are adopted to parameterize both intra-graph and cross-graph affinity functions, instead of the traditional shallow and simple parametric forms e.g. a Gaussian kernel. The embedding can also effectively capture the higher-order structure beyond second-order edges. The permutation loss model is agnostic to the number of nodes, and the embedding model is shared among nodes such that the network allows for varying numbers of nodes in graphs for training and inference. Moreover, our network is class-agnostic with some generalization capability across different categories. All these features are welcomed for real-world applications. Experiments show its superiority against state-of-the-art graph matching learning methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00597

PDF

http://arxiv.org/pdf/1904.00597
Read All
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data

2019-04-01

Foivos I. Diakogiannis, François Waldner, Peter Caccetta, Chen Wu

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation Classification Deep_Learning
Abstract

Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications. Due to the large within-class and small between-class variance in pixel values of objects of interest, this remains a challenging task. In recent years, deep convolutional neural networks have started being used in remote sensing applications and demonstrate state-of-the-art performance for pixel level classification of objects. Here we present a novel deep learning architecture, ResUNet-a, that combines ideas from various state-of-the-art modules used in computer vision for semantic segmentation tasks. We analyse the performance of several flavours of the Generalized Dice loss for semantic segmentation, and we introduce a novel variant loss function for semantic segmentation of objects that has better convergence properties and behaves well even under the presence of highly imbalanced classes. The performance of our modelling framework is evaluated on the ISPRS 2D Potsdam dataset. Results show state-of-the-art performance with an average F1 score of 92.1\% over all classes for our best model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00592

PDF

http://arxiv.org/pdf/1904.00592
Read All
Using Similarity Measures to Select Pretraining Data for NER

2019-04-01

Xiang Dai, Sarvnaz Karimi, Ben Hachey, Cecile Paris

arXiv_CL

arXiv_CL Language_Model Recognition
Abstract

Word vectors and Language Models (LMs) pretrained on a large amount of unlabelled data can dramatically improve various Natural Language Processing (NLP) tasks. However, the measure and impact of similarity between pretraining data and target task data are left to intuition. We propose three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data. We demonstrate that these measures are good predictors of the usefulness of pretrained models for Named Entity Recognition (NER) over 30 data pairs. Results also suggest that pretrained LMs are more effective and more predictable than pretrained word vectors, but pretrained word vectors are better when pretraining data is dissimilar.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00585

PDF

http://arxiv.org/pdf/1904.00585
Read All
Sound source ranging using a feed-forward neural network with fitting-based early stopping

2019-04-01

Jing Chi, Xiaolei Li, Haozhong Wang, Dazhi Gao, Peter Gerstoft

arXiv_SD

arXiv_SD
Abstract

When a feed-forward neural network (FNN) is trained for source ranging in an ocean waveguide, it is difficult evaluating the range accuracy of the FNN on unlabeled test data. A fitting-based early stopping (FEAST) method is introduced to evaluate the range error of the FNN on test data where the distance of source is unknown. Based on FEAST, when the evaluated range error of the FNN reaches the minimum on test data, stopping training, which will help to improve the ranging accuracy of the FNN on the test data. The FEAST is demonstrated on simulated and experimental data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00583

PDF

http://arxiv.org/pdf/1904.00583
Read All
Palmprint image registration using convolutional neural networks and Hough transform

2019-04-01

Mohsen Ahmadi, Hossein Soleimani

arXiv_CV

arXiv_CV CNN Recognition
Abstract

Minutia-based palmprint recognition systems has got lots of interest in last two decades. Due to the large number of minutiae in a palmprint, approximately 1000 minutiae, the matching process is time consuming which makes it unpractical for real time applications. One way to address this issue is aligning all palmprint images to a reference image and bringing them to a same coordinate system. Bringing all palmprint images to a same coordinate system, results in fewer computations during minutia matching. In this paper, using convolutional neural network (CNN) and generalized Hough transform (GHT), we propose a new method to register palmprint images accurately. This method, finds the corresponding rotation and displacement (in both x and y direction) between the palmprint and a reference image. Exact palmprint registration can enhance the speed and the accuracy of matching process. Proposed method is capable of distinguishing between left and right palmprint automatically which helps to speed up the matching process. Furthermore, designed structure of CNN in registration stage, gives us the segmented palmprint image from background which is a pre-processing step for minutia extraction. The proposed registration method followed by minutia-cylinder code (MCC) matching algorithm has been evaluated on the THUPALMLAB database, and the results show the superiority of our algorithm over most of the state-of-the-art algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00579

PDF

http://arxiv.org/pdf/1904.00579
Read All
A Novel GAN-based Fault Diagnosis Approach for Imbalanced Industrial Time Series

2019-04-01

Wenqian Jiang, Cheng Cheng, Beitong Zhou, Guijun Ma, Ye Yuan

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

This paper proposes a novel fault diagnosis approach based on generative adversarial networks (GAN) for imbalanced industrial time series where normal samples are much larger than failure cases. We combine a well-designed feature extractor with GAN to help train the whole network. Aimed at obtaining data distribution and hidden pattern in both original distinguishing features and latent space, the encoder-decoder-encoder three-sub-network is employed in GAN, based on Deep Convolution Generative Adversarial Networks (DCGAN) but without Tanh activation layer and only trained on normal samples. In order to verify the validity and feasibility of our approach, we test it on rolling bearing data from Case Western Reserve University and further verify it on data collected from our laboratory. The results show that our proposed approach can achieve excellent performance in detecting faulty by outputting much larger evaluation scores.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.00575

PDF

https://arxiv.org/pdf/1904.00575
Read All
Machine Vision for Natural Gas Methane Emissions Detection Using an Infrared Camera

2019-04-01

Jingfan Wang, Lyne P. Tchapmi, Arvind P. Ravikumara, Mike McGuire, Clay S. Bell, Daniel Zimmerle, Silvio Savarese, Adam R. Brandt

arXiv_CV

arXiv_CV Survey CNN Detection
Abstract

It is crucial to reduce natural gas methane emissions, which can potentially offset the climate benefits of replacing coal with gas. Optical gas imaging (OGI) is a widely-used method to detect methane leaks, but is labor-intensive and cannot provide leak detection results without operators’ judgment. In this paper, we develop a computer vision approach to OGI-based leak detection using convolutional neural networks (CNN) trained on methane leak images to enable automatic detection. First, we collect ~1 M frames of labeled video of methane leaks from different leaking equipment for building CNN model, covering a wide range of leak sizes (5.3-2051.6 gCH4/h) and imaging distances (4.6-15.6 m). Second, we examine different background subtraction methods to extract the methane plume in the foreground. Third, we then test three CNN model variants, collectively called GasNet, to detect plumes in videos taken at other pieces of leaking equipment. We assess the ability of GasNet to perform leak detection by comparing it to a baseline method that uses optical-flow based change detection algorithm. We explore the sensitivity of results to the CNN structure, with a moderate-complexity variant performing best across distances. We find that the detection accuracy can reach as high as 99%, the overall detection accuracy can exceed 95% for a case across all leak sizes and imaging distances. Binary detection accuracy exceeds 97% for large leaks (~710 gCH4/h) imaged closely (~5-7 m). At closer imaging distances (~5-10 m), CNN-based models have greater than 94% accuracy across all leak sizes. At farthest distances (~13-16 m), performance degrades rapidly, but it can achieve above 95% accuracy to detect large leaks (>950 gCH4/h). The GasNet-based computer vision approach could be deployed in OGI surveys to allow automatic vigilance of methane leak detection with high detection accuracy in the real world.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08500

PDF

http://arxiv.org/pdf/1904.08500
Read All
Polarity Loss for Zero-shot Object Detection

2019-04-01

Shafin Rahman, Salman Khan, Nick Barnes

arXiv_CV

arXiv_CV Object_Detection Embedding Deep_Learning Prediction Detection
Abstract

Zero-shot object detection is an emerging research topic that aims to recognize and localize previously ‘unseen’ objects. This setting gives rise to several unique challenges, e.g., highly imbalanced positive vs. negative instance ratio, ambiguity between background and unseen classes and the proper alignment between visual and semantic concepts. Here, we propose an end-to-end deep learning framework underpinned by a novel loss function that seeks to properly align the visual and semantic cues for improved zero-shot learning. We call our objective the ‘Polarity loss’ because it explicitly maximizes the gap between positive and negative predictions. Such a margin maximizing formulation is not only important for visual-semantic alignment but it also resolves the ambiguity between background and unseen objects. Our approach is inspired by the embodiment theories in cognitive science, that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word dictionary) and the perception of the physical world (visual imagery). To this end, we learn to attend to a dictionary of related semantic concepts that eventually refines the noisy semantic embeddings and helps establish a better synergy between visual and semantic domains. Our extensive results on MS-COCO and Pascal VOC datasets show as high as 14x mAP improvement over state of the art.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.08982

PDF

https://arxiv.org/pdf/1811.08982
Read All
Multi-source weak supervision for saliency detection

2019-04-01

Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang, Mingyang Qian, Yizhou Yu

arXiv_CV

arXiv_CV Salient Attention Weakly_Supervised Caption Classification Prediction Detection
Abstract

The high cost of pixel-level annotations makes it appealing to train saliency detection models with weak supervision. However, a single weak supervision source usually does not contain enough information to train a well-performing model. To this end, we propose a unified framework to train saliency detection models with diverse weak supervision sources. In this paper, we use category labels, captions, and unlabelled data for training, yet other supervision sources can also be plugged into this flexible framework. We design a classification network (CNet) and a caption generation network (PNet), which learn to predict object categories and generate captions, respectively, meanwhile highlight the most important regions for corresponding tasks. An attention transfer loss is designed to transmit supervision signal between networks, such that the network designed to be trained with one supervision source can benefit from another. An attention coherence loss is defined on unlabelled data to encourage the networks to detect generally salient regions instead of task-specific regions. We use CNet and PNet to generate pixel-level pseudo labels to train a saliency prediction network (SNet). During the testing phases, we only need SNet to predict saliency maps. Experiments demonstrate the performance of our method compares favourably against unsupervised and weakly supervised methods and even some supervised methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00566

PDF

http://arxiv.org/pdf/1904.00566
Read All
Co-regularized Multi-view Sparse Reconstruction Embedding for Dimension Reduction

2019-04-01

Huibing Wang, Jinjia Peng, Xianping Fu

arXiv_AI

arXiv_AI Image_Retrieval Sparse Face Embedding Classification Relation Recognition Face_Recognition
Abstract

With the development of information technology, we have witnessed an age of data explosion which produces a large variety of data filled with redundant information. Because dimension reduction is an essential tool which embeds high-dimensional data into a lower-dimensional subspace to avoid redundant information, it has attracted interests from researchers all over the world. However, facing with features from multiple views, it’s difficult for most dimension reduction methods to fully comprehended multi-view features and integrate compatible and complementary information from these features to construct low-dimensional subspace directly. Furthermore, most multi-view dimension reduction methods cannot handle features from nonlinear spaces with high dimensions. Therefore, how to construct a multi-view dimension reduction methods which can deal with multi-view features from high-dimensional nonlinear space is of vital importance but challenging. In order to address this problem, we proposed a novel method named Co-regularized Multi-view Sparse Reconstruction Embedding (CMSRE) in this paper. By exploiting correlations of sparse reconstruction from multiple views, CMSRE is able to learn local sparse structures of nonlinear manifolds from multiple views and constructs significative low-dimensional representations for them. Due to the proposed co-regularized scheme, correlations of sparse reconstructions from multiple views are preserved by CMSRE as much as possible. Furthermore, sparse representation produces more meaningful correlations between features from each single view, which helps CMSRE to gain better performances. Various evaluations based on the applications of document classification, face recognition and image retrieval can demonstrate the effectiveness of the proposed approach on multi-view dimension reduction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08499

PDF

http://arxiv.org/pdf/1904.08499
Read All
Adaptation of Hierarchical Structured Models for Speech Act Recognition in Asynchronous Conversation

2019-04-01

Tasnim Mohiuddin, Thanh-Tung Nguyen, Shafiq Joty

arXiv_CL

arXiv_CL Adversarial Embedding RNN Recognition
Abstract

We address the problem of speech act recognition (SAR) in asynchronous conversations (forums, emails). Unlike synchronous conversations (e.g., meetings, phone), asynchronous domains lack large labeled datasets to train an effective SAR model. In this paper, we propose methods to effectively leverage abundant unlabeled conversational data and the available labeled data from synchronous domains. We carry out our research in three main steps. First, we introduce a neural architecture based on hierarchical LSTMs and conditional random fields (CRF) for SAR, and show that our method outperforms existing methods when trained on in-domain data only. Second, we improve our initial SAR models by semi-supervised learning in the form of pretrained word embeddings learned from a large unlabeled conversational corpus. Finally, we employ adversarial training to improve the results further by leveraging the labeled data from synchronous domains and by explicitly modeling the distributional shift in two domains.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04021

PDF

http://arxiv.org/pdf/1904.04021
Read All
Scene Graph Generation with External Knowledge and Image Reconstruction

2019-04-01

Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling

arXiv_CV

arXiv_CV Image_Caption Object_Detection Knowledge Attention Prediction Detection Relation
Abstract

Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. In particular, we extract commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability in scene graph generation. To address the bias of noisy object annotations, we introduce an auxiliary image reconstruction path to regularize the scene graph generation network. Extensive experiments show that our framework can generate better scene graphs, achieving the state-of-the-art performance on two benchmark datasets: Visual Relationship Detection and Visual Genome datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00560

PDF

http://arxiv.org/pdf/1904.00560
Read All
Defogging Kinect: Simultaneous Estimation of Object Region and Depth in Foggy Scenes

2019-04-01

Yuki Fujimura, Motoharu Sonogashira, Masaaki Iiyama

arXiv_CV

arXiv_CV Optimization
Abstract

Three-dimensional (3D) reconstruction and scene depth estimation from 2-dimensional (2D) images are major tasks in computer vision. However, using conventional 3D reconstruction techniques gets challenging in participating media such as murky water, fog, or smoke. We have developed a method that uses a time-of-flight (ToF) camera to estimate an object region and depth in participating media simultaneously. The scattering component is saturated, so it does not depend on the scene depth, and received signals bouncing off distant points are negligible due to light attenuation in the participating media, so the observation of such a point contains only a scattering component. These phenomena enable us to estimate the scattering component in an object region from a background that only contains the scattering component. The problem is formulated as robust estimation where the object region is regarded as outliers, and it enables the simultaneous estimation of an object region and depth on the basis of an iteratively reweighted least squares (IRLS) optimization scheme. We demonstrate the effectiveness of the proposed method using captured images from a Kinect v2 in real foggy scenes and evaluate the applicability with synthesized data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00558

PDF

http://arxiv.org/pdf/1904.00558
Read All
Weakly Supervised Object Detection with Segmentation Collaboration

2019-04-01

Xiaoyan Li, Meina Kan, Shiguang Shan, Xilin Chen

arXiv_CV

arXiv_CV Adversarial Object_Detection Segmentation Weakly_Supervised Image_Classification Classification Detection
Abstract

Weakly supervised object detection aims at learning precise object detectors, given image category labels. In recent prevailing works, this problem is generally formulated as a multiple instance learning module guided by an image classification loss. The object bounding box is assumed to be the one contributing most to the classification among all proposals. However, the region contributing most is also likely to be a crucial part or the supporting context of an object. To obtain a more accurate detector, in this work we propose a novel end-to-end weakly supervised detection approach, where a newly introduced generative adversarial segmentation module interacts with the conventional detection module in a collaborative loop. The collaboration mechanism takes full advantages of the complementary interpretations of the weakly supervised localization task, namely detection and segmentation tasks, forming a more comprehensive solution. Consequently, our method obtains more precise object bounding boxes, rather than parts or irrelevant surroundings. Expectedly, the proposed method achieves an accuracy of 51.0% on the PASCAL VOC 2007 dataset, outperforming the state-of-the-arts and demonstrating its superiority for weakly supervised object detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00551

PDF

http://arxiv.org/pdf/1904.00551
Read All
PIRM2018 Challenge on Spectral Image Super-Resolution: Dataset and Study

2019-04-01

Mehrdad Shoeiby, Antonio Robles-Kelly, Ran Wei, Radu Timofte

arXiv_CV

arXiv_CV Super_Resolution Knowledge
Abstract

This paper introduces a newly collected and novel dataset (StereoMSI) for example-based single and colour-guided spectral image super-resolution. The dataset was first released and promoted during the PIRM2018 spectral image super-resolution challenge. To the best of our knowledge, the dataset is the first of its kind, comprising 350 registered colour-spectral image pairs. The dataset has been used for the two tracks of the challenge and, for each of these, we have provided a split into training, validation and testing. This arrangement is a result of the challenge structure and phases, with the first track focusing on example-based spectral image super-resolution and the second one aiming at exploiting the registered stereo colour imagery to improve the resolution of the spectral images. Each of the tracks and splits has been selected to be consistent across a number of image quality metrics. The dataset is quite general in nature and can be used for a wide variety of applications in addition to the development of spectral image super-resolution methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00540

PDF

http://arxiv.org/pdf/1904.00540
Read All
Perceive Where to Focus: Learning Visibility-aware Part-level Features for Partial Person Re-identification

2019-04-01

Yifan Sun, Qin Xu, Yali Li, Chi Zhang, Yikang Li, Shengjin Wang, Jian Sun

arXiv_CV

arXiv_CV Re-identification Person_Re-identification
Abstract

This paper considers a realistic problem in person re-identification (re-ID) task, i.e., partial re-ID. Under partial re-ID scenario, the images may contain a partial observation of a pedestrian. If we directly compare a partial pedestrian image with a holistic one, the extreme spatial misalignment significantly compromises the discriminative ability of the learned representation. We propose a Visibility-aware Part Model (VPM), which learns to perceive the visibility of regions through self-supervision. The visibility awareness allows VPM to extract region-level features and compare two images with focus on their shared regions (which are visible on both images). VPM gains two-fold benefit toward higher accuracy for partial re-ID. On the one hand, compared with learning a global feature, VPM learns region-level features and benefits from fine-grained information. On the other hand, with visibility awareness, VPM is capable to estimate the shared regions between two images and thus suppresses the spatial misalignment. Experimental results confirm that our method significantly improves the learned representation and the achieved accuracy is on par with the state of the art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00537

PDF

http://arxiv.org/pdf/1904.00537
Read All
Salient Object Detection via High-to-Low Hierarchical Context Aggregation

2019-04-01

Yun Liu, Yu Qiu, Le Zhang, JiaWang Bian, Guang-Yu Nie, Ming-Ming Cheng

arXiv_CV

arXiv_CV Salient Object_Detection CNN Detection
Abstract

Recent progress on salient object detection mainly aims at exploiting how to effectively integrate convolutional side-output features in convolutional neural networks (CNN). Based on this, most of the existing state-of-the-art saliency detectors design complex network structures to fuse the side-output features of the backbone feature extraction networks. However, should the fusion strategies be more and more complex for accurate salient object detection? In this paper, we observe that the contexts of a natural image can be well expressed by a high-to-low self-learning of side-output convolutional features. As we know, the contexts of an image usually refer to the global structures, and the top layers of CNN usually learn to convey global information. On the other hand, it is difficult for the intermediate side-output features to express contextual information. Here, we design an hourglass network with intermediate supervision to learn contextual features in a high-to-low manner. The learned hierarchical contexts are aggregated to generate the hybrid contextual expression for an input image. At last, the hybrid contextual features can be used for accurate saliency estimation. We extensively evaluate our method on six challenging saliency datasets, and our simple method achieves state-of-the-art performance under various evaluation metrics. Code will be released upon paper acceptance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.10956

PDF

https://arxiv.org/e-print/1812.10956
Read All
Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model

2019-04-01

Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, Lei Zhang

arXiv_CV

arXiv_CV Super_Resolution Prediction
Abstract

Most of the existing learning-based single image superresolution (SISR) methods are trained and evaluated on simulated datasets, where the low-resolution (LR) images are generated by applying a simple and uniform degradation (i.e., bicubic downsampling) to their high-resolution (HR) counterparts. However, the degradations in real-world LR images are far more complicated. As a consequence, the SISR models trained on simulated data become less effective when applied to practical scenarios. In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera. An image registration algorithm is developed to progressively align the image pairs at different resolutions. Considering that the degradation kernels are naturally non-uniform in our dataset, we present a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image. Our extensive experiments demonstrate that SISR models trained on our RealSR dataset deliver better visual quality with sharper edges and finer textures on real-world scenes than those trained on simulated datasets. Though our RealSR dataset is built by using only two cameras (Canon 5D3 and Nikon D810), the trained model generalizes well to other camera devices such as Sony a7II and mobile phones.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00523

PDF

http://arxiv.org/pdf/1904.00523
Read All
Global and Local Consistent Wavelet-domain Age Synthesis

2019-04-01

Peipei Li, Yibo Hu, Ran He, Zhenan Sun

arXiv_CV

arXiv_CV Regularization Adversarial GAN Face Quantitative
Abstract

Age synthesis is a challenging task due to the complicated and non-linear transformation in human aging process. Aging information is usually reflected in local facial parts, such as wrinkles at the eye corners. However, these local facial parts contribute less in previous GAN based methods for age synthesis. To address this issue, we propose a Wavelet-domain Global and Local Consistent Age Generative Adversarial Network (WaveletGLCA-GAN), in which one global specific network and three local specific networks are integrated together to capture both global topology information and local texture details of human faces. Different from the most existing methods that modeling age synthesis in image-domain, we adopt wavelet transform to depict the textual information in frequency-domain. %Moreover, to achieve accurate age generation under the premise of preserving the identity information, age estimation network and face verification network are employed. Moreover, five types of losses are adopted: 1) adversarial loss aims to generate realistic wavelets; 2) identity preserving loss aims to better preserve identity information; 3) age preserving loss aims to enhance the accuracy of age synthesis; 4) pixel-wise loss aims to preserve the background information of the input face; 5) the total variation regularization aims to remove ghosting artifacts. Our method is evaluated on three face aging datasets, including CACD2000, Morph and FG-NET. Qualitative and quantitative experiments show the superiority of the proposed method over other state-of-the-arts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.07764

PDF

http://arxiv.org/pdf/1809.07764
Read All
Air Taxi Skyport Location Problem for Airport Access

2019-04-01

Srushti Rath, Joseph Y.J. Chow

arXiv_AI

arXiv_AI Optimization
Abstract

Air taxis are poised to be an additional mode of transportation in major cities suffering from ground transportation congestion. Among several potential applications of air taxis, we focus on their use within a city to transport passengers to nearby airports. Specifically, we consider the problem of determining optimal locations for skyports (enabling pick-up of passengers to airport) within a city. Our approach is inspired from hub location problems, and our proposed method optimizes for aggregate travel time to multiple airports while satisfying the demand (trips to airports) either via (i) ground transportation to skyport followed by an air taxi to the airport, or (ii) direct ground transportation to the airport. The number of skyports is a constraint, and the decision to go via the skyport versus direct ground transportation is a variable in the optimization problem. Extensive experiments on publicly available airport trips data from New York City (NYC) show the efficacy of our optimization method implemented using Gurobi. In addition, we share insightful results based on the NYC data set on how ground transportation congestion can impact the demand and service efficiency in such skyports; this emerges as yet another factor in deciding the optimal number of skyports and their locations for a given city.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01497

PDF

http://arxiv.org/pdf/1904.01497
Read All
Optimal Auctions through Deep Learning

2019-04-01

Paul Dütting, Zhe Feng, Harikrishna Narasimhan, David C. Parkes, Sai Srivatsa Ravindranath

arXiv_AI

arXiv_AI Deep_Learning
Abstract

Designing an incentive compatible auction that maximizes expected revenue is an intricate task. The single-item case was resolved in a seminal piece of work by Myerson in 1981. Even after 30-40 years of intense research the problem remains unsolved for seemingly simple multi-bidder, multi-item settings. In this work, we initiate the exploration of the use of tools from deep learning for the automated design of optimal auctions. We model an auction as a multi-layer neural network, frame optimal auction design as a constrained learning problem, and show how it can be solved using standard pipelines. We prove generalization bounds and present extensive experiments, recovering essentially all known analytical solutions for multi-item settings, and obtaining novel mechanisms for settings in which the optimal mechanism is unknown.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1706.03459

PDF

http://arxiv.org/pdf/1706.03459
Read All
Elaboration Tolerant Representation of Markov Decision Process via Decision-Theoretic Extension of Probabilistic Action Language pBC+

2019-04-01

Yi Wang, Joohyung Lee

arXiv_AI

arXiv_AI
Abstract

We extend probabilistic action language pBC+ with the notion of utility as in decision theory. The semantics of the extended pBC+ can be defined as a shorthand notation for a decision-theoretic extension of the probabilistic answer set programming language LPMLN. Alternatively, the semantics of pBC+ can also be defined in terms of Markov Decision Process (MDP), which in turn allows for representing MDP in a succinct and elaboration tolerant way as well as to leverage an MDP solver to compute pBC+. The idea led to the design of the system pbcplus2mdp, which can find an optimal policy of a pBC+ action description using an MDP solver.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00512

PDF

http://arxiv.org/pdf/1904.00512
Read All
Risk Averse Robust Adversarial Reinforcement Learning

2019-03-31

Xinlei Pan, Daniel Seita, Yang Gao, John Canny

arXiv_AI

arXiv_AI Adversarial Reinforcement_Learning Optimization
Abstract

Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i.e., those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce risk-averse robust adversarial reinforcement learning (RARARL), using a risk-averse protagonist and a risk-seeking adversary. We test our approach on a self-driving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is better equipped to handle a risk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00511

PDF

http://arxiv.org/pdf/1904.00511
Read All
How to enhance learning of robotic surgery gestures? A tactile cue saliency investigation for 3D hand guidance

2019-03-31

Gustavo D. Gil, Julie M. Walker, Nabil Zemiti, Allison M. Okamura, Philippe Poignet

arXiv_RO

arXiv_RO Salient Knowledge
Abstract

The current generation of surgeons requires extensive training in teleoperation to develop specific dexterous skills, which are independent of medical knowledge. Training curricula progress from manipulation tasks to simulated surgical tasks but are limited in time. To tackle this, we propose to integrate surgical robotic training together with Haptic Feedback (HF) to improve skill acquisition. This paper present the initial but promising results of our haptic device designed to support in the training of surgical gestures. Our ongoing work is related to integrate the HF in the RAVEN II platform.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00510

PDF

http://arxiv.org/pdf/1904.00510
Read All
Generate, Segment and Replace: Towards Generic Manipulation Segmentation

2019-03-31

Peng Zhou, Bor-Chun Chen, Xintong Han, Mahyar Najibi, Abhinav Shrivastava, Ser Nam Lim, Larry S. Davis

arXiv_CV

arXiv_CV Segmentation
Abstract

Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of fake news and misinformation is growing. Current state of the art methods for detecting these manipulated images suffers from the lack of training data due to the laborious labeling process. We address this problem in this paper, for which we introduce a manipulated image generation process that creates true positives using currently available datasets. Drawing from traditional work on image blending, we propose a novel generator for creating such examples. In addition, we also propose to further create examples that force the algorithm to focus on boundary artifacts during training. Strong experimental results validate our proposal.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.09729

PDF

http://arxiv.org/pdf/1811.09729
Read All
Context-Aware Cross-Lingual Mapping

2019-03-31

Hanan Aldarmaki, Mona Diab

arXiv_CL

arXiv_CL Embedding
Abstract

Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space. Word vectors, however, are most commonly used for sentence or document-level representations that are calculated as the weighted average of word embeddings. In this paper, we propose an alternative to word-level mapping that better reflects sentence-level cross-lingual similarity. We incorporate context in the transformation matrix by directly mapping the averaged embeddings of aligned sentences in a parallel corpus. We also implement cross-lingual mapping of deep contextualized word embeddings using parallel sentences with word alignments. In our experiments, both approaches resulted in cross-lingual sentence embeddings that outperformed context-independent word mapping in sentence translation retrieval. Furthermore, the sentence-level transformation could be used for word-level mapping without loss in word translation quality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03243

PDF

http://arxiv.org/pdf/1903.03243
Read All

99/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL