Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

MgNet: A Unified Framework of Multigrid and Convolutional Neural Network

2019-01-29

Juncai He, Jinchao Xu

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

We develop a unified model, known as MgNet, that simultaneously recovers some convolutional neural networks (CNN) for image classification and multigrid (MG) methods for solving discretized partial different equations (PDEs). This model is based on close connections that we have observed and uncovered between the CNN and MG methodologies. For example, pooling operation and feature extraction in CNN correspond directly to restriction operation and iterative smoothers in MG, respectively. As the solution space is often the dual of the data space in PDEs, the analogous concept of feature space and data space (which are dual to each other) is introduced in CNN. With such connections and new concept in the unified model, the function of various convolution operations and pooling used in CNN can be better understood. As a result, modified CNN models (with fewer weights and hyper parameters) are developed that exhibit competitive and sometimes better performance in comparison with existing CNN models when applied to both CIFAR-10 and CIFAR-100 data sets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10415

PDF

http://arxiv.org/pdf/1901.10415
Read All
Guidelines for creating man-machine multimodal interfaces

2019-01-29

João Ranhel, Cacilda Vilela

arXiv_CL

arXiv_CL GAN Face
Abstract

Understanding details of human multimodal interaction can elucidate many aspects of the type of information processing machines must perform to interact with humans. This article gives an overview of recent findings from Linguistics regarding the organization of conversation in turns, adjacent pairs, (dis)preferred responses, (self)repairs, etc. Besides, we describe how multiple modalities of signs interfere with each other modifying meanings. Then, we propose an abstract algorithm that describes how a machine can implement a double-feedback system that can reproduces a human-like face-to-face interaction by processing various signs, such as verbal, prosodic, facial expressions, gestures, etc. Multimodal face-to-face interactions enrich the exchange of information between agents, mainly because these agents are active all the time by emitting and interpreting signs simultaneously. This article is not about an untested new computational model. Instead, it translates findings from Linguistics as guidelines for designs of multimodal man-machine interfaces. An algorithm is presented. Brought from Linguistics, it is a description pointing out how human face-to-face interactions work. The linguistic findings reported here are the first steps towards the integration of multimodal communication. Some developers involved on interface designs carry on working on isolated models for interpreting text, grammar, gestures and facial expressions, neglecting the interwoven between these signs. In contrast, for linguists working on the state-of-the-art multimodal integration, the interpretation of separated modalities leads to an incomplete interpretation, if not to a miscomprehension of information. The algorithm proposed herein intends to guide man-machine interface designers who want to integrate multimodal components on face-to-face interactions as close as possible to those performed between humans.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10408

PDF

http://arxiv.org/pdf/1901.10408
Read All
Constraint Satisfaction Propagation: Non-stationary Policy Synthesis for Temporal Logic Planning

2019-01-29

Thomas J. Ringstrom, Paul R. Schrater

arXiv_AI

arXiv_AI Knowledge Relation
Abstract

Problems arise when using reward functions to capture dependencies between sequential time-constrained goal states because the state-space must be prohibitively expanded to accommodate a history of successfully achieved sub-goals. Policies and value functions derived with stationarity assumptions are not readily decomposable, leading to a tension between reward maximization and task generalization. We demonstrate a logic-compatible approach using model-based knowledge of environment dynamics and deadline information to directly infer non-stationary policies composed of reusable stationary policies. The policies are constructed to maximize the probability of satisfying time-sensitive goals while respecting time-varying obstacles. Our approach explicitly maintains two different spaces, a high-level logical task specification where the task-variables are grounded onto the low-level state-space of a Markov decision process. Computing satisfiability at the task-level is made possible by a Bellman-like equation which operates on a tensor that links the temporal relationship between the two spaces; the equation solves for a value function that can be explicitly interpreted as the probability of sub-goal satisfaction under the synthesized non-stationary policy, an approach we term Constraint Satisfaction Propagation (CSP).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10405

PDF

http://arxiv.org/pdf/1901.10405
Read All
Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification

2019-01-29

Ivo M. Baltruschat, Hannes Nickisch, Michael Grass, Tobias Knopp, Axel Saalbach

arXiv_CV

arXiv_CV Transfer_Learning Classification Deep_Learning Relation
Abstract

The increased availability of X-ray image archives (e.g. the ChestX-ray14 dataset from the NIH Clinical Center) has triggered a growing interest in deep learning techniques. To provide better insight into the different approaches, and their applications to chest X-ray classification, we investigate a powerful network architecture in detail: the ResNet-50. Building on prior work in this domain, we consider transfer learning with and without fine-tuning as well as the training of a dedicated X-ray network from scratch. To leverage the high spatial resolution of X-ray data, we also include an extended ResNet-50 architecture, and a network integrating non-image data (patient age, gender and acquisition type) in the classification process. In a concluding experiment, we also investigate multiple ResNet depths (i.e. ResNet-38 and ResNet-101). In a systematic evaluation, using 5-fold re-sampling and a multi-label loss function, we compare the performance of the different approaches for pathology classification by ROC statistics and analyze differences between the classifiers using rank correlation. Overall, we observe a considerable spread in the achieved performance and conclude that the X-ray-specific ResNet-38, integrating non-image data yields the best overall results. Furthermore, class activation maps are used to understand the classification process, and a detailed analysis of the impact of non-image features is provided.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.02315

PDF

http://arxiv.org/pdf/1803.02315
Read All
Anomaly Locality in Video Surveillance

2019-01-29

Federico Landi, Cees G. M. Snoek, Rita Cucchiara

arXiv_CV

arXiv_CV Detection
Abstract

This paper strives for the detection of real-world anomalies such as burglaries and assaults in surveillance videos. Although anomalies are generally local, as they happen in a limited portion of the frame, none of the previous works on the subject has ever studied the contribution of locality. In this work, we explore the impact of considering spatiotemporal tubes instead of whole-frame video segments. For this purpose, we enrich existing surveillance videos with spatial and temporal annotations: it is the first dataset for anomaly detection with bounding box supervision in both its train and test set. Our experiments show that a network trained with spatiotemporal tubes performs better than its analogous model trained with whole-frame videos. In addition, we discover that the locality is robust to different kinds of errors in the tube extraction phase at test time. Finally, we demonstrate that our network can provide spatiotemporal proposals for unseen surveillance videos leveraging only video-level labels. By doing, we enlarge our spatiotemporal anomaly dataset without the need for further human labeling.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10364

PDF

http://arxiv.org/pdf/1901.10364
Read All
On possibility of detection of emission from nanodiamonds in vicinity of stellar objects: laboratory spectroscopy and observational data

2019-01-29

Andrey Shiryaev, Laurence Sabin, Gennady Valyavin, Gazinur Galazutdinov

arXiv_CV

arXiv_CV Detection
Abstract

Based on extensive laboratory characterization of presolar nanodiamonds extracted from mete-orites, we have proposed a novel approach to detect nanodiamonds at astrophysical objects using the 7370 {\AA} emission band arising from lattice defects. Details of laboratory spectroscopic studies and preliminary results of observations are presented.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.10360

PDF

https://arxiv.org/pdf/1901.10360
Read All
Stochastic Conditional Gradient Method for Composite Convex Minimization

2019-01-29

Francesco Locatello, Alp Yurtsever, Olivier Fercoq, Volkan Cevher

arXiv_AI

arXiv_AI Optimization
Abstract

In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10348

PDF

http://arxiv.org/pdf/1901.10348
Read All
Automated Analysis, Reporting, and Archiving for Robotic Nondestructive Assay of Holdup Deposits

2019-01-29

Heather Jones, Siri Maley, Kenji Yonekawa, Mohammadreza Mousaei, J. David Yesso, David Kohanbash, William Whittaker

arXiv_AI

arXiv_AI Review Face
Abstract

To decommission deactivated gaseous diffusion enrichment facilities, miles of contaminated pipe must be measured. The current method requires thousands of manual measurements, repeated manual data transcription, and months of manual analysis. The Pipe Crawling Activity Measurement System (PCAMS), developed by Carnegie Mellon University and in commissioning for use at the DOE Portsmouth Gaseous Diffusion Enrichment Facility, uses a robot to measure Uranium-235 from inside pipes and automatically log the data. Radiation measurements, as well as imagery, geometric modeling, and precise measurement positioning data are digitally transferred to the PCAMS server. On the server, data can be automatically processed in minutes and summarized for analyst review. Measurement reports are auto-generated with the push of a button. A database specially-configured to hold heterogeneous data such as spectra, images, and robot trajectories serves as archive. This paper outlines the features and design of the PCAMS Post-Processing Software, currently in commissioning for use at the Portsmouth Gaseous Diffusion Enrichment Facility. The analysis process, the analyst interface to the system, and the content of auto-generated reports are each described. Example pipe-interior geometric surface models, illustration of how key report features apply in operational runs, and user feedback are discussed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10795

PDF

http://arxiv.org/pdf/1901.10795
Read All
Quality Measures for Speaker Verification with Short Utterances

2019-01-29

Arnab Poddar, Md Sahidullah, Goutam Saha

arXiv_CV

arXiv_CV
Abstract

The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model-universal background model (GMM-UBM) and i-vector. Considerable improvement is found in performance metrics by the proposed system on NIST SRE corpora in short duration conditions. We have observed improvement over state-of-the-art i-vector system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10345

PDF

http://arxiv.org/pdf/1901.10345
Read All
A Robot for Nondestructive Assay of Holdup Deposits in Gaseous Diffusion Piping

2019-01-29

Heather Jones, Siri Maley, Mohammadreza Mousaei, David Kohanbash, Warren Whittaker, James Teza, Andrew Zhang, Nikhil Jog, William Whittaker

arXiv_AI

arXiv_AI Object_Detection Detection
Abstract

Miles of contaminated pipe must be measured, foot by foot, as part of the decommissioning effort at deactivated gaseous diffusion enrichment facilities. The current method requires cutting away asbestos-lined thermal enclosures and performing repeated, elevated operations to manually measure pipe from the outside. The RadPiper robot, part of the Pipe Crawling Activity Measurement System (PCAMS) developed by Carnegie Mellon University and commissioned for use at the DOE Portsmouth Gaseous Diffusion Enrichment Facility, automatically measures U-235 in pipes from the inside. This improves certainty, increases safety, and greatly reduces measurement time. The heart of the RadPiper robot is a sodium iodide scintillation detector in an innovative disc-collimated assembly. By measuring from inside pipes, the robot significantly increases its count rate relative to external through-pipe measurements. The robot also provides imagery, models interior pipe geometry, and precisely measures distance in order to localize radiation measurements. Data collected by this system provides insight into pipe interiors that is simply not possible from exterior measurements, all while keeping operators safer. This paper describes the technical details of the PCAMS RadPiper robot. Key features for this robot include precision distance measurement, in-pipe obstacle detection, ability to transform for two pipe sizes, and robustness in autonomous operation. Test results demonstrating the robot’s functionality are presented, including deployment tolerance tests, safeguarding tests, and localization tests. Integrated robot tests are also shown.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10341

PDF

http://arxiv.org/pdf/1901.10341
Read All
Who's Afraid of Adversarial Queries? The Impact of Image Modifications on Content-based Image Retrieval

2019-01-29

Zhuoran Liu, Zhengyu Zhao, Martha Larson

arXiv_CV

arXiv_CV Image_Retrieval Adversarial Knowledge Face
Abstract

An adversarial query is an image that has been modified to disrupt content-based image retrieval (CBIR), while appearing nearly untouched to the human eye. This paper presents an analysis of adversarial queries for CBIR based on neural, local, and global features. We introduce an innovative neural image perturbation approach, called Perturbations for Image Retrieval Error (PIRE), that is capable of blocking neural-feature-based CBIR. To our knowledge PIRE is the first approach to creating neural adversarial examples for CBIR. PIRE differs significantly from existing approaches that create images adversarial with respect to CNN classifiers because it is unsupervised, i.e., it needs no labeled data from the data set to which it is applied. Our experimental analysis demonstrates the surprising effectiveness of PIRE in blocking CBIR, and also covers aspects of PIRE that must be taken into account in practical settings: saving images, image quality, image editing, and leaking adversarial queries into the background collection. Our experiments also compare PIRE (a neural approach) with existing keypoint removal and injection approaches (which modify local features). Finally, we discuss the challenges that face multimedia researchers in the future study of adversarial queries.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10332

PDF

http://arxiv.org/pdf/1901.10332
Read All
Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks

2019-01-29

Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, Gerhard Rigoll

arXiv_AI

arXiv_AI Object_Detection CNN Classification Detection Recognition
Abstract

Real-time recognition of dynamic hand gestures from video streams is a challenging task since (i) there is no indication when a gesture starts and ends in the video, (ii) performed gestures should only be recognized once, and (iii) the entire architecture should be designed considering the memory and power budget. In this work, we address these challenges by proposing a hierarchical structure enabling offline-working convolutional neural network (CNN) architectures to operate online efficiently by using sliding window approach. The proposed architecture consists of two models: (1) A detector which is a lightweight CNN architecture to detect gestures and (2) a classifier which is a deep CNN to classify the detected gestures. In order to evaluate the single-time activations of the detected gestures, we propose to use the Levenshtein distance as an evaluation metric since it can measure misclassifications, multiple detections, and missing detections at the same time. We evaluate our architecture on two publicly available datasets - EgoGesture and NVIDIA Dynamic Hand Gesture Datasets - which require temporal detection and classification of the performed hand gestures. ResNeXt-101 model, which is used as a classifier, achieves the state-of-the-art offline classification accuracy of 94.04% and 83.82% for depth modality on EgoGesture and NVIDIA benchmarks, respectively. In real-time detection and classification, we obtain considerable early detections while achieving performances close to offline operation. The codes and pretrained models used in this work are publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10323

PDF

http://arxiv.org/pdf/1901.10323
Read All
Trust Region-Guided Proximal Policy Optimization

2019-01-29

Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

Model-free reinforcement learning relies heavily on a safe yet exploratory policy search. Proximal policy optimization (PPO) is a prominent algorithm to address the safe search problem, by exploiting a heuristic clipping mechanism motivated by a theoretically-justified “trust region” guidance. However, we found that the clipping mechanism of PPO could lead to a lack of exploration issue. Based on this finding, we improve the original PPO with an adaptive clipping mechanism guided by a “trust region” criterion. Our method, termed as Trust Region-Guided PPO (TRPPO), improves PPO with more exploration and better sample efficiency, while maintains the safe search property and design simplicity of PPO. On several benchmark tasks, TRPPO significantly outperforms the original PPO and is competitive with several state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10314

PDF

http://arxiv.org/pdf/1901.10314
Read All
MultiLock: Mobile Active Authentication based on Multiple Biometric and Behavioral Patterns

2019-01-29

Alejandro Acien, Aythami Morales, Ruben Vera-Rodriguez, Julian Fierrez

arXiv_CV

arXiv_CV
Abstract

In this paper we evaluate mobile active authentication based on an ensemble of biometrics and behavior-based profiling signals. We consider seven different data channels and their combination. Touch dynamics (touch gestures and keystroking), accelerometer, gyroscope, WiFi, GPS location and app usage are all collected during human-mobile interaction to authenticate the users. We evaluate two approaches: one-time authentication and active authentication. In one-time authentication, we employ the information of all channels available during one session. For active authentication we take advantage of mobile user behavior across multiple sessions by updating a confidence value of the authentication score. Our experiments are conducted on the semi-uncontrolled UMDAA-02 database. This database comprises smartphone sensor signals acquired during natural human-mobile interaction. Our results show that different traits can be complementary and multimodal systems clearly increase the performance with accuracies ranging from 82.2% to 97.1% depending on the authentication scenario.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10312

PDF

http://arxiv.org/pdf/1901.10312
Read All
Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System

2019-01-29

Muhammad Ammad-ud-din, Elena Ivannikova, Suleiman A. Khan, Were Oyomno, Qiang Fu, Kuan Eeik Tan, Adrian Flanagan

arXiv_AI

arXiv_AI Inference Recommendation
Abstract

The increasing interest in user privacy is leading to new privacy preserving machine learning paradigms. In the Federated Learning paradigm, a master machine learning model is distributed to user clients, the clients use their locally stored data and model for both inference and calculating model updates. The model updates are sent back and aggregated on the server to update the master model then redistributed to the clients. In this paradigm, the user data never leaves the client, greatly enhancing the user’ privacy, in contrast to the traditional paradigm of collecting, storing and processing user data on a backend server beyond the user’s control. In this paper we introduce, as far as we are aware, the first federated implementation of a Collaborative Filter. The federated updates to the model are based on a stochastic gradient approach. As a classical case study in machine learning, we explore a personalized recommendation system based on users’ implicit feedback and demonstrate the method’s applicability to both the MovieLens and an in-house dataset. Empirical validation confirms a collaborative filter can be federated without a loss of accuracy compared to a standard implementation, hence enhancing the user’s privacy in a widely used recommender application while maintaining recommender performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09888

PDF

http://arxiv.org/pdf/1901.09888
Read All
Self-Supervised Deep Image Denoising

2019-01-29

Samuli Laine, Jaakko Lehtinen, Timo Aila

arXiv_CV

arXiv_CV CNN Deep_Learning Prediction
Abstract

We describe techniques for training high-quality image denoising models that require only single instances of corrupted images as training data. Inspired by a recent technique that removes the need for supervision through image pairs by employing networks with a “blind spot” in the receptive field, we address two of its shortcomings: inefficient training and somewhat disappointing final denoising performance. This is achieved through a novel blind-spot convolutional network architecture that allows efficient self-supervised training, as well as application of Bayesian distribution prediction on output colors. Together, they bring the self-supervised model on par with fully supervised deep learning techniques in terms of both quality and training speed in the case of i.i.d. Gaussian noise.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10277

PDF

http://arxiv.org/pdf/1901.10277
Read All
Automated Prototype for Asteroids Detection

2019-01-29

D. Copandean, O. Vaduvescu, D. Gorgan

arXiv_CV

arXiv_CV Survey Detection
Abstract

Near Earth Asteroids (NEAs) are discovered daily, mainly by few major surveys, nevertheless many of them remain unobserved for years, even decades. Even so, there is room for new discoveries, including those submitted by smaller projects and amateur astronomers. Besides the well-known surveys that have their own automated system of asteroid detection, there are only a few software solutions designed to help amateurs and mini-surveys in NEAs discovery. Some of these obtain their results based on the blink method in which a set of reduced images are shown one after another and the astronomer has to visually detect real moving objects in a series of images. This technique becomes harder with the increase in size of the CCD cameras. Aiming to replace manual detection we propose an automated pipeline prototype for asteroids detection, written in Python under Linux, which calls some 3rd party astrophysics libraries.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10469

PDF

http://arxiv.org/pdf/1901.10469
Read All
Multi-UAV Visual Coverage of Partially Known 3D Surfaces: Voronoi-based Initialization to Improve Local Optimizers

2019-01-29

Alessandro Renzaglia, Jilles Dibangoye, Vincent Le Doze, Olivier Simon

arXiv_RO

arXiv_RO Sparse Knowledge Face Optimization
Abstract

In this paper we study the problem of steering a team of Unmanned Aerial Vehicles (UAVs) toward a static configuration which maximizes the visibility of a 3D environment. The UAVs are assumed to be equipped with visual sensors constrained by a maximum sensing range and the prior knowledge on the environment is considered to be very sparse. To solve this problem on-line, derivative-free measurement-based optimization algorithms can be adopted, even though they are strongly limited by local optimality. To overcome this limitation, we propose to exploit the partial initial knowledge on the environment to find suitable initial configurations from which the agents start the local optimization. In particular, a constrained centroidal Voronoi tessellation on a coarse approximation of the surface to cover is proposed. The behavior of the agent is so based on a two-step optimization approach, where a stochastic optimization algorithm based on the on-line acquired information follows the geometrical-based initialization. The algorithm performance is evaluated in simulation and in particular the improvement on the solution brought by the Voronoi tessellation with respect to different initializations is analyzed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10272

PDF

http://arxiv.org/pdf/1901.10272
Read All
Combined tract segmentation and orientation mapping for bundle-specific tractography

2019-01-29

Jakob Wasserthal, Peter Neher, Dusan Hirjak, Klaus H. Maier-Hein

arXiv_CV

arXiv_CV Knowledge Segmentation Tracking
Abstract

While the major white matter tracts are of great interest to numerous studies in neuroscience and medicine, their manual dissection in larger cohorts from diffusion MRI tractograms is time-consuming, requires expert knowledge and is hard to reproduce. In previous work we presented tract orientation mapping (TOM) as a novel concept for bundle-specific tractography. It is based on a learned mapping from the original fiber orientation distribution function (fODF) peaks to tract specific peaks, called tract orientation maps. Each tract orientation map represents the voxel-wise principal orientation of one tract.Here, we present an extension of this approach that combines TOM with accurate segmentations of the tract outline and its start and end region. We also introduce a custom probabilistic tracking algorithm that samples from a Gaussian distribution with fixed standard deviation centered on each peak thus enabling more complete trackings on the tract orientation maps than deterministic tracking. These extensions enable the automatic creation of bundle-specific tractograms with previously unseen accuracy. We show for 72 different bundles on high quality, low quality and phantom data that our approach runs faster and produces more accurate bundle-specific tractograms than 7 state of the art benchmark methods while avoiding cumbersome processing steps like whole brain tractography, non-linear registration, clustering or manual dissection. Moreover, we show on 17 datasets that our approach generalizes well to datasets acquired with different scanners and settings as well as with pathologies. The code of our method is openly available at www.github.com/MIC-DKFZ/TractSeg.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10271

PDF

http://arxiv.org/pdf/1901.10271
Read All
Implicit Diversity in Image Summarization

2019-01-29

L. Elisa Celis, Vijay Keswani

arXiv_CV

arXiv_CV Summarization CNN
Abstract

Case studies, such as Kay et al., 2015 have shown that in image summarization, such as with Google Image Search, the people in the results presented for occupations are more imbalanced with respect to sensitive attributes such as gender and ethnicity than the ground truth. Most of the existing approaches to correct for this problem in image summarization assume that the images are labelled and use the labels for training the model and correcting for biases. However, these labels may not always be present. Furthermore, it is often not possible (nor even desirable) to automatically classify images by sensitive attributes such as gender or race. Moreover, balancing according to the labels does not guarantee that the diversity will be visibly apparent - arguably the only metric that matters when selecting diverse images. We develop a novel approach that takes as input a visibly diverse control set of images and uses this set to produce images in response to a query which is similarly visibly diverse. We implement this approach using pre-trained and modified Convolutional Neural Networks like VGG-16, and evaluate our approach empirically on the Image dataset compiled and used by Kay et al., 2015. We compare our results with the Google Image Search results from Kay et al., 2015 and natural baselines and observe that our algorithm produces images that are accurate with respect to their similarity to the query images (on par with that of the Google Image Search results), but significantly outperforms with respect to visible diversity as measured by their similarity to our diverse control set.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10265

PDF

http://arxiv.org/pdf/1901.10265
Read All
TiFi: Taxonomy Induction for Fictional Domains Extended version

2019-01-29

Cuong Xuan Chu, Simon Razniewski, Gerhard Weikum

arXiv_AI

arXiv_AI Knowledge Attention Relation
Abstract

Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10263

PDF

http://arxiv.org/pdf/1901.10263
Read All
Learning for Multi-Model and Multi-Type Fitting

2019-01-29

Xun Xu, Loong-Fah Cheong, Zhuwen Li

arXiv_CV

arXiv_CV Embedding Inference
Abstract

Multi-model fitting has been extensively studied from the random sampling and clustering perspectives. Most assume that only a single type/class of model is present and their generalizations to fitting multiple types of models/structures simultaneously are non-trivial. The inherent challenges include choice of types and numbers of models, sampling imbalance and parameter tuning, all of which render conventional approaches ineffective. In this work, we formulate the multi-model multi-type fitting problem as one of learning deep feature embedding that is clustering-friendly. In other words, points of the same clusters are embedded closer together through the network. For inference, we apply K-means to cluster the data in the embedded feature space and model selection is enabled by analyzing the K-means residuals. Experiments are carried out on both synthetic and real world multi-type fitting datasets, producing state-of-the-art results. Comparisons are also made on single-type multi-model fitting tasks with promising results as well.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10254

PDF

http://arxiv.org/pdf/1901.10254
Read All
Explicit topological priors for deep-learning based image segmentation using persistent homology

2019-01-29

James R. Clough, Ilkay Oksuz, Nicholas Byrne, Julia A. Schnabel, Andrew P. King

arXiv_CV

arXiv_CV Knowledge Segmentation Deep_Learning
Abstract

We present a novel method to explicitly incorporate topological prior knowledge into deep learning based segmentation, which is, to our knowledge, the first work to do so. Our method uses the concept of persistent homology, a tool from topological data analysis, to capture high-level topological characteristics of segmentation results in a way which is differentiable with respect to the pixelwise probability of being assigned to a given class. The topological prior knowledge consists of the sequence of desired Betti numbers of the segmentation. As a proof-of-concept we demonstrate our approach by applying it to the problem of left-ventricle segmentation of cardiac MR images of 500 subjects from the UK Biobank dataset, where we show that it improves segmentation performance in terms of topological correctness without sacrificing pixelwise accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10244

PDF

http://arxiv.org/pdf/1901.10244
Read All
Applying Visual Domain Style Transfer and Texture Synthesis Techniques to Audio - Insights and Challenges

2019-01-29

M. Huzaifah, L. Wyse

arXiv_SD

arXiv_SD Style_Transfer Deep_Learning
Abstract

Style transfer is a technique for combining two images based on the activations and feature statistics in a deep learning neural network architecture. This paper studies the analogous task in the audio domain and takes a critical look at the problems that arise when adapting the original vision-based framework to handle spectrogram representations. We conclude that CNN architectures with features based on 2D representations and convolutions are better suited for visual images than for time-frequency representations of audio. Despite the awkward fit, experiments show that the Gram matrix determined “style” for audio is more closely aligned with timbral signatures without temporal structure whereas network layer activity determining audio “content” seems to capture more of the pitch and rhythmic structures. We shed insight on several reasons for the domain differences with illustrative examples. We motivate the use of several types of one-dimensional CNNs that generate results that are better aligned with intuitive notions of audio texture than those based on existing architectures built for images. These ideas also prompt an exploration of audio texture synthesis with architectural variants for extensions to infinite textures, multi-textures, parametric control of receptive fields and the constant-Q transform as an alternative frequency scaling for the spectrogram.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10240

PDF

http://arxiv.org/pdf/1901.10240
Read All
Automatic Whole-body Bone Age Assessment Using Deep Hierarchical Features

2019-01-29

Hai-Duong Nguyen, Soo-Hyung Kim

arXiv_CV

arXiv_CV CNN
Abstract

Bone age assessment gives us evidence to analyze the children growth status and the rejuvenation involved chronological and biological ages. All the previous works consider left-hand X-ray image of a child in their works. In this paper, we carry out a study on estimating human age using whole-body bone CT images and a novel convolutional neural network. Our model with additional connections shows an effective way to generate a massive number of vital features while reducing overfitting influence on small training data in the medical image analysis research area. A dataset and a comparison with common deep architectures will be provided for future research in this field.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10237

PDF

http://arxiv.org/pdf/1901.10237
Read All
Reconstruction of 3D Porous Media From 2D Slices

2019-01-29

Denis Volkhonskiy, Ekaterina Muravleva, Oleg Sudakov, Denis Orlov, Boris Belozerov, Evgeny Burnaev, Dmitry Koroteev

arXiv_CV

arXiv_CV Deep_Learning
Abstract

We propose a novel deep learning architecture for three-dimensional porous media structure reconstruction from two-dimensional slices. A high-level idea is that we fit a distribution on all possible three-dimensional structures of a specific type based on the given dataset of samples. Then, given partial information (central slices) we recover the three-dimensional structure that is built around such slices. Technically, it is implemented as a deep neural network with encoder, generator and discriminator modules. Numerical experiments show that this method gives a good reconstruction in terms of Minkowski functionals.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10233

PDF

http://arxiv.org/pdf/1901.10233
Read All
Detection of Alzheimers Disease from MRI using Convolutional Neural Networks, Exploring Transfer Learning And BellCNN

2019-01-29

GuruRaj Awate

arXiv_CV

arXiv_CV CNN Transfer_Learning Detection Recognition
Abstract

There is a need for automatic diagnosis of certain diseases from medical images that could help medical practitioners for further assessment towards treating the illness. Alzheimers disease is a good example of a disease that is often misdiagnosed. Alzheimers disease (Hear after referred to as AD), is caused by atrophy of certain brain regions and by brain cell death and is the leading cause of dementia and memory loss [1]. MRI scans reveal this information but atrophied regions are different for different individuals which makes the diagnosis a bit more trickier and often gets misdiagnosed [1, 13]. We believe that our approach to this particular problem would improve the assessment quality by pre-flagging the images which are more likely to have AD. We propose two solutions to this; one with transfer learning [9] and other by BellCNN [14], a custom made Convolutional Neural Network (Hear after referred to as CNN). Advantages and disadvantages of each approach will also be discussed in their respective sections. The dataset used for this project is provided by Open Access Series of Imaging Studies (Hear after referred to as OASIS) [2, 3, 4], which contains over 400 subjects, 100 of whom have mild to severe dementia. The dataset has labeled these subjects by two standards of diagnosis; MiniMental State Examination (Hear after referred to as MMSE) and Clinical Dementia Rating (Hear after referred to as CDR). These are some of the general tools and concepts which are prerequisites to our solution; CNN [5, 6], Neural Networks [10] (Hear after referred to as NN), Anaconda bundle for python, Regression, Tensorflow [7]. Keywords: Alzheimers Disease, Convolutional Neural Network, BellCNN, Image Recognition, Machine Learning, MRI, OASIS, Tensorflow

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10231

PDF

http://arxiv.org/pdf/1901.10231
Read All
Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task

2019-01-29

Ming-Siang Huang, Po-Ting Lai, Richard Tzong-Han Tsai, Wen-Lian Hsu

arXiv_CL

arXiv_CL Relation_Extraction Attention Relation Recognition
Abstract

The advancement of biomedical named entity recognition (BNER) and biomedical relation extraction (BRE) researches promotes the development of text mining in biological domains. As a cornerstone of BRE, robust BNER system is required to identify the mentioned NEs in plain texts for further relation extraction stage. However, the current BNER corpora, which play important roles in these tasks, paid less attention to achieve the criteria for BRE task. In this study, we present Revised JNLPBA corpus, the revision of JNLPBA corpus, to broaden the applicability of a NER corpus from BNER to BRE task. We preserve the original entity types including protein, DNA, RNA, cell line and cell type while all the abstracts in JNLPBA corpus are manually curated by domain experts again basis on the new annotation guideline focusing on the specific NEs instead of general terms. Simultaneously, several imperfection issues in JNLPBA are pointed out and made up in the new corpus. To compare the adaptability of different NER systems in Revised JNLPBA and JNLPBA corpora, the F1-measure was measured in three open sources NER systems including BANNER, Gimli and NERSuite. In the same circumstance, all the systems perform average 10% better in Revised JNLPBA than in JNLPBA. Moreover, the cross-validation test is carried out which we train the NER systems on JNLPBA/Revised JNLPBA corpora and access the performance in both protein-protein interaction extraction (PPIE) and biomedical event extraction (BEE) corpora to confirm that the newly refined Revised JNLPBA is a competent NER corpus in biomedical relation application. The revised JNLPBA corpus is freely available at iasl-btm.iis.sinica.edu.tw/BNER/Content/Revised_JNLPBA.zip.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10219

PDF

http://arxiv.org/pdf/1901.10219
Read All
A Push-Pull Layer Improves Robustness of Convolutional Neural Networks

2019-01-29

Nicola Strisciuglio, Manuel Lopez-Antequera, Nicolai Petkov

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

We propose a new layer in Convolutional Neural Networks (CNNs) to increase their robustness to several types of noise perturbations of the input images. We call this a push-pull layer and compute its response as the combination of two half-wave rectified convolutions, with kernels of opposite polarity. It is based on a biologically-motivated non-linear model of certain neurons in the visual system that exhibit a response suppression phenomenon, known as push-pull inhibition. We validate our method by substituting the first convolutional layer of the LeNet-5 and WideResNet architectures with our push-pull layer. We train the networks on nonperturbed training images from the MNIST, CIFAR-10 and CIFAR-100 data sets, and test on images perturbed by noise that is unseen by the training process. We demonstrate that our push-pull layers contribute to a considerable improvement in robustness of classification of images perturbed by noise, while maintaining state-of-the-art performance on the original image classification task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10208

PDF

http://arxiv.org/pdf/1901.10208
Read All
Divide and Generate: Neural Generation of Complex Sentences

2019-01-29

Tomoya Ogata, Mamoru Komachi, Tomoya Takatani

arXiv_CL

arXiv_CL
Abstract

We propose a task to generate a complex sentence from a simple sentence in order to amplify various kinds of responses in the database. We first divide a complex sentence into a main clause and a subordinate clause to learn a generator model of modifiers, and then use the model to generate a modifier clause to create a complex sentence from a simple sentence. We present an automatic evaluation metric to estimate the quality of the models and show that a pipeline model outperforms an end-to-end model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10196

PDF

http://arxiv.org/pdf/1901.10196
Read All
An Arabic Dependency Treebank in the Travel Domain

2019-01-29

Dima Taji, Jamila El Gizuli, Nizar Habash

arXiv_CL

arXiv_CL
Abstract

In this paper we present a dependency treebank of travel domain sentences in Modern Standard Arabic. The text comes from a translation of the English equivalent sentences in the Basic Traveling Expressions Corpus. The treebank dependency representation is in the style of the Columbia Arabic Treebank. The paper motivates the effort and discusses the construction process and guidelines. We also present parsing results and discuss the effect of domain and genre difference on parsing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10188

PDF

http://arxiv.org/pdf/1901.10188
Read All
Iterative Learning Control for Fast and Accurate Position Tracking with a Soft Robotic Arm

2019-01-29

Matthias Hofer, Lukas Spannagl, Raffaello D'Andrea

arXiv_RO

arXiv_RO Tracking
Abstract

This paper presents an iterative learning control scheme to improve the position tracking performance for a soft robotic arm during aggressive maneuvers. Two antagonistically arranged, inflatable bellows actuate the robotic arm and provide high compliance while enabling fast actuation. The pressure dynamics of the actuator are derived from first principles and a model of the arm dynamics is determined from system identification. A norm optimal iterative learning control scheme is presented and applied in parallel with a feedback controller. The learning scheme provides monotonic convergence guarantees for the tracking error and is experimentally evaluated on an aggressive trajectory involving set point shifts of 60 degrees within 0.2 seconds. The effectiveness of the learning approach is demonstrated by a reduction of the root-mean-square tracking error from 14 degrees to less than 2 degrees after applying the learning scheme for less than 20 iterations. Finally a method to reduce the sensitivity of the learning approach to non-repetitive disturbances is presented.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10187

PDF

http://arxiv.org/pdf/1901.10187
Read All
Generative Adversarial Networks for geometric surfaces prediction in injection molding

2019-01-29

Pierre Nagorny (SYMME), Thomas Lacombe (SYMME), Hugues Favreliere (SYMME), Maurice Pillet (SYMME), Eric Pairel (SYMME), Ronan Le Goff (IPC), Marlene Wali (IPC), Jerome Loureaux (IPC), Patrice Kiener

arXiv_CV

arXiv_CV Adversarial GAN Face Prediction
Abstract

Geometrical and appearance quality requirements set the limits of the current industrial performance in injection molding. To guarantee the product’s quality, it is necessary to adjust the process settings in a closed loop. Those adjustments cannot rely on the final quality because a part takes days to be geometrically stable. Thus, the final part geometry must be predicted from measurements on hot parts. In this paper, we use recent success of Generative Adversarial Networks (GAN) with the pix2pix network architecture to predict the final part geometry, using only hot parts thermographic images, measured right after production. Our dataset is really small, and the GAN learns to translate thermography to geometry. We firstly study prediction performances using different image similarity comparison algorithms. Moreover, we introduce the innovative use of Discrete Modal Decomposition (DMD) to analyze network predictions. The DMD is a geometrical parameterization technique using a modal space projection to geometrically describe surfaces. We study GAN performances to retrieve geometrical parameterization of surfaces.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10178

PDF

http://arxiv.org/pdf/1901.10178
Read All
Unsupervised Person Re-identification by Deep Asymmetric Metric Embedding

2019-01-29

Hong-Xing Yu, Ancong Wu, Wei-Shi Zheng

arXiv_CV

arXiv_CV Re-identification Person_Re-identification Embedding
Abstract

Person re-identification (Re-ID) aims to match identities across non-overlapping camera views. Researchers have proposed many supervised Re-ID models which require quantities of cross-view pairwise labelled data. This limits their scalabilities to many applications where a large amount of data from multiple disjoint camera views is available but unlabelled. Although some unsupervised Re-ID models have been proposed to address the scalability problem, they often suffer from the view-specific bias problem which is caused by dramatic variances across different camera views, e.g., different illumination, viewpoints and occlusion. The dramatic variances induce specific feature distortions in different camera views, which can be very disturbing in finding cross-view discriminative information for Re-ID in the unsupervised scenarios, since no label information is available to help alleviate the bias. We propose to explicitly address this problem by learning an unsupervised asymmetric distance metric based on cross-view clustering. The asymmetric distance metric allows specific feature transformations for each camera view to tackle the specific feature distortions. We then design a novel unsupervised loss function to embed the asymmetric metric into a deep neural network, and therefore develop a novel unsupervised deep framework named the DEep Clustering-based Asymmetric MEtric Learning (DECAMEL). In such a way, DECAMEL jointly learns the feature representation and the unsupervised asymmetric metric. DECAMEL learns a compact cross-view cluster structure of Re-ID data, and thus help alleviate the view-specific bias and facilitate mining the potential cross-view discriminative information for unsupervised Re-ID. Extensive experiments on seven benchmark datasets whose sizes span several orders show the effectiveness of our framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10177

PDF

http://arxiv.org/pdf/1901.10177
Read All
Two-Stream Multi-Task Network for Fashion Recognition

2019-01-29

Peizhao Li, Yanjing Li, Xiaolong Jiang, Xiantong Zhen

arXiv_CV

arXiv_CV Knowledge CNN Classification Detection Recognition
Abstract

In this paper, we present a two-stream multi-task network for fashion recognition. This task is challenging as fashion clothing always contain multiple attributes, which need to be predicted simultaneously for real-time industrial systems. To handle these challenges, we formulate fashion recognition into a multi-task learning problem, including landmark detection, category and attribute classifications, and solve it with the proposed deep convolutional neural network. We design two knowledge sharing strategies which enable information transfer between tasks and improve the overall performance. The proposed model achieves state-of-the-art results on large-scale fashion dataset comparing to the existing methods, which demonstrates its great effectiveness and superiority for fashion recognition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10172

PDF

http://arxiv.org/pdf/1901.10172
Read All
Mask-RCNN and U-net Ensembled for Nuclei Segmentation

2019-01-29

Aarno Oskar Vuola, Saad Ullah Akram, Juho Kannala

arXiv_CV

arXiv_CV Knowledge Segmentation CNN Prediction
Abstract

Nuclei segmentation is both an important and in some ways ideal task for modern computer vision methods, e.g. convolutional neural networks. While recent developments in theory and open-source software have made these tools easier to implement, expert knowledge is still required to choose the right model architecture and training setup. We compare two popular segmentation frameworks, U-Net and Mask-RCNN in the nuclei segmentation task and find that they have different strengths and failures. To get the best of both worlds, we develop an ensemble model to combine their predictions that can outperform both models by a significant margin and should be considered when aiming for best nuclei segmentation performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10170

PDF

http://arxiv.org/pdf/1901.10170
Read All
Visual Rhythm Prediction with Feature-Aligning Network

2019-01-29

Yutong Xie, Haiyang Wang, Yan Hao, Zihao Xu

arXiv_CV

arXiv_CV RNN Prediction
Abstract

In this paper, we propose a data-driven visual rhythm prediction method, which overcomes the previous works’ deficiency that predictions are made primarily by human-crafted hard rules. In our approach, we first extract features including original frames and their residuals, optical flow, scene change, and body pose. These visual features will be next taken into an end-to-end neural network as inputs. Here we observe that there are some slight misaligning between features over the timeline and assume that this is due to the distinctions between how different features are computed. To solve this problem, the extracted features are aligned by an elaborately designed layer, which can also be applied to other models suffering from mismatched features, and boost performance. Then these aligned features are fed into sequence labeling layers implemented with BiLSTM and CRF to predict the onsets. Due to the lack of existing public training and evaluation set, we experiment on a dataset constructed by ourselves based on professionally edited Music Videos (MVs), and the F1 score of our approach reaches 79.6.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10163

PDF

http://arxiv.org/pdf/1901.10163
Read All
Validation loss for landmark detection

2019-01-29

Wolfgang Fuhl, Thomas Kübler, Rene Alexander Lotz, Gjergji Kasneci, Wolfgang Rosenstiel, Enkelejda Kasneci

arXiv_CV

arXiv_CV Pose_Estimation CNN Detection Relation
Abstract

We present a new loss function for the validation of image landmarks detected via Convolutional Neural Networks (CNNs). The network learns to estimate how accurate its landmark estimation is. This loss function is applicable to all regression-based location estimations and allows exclusion of unreliable landmarks from further processing. In addition, we formulate a novel batch balancing approach which weights the importance of samples based on their produced loss. This is done by computing a probability distribution mapping on an interval from which samples can be selected using a uniform random selection scheme. We conducted several experiments on the 300W facial landmark data. In the first experiment, the influence of our batch balancing approach is evaluated by comparing it against uniform sampling. Afterwards, we compare two networks with the state of the art and demonstrate the usage and practical importance of our landmark validation signal. The effectiveness of our validation signal is further confirmed by a correlation analysis over all landmarks. Finally, we show a study on head pose estimation of truck drivers on German highways and compare our network to a commercial multi-camera system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10143

PDF

http://arxiv.org/pdf/1901.10143
Read All
Dynamic Manipulation of Flexible Objects with Torque Sequence Using a Deep Neural Network

2019-01-29

Kento Kawaharazuka, Toru Ogawa, Juntaro Tamura, Cota Nabeshima

arXiv_RO

arXiv_RO
Abstract

For dynamic manipulation of flexible objects, we propose an acquisition method of a flexible object motion equation model using a deep neural network and a control method to realize a target state by calculating an optimized time-series joint torque command. By using the proposed method, any physics model of a target object is not needed, and the object can be controlled as intended. We applied this method to manipulations of a rigid object, a flexible object with and without environmental contact, and a cloth, and verified its effectiveness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10142

PDF

http://arxiv.org/pdf/1901.10142
Read All
Harnessing GANs for Addition of New Classes in VSR

2019-01-29

Yaman Kumar, Shubham Maheshwari, Dhruva Sahrawat, Praveen Jhanwar, Vipin Chaudhary, Rajiv Ratn Shah, Debanjan Mahata

arXiv_CV

arXiv_CV GAN Classification
Abstract

It is an easy task for humans to learn and generalize a problem, perhaps it is due to their ability to visualize and imagine unseen objects and concepts. The power of imagination comes handy especially when interpolating learnt experience (like seen examples) over new classes of a problem. For a machine learning system, acquiring such powers of imagination are still a hard task. We present a novel approach to low-shot learning that uses the idea of imagination over unseen classes in a classification problem setting. We combine a classifier with a `visionary’ (i.e., a GAN model) that teaches the classifier to generalize itself over new and unseen classes. This approach can be incorporated into a variety of problem settings where we need a classifier to learn and generalize itself to new and unseen classes. We compare the performance of classifiers with and without the visionary GAN model helping them.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.10139

PDF

https://arxiv.org/pdf/1901.10139
Read All
Attention-based Context Aggregation Network for Monocular Depth Estimation

2019-01-29

Yuru Chen, Haitao Zhao, Zhengwei Hu

arXiv_CV

arXiv_CV Attention CNN Inference Classification
Abstract

Depth estimation is a traditional computer vision task, which plays a crucial role in understanding 3D scene geometry. Recently, deep-convolutional-neural-networks based methods have achieved promising results in the monocular depth estimation field. Specifically, the framework that combines the multi-scale features extracted by the dilated convolution based block (atrous spatial pyramid pooling, ASPP) has gained the significant improvement in the dense labeling task. However, the discretized and predefined dilation rates cannot capture the continuous context information that differs in diverse scenes and easily introduce the grid artifacts in depth estimation. In this paper, we propose an attention-based context aggregation network (ACAN) to tackle these difficulties. Based on the self-attention model, ACAN adaptively learns the task-specific similarities between pixels to model the context information. First, we recast the monocular depth estimation as a dense labeling multi-class classification problem. Then we propose a soft ordinal inference to transform the predicted probabilities to continuous depth values, which can reduce the discretization error (about 1% decrease in RMSE). Second, the proposed ACAN aggregates both the image-level and pixel-level context information for depth estimation, where the former expresses the statistical characteristic of the whole image and the latter extracts the long-range spatial dependencies for each pixel. Third, for further reducing the inconsistency between the RGB image and depth map, we construct an attention loss to minimize their information entropy. We evaluate on public monocular depth-estimation benchmark datasets (including NYU Depth V2, KITTI). The experiments demonstrate the superiority of our proposed ACAN and achieve the competitive results with the state of the arts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10137

PDF

http://arxiv.org/pdf/1901.10137
Read All
Structuring an unordered text document

2019-01-29

Shashank Yadav, Tejas Shimpi, C. Ravindranath Chowdary, Prashant Sharma, Deepansh Agrawal, Shivang Agarwal

arXiv_CL

arXiv_CL Summarization
Abstract

Segmenting an unordered text document into different sections is a very useful task in many text processing applications like multiple document summarization, question answering, etc. This paper proposes structuring of an unordered text document based on the keywords in the document. We test our approach on Wikipedia documents using both statistical and predictive methods such as the TextRank algorithm and Google’s USE (Universal Sentence Encoder). From our experimental results, we show that the proposed model can effectively structure an unordered document into sections.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10133

PDF

http://arxiv.org/pdf/1901.10133
Read All
BioBERT: a pre-trained biomedical language representation model for biomedical text mining

2019-01-29

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang

arXiv_CL

arXiv_CL Knowledge Relation_Extraction Deep_Learning Relation Recognition
Abstract

Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in machine learning, extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, as deep learning models require a large amount of training data, applying deep learning to biomedical text mining is often unsuccessful due to the lack of training data in biomedical fields. Recent researches on training contextualized language representation models on text corpora shed light on the possibility of leveraging a large number of unannotated biomedical text corpora. We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain specific language representation model pre-trained on large-scale biomedical corpora. Based on the BERT architecture, BioBERT effectively transfers the knowledge from a large amount of biomedical texts to biomedical text mining models with minimal task-specific architecture modifications. While BERT shows competitive performances with previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (1.86% absolute improvement), biomedical relation extraction (3.33% absolute improvement), and biomedical question answering (9.61% absolute improvement). We make the pre-trained weights of BioBERT freely available at this https URL, and the source code for fine-tuning BioBERT available at this https URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.08746

PDF

https://arxiv.org/pdf/1901.08746
Read All
Glyce: Glyph-vectors for Chinese Character Representations

2019-01-29

Wei Wu, Yuxian Meng, Qinghong Han, Muyu Li, Xiaoya Li, Jie Mei, Ping Nie, Xiaofei Sun, Jiwei Li

arXiv_AI

arXiv_AI Sentiment Segmentation Classification Language_Model Recognition
Abstract

It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found. In this paper, we address this gap by presenting the Glyce, the glyph-vectors for Chinese character representations. We make three major innovations: (1) We use historical Chinese scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to enrich the pictographic evidence in characters; (2) We design CNN structures tailored to Chinese character image processing; and (3) We use image-classification as an auxiliary task in a multi-task learning setup to increase the model’s ability to generalize. For the first time, we show that glyph-based models are able to consistently outperform word/char ID-based models in a wide range of Chinese NLP tasks. Using Glyce, we are able to achieve the state-of-the-art performances on 13 (almost all) Chinese NLP tasks, including (1) character-Level language modeling, (2) word-Level language modeling, (3) Chinese word segmentation, (4) name entity recognition, (5) part-of-speech tagging, (6) dependency parsing, (7) semantic role labeling, (8) sentence semantic similarity, (9) sentence intention identification, (10) Chinese-English machine translation, (11) sentiment analysis, (12) document classification and (13) discourse parsing

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10125

PDF

http://arxiv.org/pdf/1901.10125
Read All
Adversarial Adaptation of Scene Graph Models for Understanding Civic Issues

2019-01-29

Shanu Kumar, Shubham Atreja, Anjali Singh, Mohit Jain

arXiv_AI

arXiv_AI Adversarial Relation
Abstract

Citizen engagement and technology usage are two emerging trends driven by smart city initiatives. Governments around the world are adopting technology for faster resolution of civic issues. Typically, citizens report issues, such as broken roads, garbage dumps, etc. through web portals and mobile apps, in order for the government authorities to take appropriate actions. Several mediums – text, image, audio, video – are used to report these issues. Through a user study with 13 citizens and 3 authorities, we found that image is the most preferred medium to report civic issues. However, analyzing civic issue related images is challenging for the authorities as it requires manual effort. Moreover, previous works have been limited to identifying a specific set of issues from images. In this work, given an image, we propose to generate a Civic Issue Graph consisting of a set of objects and the semantic relations between them, which are representative of the underlying civic issue. We also release two multi-modal (text and images) datasets, that can help in further analysis of civic issues from images. We present a novel approach for adversarial training of existing scene graph models that enables the use of scene graphs for new applications in the absence of any labelled training data. We conduct several experiments to analyze the efficacy of our approach, and using human evaluation, we establish the appropriateness of our model at representing different civic issues.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10124

PDF

http://arxiv.org/pdf/1901.10124
Read All
Optimising Clifford Circuits with Quantomatic

2019-01-29

Andrew Fagan, Ross Duncan

arXiv_AI

arXiv_AI
Abstract

We present a system of equations between Clifford circuits, all derivable in the ZX-calculus, and formalised as rewrite rules in the Quantomatic proof assistant. By combining these rules with some non-trivial simplification procedures defined in the Quantomatic tactic language, we demonstrate the use of Quantomatic as a circuit optimisation tool. We prove that the system always reduces Clifford circuits of one or two qubits to their minimal form, and give numerical results demonstrating its performance on larger Clifford circuits.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10114

PDF

http://arxiv.org/pdf/1901.10114
Read All
Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification

2019-01-29

Hao Ren, Jianlin Su, Hong Lu

arXiv_CV

arXiv_CV CNN Image_Classification Classification Quantitative
Abstract

Image classification is a challenging problem which aims to identify the category of object in the image. In recent years, deep Convolutional Neural Networks (CNNs) have been applied to handle this task, and impressive improvement has been achieved. However, some research showed the output of CNNs can be easily altered by adding relatively small perturbations to the input image, such as modifying few pixels. Recently, Capsule Networks (CapsNets) are proposed, which can help eliminating this limitation. Experiments on MNIST dataset revealed that capsules can better characterize the features of object than CNNs. But it’s hard to find a suitable quantitative method to compare the generalization ability of CNNs and CapsNets. In this paper, we propose a new image classification task called Top-2 classification to evaluate the generalization ability of CNNs and CapsNets. The models are trained on single label image samples same as the traditional image classification task. But in the test stage, we randomly concatenate two test image samples which contain different labels, and then use the trained models to predict the top-2 labels on the unseen newly-created two label image samples. This task can provide us precise quantitative results to compare the generalization ability of CNNs and CapsNets. Back to the CapsNet, because it uses Full Connectivity (FC) mechanism among all capsules, it requires many parameters. To reduce the number of parameters, we introduce the Parameter-Sharing (PS) mechanism between capsules. Experiments on five widely used benchmark image datasets demonstrate the method significantly reduces the number of parameters, without losing the effectiveness of extracting features. Further, on the Top-2 classification task, the proposed PS CapsNets obtain impressive higher accuracy compared to the traditional CNNs and FC CapsNets by a large margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10112

PDF

http://arxiv.org/pdf/1901.10112
Read All
Learning Backpropagation-Free Deep Architectures with Kernels

2019-01-29

Shiyu Duan, Shujian Yu, Yunmei Chen, Jose Principe

arXiv_AI

arXiv_AI Classification
Abstract

One can substitute each neuron in any neural network with a kernel machine and obtain a counterpart powered by kernel machines. The new network inherits the expressive power and architecture of the original but works in a more intuitive way since each node enjoys the simple interpretation as a hyperplane (in a reproducing kernel Hilbert space). Further, using the kernel multilayer perceptron as an example, we prove that in classification and under certain losses, an optimal representation that minimizes the risk of the network can be characterized for each hidden layer. This result removes the need of backpropagation in learning the model and can be generalized to any feedforward kernel network. Moreover, unlike backpropagation, which turns models into black boxes, the optimal hidden representation enjoys an intuitive geometric interpretation, making the dynamics of learning in a deep kernel network transparent. Empirical results are provided to complement our theory.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.03774

PDF

http://arxiv.org/pdf/1802.03774
Read All
Discovering Underlying Person Structure Pattern with Relative Local Distance for Person Re-identification

2019-01-29

Guangcong Wang, Jianhuang Lai, Zhenyu Xie, Xiaohua Xie

arXiv_CV

arXiv_CV Re-identification Object_Detection Person_Re-identification CNN Represenation_Learning Detection
Abstract

Modeling the underlying person structure for person re-identification (re-ID) is difficult due to diverse deformable poses, changeable camera views and imperfect person detectors. How to exploit underlying person structure information without extra annotations to improve the performance of person re-ID remains largely unexplored. To address this problem, we propose a novel Relative Local Distance (RLD) method that integrates a relative local distance constraint into convolutional neural networks (CNNs) in an end-to-end way. It is the first time that the relative local constraint is proposed to guide the global feature representation learning. Specially, a relative local distance matrix is computed by using feature maps and then regarded as a regularizer to guide CNNs to learn a structure-aware feature representation. With the discovered underlying person structure, the RLD method builds a bridge between the global and local feature representation and thus improves the capacity of feature representation for person re-ID. Furthermore, RLD also significantly accelerates deep network training compared with conventional methods. The experimental results show the effectiveness of RLD on the CUHK03, Market-1501, and DukeMTMC-reID datasets. Code is available at \url{https://github.com/Wanggcong/RLD_codes}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10100

PDF

http://arxiv.org/pdf/1901.10100
Read All
Sparse Least Squares Low Rank Kernel Machines

2019-01-29

Manjing Fang, Di Xu, Xia Hong, Junbin Gao

arXiv_AI

arXiv_AI Sparse Optimization
Abstract

A general framework of least squares support vector machine with low rank kernels, referred to as LR-LSSVM, is introduced in this paper. The special structure of low rank kernels with a controlled model size brings sparsity as well as computational efficiency to the proposed model. Meanwhile, a two-step optimization algorithm with three different criteria is proposed and various experiments are carried out using the example of the so-call robust RBF kernel to validate the model. The experiment results show that the performance of the proposed algorithm is comparable or superior to several existing kernel machines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10098

PDF

http://arxiv.org/pdf/1901.10098
Read All

173/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL