Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Real-time Robot-assisted Ergonomics

2019-01-28

Ali Shafti, Ahmad Ataka, Beatriz Urbistondo Lazpita, Ali Shiva, Helge A. Wurdemann, Kaspar Althoefer

arXiv_RO

arXiv_RO
Abstract

This paper describes a novel approach in human robot interaction driven by ergonomics. With a clear focus on optimising ergonomics, the approach proposed here continuously observes a human user’s posture and by invoking appropriate cooperative robot movements, the user’s posture is, whenever required, brought back to an ergonomic optimum. Effectively, the new protocol optimises the human-robot relative position and orientation as a function of human ergonomics. An RGB-D camera is used to calculate and monitor human joint angles in real-time and to determine the current ergonomics state. A total of 6 main causes of low ergonomic states are identified, leading to 6 universal robot responses to allow the human to return to an optimal ergonomics state. The algorithmic framework identifies these 6 causes and controls the cooperating robot to always adapt the environment (e.g. change the pose of the workpiece) in a way that is ergonomically most comfortable for the interacting user. Hence, human-robot interaction is continuously re-evaluated optimizing ergonomics states. The approach is validated through an experimental study, based on established ergonomic methods and their adaptation for real-time application. The study confirms improved ergonomics using the new approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.06270

PDF

http://arxiv.org/pdf/1805.06270
Read All
Learning to Clean: A GAN Perspective

2019-01-28

Monika Sharma, Abhishek Verma, Lovekesh Vig

arXiv_CV

arXiv_CV Adversarial GAN Recognition
Abstract

In the big data era, the impetus to digitize the vast reservoirs of data trapped in unstructured scanned documents such as invoices, bank documents and courier receipts has gained fresh momentum. The scanning process often results in the introduction of artifacts such as background noise, blur due to camera motion, watermarkings, coffee stains, or faded text. These artifacts pose many readability challenges to current text recognition algorithms and significantly degrade their performance. Existing learning based denoising techniques require a dataset comprising of noisy documents paired with cleaned versions. In such scenarios, a model can be trained to generate clean documents from noisy versions. However, very often in the real world such a paired dataset is not available, and all we have for training our denoising model are unpaired sets of noisy and clean images. This paper explores the use of GANs to generate denoised versions of the noisy documents. In particular, where paired information is available, we formulate the problem as an image-to-image translation task i.e, translating a document from noisy domain ( i.e., background noise, blurred, faded, watermarked ) to a target clean document using Generative Adversarial Networks (GAN). However, in the absence of paired images for training, we employed CycleGAN which is known to learn a mapping between the distributions of the noisy images to the denoised images using unpaired data to achieve image-to-image translation for cleaning the noisy documents. We compare the performance of CycleGAN for document cleaning tasks using unpaired images with a Conditional GAN trained on paired data from the same dataset. Experiments were performed on a public document dataset on which different types of noise were artificially induced, results demonstrate that CycleGAN learns a more robust mapping from the space of noisy to clean documents.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.11382

PDF

http://arxiv.org/pdf/1901.11382
Read All
Automatic Information Extraction from Piping and Instrumentation Diagrams

2019-01-28

Rohit Rahul, Shubham Paliwal, Monika Sharma, Lovekesh Vig

arXiv_CV

arXiv_CV Knowledge Deep_Learning Detection
Abstract

One of the most common modes of representing engineering schematics are Piping and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation, and easy reference to different components of the schematics. There are several challenging vision problems associated with digitizing real world P&ID diagrams. Real world P&IDs come in several different resolutions, and often contain noisy textual information. Extraction of instrumentation information from these diagrams involves accurate detection of symbols that frequently have minute visual differences between them. Identification of pipelines that may converge and diverge at different points in the image is a further cause for concern. Due to these reasons, to the best of our knowledge, no system has been proposed for end-to-end data extraction from P&ID diagrams. However, with the advent of deep learning and the spectacular successes it has achieved in vision, we hypothesized that it is now possible to re-examine this problem armed with the latest deep learning models. To that end, we present a novel pipeline for information extraction from P&ID sheets via a combination of traditional vision techniques and state-of-the-art deep learning models to identify and isolate pipeline codes, pipelines, inlets and outlets, and for detecting symbols. This is followed by association of the detected components with the appropriate pipeline. The extracted pipeline information is used to populate a tree-like data-structure for capturing the structure of the piping schematics. We evaluated proposed method on a real world dataset of P&ID sheets obtained from an oil firm and have obtained promising results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.11383

PDF

http://arxiv.org/pdf/1901.11383
Read All
Enhancing Quality for VVC Compressed Videos by Jointly Exploiting Spatial Details and Temporal Structure

2019-01-28

Xiandong Meng, Xuan Deng, Shuyuan Zhu, Bing Zeng

arXiv_CV

arXiv_CV Prediction
Abstract

In this paper, we propose a quality enhancement network for Versatile Video Coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS). The network consists of a temporal structure prediction subnet and a spatial detail enhancement subnet. The former subnet is used to estimate and compensate the temporal motion across frames, and the spatial detail subnet is used to reduce the compression artifacts and enhance the reconstruction quality of the VVC compressed video. Experimental results demonstrate the effectiveness of our SDTS-based approach. It offers over 7.82$\%$ BD-rate saving on the common test video sequences and achieves the state-of-the-art performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09575

PDF

http://arxiv.org/pdf/1901.09575
Read All
Towards increased trustworthiness of deep learning segmentation methods on cardiac MRI

2019-01-28

Jörg Sander, Bob D. de Vos, Jelmer M. Wolterink, Ivana Išgum

arXiv_CV

arXiv_CV Segmentation CNN Deep_Learning
Abstract

Current state-of-the-art deep learning segmentation methods have not yet made a broad entrance into the clinical setting in spite of high demand for such automatic methods. One important reason is the lack of reliability caused by models that fail unnoticed and often locally produce anatomically implausible results that medical experts would not make. This paper presents an automatic image segmentation method based on (Bayesian) dilated convolutional networks (DCNN) that generate segmentation masks and spatial uncertainty maps for the input image at hand. The method was trained and evaluated using segmentation of the left ventricle (LV) cavity, right ventricle (RV) endocardium and myocardium (Myo) at end-diastole (ED) and end-systole (ES) in 100 cardiac 2D MR scans from the MICCAI 2017 Challenge (ACDC). Combining segmentations and uncertainty maps and employing a human-in-the-loop setting, we provide evidence that image areas indicated as highly uncertain regarding the obtained segmentation almost entirely cover regions of incorrect segmentations. The fused information can be harnessed to increase segmentation performance. Our results reveal that we can obtain valuable spatial uncertainty maps with low computational effort using DCNNs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.10430

PDF

http://arxiv.org/pdf/1809.10430
Read All
Out-of-Sample Testing for GANs

2019-01-28

Pablo Sánchez-Martín, Pablo M. Olmos, Fernando Pérez-Cruz

arXiv_CV

arXiv_CV GAN
Abstract

We propose a new method to evaluate GANs, namely EvalGAN. EvalGAN relies on a test set to directly measure the reconstruction quality in the original sample space (no auxiliary networks are necessary), and it also computes the (log)likelihood for the reconstructed samples in the test set. Further, EvalGAN is agnostic to the GAN algorithm and the dataset. We decided to test it on three state-of-the-art GANs over the well-known CIFAR-10 and CelebA datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.09557

PDF

https://arxiv.org/pdf/1901.09557
Read All
Improving Myocardium Segmentation in Cardiac CT Angiography using Spectral Information

2019-01-28

Steffen Bruns, Jelmer M. Wolterink, Robbert W. van Hamersvelt, Majd Zreik, Tim Leiner, Ivana Išgum

arXiv_CV

arXiv_CV Segmentation CNN Deep_Learning
Abstract

Accurate segmentation of the left ventricle myocardium in cardiac CT angiography (CCTA) is essential for e.g. the assessment of myocardial perfusion. Automatic deep learning methods for segmentation in CCTA might suffer from differences in contrast-agent attenuation between training and test data due to non-standardized contrast administration protocols and varying cardiac output. We propose augmentation of the training data with virtual mono-energetic reconstructions from a spectral CT scanner which show different attenuation levels of the contrast agent. We compare this to an augmentation by linear scaling of all intensity values, and combine both types of augmentation. We train a 3D fully convolutional network (FCN) with 10 conventional CCTA images and corresponding virtual mono-energetic reconstructions acquired on a spectral CT scanner, and evaluate on 40 CCTA scans acquired on a conventional CT scanner. We show that training with data augmentation using virtual mono-energetic images improves upon training with only conventional images (Dice similarity coefficient (DSC) 0.895 $\pm$ 0.039 vs. 0.846 $\pm$ 0.125). In comparison, training with data augmentation using linear scaling improves the DSC to 0.890 $\pm$ 0.039. Moreover, combining the results of both augmentation methods leads to a DSC of 0.901 $\pm$ 0.036, showing that both augmentations lead to different local improvements of the segmentations. Our results indicate that virtual mono-energetic images improve the generalization of an FCN used for myocardium segmentation in CCTA images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.03968

PDF

http://arxiv.org/pdf/1810.03968
Read All
CURE: Curvature Regularization For Missing Data Recovery

2019-01-28

Bin Dong, Haocheng Ju, Yiping Lu, Zuoqiang Shi

arXiv_CV

arXiv_CV Regularization
Abstract

Missing data recovery is an important and yet challenging problem in imaging and data science. Successful models often adopt certain carefully chosen regularization. Recently, the low dimension manifold model (LDMM) was introduced by S.Osher et al. and shown effective in image inpainting. They observed that enforcing low dimensionality on image patch manifold serves as a good image regularizer. In this paper, we observe that having only the low dimension manifold regularization is not enough sometimes, and we need smoothness as well. For that, we introduce a new regularization by combining the low dimension manifold regularization with a higher order Curvature Regularization, and we call this new regularization CURE for short. The key step of solving CURE is to solve a biharmonic equation on a manifold. We further introduce a weighted version of CURE, called WeCURE, in a similar manner as the weighted nonlocal Laplacian (WNLL) method. Numerical experiments for image inpainting and semi-supervised learning show that the proposed CURE and WeCURE significantly outperform LDMM and WNLL respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09548

PDF

http://arxiv.org/pdf/1901.09548
Read All
Privacy-Preserving Deep Learning via Weight Transmission

2019-01-28

Le Trieu Phong, Tran Thi Phuong

arXiv_AI

arXiv_AI Deep_Learning Gradient_Descent
Abstract

This paper considers the scenario that multiple data owners wish to apply a machine learning method over the combined dataset of all owners to obtain the best possible learning output but do not want to share the local datasets owing to privacy concerns. We design systems for the scenario that the stochastic gradient descent (SGD) algorithm is used as the machine learning method because SGD (or its variants) is at the heart of recent deep learning techniques over neural networks. Our systems differ from existing systems in the following features: {\bf (1)} any activation function can be used, meaning that no privacy-preserving-friendly approximation is required; {\bf (2)} gradients computed by SGD are not shared but the weight parameters are shared instead; and {\bf (3)} robustness against colluding parties even in the extreme case that only one honest party exists. We prove that our systems, while privacy-preserving, achieve the same learning accuracy as SGD and hence retain the merit of deep learning with respect to accuracy. Finally, we conduct several experiments using benchmark datasets, and show that our systems outperform previous system in terms of learning accuracies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.03272

PDF

http://arxiv.org/pdf/1809.03272
Read All
Attention-Aware Generalized Mean Pooling for Image Retrieval

2019-01-28

Yinzheng Gu, Chuanpeng Li, Jinbin Xie

arXiv_CV

arXiv_CV Image_Retrieval Attention CNN
Abstract

It has been shown that image descriptors extracted by convolutional neural networks (CNNs) achieve remarkable results for retrieval problems. In this paper, we apply attention mechanism to CNN, which aims at enhancing more relevant features that correspond to important keypoints in the input image. The generated attention-aware features are then aggregated by the previous state-of-the-art generalized mean (GeM) pooling followed by normalization to produce a compact global descriptor, which can be efficiently compared to other image descriptors by the dot product. An extensive comparison of our proposed approach with state-of-the-art methods is performed on the new challenging ROxford5k and RParis6k retrieval benchmarks. Results indicate significant improvement over previous work. In particular, our attention-aware GeM (AGeM) descriptor outperforms state-of-the-art method on ROxford5k under the `Hard’ evaluation protocal.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.00202

PDF

http://arxiv.org/pdf/1811.00202
Read All
Quantifying Temperature-dependent Substrate Loss in GaN-on-Si RF Technology

2019-01-28

Hareesh Chandrasekar, Michael J. Uren, Michael A. Casbon, Hassan Hirshy, Abdalla Eblabla, Khaled Elgaid, James W. Pomeroy, Paul J. Tasker, Martin Kuball

arXiv_CV

arXiv_CV GAN
Abstract

Intrinsic limits to temperature-dependent substrate loss for GaN-on-Si technology, due to the change in resistivity of the substrate with temperature, are evaluated using an experimentally validated device simulation framework. Effect of room temperature substrate resistivity on temperature-dependent CPW line loss at various operating frequency bands are then presented. CPW lines for GaN-on-high resistivity Si are shown to have a pronounced temperature-dependence for temperatures above 150°C and have lower substrate losses for frequencies above the X-band. On the other hand, GaN-on-low resistivity Si is shown to be more temperature-insensitive and have lower substrate losses than even HR-Si for lower operating frequencies. The effect of various CPW geometries on substrate loss is also presented to generalize the discussion. These results are expected to act as a benchmark for temperature dependent substrate loss in GaN-on-Si RF technology.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.09521

PDF

https://arxiv.org/pdf/1901.09521
Read All
Syntactico-Semantic Reasoning using PCFG, MEBN & PP Attachment Ambiguity

2019-01-28

Shrinivasan R Patnaik Patnaikuni, Dr. Sachin R Gengaje

arXiv_AI

arXiv_AI Ontology
Abstract

Probabilistic context free grammars (PCFG) have been the core of the probabilistic reasoning based parsers for several years especially in the context of the NLP. Multi entity bayesian networks (MEBN) a First Order Logic probabilistic reasoning methodology is widely adopted and used method for uncertainty reasoning. Further upper ontology like Probabilistic Ontology Web Language (PR-OWL) built using MEBN takes care of probabilistic ontologies which model and capture the uncertainties inherent in the domain’s semantic information. The paper attempts to establish a link between probabilistic reasoning in PCFG and MEBN by proposing a formal description of PCFG driven by MEBN leading to usage of PR-OWL modeled ontologies in PCFG parsers. Furthermore, the paper outlines an approach to resolve prepositional phrase (PP) attachment ambiguity using the proposed mapping between PCFG and MEBN.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.07607

PDF

http://arxiv.org/pdf/1809.07607
Read All
Online Estimation of Ocean Current from Sparse GPS Data for Underwater Vehicles

2019-01-28

Ki Myung Brian Lee, Chanyeol Yoo, Ben Hollings, Stuart Anstee, Shoudong Huang, Robert Fitch

arXiv_RO

arXiv_RO Sparse Face
Abstract

Underwater robots are subject to position drift due to the effect of ocean currents and the lack of accurate localisation while submerged. We are interested in exploiting such position drift to estimate the ocean current in the surrounding area, thereby assisting navigation and planning. We present a Gaussian process~(GP)-based expectation-maximisation~(EM) algorithm that estimates the underlying ocean current using sparse GPS data obtained on the surface and dead-reckoned position estimates. We first develop a specialised GP regression scheme that exploits the incompressibility of ocean currents to counteract the underdetermined nature of the problem. We then use the proposed regression scheme in an EM algorithm that estimates the best-fitting ocean current in between each GPS fix. The proposed algorithm is validated in simulation and on a real dataset, and is shown to be capable of reconstructing the underlying ocean current field. We expect to use this algorithm to close the loop between planning and estimation for underwater navigation in unknown ocean currents.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09513

PDF

http://arxiv.org/pdf/1901.09513
Read All
One pixel attack for fooling deep neural networks

2019-01-28

Jiawei Su, Danilo Vasconcellos Vargas, Sakurai Kouichi

arXiv_CV

arXiv_CV Adversarial
Abstract

Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector. In this paper, we analyze an attack in an extremely limited scenario where only one pixel can be modified. For that we propose a novel method for generating one-pixel adversarial perturbations based on differential evolution(DE). It requires less adversarial information(a black-box attack) and can fool more types of networks due to the inherent features of DE. The results show that 68.36% of the natural images in CIFAR-10 test dataset and 41.22% of the ImageNet (ILSVRC 2012) validation images can be perturbed to at least one target class by modifying just one pixel with 73.22% and 5.52% confidence on average. Thus, the proposed attack explores a different take on adversarial machine learning in an extreme limited scenario, showing that current DNNs are also vulnerable to such low dimension attacks. Besides, we also illustrate an important application of DE (or broadly speaking, evolutionary computation) in the domain of adversarial machine learning: creating tools that can effectively generate low-cost adversarial attacks against neural networks for evaluating robustness. The code is available on: https://github.com/Carina02/One-Pixel-Attack

Abstract (translated by Google)

URL

http://arxiv.org/abs/1710.08864

PDF

http://arxiv.org/pdf/1710.08864
Read All
Streamlines for Motion Planning in Underwater Currents

2019-01-28

Kwun Yiu Cadmus To, Ki Myung Brian Lee, Chanyeol Yoo, Stuart Anstee, Robert Fitch

arXiv_RO

arXiv_RO Prediction
Abstract

Motion planning for underwater vehicles must consider the effect of ocean currents. We present an efficient method to compute reachability and cost between sample points in sampling-based motion planning that supports long-range planning over hundreds of kilometres in complicated flows. The idea is to search a reduced space of control inputs that consists of stream functions whose level sets, or streamlines, optimally connect two given points. Such stream functions are generated by superimposing a control input onto the underlying current flow. A streamline represents the resulting path that a vehicle would follow as it is carried along by the current given that control input. We provide rigorous analysis that shows how our method avoids exhaustive search of the control space, and demonstrate simulated examples in complicated flows including a traversal along the east coast of Australia, using actual current predictions, between Sydney and Brisbane.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09512

PDF

http://arxiv.org/pdf/1901.09512
Read All
Toward Unsupervised Text Content Manipulation

2019-01-28

Wentao Wang, Zhiting Hu, Zichao Yang, Haoran Shi, Frank Xu, Eric Xing

arXiv_AI

arXiv_AI Sentiment Style_Transfer
Abstract

Controlled generation of text is of high practical use. Recent efforts have made impressive progress in generating or editing sentences with given textual attributes (e.g., sentiment). This work studies a new practical setting of text content manipulation. Given a structured record, such as (PLAYER: Lebron, POINTS: 20, ASSISTS: 10)', and a reference sentence, such as Kobe easily dropped 30 points’, we aim to generate a sentence that accurately describes the full content in the record, with the same writing style (e.g., wording, transitions) of the reference. The problem is unsupervised due to lack of parallel data in practice, and is challenging to minimally yet effectively manipulate the text (by rewriting/adding/deleting text portions) to ensure fidelity to the structured content. We derive a dataset from a basketball game report corpus as our testbed, and develop a neural method with unsupervised competing objectives and explicit content coverage constraints. Automatic and human evaluations show superiority of our approach over competitive methods including a strong rule-based baseline and prior approaches designed for style transfer.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09501

PDF

http://arxiv.org/pdf/1901.09501
Read All
Neural Related Work Summarization with a Joint Context-driven Attention Mechanism

2019-01-28

Yongzhen Wang, Xiaozhong Liu, Zheng Gao

arXiv_CL

arXiv_CL Attention Summarization Relation
Abstract

Conventional solutions to automatic related work summarization rely heavily on human-engineered features. In this paper, we develop a neural data-driven summarizer by leveraging the seq2seq paradigm, in which a joint context-driven attention mechanism is proposed to measure the contextual relevance within full texts and a heterogeneous bibliography graph simultaneously. Our motivation is to maintain the topic coherency between a related work section and its target document, where both the textual and graphic contexts play a big role in characterizing the relationship among scientific publications accurately. Experimental results on a large dataset show that our approach achieves a considerable improvement over a typical seq2seq summarizer and five classical summarization baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09492

PDF

http://arxiv.org/pdf/1901.09492
Read All
Weakly Supervised Deep Image Hashing through Tag Embeddings

2019-01-28

Vijetha Gattupalli, Yaoxin Zhuo, Baoxin Li

arXiv_CV

arXiv_CV Image_Retrieval Weakly_Supervised Embedding Language_Model
Abstract

Many approaches to semantic image hashing have been formulated as supervised learning problems that utilize images and label information to learn the binary hash codes. However, large-scale labeled image data is expensive to obtain, thus imposing a restriction on the usage of such algorithms. On the other hand, unlabelled image data is abundant due to the existence of many Web image repositories. Such Web images may often come with images tags that contain useful information, although raw tags, in general, do not readily lead to semantic labels. Motivated by this scenario, we formulate the problem of semantic image hashing as a weakly-supervised learning problem. We utilize the information contained in the user-generated tags associated with the images to learn the hash codes. More specifically, we extract the word2vec semantic embeddings of the tags and use the information contained in them for constraining the learning. Accordingly, we name our model Weakly Supervised Deep Hashing using Tag Embeddings (WDHT). WDHT is tested for the task of semantic image retrieval and is compared against several state-of-art models. Results show that our approach sets a new state-of-art in the area of weekly supervised image hashing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.05804

PDF

http://arxiv.org/pdf/1806.05804
Read All
Modeling and Simulation of Robotic Finger Powered by Nylon Artificial Muscles- Equations with Simulink model

2019-01-28

Lokesh Saharan, Lianjun Wu, Yonas Tadesse,

arXiv_RO

arXiv_RO Attention
Abstract

This paper shows a detailed modeling of three-link robotic finger that is actuated by nylon artificial muscles and a simulink model that can be used for numerical study of a robotic finger. The robotic hand prototype was recently demonstrated in recent publication Wu, L., Jung de Andrade, M., Saharan, L.,Rome, R., Baughman, R., and Tadesse, Y., 2017, Compact and Low-cost Humanoid Hand Powered by Nylon Artificial Muscles, Bioinspiration & Biomimetics, 12 (2). The robotic hand is a 3D printed, lightweight and compact hand actuated by silver-coated nylon muscles, often called Twisted and coiled Polymer (TCP) muscles. TCP muscles are thermal actuators that contract when they are heated and they are getting attention for application in robotics. The purpose of this paper is to demonstrate the modeling equations that were derived based on Euler Lagrangian approach that is suitable for implementation in simulink model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09486

PDF

http://arxiv.org/pdf/1901.09486
Read All
End-to-End Discriminative Deep Network for Liver Lesion Classification

2019-01-28

Francisco Perdigon Romero, Andre Diler, Gabriel Bisson-Gregoire, Simon Turcotte, Real Lapointe, Franck Vandenbroucke-Menu, An Tang, Samuel Kadoury

arXiv_CV

arXiv_CV Classification Deep_Learning Detection
Abstract

Colorectal liver metastasis is one of most aggressive liver malignancies. While the definition of lesion type based on CT images determines the diagnosis and therapeutic strategy, the discrimination between cancerous and non-cancerous lesions are critical and requires highly skilled expertise, experience and time. In the present work we introduce an end-to-end deep learning approach to assist in the discrimination between liver metastases from colorectal cancer and benign cysts in abdominal CT images of the liver. Our approach incorporates the efficient feature extraction of InceptionV3 combined with residual connections and pre-trained weights from ImageNet. The architecture also includes fully connected classification layers to generate a probabilistic output of lesion type. We use an in-house clinical biobank with 230 liver lesions originating from 63 patients. With an accuracy of 0.96 and a F1-score of 0.92, the results obtained with the proposed approach surpasses state of the art methods. Our work provides the basis for incorporating machine learning tools in specialized radiology software to assist physicians in the early detection and treatment of liver lesions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09483

PDF

http://arxiv.org/pdf/1901.09483
Read All
Bridging the Gap Between Computational Photography and Visual Recognition

2019-01-28

Rosaura G. VidalMata, Sreya Banerjee, Brandon RichardWebster, Michael Albright, Pedro Davalos, Scott McCloskey, Ben Miller, Asong Tambo, Sushobhan Ghosh, Sudarshan Nagesh, Ye Yuan, Yueyu Hu, Junru Wu, Wenhan Yang, Xiaoshuai Zhang, Jiaying Liu, Zhangyang Wang, Hwann-Tzong Chen, Tzu-Wei Huang, Wen-Chi Chin, Yi-Chun Li, Mahmoud Lababidi, Charles Otto, Walter J. Scheirer

arXiv_CV

arXiv_CV Deep_Learning Quantitative Recognition
Abstract

What is the current state-of-the-art for image restoration and enhancement applied to degraded images acquired under less than ideal circumstances? Can the application of such algorithms as a pre-processing step to improve image interpretability for manual analysis or automatic visual recognition to classify scene content? While there have been important advances in the area of computational photography to restore or enhance the visual quality of an image, the capabilities of such techniques have not always translated in a useful way to visual recognition tasks. Consequently, there is a pressing need for the development of algorithms that are designed for the joint problem of improving visual appearance and recognition, which will be an enabling factor for the deployment of visual recognition tools in many real-world scenarios. To address this, we introduce the UG^2 dataset as a large-scale benchmark composed of video imagery captured under challenging conditions, and two enhancement tasks designed to test algorithmic impact on visual quality and automatic object recognition. Furthermore, we propose a set of metrics to evaluate the joint improvement of such tasks as well as individual algorithmic advances, including a novel psychophysics-based evaluation regime for human assessment and a realistic set of quantitative measures for object recognition performance. We introduce six new algorithms for image restoration or enhancement, which were created as part of the IARPA sponsored UG^2 Challenge workshop held at CVPR 2018. Under the proposed evaluation regime, we present an in-depth analysis of these algorithms and a host of deep learning-based and classic baseline approaches. From the observed results, it is evident that we are in the early days of building a bridge between computational photography and visual recognition, leaving many opportunities for innovation in this area.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09482

PDF

http://arxiv.org/pdf/1901.09482
Read All
Watermark Signal Detection and Its Application in Image Retrieval

2019-01-28

Ning Ma, Josh Zhao, Mark Bolin

arXiv_CV

arXiv_CV Image_Retrieval CNN Detection
Abstract

We propose a few fundamental techniques to obtain effective watermark features of images in the image search index, and utilize the signals in a commercial search engine to improve the image search quality. We collect a diverse and large set (about 1M) of images with human labels indicating whether the image contains visible watermark. We train a few deep convolutional neural networks to extract watermark information from the raw images. We also analyze the images based on their domains to get watermark information from a domain-based watermark classifier. The deep CNN classifiers we trained can achieve high accuracy on the watermark data set. We demonstrate that using these signals in Bing image search ranker, powered by LambdaMART, can effectively reduce the watermark rate during the online image ranking.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09473

PDF

http://arxiv.org/pdf/1901.09473
Read All
Bayesian Active Learning for Collaborative Task Specification Using Equivalence Regions

2019-01-28

Nils Wilde, Dana Kulic, Stephen L. Smith

arXiv_RO

arXiv_RO Face
Abstract

Specifying complex task behaviours while ensuring good robot performance may be difficult for untrained users. We study a framework for users to specify rules for acceptable behaviour in a shared environment such as industrial facilities. As non-expert users might have little intuition about how their specification impacts the robot’s performance, we design a learning system that interacts with the user to find an optimal solution. Using active preference learning, we iteratively show alternative paths that the robot could take on an interface. From the user feedback ranking the alternatives, we learn about the weights that users place on each part of their specification. We extend the user model from our previous work to a discrete Bayesian learning model and introduce a greedy algorithm for proposing alternative that operates on the notion of equivalence regions of user weights. We prove that with this algorithm the revision active learning process converges on the user-optimal path. In simulations on realistic industrial environments, we demonstrate the convergence and robustness of our approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09470

PDF

http://arxiv.org/pdf/1901.09470
Read All
TUNet: Incorporating segmentation maps to improve classification

2019-01-27

Yijun Tian

arXiv_CV

arXiv_CV Segmentation Classification Deep_Learning
Abstract

Determining the localization of specific protein in human cells is important for understanding cellular functions and biological processes of underlying diseases. Among imaging techniques, high-throughput fluorescence microscopy imaging is an efficient biotechnology to stain the protein of interest in a cell. In this work, we present a novel classification model Twin U-Net (TUNet) for processing and classifying the belonging of protein in the Atlas images. Several notable Deep Learning models including GoogleNet and Resnet have been employed for comparison. Results have shown that our system obtaining competitive performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.11379

PDF

http://arxiv.org/pdf/1901.11379
Read All
Deconstructing Generative Adversarial Networks

2019-01-27

Banghua Zhu, Jiantao Jiao, David Tse

arXiv_AI

arXiv_AI Adversarial GAN Optimization Gradient_Descent
Abstract

We deconstruct the performance of GANs into three components:
1. Formulation: we propose a perturbation view of the population target of GANs. Building on this interpretation, we show that GANs can be viewed as a generalization of the robust statistics framework, and propose a novel GAN architecture, termed as Cascade GANs, to provably recover meaningful low-dimensional generator approximations when the real distribution is high-dimensional and corrupted by outliers.
2. Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using finite samples. We implement our principle in three cases to achieve polynomial and sometimes near-optimal sample complexities: (1) learning an arbitrary generator under an arbitrary pseudonorm; (2) learning a Gaussian location family under total variation distance, where we utilize our principle provide a new proof for the optimality of Tukey median viewed as GANs; (3) learning a low-dimensional Gaussian approximation of a high-dimensional arbitrary distribution under Wasserstein distance. We demonstrate a fundamental trade-off in the approximation error and statistical error in GANs, and show how to apply our principle with empirical samples to predict how many samples are sufficient for GANs in order not to suffer from the discriminator winning problem.
3. Optimization: we demonstrate alternating gradient descent is provably not even locally stable in optimizating the GAN formulation of PCA. We diagnose the problem as the minimax duality gap being non-zero, and propose a new GAN architecture whose duality gap is zero, where the value of the game is equal to the previous minimax value (not the maximin value). We prove the new GAN architecture is globally stable in optimization under alternating gradient descent.
Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09465

PDF

http://arxiv.org/pdf/1901.09465
Read All
Learning Transformation Synchronization

2019-01-27

Xiangru Huang, Zhenxiao Liang, Xiaowei Zhou, Yao Xie, Leonidas Guibas, Qixing Huang

arXiv_CV

arXiv_CV
Abstract

Reconstructing the 3D model of a physical object typically requires us to align the depth scans obtained from different camera poses into the same coordinate system. Solutions to this global alignment problem usually proceed in two steps. The first step estimates relative transformations between pairs of scans using an off-the-shelf technique. Due to limited information presented between pairs of scans, the resulting relative transformations are generally noisy. The second step then jointly optimizes the relative transformations among all input depth scans. A natural constraint used in this step is the cycle-consistency constraint, which allows us to prune incorrect relative transformations by detecting inconsistent cycles. The performance of such approaches, however, heavily relies on the quality of the input relative transformations. Instead of merely using the relative transformations as the input to perform transformation synchronization, we propose to use a neural network to learn the weights associated with each relative transformation. Our approach alternates between transformation synchronization using weighted relative transformations and predicting new weights of the input relative transformations using a neural network. We demonstrate the usefulness of this approach across a wide range of datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09458

PDF

http://arxiv.org/pdf/1901.09458
Read All
On Learning Invariant Representation for Domain Adaptation

2019-01-27

Han Zhao, Remi Tachet des Combes, Kun Zhang, Geoffrey J. Gordon

arXiv_AI

arXiv_AI Represenation_Learning
Abstract

Due to the ability of deep neural nets to learn rich representations, recent advances in unsupervised domain adaptation have focused on learning domain-invariant features that achieve a small error on the source domain. The hope is that the learnt representation, together with the hypothesis learnt from the source domain, can generalize to the target domain. In this paper, we first construct a simple counterexample showing that, contrary to common belief, the above conditions are not sufficient to guarantee successful domain adaptation. In particular, the counterexample (Fig. 1) exhibits \emph{conditional shift}: the class-conditional distributions of input features change between source and target domains. To give a sufficient condition for domain adaptation, we propose a natural and interpretable generalization upper bound that explicitly takes into account the aforementioned shift. Moreover, we shed new light on the problem by proving an information-theoretic lower bound on the joint error of \emph{any} domain adaptation method that attempts to learn invariant representations. Our result characterizes a fundamental tradeoff between learning invariant representations and achieving small joint error on both domains when the marginal label distributions differ from source to target. Finally, we conduct experiments on real-world datasets that corroborate our theoretical findings. We believe these insights are helpful in guiding the future design of domain adaptation and representation learning algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09453

PDF

http://arxiv.org/pdf/1901.09453
Read All
Open Source Face Recognition Performance Evaluation Package

2019-01-27

Xiang Xu, Ioannis A. Kakadiaris

arXiv_CV

arXiv_CV Face Deep_Learning Recognition Face_Recognition
Abstract

Biometrics-related research has been accelerated significantly by deep learning technology. However, there are limited open-source resources to help researchers evaluate their deep learning-based biometrics algorithms efficiently, especially for the face recognition tasks. In this work, we design and implement a light-weight, maintainable, scalable, generalizable, and extendable face recognition evaluation toolbox named FaRE that supports both online and offline evaluation to provide feedback to algorithm development and accelerate biometrics-related research. FaRE consists of a set of evaluation metric functions and provides various APIs for commonly-used face recognition datasets including LFW, CFP, UHDB31, and IJB-series datasets, which can be easily extended to include other customized datasets. The package and the pre-trained baseline models will be released for public academic research use after obtaining university approval.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09447

PDF

http://arxiv.org/pdf/1901.09447
Read All
Promoting Diversity for End-to-End Conversation Response Generation

2019-01-27

Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Jia-Chen Gu, Xiaodan Zhu

arXiv_CL

arXiv_CL Knowledge
Abstract

We present our work on Track 2 in the Dialog System Technology Challenges 7 (DSTC7). The DSTC7-Track 2 aims to evaluate the response generation of fully data-driven conversation models in knowledge-grounded settings, which provides the contextual-relevant factual texts. The Sequenceto-Sequence models have been widely used for end-to-end generative conversation modelling and achieved impressive results. However, they tend to output dull and repeated responses in previous studies. Our work aims to promote the diversity for end-to-end conversation response generation, which follows a two-stage pipeline: 1) Generate multiple responses. At this stage, two different models are proposed, i.e., a variational generative (VariGen) model and a retrieval based (Retrieval) model. 2) Rank and return the most related response by training a topic coherence discrimination (TCD) model for the ranking process. According to the official evaluation results, our proposed Retrieval and VariGen systems ranked first and second respectively on objective diversity metrics, i.e., Entropy, among all participant systems. And the VariGen system ranked second on NIST and METEOR metrics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09444

PDF

http://arxiv.org/pdf/1901.09444
Read All
Pixelated Semantic Colorization

2019-01-27

Jiaojiao Zhao, Jungong Han, Ling Shao, Cees G. M. Snoek

arXiv_CV

arXiv_CV Segmentation Embedding CNN Semantic_Segmentation
Abstract

While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from limited semantic understanding. To address this shortcoming, we propose to exploit pixelated object semantics to guide image colorization. The rationale is that human beings perceive and distinguish colors based on the semantic categories of objects. Starting from an autoregressive model, we generate image color distributions, from which diverse colored results are sampled. We propose two ways to incorporate object semantics into the colorization model: through a pixelated semantic embedding and a pixelated semantic generator. Specifically, the proposed convolutional neural network includes two branches. One branch learns what the object is, while the other branch learns the object colors. The network jointly optimizes a color embedding loss, a semantic segmentation loss and a color generation loss, in an end-to-end fashion. Experiments on PASCAL VOC2012 and COCO-stuff reveal that our network, when trained with semantic segmentation labels, produces more realistic and finer results compared to the colorization state-of-the-art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10889

PDF

http://arxiv.org/pdf/1901.10889
Read All
Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques

2019-01-27

Omar Boudraa, Walid Khaled Hidouci, Dominique Michelucci

arXiv_CV

arXiv_CV Recognition
Abstract

Document image binarization is the initial step and a crucial in many document analysis and recognition scheme. In fact, it is still a relevant research subject and a fundamental challenge due to its importance and influence. This paper provides an original multi-phases system that hybridizes various efficient image thresholding methods in order to get the best binarization output. First, to improve contrast in particularly defective images, the application of CLAHE algorithm is suggested and justified. We then use a cooperative technique to segment image into two separated classes. At the end, a special transformation is applied for the purpose of removing scattered noise and of correcting characters forms. Experimentations demonstrate the precision and the robustness of our framework applied on historical degraded documents images within three benchmarks compared to other noted methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09425

PDF

http://arxiv.org/pdf/1901.09425
Read All
3D Contouring for Breast Tumor in Sonography

2019-01-27

Yu-Len Huang, PhD, Dar-Ren Chen, MD, Yu-Chih Lin

arXiv_CV

arXiv_CV Segmentation
Abstract

Malignant and benign breast tumors present differently in their shape and size on sonography. Morphological information provided by tumor contours are important in clinical diagnosis. However, ultrasound images contain noises and tissue texture; clinical diagnosis thus highly depends on the experience of physicians. The manual way to sketch three-dimensional (3D) contours of breast tumor is a time-consuming and complicate task. If automatic contouring could provide a precise breast tumor contour that might assist physicians in making an accurate diagnosis. This study presents an efficient method for automatically contouring breast tumors in 3D sonography. The proposed method utilizes an efficient segmentation procedure, i.e. level-set method (LSM), to automatic detect contours of breast tumors. This study evaluates 20 cases comprising ten benign and ten malignant tumors. The results of computer simulation reveal that the proposed 3D segmentation method provides robust contouring for breast tumor on ultrasound images. This approach consistently obtains contours similar to those obtained by manual contouring of the breast tumor and can save much of the time required to sketch precise contours.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09407

PDF

http://arxiv.org/pdf/1901.09407
Read All
Spatio-temporal Action Recognition: A Survey

2019-01-27

Amlaan Bhoi

arXiv_CV

arXiv_CV Survey Action_Recognition Detection Recognition
Abstract

The task of action recognition or action detection involves analyzing videos and determining what action or motion is being performed. The primary subject of these videos are predominantly humans performing some action. However, this requirement can be relaxed to generalize over other subjects such as animals or robots. The applications can range from anywhere between human-computer inter-action to automated video editing proposals. When we consider spatiotemporal action recognition, we deal with action localization. This task not only involves determining what action is being performed but also when and where itis being performed in said video. This paper aims to survey the plethora of approaches and algorithms attempted to solve this task, give a comprehensive comparison between them, explore various datasets available for the problem, and determine the most promising approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09403

PDF

http://arxiv.org/pdf/1901.09403
Read All
Monocular Depth Estimation: A Survey

2019-01-27

Amlaan Bhoi

arXiv_CV

arXiv_CV Review Segmentation Survey Detection Recognition
Abstract

Monocular depth estimation is often described as an ill-posed and inherently ambiguous problem. Estimating depth from 2D images is a crucial step in scene reconstruction, 3Dobject recognition, segmentation, and detection. The problem can be framed as: given a single RGB image as input, predict a dense depth map for each pixel. This problem is worsened by the fact that most scenes have large texture and structural variations, object occlusions, and rich geometric detailing. All these factors contribute to difficulty in accurate depth estimation. In this paper, we review five papers that attempt to solve the depth estimation problem with various techniques including supervised, weakly-supervised, and unsupervised learning techniques. We then compare these papers and understand the improvements made over one another. Finally, we explore potential improvements that can aid to better solve this problem.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09402

PDF

http://arxiv.org/pdf/1901.09402
Read All
NeuralSampler: Euclidean Point Cloud Auto-Encoder and Sampler

2019-01-27

Edoardo Remelli, Pierre Baque, Pascal Fua

arXiv_CV

arXiv_CV Sparse Deep_Learning
Abstract

Most algorithms that rely on deep learning-based approaches to generate 3D point sets can only produce clouds containing fixed number of points. Furthermore, they typically require large networks parameterized by many weights, which makes them hard to train. In this paper, we propose an auto-encoder architecture that can both encode and decode clouds of arbitrary size and demonstrate its effectiveness at upsampling sparse point clouds. Interestingly, we can do so using less than half as many parameters as state-of-the-art architectures while still delivering better performance. We will make our code base fully available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09394

PDF

http://arxiv.org/pdf/1901.09394
Read All
Imitation Learning from Imperfect Demonstration

2019-01-27

Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama

arXiv_AI

arXiv_AI Adversarial
Abstract

Imitation learning (IL) aims to learn an optimal policy from demonstrations. However, such demonstrations are often imperfect since collecting optimal ones is costly. To effectively learn from imperfect demonstrations, we propose a novel approach that utilizes confidence scores, which describe the quality of demonstrations. More specifically, we propose two confidence-based IL methods, namely two-step importance weighting IL (2IWIL) and generative adversarial IL with imperfect demonstration and confidence (IC-GAIL). We show that confidence scores given only to a small portion of sub-optimal demonstrations significantly improve the performance of IL both theoretically and empirically.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09387

PDF

http://arxiv.org/pdf/1901.09387
Read All
Dual Co-Matching Network for Multi-choice Reading Comprehension

2019-01-27

Shuailiang Zhang, Hai Zhao, Yuwei Wu, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

arXiv_CL

arXiv_CL Relation
Abstract

Multi-choice reading comprehension is a challenging task that requires complex reasoning procedure. Given passage and question, a correct answer need to be selected from a set of candidate answers. In this paper, we propose \textbf{D}ual \textbf{C}o-\textbf{M}atching \textbf{N}etwork (\textbf{DCMN}) which model the relationship among passage, question and answer bidirectionally. Different from existing approaches which only calculate question-aware or option-aware passage representation, we calculate passage-aware question representation and passage-aware answer representation at the same time. To demonstrate the effectiveness of our model, we evaluate our model on a large-scale multiple choice machine reading comprehension dataset({\em i.e.} RACE). Experimental result show that our proposed model achieves new state-of-the-art results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09381

PDF

http://arxiv.org/pdf/1901.09381
Read All
Graph Neural Networks for Learning Robot Team Coordination

2019-01-27

Amanda Prorok

arXiv_RO

arXiv_RO Relation
Abstract

This paper shows how Graph Neural Networks can be used for learning distributed coordination mechanisms in connected teams of robots. We capture the relational aspect of robot coordination by modeling the robot team as a graph, where each robot is a node, and edges represent communication links. During training, robots learn how to pass messages and update internal states, so that a target behavior is reached. As a proxy for more complex problems, this short paper considers the problem where each robot must locally estimate the algebraic connectivity of the team’s network topology.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.03737

PDF

http://arxiv.org/pdf/1805.03737
Read All
6D Object Pose Estimation Based on 2D Bounding Box

2019-01-27

Jin Liu, Sheng He

arXiv_CV

arXiv_CV Pose_Estimation CNN Detection
Abstract

In this paper, we present a simple but powerful method to tackle the problem of estimating the 6D pose of objects from a single RGB image. Our system trains a novel convolutional neural network to regress the unit quaternion, which represents the 3D rotation, from the partial image inside the bounding box returned by 2D detection systems. Then we propose an algorithm we call Bounding Box Equation to efficiently and accurately obtain the 3D translation, using 3D rotation and 2D bounding box. Considering that the quadratic sum of the quaternion’s four elements equals to one, we add a normalization layer to keep the network’s output on the unit sphere and put forward a special loss function for unit quaternion regression. We evaluate our method on the LineMod dataset and experiment shows that our approach outperforms base-line and some state of the art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09366

PDF

http://arxiv.org/pdf/1901.09366
Read All
Fast and Efficient Lenslet Image Compression

2019-01-27

Hadi Amirpour, Antonio Pinheiro, Manuela Pereira, Mohammad Ghanbari

arXiv_CV

arXiv_CV Image_Caption GAN Prediction
Abstract

Light field imaging is characterized by capturing brightness, color, and directional information of light rays in a scene. This leads to image representations with huge amount of data that require efficient coding schemes. In this paper, lenslet images are rendered into sub-aperture images. These images are organized as a pseudo-sequence input for the HEVC video codec. To better exploit redundancy among the neighboring sub-aperture images and consequently decrease the distances between a sub-aperture image and its references used for prediction, sub-aperture images are divided into four smaller groups that are scanned in a serpentine order. The most central sub-aperture image, which has the highest similarity to all the other images, is used as the initial reference image for each of the four regions. Furthermore, a structure is defined that selects spatially adjacent sub-aperture images as prediction references with the highest similarity to the current image. In this way, encoding efficiency increases, and furthermore it leads to a higher similarity among the co-located Coding Three Units (CTUs). The similarities among the co-located CTUs are exploited to predict Coding Unit depths.Moreover, independent encoding of each group division enables parallel processing, that along with the proposed coding unit depth prediction decrease the encoding execution time by almost 80% on average. Simulation results show that Rate-Distortion performance of the proposed method has higher compression gain than the other state-of-the-art lenslet compression methods with lower computational complexity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.11396

PDF

http://arxiv.org/pdf/1901.11396
Read All
Resultant Based Incremental Recovery of Camera Pose from Pairwise Matches

2019-01-27

Yoni Kasten, Meirav Galun, Ronen Basri

arXiv_CV

arXiv_CV Tracking
Abstract

Incremental (online) structure from motion pipelines seek to recover the camera matrix associated with an image $I_n$ given $n-1$ images, $I_1,…,I_{n-1}$, whose camera matrices have already been recovered. In this paper, we introduce a novel solution to the six-point online algorithm to recover the exterior parameters associated with $I_n$. Our algorithm uses just six corresponding pairs of 2D points, extracted each from $I_n$ and from \textit{any} of the preceding $n-1$ images, allowing the recovery of the full six degrees of freedom of the $n$’th camera, and unlike common methods, does not require tracking feature points in three or more images. Our novel solution is based on constructing a Dixon resultant, yielding a solution method that is both efficient and accurate compared to existing solutions. We further use Bernstein’s theorem to prove a tight bound on the number of complex solutions. Our experiments demonstrate the utility of our approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09364

PDF

http://arxiv.org/pdf/1901.09364
Read All
Bayesian Learning of Neural Network Architectures

2019-01-27

Georgi Dikov, Patrick van der Smagt, Justin Bayer

arXiv_CV

arXiv_CV NAS
Abstract

In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.04436

PDF

https://arxiv.org/pdf/1901.04436
Read All
A Convolutional Neural Network model based on Neutrosophy for Noisy Speech Recognition

2019-01-27

Elyas Rashno, Ahmad Akbari, Babak Nasersharif

arXiv_SD

arXiv_SD Speech_Recognition CNN Classification Recognition
Abstract

Convolutional neural networks are sensitive to unknown noisy condition in the test phase and so their performance degrades for the noisy data classification task including noisy speech recognition. In this research, a new convolutional neural network (CNN) model with data uncertainty handling; referred as NCNN (Neutrosophic Convolutional Neural Network); is proposed for classification task. Here, speech signals are used as input data and their noise is modeled as uncertainty. In this task, using speech spectrogram, a definition of uncertainty is proposed in neutrosophic (NS) domain. Uncertainty is computed for each Time-frequency point of speech spectrogram as like a pixel. Therefore, uncertainty matrix with the same size of spectrogram is created in NS domain. In the next step, a two parallel paths CNN classification model is proposed. Speech spectrogram is used as input of the first path and uncertainty matrix for the second path. The outputs of two paths are combined to compute the final output of the classifier. To show the effectiveness of the proposed method, it has been compared with conventional CNN on the isolated words of Aurora2 dataset. The proposed method achieves the average accuracy of 85.96 in noisy train data. It is more robust against Car, Airport and Subway noises with accuracies 90, 88 and 81 in test sets A, B and C, respectively. Results show that the proposed method outperforms conventional CNN with the improvement of 6, 5 and 2 percentage in test set A, test set B and test sets C, respectively. It means that the proposed method is more robust against noisy data and handle these data effectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10629

PDF

http://arxiv.org/pdf/1901.10629
Read All
Automated Quality Control in Image Segmentation: Application to the UK Biobank Cardiac MR Imaging Study

2019-01-27

Robert Robinson (1), Vanya V. Valindria (1), Wenjia Bai (1), Ozan Oktay (1), Bernhard Kainz (1), Hideaki Suzuki (2), Mihir M. Sanghvi (4 and 5), Nay Aung (4 and 5), Jos$é$ Miguel Paiva (4), Filip Zemrak (4 and 5), Kenneth Fung (4 and 5), Elena Lukaschuk (6), Aaron M. Lee (4 and 5), Valentina Carapella (6), Young Jin Kim (6 and 7), Stefan K. Piechnik (6), Stefan Neubauer (6), Steffen E. Petersen (4 and 5), Chris Page (3), Paul M. Matthews (2 and 8), Daniel Rueckert (1), Ben Glocker (1) ((1) Biomedical Image Analysis Group, Department of Computing, Imperial College London, (2) Division of Brain Sciences, Dept. of Medicine, Imperial College London, (3) GlaxoSmithKline Research and Development, UK, (4) William Harvey Research Institute, Queen Mary University of London, (5) Barts Heart Centre, London, (6) Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, (7) Department of Radiology, Severance Hospital, South Korea, (8) UK Dementia Research Institute, Imperial College London)

arXiv_CV

arXiv_CV Segmentation Face Classification Quantitative Relation
Abstract

Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools, e.g. image segmentation methods, are employed to derive quantitative measures or biomarkers for later analyses. Manual inspection and visual QC of each segmentation isn’t feasible at large scale. However, it’s important to be able to automatically detect when a segmentation method fails so as to avoid inclusion of wrong measurements into subsequent analyses which could lead to incorrect conclusions. Methods: To overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4,800 cardiac magnetic resonance scans. We then apply our method to a large cohort of 7,250 cardiac MRI on which we have performed manual QC. Results: We report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4,800 scans for which manual segmentations were available. We mimic real-world application of the method on 7,250 cardiac MRI where we show good agreement between predicted quality metrics and manual visual QC scores. Conclusions: We show that RCA has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09351

PDF

http://arxiv.org/pdf/1901.09351
Read All
A deep learning pipeline for product recognition on store shelves

2019-01-27

Alessio Tonioni, Eugenio Serra, Luigi Di Stefano

arXiv_CV

arXiv_CV Image_Retrieval Object_Detection Embedding Deep_Learning Detection Recognition
Abstract

Recognition of grocery products in store shelves poses peculiar challenges. Firstly, the task mandates the recognition of an extremely high number of different items, in the order of several thousands for medium-small shops, with many of them featuring small inter and intra class variability. Then, available product databases usually include just one or a few studio-quality images per product (referred to herein as reference images), whilst at test time recognition is performed on pictures displaying a portion of a shelf containing several products and taken in the store by cheap cameras (referred to as query images). Moreover, as the items on sale in a store as well as their appearance change frequently over time, a practical recognition system should handle seamlessly new products/packages. Inspired by recent advances in object detection and image retrieval, we propose to leverage on state of the art object detectors based on deep learning to obtain an initial productagnostic item detection. Then, we pursue product recognition through a similarity search between global descriptors computed on reference and cropped query images. To maximize performance, we learn an ad-hoc global descriptor by a CNN trained on reference images based on an image embedding loss. Our system is computationally expensive at training time but can perform recognition rapidly and accurately at test time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.01733

PDF

http://arxiv.org/pdf/1810.01733
Read All
Fixup Initialization: Residual Learning Without Normalization

2019-01-27

Hongyi Zhang, Yann N. Dauphin, Tengyu Ma

arXiv_CV

arXiv_CV Regularization Image_Classification Classification
Abstract

Normalization layers are a staple in state-of-the-art deep neural network architectures. They are widely believed to stabilize training, enable higher learning rate, accelerate convergence and improve generalization, though the reason for their effectiveness is still an active research topic. In this work, we challenge the commonly-held beliefs by showing that none of the perceived benefits is unique to normalization. Specifically, we propose fixed-update initialization (Fixup), an initialization motivated by solving the exploding and vanishing gradient problem at the beginning of training via properly rescaling a standard initialization. We find training residual networks with Fixup to be as stable as training with normalization – even for networks with 10,000 layers. Furthermore, with proper regularization, Fixup enables residual networks without normalization to achieve state-of-the-art performance in image classification and machine translation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09321

PDF

http://arxiv.org/pdf/1901.09321
Read All
Modularization of End-to-End Learning: Case Study in Arcade Games

2019-01-27

Andrew Melnik, Sascha Fleer, Malte Schilling, Helge Ritter

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better generalisation capability. Here, we consider arcade-game environments as sets of interacting objects (controllable, non-controllable) and propose a set of functional modules that are specialized on mastering different types of interactions in a broad range of environments. The modules utilize regression, supervised learning, and reinforcement learning algorithms. Results of this case study in different Atari games suggest that human-level performance can be achieved by a learning agent within a human amount of game experience (10-15 minutes game time) when a proper decomposition of an environment or a task is provided. However, automatization of such decomposition remains a challenging problem. This case study shows how a model of a causal structure underlying an environment or a task can benefit learning time and generalization capability of the agent, and argues in favor of exploiting modular structure in contrast to using pure end-to-end learning approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09895

PDF

http://arxiv.org/pdf/1901.09895
Read All
Nonlinear Distributional Gradient Temporal-Difference Learning

2019-01-27

Chao Qu, Shie Mannor, Huan Xu

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep{bellemare2017distributional}. In the policy evaluation setting, we design two new algorithms called distributional GTD2 and distributional TDC using the Cram{'e}r distance on the distributional version of the Bellman error objective function, which inherits advantages of both the nonlinear gradient TD algorithms and the distributional RL approach. In the control setting, we propose the distributional Greedy-GQ using the similar derivation. We prove the asymptotic almost-sure convergence of distributional GTD2 and TDC to a local optimal solution for general smooth function approximators, which includes neural networks that have been widely used in recent study to solve the real-life RL problems. In each step, the computational complexities of above three algorithms are linear w.r.t.\ the number of the parameters of the function approximator, thus can be implemented efficiently for neural networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.07732

PDF

http://arxiv.org/pdf/1805.07732
Read All
Variational Smoothing in Recurrent Neural Network Language Models

2019-01-27

Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama

arXiv_CL

arXiv_CL Embedding RNN Language_Model Prediction
Abstract

We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017). We show that each variant of data noising is an instance of Bayesian recurrent neural networks with a particular variational distribution (i.e., a mixture of Gaussians whose weights depend on statistics derived from the corpus such as the unigram distribution). We use this insight to propose a more principled method to apply at prediction time and propose natural extensions to data noising under the variational framework. In particular, we propose variational smoothing with tied input and output embedding matrices and an element-wise variational smoothing method. We empirically verify our analysis on two benchmark language modeling datasets and demonstrate performance improvements over existing data noising methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09296

PDF

http://arxiv.org/pdf/1901.09296
Read All
Composing Distributed Data-intensive Web Services Using a Flexible Memetic Algorithm

2019-01-26

Soheila Sadeghiram, Hui Ma, Gang Chen

arXiv_AI

arXiv_AI
Abstract

Web Service Composition (WSC) is a particularly promising application of Web services, where multiple individual services with specific functionalities are composed to accomplish a more complex task, which must fulfil functional requirements and optimise Quality of Service (QoS) attributes, simultaneously. Additionally, large quantities of data, produced by technological advances, need to be exchanged between services. Data-intensive Web services, which manipulate and deal with those data, are of great interest to implement data-intensive processes, such as distributed Data-intensive Web Service Composition (DWSC). Researchers have proposed Evolutionary Computing (EC) fully-automated WSC techniques that meet all the above factors. Some of these works employed Memetic Algorithms (MAs) to enhance the performance of EC through increasing its exploitation ability of in searching neighbourhood area of a solution. However, those works are not efficient or effective. This paper proposes an MA-based approach to solving the problem of distributed DWSC in an effective and efficient manner. In particular, we develop an MA that hybridises EC with a flexible local search technique incorporating distance of services. An evaluation using benchmark datasets is carried out, comparing existing state-of-the-art methods. Results show that our proposed method has the highest quality and an acceptable execution time overall.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09894

PDF

http://arxiv.org/pdf/1901.09894
Read All

175/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL