Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Video from Stills: Lensless Imaging with Rolling Shutter

2019-05-30

Nick Antipa, Patrick Oare, Emrah Bostan, Ren Ng, Laura Waller

arXiv_CV

arXiv_CV Sparse
Abstract

Because image sensor chips have a finite bandwidth with which to read out pixels, recording video typically requires a trade-off between frame rate and pixel count. Compressed sensing techniques can circumvent this trade-off by assuming that the image is compressible. Here, we propose using multiplexing optics to spatially compress the scene, enabling information about the whole scene to be sampled from a row of sensor pixels, which can be read off quickly via a rolling shutter CMOS sensor. Conveniently, such multiplexing can be achieved with a simple lensless, diffuser-based imaging system. Using sparse recovery methods, we are able to recover 140 video frames at over 4,500 frames per second, all from a single captured image with a rolling shutter sensor. Our proof-of-concept system uses easily-fabricated diffusers paired with an off-the-shelf sensor. The resulting prototype enables compressive encoding of high frame rate video into a single rolling shutter exposure, and exceeds the sampling-limited performance of an equivalent global shutter system for sufficiently sparse objects.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13221

PDF

http://arxiv.org/pdf/1905.13221
Read All
On Network Design Spaces for Visual Recognition

2019-05-30

Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár

arXiv_CV

arXiv_CV NAS Recognition
Abstract

Over the past several years progress in designing better neural network architectures for visual recognition has been substantial. To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing network architectures. In particular, we introduce a new comparison paradigm of distribution estimates, in which network design spaces are compared by applying statistical techniques to populations of sampled models, while controlling for confounding factors like network complexity. Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape. As a case study, we examine design spaces used in neural architecture search (NAS). We find significant statistical differences between recent NAS design space variants that have been largely overlooked. Furthermore, our analysis reveals that the design spaces for standard model families like ResNeXt can be comparable to the more complex ones used in recent NAS work. We hope these insights into distribution analysis will enable more robust progress toward discovering better networks for visual recognition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13214

PDF

http://arxiv.org/pdf/1905.13214
Read All
What Can Neural Networks Reason About?

2019-05-30

Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka

arXiv_AI

arXiv_AI Relation
Abstract

Neural networks have successfully been applied to solving reasoning tasks, ranging from learning simple concepts like “close to”, to intricate questions whose reasoning procedures resemble algorithms. Empirically, not all network structures work equally well for reasoning. For example, Graph Neural Networks have achieved impressive empirical results, while less structured neural networks may fail to learn to reason. Theoretically, there is currently limited understanding of the interplay between reasoning tasks and network learning. In this paper, we develop a framework to characterize which tasks a neural network can learn well, by studying how well its structure aligns with the algorithmic structure of the relevant reasoning procedure. This suggests that Graph Neural Networks can learn dynamic programming, a powerful algorithmic strategy that solves a broad class of reasoning problems, such as relational question answering, sorting, intuitive physics, and shortest paths. Our perspective also implies strategies to design neural architectures for complex reasoning. On several abstract reasoning tasks, we see empirically that our theory aligns well with practice.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13211

PDF

http://arxiv.org/pdf/1905.13211
Read All
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

2019-05-30

Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova

arXiv_CV

arXiv_CV Image_Caption Video_Caption CNN
Abstract

Learning to represent videos is a very challenging task both algorithmically and computationally. Standard video CNN architectures have been designed by directly extending architectures devised for image understanding to a third dimension (using a limited number of space-time modules such as 3D convolutions) or by introducing a handcrafted two-stream design to capture both appearance and motion in videos. We interpret a video CNN as a collection of multi-stream space-time convolutional blocks connected to each other, and propose the approach of automatically finding neural architectures with better connectivity for video understanding. This is done by evolving a population of overly-connected architectures guided by connection weight learning. Architectures combining representations that abstract different input types (i.e., RGB and optical flow) at multiple temporal resolutions are searched for, allowing different types or sources of information to interact with each other. Our method, referred to as AssembleNet, outperforms prior approaches on public video datasets, in some cases by a great margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13209

PDF

http://arxiv.org/pdf/1905.13209
Read All
An attention-based multi-resolution model for prostate whole slide imageclassification and localization

2019-05-30

Jiayun Li, Wenyuan Li, Arkadiusz Gertych, Beatrice S. Knudsen, William Speier, Corey W. Arnold

arXiv_CV

arXiv_CV Salient Review Attention Classification Detection
Abstract

Histology review is often used as the `gold standard’ for disease diagnosis. Computer aided diagnosis tools can potentially help improve current pathology workflows by reducing examination time and interobserver variability. Previous work in cancer grading has focused mainly on classifying pre-defined regions of interest (ROIs), or relied on large amounts of fine-grained labels. In this paper, we propose a two-stage attention-based multiple instance learning model for slide-level cancer grading and weakly-supervised ROI detection and demonstrate its use in prostate cancer. Compared with existing Gleason classification models, our model goes a step further by utilizing visualized saliency maps to select informative tiles for fine-grained grade classification. The model was primarily developed on a large-scale whole slide dataset consisting of 3,521 prostate biopsy slides with only slide-level labels from 718 patients. The model achieved state-of-the-art performance for prostate cancer grading with an accuracy of 85.11\% for classifying benign, low-grade (Gleason grade 3+3 or 3+4), and high-grade (Gleason grade 4+3 or higher) slides on an independent test set.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13208

PDF

http://arxiv.org/pdf/1905.13208
Read All
Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections

2019-05-30

Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov, Gal Novik

arXiv_AI

arXiv_AI Detection Relation
Abstract

Quantifying and measuring uncertainty in deep neural networks, despite recent important advances, is still an open problem. Bayesian neural networks are a powerful solution, where the prior over network weights is a design choice, often a normal distribution or other distribution encouraging sparsity. However, this prior is agnostic to the generative process of the input data, which might lead to unwarranted generalization for out-of-distribution tested data. We suggest treating the generative process of the input data as a confounder for the relation between the input and the discriminative function, thereby conditioning the prior of the network weights on the distribution of the input. We propose an algorithm for modeling this confounder through neural connectivity patterns. This approach is ultimately translated into a new deep architecture—a compact hierarchy of networks. We demonstrate that sampling networks from this hierarchy, proportionally to their posterior, is efficient and enables estimating various types of uncertainties. Empirical evaluations of our method demonstrate significant improvement compared to state-of-the-art calibration and out-of-distribution detection methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13195

PDF

http://arxiv.org/pdf/1905.13195
Read All
Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

2019-05-30

Simon S. Du, Kangcheng Hou, Barnabás Póczos, Ruslan Salakhutdinov, Ruosong Wang, Keyulu Xu

arXiv_AI

arXiv_AI Classification Gradient_Descent
Abstract

While graph kernels (GKs) are easy to train and enjoy provable theoretical guarantees, their practical performances are limited by their expressive power, as the kernel function often depends on hand-crafted combinatorial features of graphs. Compared to graph kernels, graph neural networks (GNNs) usually achieve better practical performance, as GNNs use multi-layer architectures and non-linear activation functions to extract high-order information of graphs as features. However, due to the large number of hyper-parameters and the non-convex nature of the training procedure, GNNs are harder to train. Theoretical guarantees of GNNs are also not well-understood. Furthermore, the expressive power of GNNs scales with the number of parameters, and thus it is hard to exploit the full power of GNNs when computing resources are limited. The current paper presents a new class of graph kernels, Graph Neural Tangent Kernels (GNTKs), which correspond to \emph{infinitely wide} multi-layer GNNs trained by gradient descent. GNTKs enjoy the full expressive power of GNNs and inherit advantages of GKs. Theoretically, we show GNTKs provably learn a class of smooth functions on graphs. Empirically, we test GNTKs on graph classification datasets and show they achieve strong performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13192

PDF

http://arxiv.org/pdf/1905.13192
Read All
Ridesharing with Driver Location Preferences

2019-05-30

Duncan Rheingans-Yoo, Scott Duke Kominers, Hongyao Ma, David C. Parkes

arXiv_AI

arXiv_AI
Abstract

We study revenue-optimal pricing and driver compensation in ridesharing platforms when drivers have heterogeneous preferences over locations. If a platform ignores drivers’ location preferences, it may make inefficient trip dispatches; moreover, drivers may strategize so as to route towards their preferred locations. In a model with stationary and continuous demand and supply, we present a mechanism that incentivizes drivers to both (i) report their location preferences truthfully and (ii) always provide service. In settings with unconstrained driver supply or symmetric demand patterns, our mechanism achieves (full-information) first-best revenue. Under supply constraints and unbalanced demand, we show via simulation that our mechanism improves over existing mechanisms and has performance close to the first-best.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13191

PDF

http://arxiv.org/pdf/1905.13191
Read All
Better Future through AI: Avoiding Pitfalls and Guiding AI Towards its Full Potential

2019-05-30

Risto Miikkulainen, Bret Greenstein, Babak Hodjat, Jerry Smith

arXiv_AI

arXiv_AI
Abstract

Artificial Intelligence (AI) technology is rapidly changing many areas of society. While there is tremendous potential in this transition, there are several pitfalls as well. Using the history of computing and the world-wide web as a guide, in this article we identify those pitfalls and actions that lead AI development to its full potential. If done right, AI will be instrumental in achieving the goals we set for economy, society, and the world in general.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13178

PDF

http://arxiv.org/pdf/1905.13178
Read All
General Support-Effective Decomposition for Multi-Directional 3D Printing

2019-05-30

Chenming Wu, Chengkai Dai, Guoxin Fang, Yong-Jin Liu, Charlie C.L. Wang

arXiv_RO

arXiv_RO
Abstract

We present a method to fabricate general models by multi-directional 3D printing systems, in which different regions of a model are printed along different directions. The core of our method is a support-effective volume decomposition algorithm that targets on minimizing the area of the regions with large overhangs. Optimal volume decomposition represented by a sequence of clipping planes is determined by a beam-guided searching algorithm according to manufacturing constraints. Different from existing approaches that need to manually assemble 3D printed components into a final model, regions decomposed by our algorithm can be automatically fabricated on a multi-directional 3D printing system. Our approach is general and can be applied to models with loops and handles. For those models that cannot completely eliminate supporting structures for large overhangs, an algorithm is developed to generate special supporting structures for multi-directional 3D printing. We developed two different hardware systems to physically verify the effectiveness of our method: a Cartesian-motion based system and an angular-motion based system. A variety of 3D models have been successfully fabricated on these systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.00606

PDF

http://arxiv.org/pdf/1812.00606
Read All
Hierarchical Transformers for Multi-Document Summarization

2019-05-30

Yang Liu, Mirella Lapata

arXiv_AI

arXiv_AI Attention Summarization Relation
Abstract

In this paper, we develop a neural summarization model which can effectively process multiple input documents and distill Transformer architecture with the ability to encode documents in a hierarchical manner. We represent cross-document relationships via an attention mechanism which allows to share information as opposed to simply concatenating text spans and processing them as a flat sequence. Our model learns latent dependencies among textual units, but can also take advantage of explicit graph representations focusing on similarity or discourse relations. Empirical results on the WikiSum dataset demonstrate that the proposed architecture brings substantial improvements over several strong baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13164

PDF

http://arxiv.org/pdf/1905.13164
Read All
Deep Adversarial Social Recommendation

2019-05-30

Wenqi Fan, Tyler Derr, Yao Ma, Jianping Wang, Jiliang Tang, Qing Li

arXiv_AI

arXiv_AI Adversarial Represenation_Learning Optimization Recommendation
Abstract

Recent years have witnessed rapid developments on social recommendation techniques for improving the performance of recommender systems due to the growing influence of social networks to our daily life. The majority of existing social recommendation methods unify user representation for the user-item interactions (item domain) and user-user connections (social domain). However, it may restrain user representation learning in each respective domain, since users behave and interact differently in the two domains, which makes their representations to be heterogeneous. In addition, most of traditional recommender systems can not efficiently optimize these objectives, since they utilize negative sampling technique which is unable to provide enough informative guidance towards the training during the optimization process. In this paper, to address the aforementioned challenges, we propose a novel deep adversarial social recommendation framework DASO. It adopts a bidirectional mapping method to transfer users’ information between social domain and item domain using adversarial learning. Comprehensive experiments on two real-world datasets show the effectiveness of the proposed framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13160

PDF

http://arxiv.org/pdf/1905.13160
Read All
Grounding Language Attributes to Objects using Bayesian Eigenobjects

2019-05-30

Vanya Cohen, Benjamin Burchfiel, Thao Nguyen, Nakul Gopalan, Stefanie Tellex, George Konidaris

arXiv_CV

arXiv_CV
Abstract

We develop a system to disambiguate objects based on simple physical descriptions. The system takes as input a natural language phrase and a depth image containing a segmented object and predicts how similar the observed object is to the described object. Our system is designed to learn from only a small amount of human-labeled language data and generalize to viewpoints not represented in the language-annotated depth-image training set. By decoupling 3D shape representation from language representation, our method is able to ground language to novel objects using a small amount of language-annotated depth-data and a larger corpus of unlabeled 3D object meshes, even when these objects are partially observed from unusual viewpoints. Our system is able to disambiguate between novel objects, observed via depth-images, based on natural language descriptions. Our method also enables view-point transfer; trained on human-annotated data on a small set of depth-images captured from frontal viewpoints, our system successfully predicted object attributes from rear views despite having no such depth images in its training set. Finally, we demonstrate our system on a Baxter robot, enabling it to pick specific objects based on human-provided natural language descriptions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13153

PDF

http://arxiv.org/pdf/1905.13153
Read All
Lattice-based lightly-supervised acoustic model training

2019-05-30

Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell

arXiv_CL

arXiv_CL Speech_Recognition Caption Language_Model Recognition
Abstract

In the broadcast domain there is an abundance of related text data and partial transcriptions, such as closed captions and subtitles. This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model. Current approaches to light supervision typically filter the data based on matching error rates between the transcriptions and biased decoding hypotheses. In contrast, semi-supervised training does not require matching text data, instead generating a hypothesis using a background language model. State-of-the-art semi-supervised training uses lattice-based supervision with the lattice-free MMI (LF-MMI) objective function. We propose a technique to combine inaccurate transcriptions with the lattices generated for semi-supervised training, thus preserving uncertainty in the lattice where appropriate. We demonstrate that this combined approach reduces the expected error rates over the lattices, and reduces the word error rate (WER) on a broadcast task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13150

PDF

http://arxiv.org/pdf/1905.13150
Read All
Prostate Cancer Detection using Deep Convolutional Neural Networks

2019-05-30

Sunghwan Yoo, Isha Gujrathi, Masoom A. Haider, Farzad Khalvati

arXiv_CV

arXiv_CV Object_Detection Segmentation CNN Detection
Abstract

Prostate cancer is one of the most common forms of cancer and the third leading cause of cancer death in North America. As an integrated part of computer-aided detection (CAD) tools, diffusion-weighted magnetic resonance imaging (DWI) has been intensively studied for accurate detection of prostate cancer. With deep convolutional neural networks (CNNs) significant success in computer vision tasks such as object detection and segmentation, different CNNs architectures are increasingly investigated in medical imaging research community as promising solutions for designing more accurate CAD tools for cancer detection. In this work, we developed and implemented an automated CNNs-based pipeline for detection of clinically significant prostate cancer (PCa) for a given axial DWI image and for each patient. DWI images of 427 patients were used as the dataset, which contained 175 patients with PCa and 252 healthy patients. To measure the performance of the proposed pipeline, a test set of 108 (out of 427) patients were set aside and not used in the training phase. The proposed pipeline achieved area under the receiver operating characteristic curve (AUC) of 0.87 (95% Confidence Interval (CI): 0.84-0.90) and 0.84 (95% CI: 0.76-0.91) at slice level and patient level, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13145

PDF

http://arxiv.org/pdf/1905.13145
Read All
Semantics-Aligned Representation Learning for Person Re-identification

2019-05-30

Xin Jin, Cuiling Lan, Wenjun Zeng, Guoqiang Wei, Zhibo Chen

arXiv_CV

arXiv_CV Re-identification Person_Re-identification Represenation_Learning Inference
Abstract

Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. This is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc. In this paper, we propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add triplet reID constraints/losses over the feature maps as the perceptual losses. The decoder is discarded in the inference/test and thus our scheme is computationally efficient. Ablation studies demonstrate the effectiveness of our design. We achieve the state-of-the-art performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the partial person reID dataset Partial REID.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13143

PDF

http://arxiv.org/pdf/1905.13143
Read All
Improved Fourier Mellin Invariant for Robust Rotation Estimation with Omni-cameras

2019-05-30

Qingwen Xu, Arturo Gomez Chavez, Heiko Bülow, Andreas Birk, Sören Schwertfeger

arXiv_CV

arXiv_CV Pose_Estimation
Abstract

Spectral methods such as the improved Fourier Mellin Invariant (iFMI) transform have proved faster, more robust and accurate than feature based methods on image registration. However, iFMI is restricted to work only when the camera moves in 2D space and has not been applied on omni-cameras images so far. In this work, we extend the iFMI method and apply a motion model to estimate an omni-camera’s pose when it moves in 3D space. This is particularly useful in field robotics applications to get a rapid and comprehensive view of unstructured environments, and to estimate robustly the robot pose. In the experiment section, we compared the extended iFMI method against ORB and AKAZE feature based approaches on three datasets showing different type of environments: office, lawn and urban scenery (MPI-omni dataset). The results show that our method boosts the accuracy of the robot pose estimation two to four times with respect to the feature registration techniques, while offering lower processing times. Furthermore, the iFMI approach presents the best performance against motion blur typically present in mobile robotics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.05306

PDF

http://arxiv.org/pdf/1811.05306
Read All
Standing on the Shoulders of Giants: AI-driven Calibration of Localisation Technologies

2019-05-30

Aftab Khan, Tim Farnham, Roget Kou, Usman Raza, Thajanee Premalal, Aleksandar Stanoev, William Thompson

arXiv_AI

arXiv_AI Tracking
Abstract

High accuracy localisation technologies exist but are prohibitively expensive to deploy for large indoor spaces such as warehouses, factories, and supermarkets to track assets and people. However, these technologies can be used to lend their highly accurate localisation capabilities to low-cost, commodity, and less-accurate technologies. In this paper, we bridge this link by proposing a technology-agnostic calibration framework based on artificial intelligence to assist such low-cost technologies through highly accurate localisation systems. A single-layer neural network is used to calibrate less accurate technology using more accurate one such as BLE using UWB and UWB using a professional motion tracking system. On a real indoor testbed, we demonstrate an increase in accuracy of approximately 70% for BLE and 50% for UWB. Not only the proposed approach requires a very short measurement campaign, the low complexity of the single-layer neural network also makes it ideal for deployment on constrained devices typically for localisation purposes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13118

PDF

http://arxiv.org/pdf/1905.13118
Read All
Towards Finding Longer Proofs

2019-05-30

Zsolt Zombori, Adrián Csiszárik, Henryk Michalewski, Cezary Kaliszyk, Josef Urban

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). FLoP focuses on generalizing from short proofs to longer ones of similar structure. To achieve that, FLoP uses state-of-the-art RL approaches that were previously not applied in theorem proving. In particular, we show that curriculum learning significantly outperforms previous learning-based proof guidance on a synthetic dataset of increasingly difficult arithmetic problems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13100

PDF

http://arxiv.org/pdf/1905.13100
Read All
A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

2019-05-30

Simon A. A. Kohl, Bernardino Romera-Paredes, Klaus H. Maier-Hein, Danilo Jimenez Rezende, S. M. Ali Eslami, Pushmeet Kohli, Andrew Zisserman, Olaf Ronneberger

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Prediction
Abstract

Medical imaging only indirectly measures the molecular identity of the tissue within each voxel, which often produces only ambiguous image evidence for target measures of interest, like semantic segmentation. This diversity and the variations of plausible interpretations are often specific to given image regions and may thus manifest on various scales, spanning all the way from the pixel to the image level. In order to learn a flexible distribution that can account for multiple scales of variations, we propose the Hierarchical Probabilistic U-Net, a segmentation network with a conditional variational auto-encoder (cVAE) that uses a hierarchical latent space decomposition. We show that this model formulation enables sampling and reconstruction of segmenations with high fidelity, i.e. with finely resolved detail, while providing the flexibility to learn complex structured distributions across scales. We demonstrate these abilities on the task of segmenting ambiguous medical scans as well as on instance segmentation of neurobiological and natural images. Our model automatically separates independent factors across scales, an inductive bias that we deem beneficial in structured output prediction tasks beyond segmentation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13077

PDF

http://arxiv.org/pdf/1905.13077
Read All
Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness

2019-05-30

Adnan Siraj Rakin, Zhezhi He, Li Yang, Yanzhi Wang, Liqiang Wang, Deliang Fan

arXiv_CV

arXiv_CV Regularization Adversarial Sparse Gradient_Descent
Abstract

Deep Neural Network (DNN) trained by the gradient descent method is known to be vulnerable to maliciously perturbed adversarial input, aka. adversarial attack. As one of the countermeasures against adversarial attack, increasing the model capacity for DNN robustness enhancement was discussed and reported as an effective approach by many recent works. In this work, we show that shrinking the model size through proper weight pruning can even be helpful to improve the DNN robustness under adversarial attack. For obtaining a simultaneously robust and compact DNN model, we propose a multi-objective training method called Robust Sparse Regularization (RSR), through the fusion of various regularization techniques, including channel-wise noise injection, lasso weight penalty, and adversarial training. We conduct extensive experiments across popular ResNet-20, ResNet-18 and VGG-16 DNN architectures to demonstrate the effectiveness of RSR against popular white-box (i.e., PGD and FGSM) and black-box attacks. Thanks to RSR, 85% weight connections of ResNet-18 can be pruned while still achieving 0.68% and 8.72% improvement in clean- and perturbed-data accuracy respectively on CIFAR-10 dataset, in comparison to its PGD adversarial training baseline.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13074

PDF

http://arxiv.org/pdf/1905.13074
Read All
Weakly Aggregative Modal Logic: Characterization and Interpolation

2019-05-30

Jixin Liu, Yanjing Wang, Yifeng Ding

arXiv_AI

arXiv_AI
Abstract

Weakly Aggregative Modal Logic (WAML) is a collection of disguised polyadic modal logics with n-ary modalities whose arguments are all the same. WAML has some interesting applications on epistemic logic and logic of games, so we study some basic model theoretical aspects of WAML in this paper. Specifically, we give a van Benthem-Rosen characterization theorem of WAML based on an intuitive notion of bisimulation and show that each basic WAML system K_n lacks Craig Interpolation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.10953

PDF

http://arxiv.org/pdf/1803.10953
Read All
Unbabel's Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing

2019-05-30

António V. Lopes, M. Amin Farajian, Gonçalo M. Correia, Jonay Trenous, André F. T. Martins

arXiv_CL

arXiv_CL
Abstract

This paper describes Unbabel’s submission to the WMT2019 APE Shared Task for the English-German language pair. Following the recent rise of large, powerful, pre-trained models, we adapt the BERT pretrained model to perform Automatic Post-Editing in an encoder-decoder framework. Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation tgt strings. Furthermore, we explore a conservativeness factor to constrain the APE system to perform fewer edits. As the official results show, when trained on a weighted combination of in-domain and artificial training data, our BED system with the conservativeness penalty improves significantly the translations of a strong Neural Machine Translation system by $-0.78$ and $+1.23$ in terms of TER and BLEU, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13068

PDF

http://arxiv.org/pdf/1905.13068
Read All
Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

2019-05-30

Sanghyun Woo, Dahun Kim, KwanYong Park, Joon-Young Lee, In So Kweon

arXiv_CV

arXiv_CV Attention Relation
Abstract

We propose a novel feed-forward network for video inpainting. We use a set of sampled video frames as the reference to take visible contents to fill the hole of a target frame. Our video inpainting network consists of two stages. The first stage is an alignment module that uses computed homographies between the reference frames and the target frame. The visible patches are then aggregated based on the frame similarity to fill in the target holes roughly. The second stage is a non-local attention module that matches the generated patches with known reference patches (in space and time) to refine the previous global alignment stage. Both stages consist of large spatial-temporal window size for the reference and thus enable modeling long-range correlations between distant information and the hole regions. Therefore, even challenging scenes with large or slowly moving holes can be handled, which have been hardly modeled by existing flow-based approach. Our network is also designed with a recurrent propagation stream to encourage temporal consistency in video results. Experiments on video object removal demonstrate that our method inpaints the holes with globally and locally coherent contents.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13066

PDF

http://arxiv.org/pdf/1905.13066
Read All
Neural Consciousness Flow

2019-05-30

Xiaoran Xu, Wei Feng, Zhiqing Sun, Zhi-Hong Deng

arXiv_AI

arXiv_AI Knowledge_Graph Knowledge Attention Reinforcement_Learning Embedding Deep_Learning
Abstract

The ability of reasoning beyond data fitting is substantial to deep learning systems in order to make a leap forward towards artificial general intelligence. A lot of efforts have been made to model neural-based reasoning as an iterative decision-making process based on recurrent networks and reinforcement learning. Instead, inspired by the consciousness prior proposed by Yoshua Bengio, we explore reasoning with the notion of attentive awareness from a cognitive perspective, and formulate it in the form of attentive message passing on graphs, called neural consciousness flow (NeuCFlow). Aiming to bridge the gap between deep learning systems and reasoning, we propose an attentive computation framework with a three-layer architecture, which consists of an unconsciousness flow layer, a consciousness flow layer, and an attention flow layer. We implement the NeuCFlow model with graph neural networks (GNNs) and conditional transition matrices. Our attentive computation greatly reduces the complexity of vanilla GNN-based methods, capable of running on large-scale graphs. We validate our model for knowledge graph reasoning by solving a series of knowledge base completion (KBC) tasks. The experimental results show NeuCFlow significantly outperforms previous state-of-the-art KBC methods, including the embedding-based and the path-based. The reproducible code can be found by the link below.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13049

PDF

http://arxiv.org/pdf/1905.13049
Read All
Memory-efficient and fast implementation of local adaptive binarization methods

2019-05-30

Chungkwong Chan

arXiv_CV

arXiv_CV Segmentation Recognition
Abstract

Binarization is widely used as an image preprocessing step to separate object especially text from background before recognition. For noisy images with uneven illumination, threshold values should be computed pixel by pixel to obtain a good segmentation. Since local threshold values typically depend on moments-based statistics such as mean and variance of gray levels inside rectangular windows, integral images are commonly used to accelerate the calculation. However, integral images are memory consuming. For Sauvola’s method, the two integral images occupy $16HW$ bytes given a $H\times W$ input image. By using a recursive technique to avoid integral images, memory usage of intermediate data structures can be reduced significantly to $6\min{H,W}$ bytes, while the time complexity remains $O(HW)$ independent of window size. Therefore, the proposed implementation enable various local adaptive binarization methods to be applied in real-time use cases on devices with limited resources.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.13038

PDF

http://arxiv.org/pdf/1905.13038
Read All
Multiple Character Embeddings for Chinese Word Segmentation

2019-05-30

Jingkang Wang, Jianing Zhou, Jie Zhou, Gongshen Liu

arXiv_CL

arXiv_CL Segmentation Embedding RNN
Abstract

Chinese word segmentation (CWS) is often regarded as a character-based sequence labeling task in most current works which have achieved great success with the help of powerful neural networks. However, these works neglect an important clue: Chinese characters incorporate both semantic and phonetic meanings. In this paper, we introduce multiple character embeddings including Pinyin Romanization and Wubi Input, both of which are easily accessible and effective in depicting semantics of characters. We propose a novel shared Bi-LSTM-CRF model to fuse linguistic features efficiently by sharing the LSTM network during the training procedure. Extensive experiments on five corpora show that extra embeddings help obtain a significant improvement in labeling accuracy. Specifically, we achieve the state-of-the-art performance in AS and CityU corpora with F1 scores of 96.9 and 97.3, respectively without leveraging any external lexical resources.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.04963

PDF

http://arxiv.org/pdf/1808.04963
Read All
Generating Material Maps to Map Informal Settlements

2019-05-30

Patrick Helber, Bradley Gram-Hansen, Indhu Varatharajan, Faiza Azam, Alejandro Coca-Castro, Veronika Kopackova, Piotr Bilinski

arXiv_AI

arXiv_AI Knowledge GAN
Abstract

Detecting and mapping informal settlements encompasses several of the United Nations sustainable development goals. This is because informal settlements are home to the most socially and economically vulnerable people on the planet. Thus, understanding where these settlements are is of paramount importance to both government and non-government organizations (NGOs), such as the United Nations Children’s Fund (UNICEF), who can use this information to deliver effective social and economic aid. We propose a method that detects and maps the locations of informal settlements using only freely available, Sentinel-2 low-resolution satellite spectral data and socio-economic data. This is in contrast to previous studies that only use costly very-high resolution (VHR) satellite and aerial imagery. We show how we can detect informal settlements by combining both domain knowledge and machine learning techniques, to build a classifier that looks for known roofing materials used in informal settlements. Please find additional material at https://frontierdevelopmentlab.github.io/informal-settlements/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.00786

PDF

http://arxiv.org/pdf/1812.00786
Read All
Generalized Separable Nonnegative Matrix Factorization

2019-05-30

Junjun Pan, Nicolas Gillis

arXiv_CV

arXiv_CV Optimization
Abstract

Nonnegative matrix factorization (NMF) is a linear dimensionality technique for nonnegative data with applications such as image analysis, text mining, audio source separation and hyperspectral unmixing. Given a data matrix $M$ and a factorization rank $r$, NMF looks for a nonnegative matrix $W$ with $r$ columns and a nonnegative matrix $H$ with $r$ rows such that $M \approx WH$. NMF is NP-hard to solve in general. However, it can be computed efficiently under the separability assumption which requires that the basis vectors appear as data points, that is, that there exists an index set $\mathcal{K}$ such that $W = M(:,\mathcal{K})$. In this paper, we generalize the separability assumption: We only require that for each rank-one factor $W(:,k)H(k,:)$ for $k=1,2,\dots,r$, either $W(:,k) = M(:,j)$ for some $j$ or $H(k,:) = M(i,:)$ for some $i$. We refer to the corresponding problem as generalized separable NMF (GS-NMF). We discuss some properties of GS-NMF and propose a convex optimization model which we solve using a fast gradient method. We also propose a heuristic algorithm inspired by the successive projection algorithm. To verify the effectiveness of our methods, we compare them with several state-of-the-art separable NMF algorithms on synthetic, document and image data sets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12995

PDF

http://arxiv.org/pdf/1905.12995
Read All
Expressing Linear Orders Requires Exponential-Size DNNFs

2019-05-30

Ronald de Haan

arXiv_AI

arXiv_AI
Abstract

We show that any DNNF circuit that expresses the set of linear orders over a set of $n$ candidates must be of size $2^{\Omega(n)}$. Moreover, we show that there exist DNNF circuits of size $2^{O(n)}$ expressing linear orders over $n$ candidates.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.06397

PDF

http://arxiv.org/pdf/1807.06397
Read All
Data Complexity and Rewritability of Ontology-Mediated Queries in Metric Temporal Logic under the Event-Based Semantics

2019-05-30

Vladislav Ryzhikov, Przemyslaw Andrzej Walega, Michael Zakharyaschev

arXiv_AI

arXiv_AI Ontology
Abstract

We investigate the data complexity of answering queries mediated by metric temporal logic ontologies under the event-based semantics assuming that data instances are finite timed words timestamped with binary fractions. We identify classes of ontology-mediated queries answering which can be done in AC0, NC1, L, NL, P, and coNP for data complexity, provide their rewritings to first-order logic and its extensions with primitive recursion, transitive closure or datalog, and establish lower complexity bounds.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12990

PDF

http://arxiv.org/pdf/1905.12990
Read All
3D Reconstruction of Whole Stomach from Endoscope Video Using Structure-from-Motion

2019-05-30

Aji Resindra Widya, Yusuke Monno, Kosuke Imahori, Masatoshi Okutomi, Sho Suzuki, Takuji Gotoda, Kenji Miki

arXiv_CV

arXiv_CV GAN Face
Abstract

Gastric endoscopy is a common clinical practice that enables medical doctors to diagnose the stomach inside a body. In order to identify a gastric lesion’s location such as early gastric cancer within the stomach, this work addressed to reconstruct the 3D shape of a whole stomach with color texture information generated from a standard monocular endoscope video. Previous works have tried to reconstruct the 3D structures of various organs from endoscope images. However, they are mainly focused on a partial surface. In this work, we investigated how to enable structure-from-motion (SfM) to reconstruct the whole shape of a stomach from a standard endoscope video. We specifically investigated the combined effect of chromo-endoscopy and color channel selection on SfM. Our study found that 3D reconstruction of the whole stomach can be achieved by using red channel images captured under chromo-endoscopy by spreading indigo carmine (IC) dye on the stomach surface.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12988

PDF

http://arxiv.org/pdf/1905.12988
Read All
Partial Computing Offloading Assisted Cloud Point Registration in Multi-robot SLAM

2019-05-30

Biwei Li, Zhenqiang Mi, Yu Guo, Yang Yang, Mohammad S. Obaidat

arXiv_RO

arXiv_RO SLAM
Abstract

Multi-robot visual simultaneous localization and mapping (SLAM) system is normally consisted of multiple mobile robots equipped with camera and/or other visual sensors. The networked robots work independently or cooperatively in an unknown scene in order to solve autonomous localization and mapping problem. One of the most critical issues in Multi-robot visual SLAM is the intensive computation that is normally required yet overwhelming for inexpensive mobile robots with limited on-board resources. To address this problem, a novel task offloading strategy and dense point cloud map construction method is proposed in this paper. First, we develop a novel strategy to remotely offload computation-intensive tasks to cloud center, so that the tasks that could not originally be achieved locally on the resource-limited robot systems become possible. Second, a modified iterative closest point algorithm (ICP), named fitness score hierarchical ICP algorithm (FS-HICP), is developed to accelerate point cloud registration. The correctness, efficiency, and scalability of the proposed strategy are evaluated with both theoretical analysis and experimental simulations. The results show that the proposed method can effectively reduce the energy consumption while increase the computation capability and speed of the multi-robot visual SLAM system, especially in indoor environment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12973

PDF

http://arxiv.org/pdf/1905.12973
Read All
Quantifying consensus of rankings based on q-support patterns

2019-05-30

Zhengui Xue, Zhiwei Lin, Hui Wang, Sally McClean

arXiv_AI

arXiv_AI Quantitative Relation
Abstract

Rankings, representing preferences over a set of candidates, are widely used in many information systems, e.g., group decision making. It is of great importance to evaluate the consensus of the obtained rankings from multiple agents. There is often no ground truth available for a ranking task. An overall measure of the consensus degree enables us to have a clear cognition about the ranking data. Moreover, it could provide a quantitative indicator for consensus comparison between groups and further improvement of a ranking system. In this paper, a novel consensus quantifying approach, without the need for any correlation or distance functions, is proposed based on a concept of q-support patterns of rankings. The q-support patterns represent the commonality embedded in a set of rankings. A method for detecting outliers in a set of rankings is naturally derived from the proposed consensus quantifying approach. Experimental studies are conducted to demonstrate the effectiveness of the proposed approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12966

PDF

http://arxiv.org/pdf/1905.12966
Read All
Handling robot constraints within a Set-Based Multi-Task Priority Inverse Kinematics Framework

2019-05-30

Paolo Di Lillo, Stefano Chiaverini, Gianluca Antonelli

arXiv_RO

arXiv_RO Optimization
Abstract

Set-Based Multi-Task Priority is a recent framework to handle inverse kinematics for redundant structures. Both equality tasks, i.e., control objectives to be driven to a desired value, and set-bases tasks, i.e., control objectives to be satisfied with a set/range of values can be addressed in a rigorous manner within a priority framework. In addition, optimization tasks, driven by the gradient of a proper function, may be considered as well, usually as lower priority tasks. In this paper the proper design of the tasks, their priority and the use of a Set-Based Multi-Task Priority framework is proposed in order to handle several constraints simultaneously in real-time. It is shown that safety related tasks such as, e.g., joint limits or kinematic singularity, may be properly handled by consider them both at an higher priority as set-based task and at a lower within a proper optimization functional. Experimental results on a 7DOF Jaco$^2$

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12945

PDF

http://arxiv.org/pdf/1905.12945
Read All
A Realtime Autonomous Robot Navigation Framework for Human like High-level Interaction and Task Planning in Global Dynamic Environment

2019-05-30

Sung-Hyeon Joo, Sumaira Manzoor, Yuri Goncalves Rocha, Hyun-Uk Lee, Tae-Yong Kuc

arXiv_RO

arXiv_RO Knowledge SLAM
Abstract

In this paper, we present a framework for real-time autonomous robot navigation based on cloud and on-demand databases to address two major issues of human-like robot interaction and task planning in global dynamic environment, which is not known a priori. Our framework contributes to make human-like brain GPS mapping system for robot using spatial information and performs 3D visual semantic SLAM for independent robot navigation. We accomplish the feat by separating robot’s memory system into Long-Term Memory (LTM) and Short-Term Memory (STM). We also form robot’s behavior and knowledge system by linking these memories to Autonomous Navigation Module (ANM), Learning Module (LM), and Behavior Planner Module (BPM). The proposed framework is assessed through simulation using ROS-based Gazebo-simulated mobile robot, RGB-D camera (3D sensor) and a laser range finder (2D sensor) in 3D model of realistic indoor environment. Simulation corroborates the substantial practical merit of our proposed framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12942

PDF

http://arxiv.org/pdf/1905.12942
Read All
Learning Compositional Neural Programs with Recursive Tree Search and Planning

2019-05-30

Thomas Pierrot, Guillaume Ligner, Scott Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas

arXiv_AI

arXiv_AI Sparse Reinforcement_Learning
Abstract

We propose a novel reinforcement learning algorithm, AlphaNPI, that incorporates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and increase interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. Using this specification, AlphaNPI is able to train NPI models effectively with RL for the first time, completely eliminating the need for strong supervision in the form of execution traces. The experiments show that AlphaNPI can sort as well as previous strongly supervised NPI variants. The AlphaNPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disk

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12941

PDF

http://arxiv.org/pdf/1905.12941
Read All
A Hippocampus Model for Online One-Shot Storage of Pattern Sequences

2019-05-30

Jan Melchior, Mehdi Bayati, Amir Azizi, Sen Cheng, Laurenz Wiskott

arXiv_AI

arXiv_AI Knowledge
Abstract

We present a computational model based on the CRISP theory (Content Representation, Intrinsic Sequences, and Pattern completion) of the hippocampus that allows to continuously store pattern sequences online in a one-shot fashion. Rather than storing a sequence in CA3, CA3 provides a pre-trained sequence that is hetero-associated with the input sequence, which allows the system to perform one-shot learning. Plasticity on a short time scale therefore only happens in the incoming and outgoing connections of CA3. Stored sequences can later be recalled from a single cue pattern. We identify the pattern separation performed by subregion DG to be necessary for storing sequences that contain correlated patterns. A design principle of the model is that we use a single learning rule named Hebbiand-escent to train all parts of the system. Hebbian-descent has an inherent forgetting mechanism that allows the system to continuously memorize new patterns while forgetting early stored ones. The model shows a plausible behavior when noisy and new patterns are presented and has a rather high capacity of about 40% in terms of the number of neurons in CA3. One notable property of our model is that it is capable of boot-strapping' (improving) itself without external input in a process we refer to as dreaming’. Besides artificially generated input sequences we also show that the model works with sequences of encoded handwritten digits or natural images. To our knowledge this is the first model of the hippocampus that allows to store correlated pattern sequences online in a one-shot fashion without a consolidation process, which can instantaneously be recalled later.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12937

PDF

http://arxiv.org/pdf/1905.12937
Read All
Assistive robot operated via P300-based Brain Computer Interface

2019-05-30

Filippo Arrichiello, Paolo Di Lillo, Daniele Di Vito, Gianluca Antonelli, Stefano Chiaverini

arXiv_RO

arXiv_RO Face
Abstract

In this paper we present an architecture for the operation of an assistive robot finally aimed at allowing users with severe motion disabilities to perform manipulation tasks that may help in daily-life operations. The robotic system, based on a lightweight robot manipulator, receives high level commands from the user through a Brain-Computer Interface based on P300 paradigm. The motion of the manipulator is controlled relying on a closed loop inverse kinematic algorithm that simultaneously manages multiple set-based and equality-based tasks. The software architecture is developed relying on widely used frameworks to operate BCIs and robots (namely, BCI2000 for the operation of the BCI and ROS for the control of the manipulator) integrating control, perception and communication modules developed for the application at hand. Preliminary experiments have been conducted to show the potentialities of the developed architecture.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12927

PDF

http://arxiv.org/pdf/1905.12927
Read All
Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation

2019-05-30

Ke Wang, Hang Hua, Xiaojun Wan

arXiv_CL

arXiv_CL Sentiment Optimization
Abstract

Unsupervised text attribute transfer automatically transforms a text to alter a specific attribute (e.g. sentiment) without using any parallel data, while simultaneously preserving its attribute-independent content. The dominant approaches are trying to model the content-independent attribute separately, e.g., learning different attributes’ representations or using multiple attribute-specific decoders. However, it may lead to inflexibility from the perspective of controlling the degree of transfer or transferring over multiple aspects at the same time. To address the above problems, we propose a more flexible unsupervised text attribute transfer framework which replaces the process of modeling attribute with minimal editing of latent representations based on an attribute classifier. Specifically, we first propose a Transformer-based autoencoder to learn an entangled latent representation for a discrete text, then we transform the attribute transfer task to an optimization problem and propose the Fast-Gradient-Iterative-Modification algorithm to edit the latent representation until conforming to the target attribute. Extensive experimental results demonstrate that our model achieves very competitive performance on three public data sets. Furthermore, we also show that our model can not only control the degree of transfer freely but also allow to transfer over multiple aspects at the same time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12926

PDF

http://arxiv.org/pdf/1905.12926
Read All
Bayesian Grasp: Robotic visual stable grasp based on prior tactile knowledge

2019-05-30

Teng Xue, Wenhai Liu, Mingshuo Han, Zhenyu Pan, Jin Ma, Quanquan Shao, Weiming Wang

arXiv_RO

arXiv_RO Knowledge Detection
Abstract

Robotic grasp detection is a fundamental capability for intelligent manipulation in unstructured environments. Previous work mainly employed visual and tactile fusion to achieve stable grasp, while, the whole process depending heavily on regrasping, which wastes much time to regulate and evaluate. We propose a novel way to improve robotic grasping: by using learned tactile knowledge, a robot can achieve a stable grasp from an image. First, we construct a prior tactile knowledge learning framework with novel grasp quality metric which is determined by measuring its resistance to external perturbations. Second, we propose a multi-phases Bayesian Grasp architecture to generate stable grasp configurations through a single RGB image based on prior tactile knowledge. Results show that this framework can classify the outcome of grasps with an average accuracy of 86% on known objects and 79% on novel objects. The prior tactile knowledge improves the successful rate of 55% over traditional vision-based strategies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12920

PDF

http://arxiv.org/pdf/1905.12920
Read All
A Survey Of Cross-lingual Word Embedding Models

2019-05-30

Sebastian Ruder, Ivan Vulić, Anders Søgaard

arXiv_CL

arXiv_CL Survey Embedding Optimization
Abstract

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1706.04902

PDF

http://arxiv.org/pdf/1706.04902
Read All
Learning Semantics-aware Distance Map with Semantics Layering Network for Amodal Instance Segmentation

2019-05-30

Ziheng Zhang, Anpei Chen, Ling Xie, Jingyi Yu, Shenghua Gao

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

In this work, we demonstrate yet another approach to tackle the amodal segmentation problem. Specifically, we first introduce a new representation, namely a semantics-aware distance map (sem-dist map), to serve as our target for amodal segmentation instead of the commonly used masks and heatmaps. The sem-dist map is a kind of level-set representation, of which the different regions of an object are placed into different levels on the map according to their visibility. It is a natural extension of masks and heatmaps, where modal, amodal segmentation, as well as depth order information, are all well-described. Then we also introduce a novel convolutional neural network (CNN) architecture, which we refer to as semantic layering network, to estimate sem-dist maps layer by layer, from the global-level to the instance-level, for all objects in an image. Extensive experiments on the COCOA and D2SA datasets have demonstrated that our framework can predict amodal segmentation, occlusion and depth order with state-of-the-art performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12898

PDF

http://arxiv.org/pdf/1905.12898
Read All
A Compare-Aggregate Model with Latent Clustering for Answer Selection

2019-05-30

Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung

arXiv_AI

arXiv_AI QA Transfer_Learning Language_Model
Abstract

In this paper, we propose a novel method for a sentence-level answer-selection task that is one of the fundamental problems in natural language processing. First, we explore the effect of additional information by adopting a pretrained language model to compute the vector representation of the input text and by applying transfer learning from a large-scale corpus. Second, we enhance the compare-aggregate model by proposing a novel latent clustering method to compute additional information within the target corpus and by changing the objective function from listwise to pointwise. To evaluate the performance of the proposed approaches, experiments are performed with the WikiQA and TREC-QA datasets. The empirical results demonstrate the superiority of our proposed approach, which achieve state-of-the-art performance on both datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12897

PDF

http://arxiv.org/pdf/1905.12897
Read All
Imitation Learning as $f$-Divergence Minimization

2019-05-30

Liyiming Ke, Matt Barnes, Wen Sun, Gilwoo Lee, Sanjiban Choudhury, Siddhartha Srinivasa

arXiv_RO

arXiv_RO
Abstract

We address the problem of imitation learning with multi-modal demonstrations. Instead of attempting to learn all modes, we argue that in many tasks it is sufficient to imitate any one of them. We show that the state-of-the-art methods such as GAIL and behavior cloning, due to their choice of loss function, often incorrectly interpolate between such modes. Our key insight is to minimize the right divergence between the learner and the expert state-action distributions, namely the reverse KL divergence or I-projection. We propose a general imitation learning framework for estimating and minimizing any f-Divergence. By plugging in different divergences, we are able to recover existing algorithms such as Behavior Cloning (Kullback-Leibler), GAIL (Jensen Shannon) and Dagger (Total Variation). Empirical results show that our approximate I-projection technique is able to imitate multi-modal behaviors more reliably than GAIL and behavior cloning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12888

PDF

http://arxiv.org/pdf/1905.12888
Read All
Does computer vision matter for action?

2019-05-30

Brady Zhou, Philipp Krähenbühl, Vladlen Koltun

arXiv_AI

arXiv_AI
Abstract

Computer vision produces representations of scene content. Much computer vision research is predicated on the assumption that these intermediate representations are useful for action. Recent work at the intersection of machine learning and robotics calls this assumption into question by training sensorimotor systems directly for the task at hand, from pixels to actions, with no explicit intermediate representations. Thus the central question of our work: Does computer vision matter for action? We probe this question and its offshoots via immersive simulation, which allows us to conduct controlled reproducible experiments at scale. We instrument immersive three-dimensional environments to simulate challenges such as urban driving, off-road trail traversal, and battle. Our main finding is that computer vision does matter. Models equipped with intermediate representations train faster, achieve higher task performance, and generalize better to previously unseen environments. A video that summarizes the work and illustrates the results can be found at https://youtu.be/4MfWa2yZ0Jc

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12887

PDF

http://arxiv.org/pdf/1905.12887
Read All
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

2019-05-30

Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai

arXiv_CV

arXiv_CV Object_Detection Segmentation Semantic_Segmentation Detection
Abstract

Existing Earth Vision datasets are either suitable for semantic segmentation or object detection. In this work, we introduce the first benchmark dataset for instance segmentation in aerial imagery that combines instance-level object detection and pixel-level segmentation tasks. In comparison to instance segmentation in natural scenes, aerial images present unique challenges e.g., a huge number of instances per image, large object-scale variations and abundant tiny objects. Our large-scale and densely annotated Instance Segmentation in Aerial Images Dataset (iSAID) comes with 655,451 object instances for 15 categories across 2,806 high-resolution images. Such precise per-pixel annotations for each instance ensure accurate localization that is essential for detailed scene analysis. Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances. We benchmark our dataset using two popular instance segmentation approaches for natural images, namely Mask R-CNN and PANet. In our experiments we show that direct application of off-the-shelf Mask R-CNN and PANet on aerial images provide suboptimal instance segmentation results, thus requiring specialized solutions from the research community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12886

PDF

http://arxiv.org/pdf/1905.12886
Read All
M-GWAP: An Online and Multimodal Game With A Purpose in WordPress for Mental States Annotation

2019-05-30

Fabio Paolizzo

arXiv_CL

arXiv_CL
Abstract

M-GWAP is a multimodal game with a purpose of that leverages on the wisdom of crowds phenomenon for the annotation of multimedia data in terms of mental states. This game with a purpose is developed in WordPress to allow users implementing the game without programming skills. The game adopts motivational strategies for the player to remain engaged, such as a score system, text motivators while playing, a ranking system to foster competition and mechanics for identify building. The current version of the game was deployed after alpha and beta testing helped refining the game accordingly.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12884

PDF

http://arxiv.org/pdf/1905.12884
Read All
P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification

2019-05-30

Bingzhe Wu, Shiwan Zhao, Guangyu Sun, Xiaolu Zhang, Zhong Su, Caihong Zeng, Zhihong Liu

arXiv_CV

arXiv_CV CNN Image_Classification Classification Quantitative Gradient_Descent
Abstract

Recently, deep convolutional neural networks (CNNs) have achieved great success in pathological image classification. However, due to the limited number of labeled pathological images, there are still two challenges to be addressed: (1) overfitting: the performance of a CNN model is undermined by the overfitting due to its huge amounts of parameters and the insufficiency of labeled training data. (2) privacy leakage: the model trained using a conventional method may involuntarily reveal the private information of the patients in the training dataset. The smaller the dataset, the worse the privacy leakage. To tackle the above two challenges, we introduce a novel stochastic gradient descent (SGD) scheme, named patient privacy preserving SGD (P3SGD), which performs the model update of the SGD in the patient level via a large-step update built upon each patient’s data. Specifically, to protect privacy and regularize the CNN model, we propose to inject the well-designed noise into the updates. Moreover, we equip our P3SGD with an elaborated strategy to adaptively control the scale of the injected noise. To validate the effectiveness of P3SGD, we perform extensive experiments on a real-world clinical dataset and quantitatively demonstrate the superior ability of P3SGD in reducing the risk of overfitting. We also provide a rigorous analysis of the privacy cost under differential privacy. Additionally, we find that the models trained with P3SGD are resistant to the model-inversion attack compared with those trained using non-private SGD.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12883

PDF

http://arxiv.org/pdf/1905.12883
Read All
Using Restart Heuristics to Improve Agent Performance in Angry Birds

2019-05-30

Tommy Liu, Jochen Renz, Peng Zhang, Matthew Stephenson

arXiv_AI

arXiv_AI
Abstract

Over the past few years the Angry Birds AI competition has been held in an attempt to develop intelligent agents that can successfully and efficiently solve levels for the video game Angry Birds. Many different agents and strategies have been developed to solve the complex and challenging physical reasoning problems associated with such a game. However none of these agents attempt one of the key strategies which humans employ to solve Angry Birds levels, which is restarting levels. Restarting is important in Angry Birds because sometimes the level is no longer solvable or some given shot made has little to no benefit towards the ultimate goal of the game. This paper proposes a framework and experimental evaluation for when to restart levels in Angry Birds. We demonstrate that restarting is a viable strategy to improve agent performance in many cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12877

PDF

http://arxiv.org/pdf/1905.12877
Read All

1/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL