We consider the feedback design for stabilizing a rigid body system by making and breaking multiple contacts with the environment without prespecifying the timing or the number of occurrence of the contacts. We model such a system as a discrete-time hybrid polynomial system, where the state-input space is partitioned into several polytopic regions with each region associated with a different polynomial dynamics equation. Based on the notion of occupation measures, we present a novel controller synthesis approach that solves finite-dimensional semidefinite programs as approximations to an infinite-dimensional linear program to stabilize the system. The optimization formulation is simple and convex, and for any fixed degree of approximations the computational complexity is polynomial in the state and control input dimensions. We illustrate our approach on some robotics examples.
http://arxiv.org/abs/1809.06715
Like medicine, psychology, or education, data science is fundamentally an applied discipline, with most students who receive advanced degrees in the field going on to work on practical problems. Unlike these disciplines, however, data science education remains heavily focused on theory and methods, and practical coursework typically revolves around cleaned or simplified data sets that have little analog in professional applications. We believe that the environment in which new data scientists are trained should more accurately reflect that in which they will eventually practice and propose here a data science master’s degree program that takes inspiration from the residency model used in medicine. Students in the suggested program would spend three years working on a practical problem with an industry, government, or nonprofit partner, supplemented with coursework in data science methods and theory. We also discuss how this program can also be implemented in shorter formats to augment existing professional masters programs in different disciplines. This approach to learning by doing is designed to fill gaps in our current approach to data science education and ensure that students develop the skills they need to practice data science in a professional context and under the many constraints imposed by that context.
http://arxiv.org/abs/1905.06875
Various power saving and contrast enhancement (PSCE) techniques have been applied to an organic light emitting diode (OLED) display for reducing the power demands of the display while preserving the image quality. In this paper, we propose a new deep learning-based PSCE scheme that can save power consumed by the OLED display while enhancing the contrast of the displayed image. In the proposed method, the power consumption is saved by simply reducing the brightness a certain ratio, whereas the perceived visual quality is preserved as much as possible by enhancing the contrast of the image using a convolutional neural network (CNN). Furthermore, our CNN can learn the PSCE technique without a reference image by unsupervised learning. Experimental results show that the proposed method is superior to conventional ones in terms of image quality assessment metrics such as a visual saliency-induced index (VSI) and a measure of enhancement (EME).
http://arxiv.org/abs/1905.05916
To bridge the gap between Machine Reading Comprehension (MRC) models and human beings, which is mainly reflected in the hunger for data and the robustness to noise, in this paper, we explore how to integrate the neural networks of MRC models with the general knowledge of human beings. On the one hand, we propose a data enrichment method, which uses WordNet to extract inter-word semantic connections as general knowledge from each given passage-question pair. On the other hand, we propose a new MRC model named as Knowledge Aided Reader (KAR), which explicitly utilizes the above extracted general knowledge in its attention mechanisms. Based on the data enrichment method, KAR is comparable in performance with the state-of-the-art MRC models and significantly more robust to noise than them. Besides, when only a subset (20% - 80%) of the training examples are available, KAR outperforms the state-of-the-art MRC models by a large margin and is still fairly robust to noise.
http://arxiv.org/abs/1809.03449
In this paper, we propose a \textit{weak supervision} framework for neural ranking tasks based on the data programming paradigm \citep{Ratner2016}, which enables us to leverage multiple weak supervision signals from different sources. Empirically, we consider two sources of weak supervision signals, unsupervised ranking functions and semantic feature similarities. We train a BERT-based passage-ranking model (which achieves new state-of-the-art performances on two benchmark datasets with full supervision) in our weak supervision framework. Without using ground-truth training labels, BERT-PR models outperform BM25 baseline by a large margin on all three datasets and even beat the previous state-of-the-art results with full supervision on two of the datasets.
http://arxiv.org/abs/1905.05910
Branch-and-bound (BnB) algorithms are widely used to solve combinatorial problems, and the performance crucially depends on its branching heuristic.In this work, we consider a typical problem of maximum common subgraph (MCS), and propose a branching heuristic inspired from reinforcement learning with a goal of reaching a tree leaf as early as possible to greatly reduce the search tree size.Extensive experiments show that our method is beneficial and outperforms current best BnB algorithm for the MCS.
http://arxiv.org/abs/1905.05840
One of the hallmarks of human intelligence is the ability to compose learned knowledge into novel concepts which can be recognized without a single training example. In contrast, current state-of-the-art methods require hundreds of training examples for each possible category to build reliable and accurate classifiers. To alleviate this striking difference in efficiency, we propose a task-driven modular architecture for compositional reasoning and sample efficient learning. Our architecture consists of a set of neural network modules, which are small fully connected layers operating in semantic concept space. These modules are configured through a gating function conditioned on the task to produce features representing the compatibility between the input image and the concept under consideration. This enables us to express tasks as a combination of sub-tasks and to generalize to unseen categories by reweighting a set of small modules. Furthermore, the network can be trained efficiently as it is fully differentiable and its modules operate on small sub-spaces. We focus our study on the problem of compositional zero-shot classification of object-attribute categories. We show in our experiments that current evaluation metrics are flawed as they only consider unseen object-attribute pairs. When extending the evaluation to the generalized setting which accounts also for pairs seen during training, we discover that naive baseline methods perform similarly or better than current approaches. However, our modular network is able to outperform all existing approaches on two widely-used benchmark datasets.
http://arxiv.org/abs/1905.05908
Speech-driven visual speech synthesis involves mapping features extracted from acoustic speech to the corresponding lip animation controls for a face model. This mapping can take many forms, but a powerful approach is to use deep neural networks (DNNs). However, a limitation is the lack of synchronized audio, video, and depth data required to reliably train the DNNs, especially for speaker-independent models. In this paper, we investigate adapting an automatic speech recognition (ASR) acoustic model (AM) for the visual speech synthesis problem. We train the AM on ten thousand hours of audio-only data. The AM is then adapted to the visual speech synthesis domain using ninety hours of synchronized audio-visual speech. Using a subjective assessment test, we compared the performance of the AM-initialized DNN to one with a random initialization. The results show that viewers significantly prefer animations generated from the AM-initialized DNN than the ones generated using the randomly initialized model. We conclude that visual speech synthesis can significantly benefit from the powerful representation of speech in the ASR acoustic models.
http://arxiv.org/abs/1905.06860
In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric. In this work we assess this assumption by meta-learning an adaptive loss function to directly optimize the evaluation metric. We propose a sample efficient reinforcement learning approach for adapting the loss dynamically during training. We empirically show how this formulation improves performance by simultaneously optimizing the evaluation metric and smoothing the loss landscape. We verify our method in metric learning and classification scenarios, showing considerable improvements over the state-of-the-art on a diverse set of tasks. Importantly, our method is applicable to a wide range of loss functions and evaluation metrics. Furthermore, the learned policies are transferable across tasks and data, demonstrating the versatility of the method.
http://arxiv.org/abs/1905.05895
Crowd density estimation is an important task for crowd monitoring. Many efforts have been done to automate the process of estimating crowd density from images and videos. Despite series of efforts, it remains a challenging task. In this paper, we proposes a new texture feature-based approach for the estimation of crowd density based on Completed Local Binary Pattern (CLBP). We first divide the image into blocks and then re-divide the blocks into cells. For each cell, we compute CLBP and then concatenate them to describe the texture of the corresponding block. We then train a multi-class Support Vector Machine (SVM) classifier, which classifies each block of image into one of four categories, i.e. Very Low, Low, Medium, and High. We evaluate our technique on the PETS 2009 dataset, and from the experiments, we show to achieve 95% accuracy for the proposed descriptor. We also compare other state-of-the-art texture descriptors and from the experimental results, we show that our proposed method outperforms other state-of-the-art methods.
http://arxiv.org/abs/1905.05891
In this paper, we propose a Deep Active Ray Network (DARNet) for automatic building segmentation. Taking an image as input, it first exploits a deep convolutional neural network (CNN) as the backbone to predict energy maps, which are further utilized to construct an energy function. A polygon-based contour is then evolved via minimizing the energy function, of which the minimum defines the final segmentation. Instead of parameterizing the contour using Euclidean coordinates, we adopt polar coordinates, i.e., rays, which not only prevents self-intersection but also simplifies the design of the energy function. Moreover, we propose a loss function that directly encourages the contours to match building boundaries. Our DARNet is trained end-to-end by back-propagating through the energy minimization and the backbone CNN, which makes the CNN adapt to the dynamics of the contour evolution. Experiments on three building instance segmentation datasets demonstrate our DARNet achieves either state-of-the-art or comparable performances to other competitors.
http://arxiv.org/abs/1905.05889
We present a novel application of robust control and online learning for the balancing of a n Degree of Freedom (DoF), Wheeled Inverted Pendulum (WIP) humanoid robot. Our technique condenses the inaccuracies of a mass model into a Center of Mass (CoM) error, balances despite this error, and uses online learning to update the mass model for a better CoM estimate. Using a simulated model of our robot, we meta-learn a set of excitory joint poses that makes our gradient descent algorithm quickly converge to an accurate (CoM) estimate. This simulated pipeline executes in a fully online fashion, using active disturbance rejection to address the mass errors that result from a steadily evolving mass model. Experiments were performed on a 19 DoF WIP, in which we manually acquired the data for the learned set of poses and show that the mass model produced by a gradient descent produces a CoM estimate that improves overall control and efficiency. This work contributes to a greater corpus of whole body control on the Golem Krang humanoid robot.
http://arxiv.org/abs/1810.03076
We introduce the Chronicle Challenge as an optional addition to the Settlement Generation Challenge in Minecraft. One of the foci of the overall competition is adaptive procedural content generation (PCG), an arguably under-explored problem in computational creativity. In the base challenge, participants must generate new settlements that respond to and ideally interact with existing content in the world, such as the landscape or climate. The goal is to understand the underlying creative process, and to design better PCG systems. The Chronicle Challenge in particular focuses on the generation of a narrative based on the history of a generated settlement, expressed in natural language. We discuss the unique features of the Chronicle Challenge in comparison to other competitions, clarify the characteristics of a chronicle eligible for submission and describe the evaluation criteria. We furthermore draw on simulation-based approaches in computational storytelling as examples to how this challenge could be approached.
http://arxiv.org/abs/1905.05888
We propose a novel procedure which adds “content-addressability” to any given unconditional implicit model e.g., a generative adversarial network (GAN). The procedure allows users to control the generative process by specifying a set (arbitrary size) of desired examples based on which similar samples are generated from the model. The proposed approach, based on kernel mean matching, is applicable to any generative models which transform latent vectors to samples, and does not require retraining of the model. Experiments on various high-dimensional image generation problems (CelebA-HQ, LSUN bedroom, bridge, tower) show that our approach is able to generate images which are consistent with the input set, while retaining the image quality of the original model. To our knowledge, this is the first work that attempts to construct, at test time, a content-addressable generative model from a trained marginal model.
http://arxiv.org/abs/1905.05882
Methods that move towards less supervised scenarios are key for image segmentation, as dense labels demand significant human intervention. Generally, the annotation burden is mitigated by labeling datasets with weaker forms of supervision, e.g. image-level labels or bounding boxes. Another option are semi-supervised settings, that commonly leverage a few strong annotations and a huge number of unlabeled/weakly-labeled data. In this paper, we revisit semi-supervised segmentation schemes and narrow down significantly the annotation budget (in terms of total labeling time of the training set) compared to previous approaches. With a very simple pipeline, we demonstrate that at low annotation budgets, semi-supervised methods outperform by a wide margin weakly-supervised ones for both semantic and instance segmentation. Our approach also outperforms previous semi-supervised works at a much reduced labeling cost. We present results for the Pascal VOC benchmark and unify weakly and semi-supervised approaches by considering the total annotation budget, thus allowing a fairer comparison between methods.
http://arxiv.org/abs/1905.05880
Non-parallel many-to-many voice conversion, as well as zero-shot voice conversion, remain under-explored areas. Deep style transfer algorithms, such as generative adversarial networks (GAN) and conditional variational autoencoder (CVAE), are being applied as new solutions in this field. However, GAN training is sophisticated and difficult, and there is no strong evidence that its generated speech is of good perceptual quality. On the other hands, CVAE training is simple but does not come with the distribution-matching property as in GAN. In this paper, we propose a new style transfer scheme that involves only an autoencoder with a carefully designed bottleneck. We formally show that this scheme can achieve distribution-matching style transfer by training only on a self-reconstruction loss. Based on this scheme, we proposed AUTOVC, which achieves state-of-the-art results in many-to-many voice conversion with non-parallel data, and which is the first to perform zero-shot voice conversion.
http://arxiv.org/abs/1905.05879
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. In this paper, we present a natural language processing approach based on deep learning to automatically identify clinically important recommendations in radiology reports. Our approach first identifies the recommendation sentences and then extracts reason, test, and time frame of the identified recommendations. To train our extraction models, we created a corpus of 567 radiology reports annotated for recommendation information. Our extraction models achieved 0.92 f-score for recommendation sentence, 0.65 f-score for reason, 0.73 f-score for test, and 0.84 f-score for time frame. We applied the extraction models to a set of over 3.3 million radiology reports and analyzed the adherence of follow-up recommendations.
http://arxiv.org/abs/1905.05877
We address a problem of estimating pose of a person’s head from its RGB image. The employment of CNNs for the problem has contributed to significant improvement in accuracy in recent works. However, we show that the following two methods, despite their simplicity, can attain further improvement: (i) proper adjustment of the margin of bounding box of a detected face, and (ii) choice of loss functions. We show that the integration of these two methods achieve the new state-of-the-art on standard benchmark datasets for in-the-wild head pose estimation.
http://arxiv.org/abs/1905.08609
Deep neural networks have achieved remarkable success in challenging tasks. However, the black-box approach of training and testing of such networks is not acceptable to critical applications. In particular, the existence of adversarial examples and their overgeneralization to irrelevant inputs makes it difficult, if not impossible, to explain decisions by commonly used neural networks. In this paper, we analyze the underlying mechanism of generalization of deep neural networks and propose an ($n$, $k$) consensus algorithm to be insensitive to adversarial examples and at the same time be able to reject irrelevant samples. Furthermore, the consensus algorithm is able to improve classification accuracy by using multiple trained deep neural networks. To handle the complexity of deep neural networks, we cluster linear approximations and use cluster means to capture feature importance. Due to weight symmetry, a small number of clusters are sufficient to produce a robust interpretation. Experimental results on a health dataset show the effectiveness of our algorithm in enhancing the prediction accuracy and interpretability of deep neural network models on one-year patient mortality prediction.
http://arxiv.org/abs/1905.05849
An accurate and timely detection of diseases and pests in rice plants can help farmers in applying timely treatment on the plants and thereby can reduce the economic losses substantially. Recent developments in deep learning based convolutional neural networks (CNN) have greatly improved image classification accuracy. In this paper, we present deep learning based approaches to detect diseases and pests in rice plants using images captured in real life scenario. We have experimented with various state-of-the-art CNN architectures on our large dataset of rice diseases and pests collected manually from the field, which contain both inter-class and intra-class variations and have nine classes in total. The results show that we can effectively detect and recognize rice diseases and pests using CNN with the best accuracy of 99.53% on test set using CNN architecture, VGG16. Though the accuracy of CNN models built on VGG16 or other similar architectures is impressive, these models are not suitable for mobile devices due to their large size having a huge number of parameters. To solve this problem, we propose a new CNN architecture, namely stacked CNN, that exploits two stage training to reduce the size of the model significantly while at the same time maintaining high classification accuracy. Our experimental results show that we achieve 95% test accuracy with stacked CNN, while reducing the model size by 98% compared to VGG16. This kind of memory efficient CNN architectures can contribute in rice disease detection and identification based mobile application development.
http://arxiv.org/abs/1812.01043
Visual localization, i.e., determining the position and orientation of a vehicle with respect to a map, is a key problem in autonomous driving. We present a multicamera visual inertial localization algorithm for large scale environments. To efficiently and effectively match features against a pre-built global 3D map, we propose a prioritized feature matching scheme for multi-camera systems. In contrast to existing works, designed for monocular cameras, we (1) tailor the prioritization function to the multi-camera setup and (2) run feature matching and pose estimation in parallel. This significantly accelerates the matching and pose estimation stages and allows us to dynamically adapt the matching efforts based on the surrounding environment. In addition, we show how pose priors can be integrated into the localization system to increase efficiency and robustness. Finally, we extend our algorithm by fusing the absolute pose estimates with motion estimates from a multi-camera visual inertial odometry pipeline (VIO). This results in a system that provides reliable and drift-less pose estimation. Extensive experiments show that our localization runs fast and robust under varying conditions, and that our extended algorithm enables reliable real-time pose estimation.
http://arxiv.org/abs/1809.06445
Motivated by the advances in 3D sensing technology and the spreading of low-cost robotic platforms, 3D object reconstruction has become a common task in many areas. Nevertheless, the selection of the optimal sensor pose that maximizes the reconstructed surface is a problem that remains open. It is known in the literature as the next-best-view planning problem. In this paper, we propose a novel next-best-view planning scheme based on supervised deep learning. The scheme contains an algorithm for automatic generation of datasets and an original three-dimensional convolutional neural network (3D-CNN) used to learn the next-best-view. Unlike previous work where the problem is addressed as a search, the trained 3D-CNN directly predicts the sensor pose. We present a comparison of the proposed network against a similar net, and we present several experiments of the reconstruction of unknown objects validating the effectiveness of the proposed scheme.
http://arxiv.org/abs/1905.05833
Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. We show that a large dataset of 723,754 clinically-acquired echocardiographic videos (~45 million images) linked to longitudinal follow-up data in 27,028 patients can be used to train a deep neural network to predict 1-year mortality with good accuracy (area under the curve (AUC) in an independent test set = 0.839). Prediction accuracy was further improved by adding EHR data (AUC = 0.858). Finally, we demonstrate that the trained neural network was more accurate in mortality prediction than two expert cardiologists. These results highlight the potential of neural networks to add new power to clinical predictions.
http://arxiv.org/abs/1811.10553
It is widely accepted that optimization of imaging system performance should be guided by task-based measures of image quality (IQ). It has been advocated that imaging hardware or data-acquisition designs should be optimized by use of an ideal observer (IO) that exploits full statistical knowledge of the measurement noise and class of objects to be imaged, without consideration of the reconstruction method. In practice, accurate and tractable models of the complete object statistics are often difficult to determine. Moreover, in imaging systems that employ compressive sensing concepts, imaging hardware and sparse image reconstruction are innately coupled technologies. In this work, a sparsity-driven observer (SDO) that can be employed to optimize hardware by use of a stochastic object model describing object sparsity is described and investigated. The SDO and sparse reconstruction method can therefore be “matched” in the sense that they both utilize the same statistical information regarding the class of objects to be imaged. To efficiently compute the SDO test statistic, computational tools developed recently for variational Bayesian inference with sparse linear models are adopted. The use of the SDO to rank data-acquisition designs in a stylized example as motivated by magnetic resonance imaging (MRI) is demonstrated. This study reveals that the SDO can produce rankings that are consistent with visual assessments of the reconstructed images but different from those produced by use of the traditionally employed Hotelling observer (HO).
http://arxiv.org/abs/1905.05820
Automatically generating accurate summaries from clinical reports could save a clinician’s time, improve summary coverage, and reduce errors. We propose a sequence-to-sequence abstractive summarization model augmented with domain-specific ontological information to enhance content selection and summary generation. We apply our method to a dataset of radiology reports and show that it significantly outperforms the current state-of-the-art on this task in terms of rouge scores. Extensive human evaluation conducted by a radiologist further indicates that this approach yields summaries that are less likely to omit important details, without sacrificing readability or accuracy.
http://arxiv.org/abs/1905.05818
We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.
http://arxiv.org/abs/1905.05816
Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only performs approximate distributed averaging. This paper studies Stochastic Gradient Push (SGP), which combines PushSum with stochastic gradient updates. We prove that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus. We empirically validate the performance of SGP on image classification (ResNet-50, ImageNet) and machine translation (Transformer, WMT’16 En-De) workloads. Our code will be made publicly available.
http://arxiv.org/abs/1811.10792
Related tasks often have inter-dependence on each other and perform better when solved in a joint framework. In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both. The multi-modal inputs (i.e., text, acoustic and visual frames) of a video convey diverse and distinctive information, and usually do not have equal contribution in the decision making. We propose a context-level inter-modal attention framework for simultaneously predicting the sentiment and expressed emotions of an utterance. We evaluate our proposed approach on CMU-MOSEI dataset for multi-modal sentiment and emotion analysis. Evaluation results suggest that multi-task learning framework offers improvement over the single-task framework. The proposed approach reports new state-of-the-art performance for both sentiment analysis and emotion analysis.
http://arxiv.org/abs/1905.05812
In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.
http://arxiv.org/abs/1905.05809
Spherical convolutional networks have been introduced recently as tools to learn powerful feature representations of 3D shapes. Spherical CNNs are equivariant to 3D rotations making them ideally suited to applications where 3D data may be observed in arbitrary orientations. In this paper we learn 2D image embeddings with a similar equivariant structure: embedding the image of a 3D object should commute with rotations of the object. We introduce a cross-domain embedding from 2D images into a spherical CNN latent space. This embedding encodes images with 3D shape properties and is equivariant to 3D rotations of the observed object. The model is supervised only by target embeddings obtained from a spherical CNN pretrained for 3D shape classification. We show that learning a rich embedding for images with appropriate geometric structure is sufficient for tackling varied applications, such as relative pose estimation and novel view synthesis, without requiring additional task-specific supervision.
http://arxiv.org/abs/1812.02716
Melanoma is one of the ten most common cancers in the US. Early detection is crucial for survival, but often the cancer is diagnosed in the fatal stage. Deep learning has the potential to improve cancer detection rates, but its applicability to melanoma detection is compromised by the limitations of the available skin lesion databases, which are small, heavily imbalanced, and contain images with occlusions. We build deep-learning-based tools for data purification and augmentation to counter-act these limitations. The developed tools can be utilized in a deep learning system for lesion classification and we show how to build such a system. The system heavily relies on the processing unit for removing image occlusions and the data generation unit, based on generative adversarial networks, for populating scarce lesion classes, or equivalently creating virtual patients with pre-defined types of lesions. We empirically verify our approach and show that incorporating these two units into melanoma detection system results in the superior performance over common baselines.
http://arxiv.org/abs/1902.06061
We define two algorithms for propagating information in classification problems with pairwise relationships. The algorithms are based on contraction maps and are related to non-linear diffusion and random walks on graphs. The approach is also related to message passing algorithms, including belief propagation and mean field methods. The algorithms we describe are guaranteed to converge on graphs with arbitrary topology. Moreover they always converge to a unique fixed point, independent of initialization. We prove that the fixed points of the algorithms under consideration define lower-bounds on the energy function and the max-marginals of a Markov random field. The theoretical results also illustrate a relationship between message passing algorithms and value iteration for an infinite horizon Markov decision process. We illustrate the practical application of the algorithms under study with numerical experiments in image restoration, stereo depth estimation and binary classification on a grid.
http://arxiv.org/abs/1505.06072
Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only or image-only models). While the success of a partial-input baseline indicates a dataset is cheatable, our work cautions the converse is not necessarily true. Using artificial datasets, we illustrate how the failure of a partial-input baseline might shadow more trivial patterns that are only visible in the full input. We also identify such artifacts in real natural language inference datasets. Our work provides an alternative view on the use of partial-input baselines in future dataset creation.
http://arxiv.org/abs/1905.05778
Deep neural networks provide best-in-class performance for a number of computer vision problems. However, training these networks is computationally intensive and requires fine-tuning various hyperparameters. In addition, performance swings widely as the network converges making it hard to decide when to stop training. In this paper, we introduce a trio of techniques (PSWA, PWALKS, and PSWM) centered around periodic sampling of model weights that provide consistent and more robust convergence on a variety of vision problems (classification, detection, segmentation) and gradient update methods (vanilla SGD, Momentum, Adam) with marginal additional computation time. Our techniques use existing optimal training policies but converge in a less volatile fashion with performance improvements that are approximately monotonic. Our analysis of the loss surface shows that these techniques also produce minima that are deeper and wider than those found by SGD.
http://arxiv.org/abs/1905.05774
We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic triangulation with an addition of confidence weights estimated from the input images. The second solution is based on a novel method of volumetric aggregation from intermediate 2D backbone feature maps. The aggregated volume is then refined via 3D convolutions that produce final 3D joint heatmaps and allow modelling a human pose prior. Crucially, both approaches are end-to-end differentiable, which allows us to directly optimize the target metric. We demonstrate transferability of the solutions across datasets and considerably improve the multi-view state of the art on the Human3.6M dataset. Video demonstration, annotations and additional materials will be posted on our project page (this https URL).
https://arxiv.org/abs/1905.05754
Quantum algorithms have the potential to be very powerful. However, to exploit quantum parallelism, some quantum algorithms require an embedding of large classical data into quantum states. This embedding can cost a lot of resources, for instance by implementing quantum random-access memory (QRAM). An important instance of this is in quantum-enhanced machine learning algorithms. We propose a new way of circumventing this requirement by using a classical-quantum hybrid architecture where the input data can remain classical, which differs from other hybrid models. We apply this to a fundamental computational problem called Boolean oracle identification, which offers a useful primitive for quantum machine learning algorithms. Its aim is to identify an unknown oracle amongst a list of candidates while minimising the number of queries to the oracle. In our scheme, we replace the classical oracle with our hybrid oracle. We demonstrate both theoretically and numerically that the success rates of the oracle query can be improved in the presence of noise and also enables us to explore a larger search space. This also makes the model suitable for realisation in the current era of noisy intermediate-scale quantum (NISQ) devices. Furthermore, we can show our scheme can lead to a reduction in the learning sample complexity. This means that for certain sizes of learning samples, our classical-quantum hybrid learner can complete the learning task faithfully whereas a classical learner cannot.
https://arxiv.org/abs/1905.05751
The calibration of a reservoir model with observed transient data of fluid pressures and rates is a key task in obtaining a predictive model of the flow and transport behaviour of the earth’s subsurface. The model calibration task, commonly referred to as “history matching”, can be formalised as an ill-posed inverse problem where we aim to find the underlying spatial distribution of petrophysical properties that explain the observed dynamic data. We use a generative adversarial network pretrained on geostatistical object-based models to represent the distribution of rock properties for a synthetic model of a hydrocarbon reservoir. The dynamic behaviour of the reservoir fluids is modelled using a transient two-phase incompressible Darcy formulation. We invert for the underlying reservoir properties by first modeling property distributions using the pre-trained generative model then using the adjoint equations of the forward problem to perform gradient descent on the latent variables that control the output of the generative model. In addition to the dynamic observation data, we include well rock-type constraints by introducing an additional objective function. Our contribution shows that for a synthetic test case, we are able to obtain solutions to the inverse problem by optimising in the latent variable space of a deep generative model, given a set of transient observations of a non-linear forward problem.
https://arxiv.org/abs/1905.05749
We propose a novel Bayesian nonparametric method to learn translation-invariant relationships on non-Euclidean domains. The resulting graph convolutional Gaussian processes can be applied to problems in machine learning for which the input observations are functions with domains on general graphs. The structure of these models allows for high dimensional inputs while retaining expressibility, as is the case with convolutional neural networks. We present applications of graph convolutional Gaussian processes to images and triangular meshes, demonstrating their versatility and effectiveness, comparing favorably to existing methods, despite being relatively simple models.
https://arxiv.org/abs/1905.05739
This paper introduces a new framework for open-domain question answering in which the retriever and the reader iteratively interact with each other. The framework is agnostic to the architecture of the machine reading model, only requiring access to the token-level hidden representations of the reader. The retriever uses fast nearest neighbor search to scale to corpora containing millions of paragraphs. A gated recurrent unit updates the query at each step conditioned on the state of the reader and the reformulated query is used to re-rank the paragraphs by the retriever. We conduct analysis and show that iterative interaction helps in retrieving informative paragraphs from the corpus. Finally, we show that our multi-step-reasoning framework brings consistent improvement when applied to two widely used reader architectures DrQA and BiDAF on various large open-domain datasets — TriviaQA-unfiltered, QuasarT, SearchQA, and SQuAD-Open.
https://arxiv.org/abs/1905.05733
We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using Seq2Seq and recurrent Variational Information Bottleneck (VIB) models. Though Seq2Seq models usually require painstakingly aligned corpora, we show that it is possible to adapt an approach from the Generative Adversarial Network (GAN) literature (e.g. Pix2Pix (Isola et al., 2017) and Vid2Vid (Wang et al. 2018a)) to sequences, creating large volumes of paired data by performing simple transformations and training generative models to plausibly invert these transformations. Music, and drumming in particular, provides a strong test case for this approach because many common transformations (quantization, removing voices) have clear semantics, and models for learning to invert them have real-world applications. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with fine-grained timing and dynamics information. We also explore some of the creative potential of these models, including demonstrating improvements on state-of-the-art methods for Humanization (instantiating a performance from a musical score).
http://arxiv.org/abs/1905.06118
The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. This work adopts a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor Representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces easily. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor Representations and building options, which is useful when robust Successor Representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch.
https://arxiv.org/abs/1905.05731
Learning effective representations of sentences is one of the core missions of natural language understanding. Existing models either train on a vast amount of text, or require costly, manually curated sentence relation datasets. We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations. We show that our curated dataset provides an excellent signal for learning vector representations of sentence meaning, representing relations that can only be determined when the meanings of two sentences are combined. We demonstrate that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT. We evaluate our sentence embeddings on a variety of transfer tasks, including SentEval. We achieve state-of-the-art result on Penn Discourse Treebank implicit relation prediction task.
http://arxiv.org/abs/1710.04334
Bayesian optimization (BO) is an effective method of finding the global optima of black-box functions. Recently BO has been applied to neural architecture search and shows better performance than pure evolutionary strategies. All these methods adopt Gaussian processes (GPs) as surrogate function, with the handcraft similarity metrics as input. In this work, we propose a Bayesian graph neural network as a new surrogate, which can automatically extract features from deep neural architectures, and use such learned features to fit and characterize black-box objectives and their uncertainty. Based on the new surrogate, we then develop a graph Bayesian optimization framework to address the challenging task of deep neural architecture search. Experiment results show our method significantly outperforms the comparative methods on benchmark tasks.
https://arxiv.org/abs/1905.06159
Automated Planning is one of the main research field of Artificial Intelligence since its beginnings. Research in Automated Planning aims at developing general reasoners (i.e., planners) capable of automatically solve complex problems. Broadly speaking, planners rely on a general model characterizing the possible states of the world and the actions that can be performed in order to change the status of the world. Given a model and an initial known state, the objective of a planner is to synthesize a set of actions needed to achieve a particular goal state. The classical approach to planning roughly corresponds to the description given above. The timeline-based approach is a particular planning paradigm capable of integrating causal and temporal reasoning within a unified solving process. This approach has been successfully applied in many real-world scenarios although a common interpretation of the related planning concepts is missing. Indeed, there are significant differences among the existing frameworks that apply this technique. Each framework relies on its own interpretation of timeline-based planning and therefore it is not easy to compare these systems. Thus, the objective of this work is to investigate the timeline-based approach to planning by addressing several aspects ranging from the semantics of the related planning concepts to the modeling and solving techniques. Specifically, the main contributions of this PhD work consist of: (i) the proposal of a formal characterization of the timeline-based approach capable of dealing with temporal uncertainty; (ii) the proposal of a hierarchical modeling and solving approach; (iii) the development of a general purpose framework for planning and execution with timelines; (iv) the validation†of this approach in real-world manufacturing scenarios.
https://arxiv.org/abs/1905.05713
Spaced repetition is among the most studied learning strategies in the cognitive science literature. It consists in temporally distributing exposure to an information so as to improve long-term memorization. Providing students with an adaptive and personalized distributed practice schedule would benefit more than just a generic scheduler. However, the applicability of such adaptive schedulers seems to be limited to pure memorization, e.g. flashcards or foreign language learning. In this article, we first frame the research problem of optimizing an adaptive and personalized spaced repetition scheduler when memorization concerns the application of underlying multiple skills. To this end, we choose to rely on a student model for inferring knowledge state and memory dynamics on any skill or combination of skills. We argue that no knowledge tracing model takes both memory decay and multiple skill tagging into account for predicting student performance. As a consequence, we propose a new student learning and forgetting model suited to our research problem: DAS3H builds on the additive factor models and includes a representation of the temporal distribution of past practice on the skills involved by an item. In particular, DAS3H allows the learning and forgetting curves to differ from one skill to another. Finally, we provide empirical evidence on three real-world educational datasets that DAS3H outperforms other state-of-the-art EDM models. These results suggest that incorporating both item-skill relationships and forgetting effect improves over student models that consider one or the other.
http://arxiv.org/abs/1905.06873
Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.
https://arxiv.org/abs/1905.05710
Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of $\alpha$-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any $\alpha > 1$. We provide fast algorithms to evaluate these transformations and their gradients, which scale well for large vocabulary sizes. Our models are able to produce sparse alignments and to assign nonzero probability to a short list of plausible outputs, sometimes rendering beam search exact. Experiments on morphological inflection and machine translation reveal consistent gains over dense models.
https://arxiv.org/abs/1905.05702
To improve the training efficiency of hierarchical recurrent models without compromising their performance, we propose a strategy named as `the lower the simpler’, which is to simplify the baseline models by making the lower layers simpler than the upper layers. We carry out this strategy to simplify two typical hierarchical recurrent models, namely Hierarchical Recurrent Encoder-Decoder (HRED) and R-NET, whose basic building block is GRU. Specifically, we propose Scalar Gated Unit (SGU), which is a simplified variant of GRU, and use it to replace the GRUs at the middle layers of HRED and R-NET. Besides, we also use Fixed-size Ordinally-Forgetting Encoding (FOFE), which is an efficient encoding method without any trainable parameter, to replace the GRUs at the bottom layers of HRED and R-NET. The experimental results show that the simplified HRED and the simplified R-NET contain significantly less trainable parameters, consume significantly less training time, and achieve slightly better performance than their baseline models.
http://arxiv.org/abs/1809.02790
We propose an efficient neural framework for sentence-level discourse analysis in accordance with Rhetorical Structure Theory (RST). Our framework comprises a discourse segmenter to identify the elementary discourse units (EDU) in a text, and a discourse parser that constructs a discourse tree in a top-down fashion. Both the segmenter and the parser are based on Pointer Networks and operate in linear time. Our segmenter yields an $F_1$ score of 95.4, and our parser achieves an $F_1$ score of 81.7 on the aggregated labeled (relation) metric, surpassing previous approaches by a good margin and approaching human agreement on both tasks (98.3 and 83.0 $F_1$).
https://arxiv.org/abs/1905.05682
In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words of the lexical database. We propose two different methods that greatly reduces the size of neural WSD models, with the benefit of improving their coverage without additional training data, and without impacting their precision. In addition to our method, we present a new WSD system which relies on pre-trained BERT word vectors in order to achieve results that significantly outperform the state of the art on all WSD evaluation tasks.
https://arxiv.org/abs/1905.05677