Optimizing compilers, as well as other translator systems, often work by rewriting expressions according to equivalence preserving rules. Given an input expression and its optimized form, finding the sequence of rules that were applied is a non-trivial task. Most of the time, the tools provide no proof, of any kind, of the equivalence between the original expression and its optimized form. In this work, we propose to reconstruct proofs of equivalence of simple mathematical expressions, after the fact, by finding paths of equivalence preserving transformations between expressions. We propose to find those sequences of transformations using a search algorithm, guided by a neural network heuristic. Using a Tree-LSTM recursive neural network, we learn a distributed representation of expressions where the Manhattan distance between vectors approximately corresponds to the rewrite distance between expressions. We then show how the neural network can be efficiently used to search for transformation paths, leading to substantial gain in speed compared to an uninformed exhaustive search. In one of our experiments, our neural-network guided search algorithm is able to solve more instances with a 2 seconds timeout per instance than breadth-first search does with a 5 minutes timeout per instance.
http://arxiv.org/abs/1902.02194
The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stronger performance faster, on harder domains [26, 32, 5, 8]. Despite the widespread use and conceptual simplicity of distillation, many different formulations are used in practice, and the subtle variations between them can often drastically change the performance and the resulting objective that is being optimised. In this work, we rigorously explore the entire landscape of policy distillation, comparing the motivations and strengths of each variant through theoretical and empirical analysis. Our results point to three distillation techniques, that are preferred depending on specifics of the task. Specifically a newly proposed expected entropy regularised distillation allows for quicker learning in a wide range of situations, while still guaranteeing convergence.
http://arxiv.org/abs/1902.02186
This paper presents a novel method, MaskMVS, to solve depth estimation for unstructured multi-view image-pose pairs. In the plane-sweep procedure, the depth planes are sampled by histogram matching that ensures covering the depth range of interest. Unlike other plane-sweep methods, we do not rely on a cost metric to explicitly build the cost volume, but instead infer a multiplane mask representation which regularizes the learning. Compared to many previous approaches, we show that our method is lightweight and generalizes well without requiring excessive training. We outperform the current state-of-the-art and show results on the sun3d, scenes11, MVS, and RGBD test data sets.
http://arxiv.org/abs/1902.02166
Human action recognition refers to automatic recognizing human actions from a video clip. In reality, there often exist multiple human actions in a video stream. Such a video stream is often weakly-annotated with a set of relevant human action labels at a global level rather than assigning each label to a specific video episode corresponding to a single action, which leads to a multi-label learning problem. Furthermore, there are many meaningful human actions in reality but it would be extremely difficult to collect/annotate video clips regarding all of various human actions, which leads to a zero-shot learning scenario. To the best of our knowledge, there is no work that has addressed all the above issues together in human action recognition. In this paper, we formulate a real-world human action recognition task as a multi-label zero-shot learning problem and propose a framework to tackle this problem in a holistic way. Our framework holistically tackles the issue of unknown temporal boundaries between different actions for multi-label learning and exploits the side information regarding the semantic relationship between different human actions for knowledge transfer. Consequently, our framework leads to a joint latent ranking embedding for multi-label zero-shot human action recognition. A novel neural architecture of two component models and an alternate learning algorithm are proposed to carry out the joint latent ranking embedding learning. Thus, multi-label zero-shot recognition is done by measuring relatedness scores of action labels to a test video clip in the joint latent visual and semantic embedding spaces. We evaluate our framework with different settings, including a novel data split scheme designed especially for evaluating multi-label zero-shot learning, on two datasets: Breakfast and Charades. The experimental results demonstrate the effectiveness of our framework.
http://arxiv.org/abs/1709.05107
Anatomical landmark segmentation and pathology localization are important steps in automated analysis of medical images. They are particularly challenging when the anatomy or pathology is small, as in retinal images and cardiac MRI, or when the image is of low quality due to device acquisition parameters as in magnetic resonance (MR) scanners. We propose an image super-resolution method using progressive generative adversarial networks (P-GAN) that can take as input a low-resolution image and generate a high resolution image of desired scaling factor. The super resolved images can be used for more accurate detection of landmarks and pathology. Our primary contribution is in proposing a multistage model where the output image quality of one stage is progressively improved in the next stage by using a triplet loss function. The triplet loss enables stepwise image quality improvement by using the output of the previous stage as the baseline. This facilitates generation of super resolved images of high scaling factor while maintaining good image quality. Experimental results for image super-resolution show that our proposed multistage P-GAN outperforms competing methods and baseline GAN.
http://arxiv.org/abs/1902.02144
The fuzzy quantification model FA has been identified as one of the best behaved quantification models in several revisions of the field of fuzzy quantification. This model is, to our knowledge, the unique one fulfilling the strict Determiner Fuzzification Scheme axiomatic framework that does not induce the standard min and max operators. The main contribution of this paper is the proof of a convergence result that links this quantification model with the Zadeh’s model when the size of the input sets tends to infinite. The convergence proof is, in any case, more general than the convergence to the Zadeh’s model, being applicable to any quantitative quantifier. In addition, recent revisions papers have presented some doubts about the existence of suitable computational implementations to evaluate the FA model in practical applications. In order to prove that this model is not only a theoretical approach, we show exact algorithmic solutions for the most common linguistic quantifiers as well as an approximate implementation by means of Monte Carlo. Additionally, we will also give a general overview of the main properties fulfilled by the FA model, as a single compendium integrating the whole set of properties fulfilled by it has not been previously published.
http://arxiv.org/abs/1902.02132
The fresh water reservoirs are one of the main power resources of Pakistan.These water reservoirs are in the form of Tarbela Dam, Mangla Dam, Bhasha Dam,and Warsak Dam. To estimate the current power capability of the Dams, the statistical information about the water in the dam has to be clear and precise. For the purpose of water management monthly or yearly survey of the dams required. One of the important parameter is to find the water level of water, which can help us in finding the pressure and flow of water in dams. The existing surveying systems have some problems, i.e., risky, errors in measurement and sometimes expensive. Our project has tried a lot to overcome these flaws and to develop more economical, safe and accurate system for finding depth values of dams and ponds. The key purpose of Our Project Autonomous Surveying Boat is to have it log water depths along a predefined set of points. The Autonomous Surveying Boat floats in water according to predefined path, getting the coordinates from GPS Sensor and direction is controlled by using Magnetometer Sensor. It stores its data on SD card as a text file for later readings. The boat can also be used to find the average capacity of the dam. The average depth is calculated from the measured depth values at different set points of the dam. The actual length of the dam is determined by the magnetometer. The numbers of surveys over the time can help us in finding the silting ratio in dams.For square dams the length and width of the dam are measured and the average depth, then using these three parameters we can estimate the average capacity of the dam.The boat is scalable for furthered modification if needed.
http://arxiv.org/abs/1408.6271
In this work, we train fully convolutional networks to detect anger in speech. Since training these deep architectures requires large amounts of data and the size of emotion datasets is relatively small, we use transfer learning. However, unlike previous approaches that use speech or emotion-based tasks for the source model, we instead use SoundNet, a fully convolutional neural network trained multimodally on a massive video dataset to classify audio, with ground-truth labels provided by vision-based classifiers. As a result of transfer learning from SoundNet, our trained anger detection model improves performance and generalizes well on a variety of acted, elicited, and natural emotional speech datasets. We also test the cross-lingual effectiveness of our model by evaluating our English-trained model on Mandarin Chinese speech emotion data. Furthermore, our proposed system has low latency suitable for real-time applications, only requiring 1.2 seconds of audio to make a reliable classification.
http://arxiv.org/abs/1902.02120
This paper deals with the Compressive Sensing implementation in the Face Recognition problem. Compressive Sensing is new approach in signal processing with a single goal to recover signal from small set of available samples. Compressive Sensing finds its usage in many real applications as it lowers the memory demand and acquisition time, and therefore allows dealing with huge data in the fastest manner. In this paper, the undersampled signal is recovered using the algorithm based on Total Variation minimization. The theory is verified with an experimental results using different percentage of signal samples.
http://arxiv.org/abs/1902.05388
Deep generative models are universal tools for learning data distributions on high dimensional data spaces via a mapping to lower dimensional latent spaces. We provide a study of latent space geometries and extend and build upon previous results on Riemannian metrics. We show how a class of heuristic measures gives more flexibility in finding meaningful, problem-specific distances, and how it can be applied to diverse generator types such as autoregressive generators commonly used in e.g. language and other sequence modeling. We further demonstrate how a diffusion-inspired transformation previously studied in cartography can be used to smooth out latent spaces, stretching them according to a chosen measure. In addition to providing more meaningful distances directly in latent space, this also provides a unique tool for novel kinds of data visualizations. We believe that the proposed methods can be a valuable tool for studying the structure of latent spaces and learned data distributions of generative models.
http://arxiv.org/abs/1902.02113
With the introduction of the variational autoencoder (VAE), probabilistic latent variable models have received renewed attention as powerful generative models. However, their performance in terms of test likelihood and quality of generated samples has been surpassed by autoregressive models without stochastic units. Furthermore, flow-based models have recently been shown to be an attractive alternative that scales well to high-dimensional data. In this paper we close the performance gap by constructing VAE models that can effectively utilize a deep hierarchy of stochastic variables and model complex covariance structures. We introduce the Bidirectional-Inference Variational Autoencoder (BIVA), characterized by a skip-connected generative model and an inference network formed by a bidirectional stochastic inference path. We show that BIVA reaches state-of-the-art test likelihoods, generates sharp and coherent natural images, and uses the hierarchy of latent variables to capture different aspects of the data distribution. We observe that BIVA, in contrast to recent results, can be used for anomaly detection. We attribute this to the hierarchy of latent variables which is able to extract high-level semantic features. Finally, we extend BIVA to semi-supervised classification tasks and show that it performs comparably to state-of-the-art results by generative adversarial networks.
http://arxiv.org/abs/1902.02102
We present a Deep Learning based system for the twin tasks of localization and obstacle avoidance essential to any mobile robot. Our system learns from conventional geometric SLAM, and outputs, using a single camera, the topological pose of the camera in an environment, and the depth map of obstacles around it. We use a CNN to localize in a topological map, and a conditional VAE to output depth for a camera image, conditional on this topological location estimation. We demonstrate the effectiveness of our monocular localization and depth estimation system on simulated and real datasets.
http://arxiv.org/abs/1902.02086
Understanding the physics of small bodies such as asteroids, comets, and planetary moons will help us understand the formation of the solar system, and also provide us with resources for a future space economy. Due to these reasons, missions to small bodies are actively being pursued. However, the surfaces of small bodies contain unpredictable and interesting features such as craters, dust, and granular matter, which need to be observed carefully before a lander mission is even considered. This presents the need for a surveillance spacecraft to observe the surface of small bodies where these features exist. While traditionally, the small body exploration has been performed by a large monolithic spacecraft, a group of small, low-cost spacecraft can enhance the observational value of the mission. Such a spacecraft swarm has the advantage of providing longer observation time and is also tolerant to single point failures. In order to optimize a space-craft swarm mission design, we proposed the Integrated Design Engineering & Automation of Swarms (IDEAS) software which will serve as an end-to-end tool for theoretical swarm mission design. The current work will focus on developing the Automated Swarm Designer module of the IDEAS software by extending its capabilities for exploring surface features on small bodies while focusing on the attitude behaviors of the spacecraft in the swarm. We begin by classifying space-craft swarms into 5 classes based on the level of coordination. In the current work, we design Class 2 swarms, whose spacecraft operate in a decentralized fashion but coordinate for communication. We demonstrate the Class 2 swarm in 2 different configurations, based on the roles of the participating spacecraft.
http://arxiv.org/abs/1902.02084
This work observed the problem of fingerprint image recognition in the case of missing pixels from the original image. The possibility of missing pixels recovery is tested by applying the Compressive Sensing approach. Namely, different percentage of missing pixels is observed and the image reconstruction is done by applying commonly used approach for sparse image reconstruction. The theory is verified by experiments, showing successful image reconstruction and later person identification even if less then 90% of the image pixels is missing.
http://arxiv.org/abs/1902.05389
Exploration of Mars has been made possible using a series of landers, rovers and orbiters. The HiRise camera on the Mars Reconnaissance Orbiter (MRO) has captured high-resolution images covering large tracts of the surface. However, orbital images lack the depth and rich detail obtained from in-situ exploration. Rovers such as Mars Science Laboratory and upcoming Mars 2020 carry state-of-the-art science laboratories to perform in-situ exploration and analysis. However, they can only cover a small area of Mars through the course of their mission. A critical capability gap exists in our ability to image, provide services and explore large tracts of the surface of Mars required for enabling a future human mission. A promising solution is to develop a reconnaissance sailplane that travels tens to hundreds of kilometers per sol. The aircraft would be equipped with imagers that provide that in-situ depth of field, with coverage comparable to orbital assets such as MRO. A major challenge is that the Martian carbon dioxide atmosphere is thin, with a pres-sure of 1% of Earth at sea level. To compensate, the aircraft needs to fly at high-velocities and have sufficiently large wing area to generate the required lift. Inflatable wings are an excellent choice as they have the lowest mass and can be used to change shape (morph) depending on aerodynamic or con-trol requirements. In this paper, we present our design of an inflatable sail-plane capable of deploying from a 12U CubeSat platform. A pneumatic de-ployment mechanism ensures highly compact stowage volumes and minimizes complexity.
http://arxiv.org/abs/1902.02083
Many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training, corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.
http://arxiv.org/abs/1902.02078
The science and origins of asteroids is deemed high priority in the Planetary Science Decadal Survey. Major scientific goals for the study of planetesimals are to decipher geological processes in SSSBs not determinable from investigation via in-situ experimentation, and to understand how planetesimals contribute to the formation of planets. Ground based observations are not sufficient to examine SSSBs, as they are only able to measure what is on the surface of the body; however, in-situ analysis allows for further, close up investigation as to the surface characteristics and the inner composure of the body. To this end, the Asteroid Mobile Imager and Geologic Observer (AMIGO) an autonomous semi-inflatable robot will operate in a swarm to efficiently characterize the surface of an asteroid. The stowed package is 10x10x10 cm (equivalent to a 1U CubeSat) that deploys an inflatable sphere of ~1m in diameter. Three mobility modes are identified and designed: ballistic hopping, rotation during hops, and up-righting maneuvers. Ballistic hops provide the AMIGO robot the ability to explore a larger portion of the asteroid’s surface to sample a larger area than a stationary lander. Rotation during the hop entails attitude control of the robot, utilizing propulsion and reaction wheel actuation. In the event of the robot tipping or not landing up-right, a combination of thrusters and reaction wheels will correct the robot’s attitude. The AMIGO propulsion system utilizes sublimate-based micro-electromechanical systems (MEMS) technology as a means of lightweight, low-thrust ballistic hopping and coarse attitude control. Each deployed AMIGO will hop across the surface of the asteroid multiple times.
http://arxiv.org/abs/1902.02071
We demonstrated that Non-Maximum Suppression (NMS), which is commonly used in object detection tasks to filter redundant detection results, is no longer secure. NMS has always been an integral part of object detection algorithms. Currently, Fully Convolutional Network (FCN) is widely used as the backbone architecture of object detection models. Given an input instance, since FCN generates end-to-end detection results in a single stage, it outputs a large number of raw detection boxes. These bounding boxes are then filtered by NMS to make the final detection results. In this paper, we propose an adversarial example attack which triggers malfunctioning of NMS in the end-to-end object detection models. Our attack, namely Daedalus, manipulates the detection box regression values to compress the dimensions of detection boxes. Henceforth, NMS will no longer be able to filter redundant detection boxes correctly. And as a result, the final detection output contains extremely dense false positives. This can be fatal for many object detection applications such as autonomous vehicle and smart manufacturing industry. Our attack can be applied to different end-to-end object detection models. Furthermore, we suggest crafting robust adversarial examples by using an ensemble of popular detection models as the substitutes. Considering that model reusing is commonly seen in real-world object detection scenarios, Daedalus examples crafted based on an ensemble of substitutes can launch attacks without knowing the details of the victim models. Our experiments demonstrate that our attack effectively stops NMS from filtering redundant bounding boxes. As the evaluation results suggest, Daedalus increases the false positive rate in detection results to 99.9% and reduces the mean average precision scores to 0, while maintaining a low cost of distortion on the original inputs.
http://arxiv.org/abs/1902.02067
There are thousands of asteroids in near-Earth space and millions in the Main Belt. They are diverse in physical properties and composition and are time capsules of the early solar system. This makes them strategic locations for planetary science, resource mining, planetary defense/security and as inter-planetary depots and communication relays. However, asteroids are a chal-lenging target for surface exploration due it its low but highly nonlinear gravity field. In such conditions, mobility through ballistic hopping possess multiple advantages over conventional mobility solutions and as such hop-ping robots have emerged as a promising platform for future exploration of asteroids and comets. They can traverse large distances over rough terrain with the expenditure of minimum energy. In this paper we present ballistic hopping dynamics and its motion planning on an asteroid surface with highly nonlinear gravity fields. We do it by solving Lambert’s orbital boundary val-ue problem in irregular gravity fields by a shooting method to find the initial velocity required to intercept a target. We then present methods to localize the hopping robot using pose estimation by successive scan matching with a 3D laser scanner. Using the above results, we provide methods for motion planning on the asteroid surface over long distances. The robot will require to perform multiple hops to reach a desired goal from its initial position while avoiding obstacles. The study is then be extended to find optimal tra-jectories to reach a desired goal by visiting multiple waypoints.
http://arxiv.org/abs/1902.02065
String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications We experimentally test ESP+ SFM on its ability to learn SVMs for large-scale string classifications with various massive string data, and we demonstrate the superior performance of our method with respect to prediction accuracy, scalability and computation efficiency.
http://arxiv.org/abs/1802.06382
Urban flow monitoring systems play important roles in smart city efforts around the world. However, the ubiquitous deployment of monitoring devices, such as CCTVs, induces a long-lasting and enormous cost for maintenance and operation. This suggests the need for a technology that can reduce the number of deployed devices, while preventing the degeneration of data accuracy and granularity. In this paper, we aim to infer the real-time and fine-grained crowd flows throughout a city based on coarse-grained observations. This task is challenging due to two reasons: the spatial correlations between coarse- and fine-grained urban flows, and the complexities of external impacts. To tackle these issues, we develop a method entitled UrbanFM based on deep neural networks. Our model consists of two major parts: 1) an inference network to generate fine-grained flow distributions from coarse-grained inputs by using a feature extraction module and a novel distributional upsampling module; 2) a general fusion subnet to further boost the performance by considering the influences of different external factors. Extensive experiments on two real-world datasets, namely TaxiBJ and HappyValley, validate the effectiveness and efficiency of our method compared to seven baselines, demonstrating the state-of-the-art performance of our approach on the fine-grained urban flow inference problem.
http://arxiv.org/abs/1902.05377
Content-based image retrieval (CBIR) has become one of the most important research directions in the domain of digital data management. In this paper, a new feature extraction schema including the norm of low frequency components in wavelet transformation and color features in RGB and HSV domains are proposed as representative feature vector for images in database followed by appropriate similarity measure for each feature type. In CBIR systems, retrieving results are so sensitive to image features. We address this problem with selection of most relevant features among complete feature set by ant colony optimization (ACO)-based feature selection which minimize the number of features as well as maximize F-measure in CBIR system. To evaluate the performance of our proposed CBIR system, it has been compared with three older proposed systems. Results show that the precision and recall of our proposed system are higher than older ones for the majority of image categories in Corel database.
http://arxiv.org/abs/1902.02059
We first, introduce a deep learning based framework named as DeepIrisNet2 for visible spectrum and NIR Iris representation. The framework can work without classical iris normalization step or very accurate iris segmentation; allowing to work under non-ideal situation. The framework contains spatial transformer layers to handle deformation and supervision branches after certain intermediate layers to mitigate overfitting. In addition, we present a dual CNN iris segmentation pipeline comprising of a iris/pupil bounding boxes detection network and a semantic pixel-wise segmentation network. Furthermore, to get compact templates, we present a strategy to generate binary iris codes using DeepIrisNet2. Since, no ground truth dataset are available for CNN training for iris segmentation, We build large scale hand labeled datasets and make them public; i) iris, pupil bounding boxes, ii) labeled iris texture. The networks are evaluated on challenging ND-IRIS-0405, UBIRIS.v2, MICHE-I, and CASIA v4 Interval datasets. Proposed approach significantly improves the state-of-the-art and achieve outstanding performance surpassing all previous methods.
http://arxiv.org/abs/1902.05390
This paper proposes a class of well-conditioned neural networks in which a unit amount of change in the inputs causes at most a unit amount of change in the outputs or any of the internal layers. We develop the known methodology of controlling Lipschitz constants to realize its full potential in maximizing robustness, with a new regularization scheme for linear layers, new ways to adapt nonlinearities and a new loss function. With MNIST and CIFAR-10 classifiers, we demonstrate a number of advantages. Without needing any adversarial training, the proposed classifiers exceed the state of the art in robustness against white-box L2-bounded adversarial attacks. They generalize better than ordinary networks from noisy data with partially random labels. Their outputs are quantitatively meaningful and indicate levels of confidence and generalization, among other desirable properties.
http://arxiv.org/abs/1802.07896
We ask whether the neural network interpretation methods can be fooled via adversarial model manipulation, which is defined as a model fine-tuning step that aims to radically alter the explanations without hurting the accuracy of the original model. By incorporating the interpretation results directly in the regularization term of the objective function for fine-tuning, we show that the state-of-the-art interpreters, e.g., LRP and Grad-CAM, can be easily fooled with our model manipulation. We propose two types of fooling, passive and active, and demonstrate such foolings generalize well to the entire validation set as well as transfer to other interpretation methods. Our results are validated by both visually showing the fooled explanations and reporting quantitative metrics that measure the deviations from the original explanations. We claim that the stability of neural network interpretation method with respect to our adversarial model manipulation is an important criterion to check for developing robust and reliable neural network interpretation method.
http://arxiv.org/abs/1902.02041
We consider the problem of inferring the values of an arbitrary set of variables (e.g., risk of diseases) given other observed variables (e.g., symptoms and diagnosed diseases) and high-dimensional signals (e.g., MRI images or EEG). This is a common problem in healthcare since variables of interest often differ for different patients. Existing methods including Bayesian networks and structured prediction either do not incorporate high-dimensional signals or fail to model conditional dependencies among variables. To address these issues, we propose bidirectional inference networks (BIN), which stich together multiple probabilistic neural networks, each modeling a conditional dependency. Predictions are then made via iteratively updating variables using backpropagation (BP) to maximize corresponding posterior probability. Furthermore, we extend BIN to composite BIN (CBIN), which involves the iterative prediction process in the training stage and improves both accuracy and computational efficiency by adaptively smoothing the optimization landscape. Experiments on synthetic and real-world datasets (a sleep study and a dermatology dataset) show that CBIN is a single model that can achieve state-of-the-art performance and obtain better accuracy in most inference tasks than multiple models each specifically trained for a different task.
http://arxiv.org/abs/1902.02037
To realize the full spectrum of advantages that GaN materials system offers, demonstration of p-GaN based devices is valuable. Authors report the first p-field effect transistor (pFET) based on AlGaN/GaN superlattice (SL) grown using MOCVD. Magnesium was used to dope the material in the superlattice. Lowest sheet resistance of 10 k{\Omega}/sq was achieved for doping of 1.5e+19 cm^-3 of Mg (determined by SIMS). Mobility in the range of 7-10 cm^2/Vs and total sheet charge density in the range of 1e+13- 6e+13 cm^-2 were measured. The device had a maximum drain-source current (IDS) of 3mA/mm and On-Resistance (RON) of 3.48k{\Omega}.mm.
https://arxiv.org/abs/1902.02022
We consider the problem of rational uncertainty about unproven mathematical statements, which G"odel and others have remarked on. Using Bayesian-inspired arguments we build a normative model of fair bets under deductive uncertainty which draws from both probability and the theory of algorithms. We comment on connections to Zeilberger’s notion of “semi-rigorous proofs”, particularly that inherent subjectivity would be present. We also discuss a financial view, with models of arbitrage where traders have limited computational resources.
http://arxiv.org/abs/1708.09032
Understanding learning and generalization of deep architectures has been a major research objective in the recent years with notable theoretical progress. A main focal point of generalization studies stems from the success of excessively large networks which defy the classical wisdom of uniform convergence and learnability. We study empirically the layer-wise functional structure of over-parameterized deep models. We provide evidence for the heterogeneous characteristic of layers. To do so, we introduce the notion of (post training) re-initialization and re-randomization robustness. We show that layers can be categorized into either “robust” or “critical”. In contrast to critical layers, resetting the robust layers to their initial value has no negative consequence, and in many cases they barely change throughout training. Our study provides further evidence that mere parameter counting or norm accounting is too coarse in studying generalization of deep models.
http://arxiv.org/abs/1902.01996
A semi-supervised learning framework using the feedforward-designed convolutional neural networks (FF-CNNs) is proposed for image classification in this work. One unique property of FF-CNNs is that no backpropagation is used in model parameters determination. Since unlabeled data may not always enhance semi-supervised learning, we define an effective quality score and use it to select a subset of unlabeled data in the training process. We conduct experiments on the MNIST, SVHN, and CIFAR-10 datasets, and show that the proposed semi-supervised FF-CNN solution outperforms the CNN trained by backpropagation (BP-CNN) when the amount of labeled data is reduced. Furthermore, we develop an ensemble system that combines the output decision vectors of different semi-supervised FF-CNNs to boost classification accuracy. The ensemble systems can achieve further performance gains on all three benchmarking datasets.
http://arxiv.org/abs/1902.01980
Many post-disaster and -conflict regions do not have sufficient data on their transportation infrastructure assets, hindering both mobility and reconstruction. In particular, as the number of aging and deteriorating bridges increase, it is necessary to quantify their load characteristics in order to inform maintenance and prevent failure. The load carrying capacity and the design load are considered as the main aspects of any civil structures. Human examination can be costly and slow when expertise is lacking in challenging scenarios. In this paper, we propose to employ deep learning as method to estimate the load carrying capacity from crowd sourced images. A new convolutional neural network architecture is trained on data from over 6000 bridges, which will benefit future research and applications. We tackle significant variations in the dataset (e.g. class interval, image completion, image colour) and quantify their impact on the prediction accuracy, precision, recall and F1 score. Finally, practical optimisation is performed by converting multiclass classification into binary classification to achieve a promising field use performance.
http://arxiv.org/abs/1902.05391
High-fidelity semantic segmentation of magnetic resonance volumes is critical for estimating tissue morphometry and relaxation parameters in both clinical and research applications. While manual segmentation is accepted as the gold-standard, recent advances in deep learning and convolutional neural networks (CNNs) have shown promise for efficient automatic segmentation of soft tissues. However, due to the stochastic nature of deep learning and the multitude of hyperparameters in training networks, predicting network behavior is challenging. In this paper, we quantify the impact of three factors associated with CNN segmentation performance: network architecture, training loss functions, and training data characteristics. We evaluate the impact of these variations on the segmentation of femoral cartilage and propose potential modifications to CNN architectures and training protocols to train these models with confidence.
http://arxiv.org/abs/1902.01977
In the domain of algorithmic music composition, machine learning-driven systems eliminate the need for carefully hand-crafting rules for composition. In particular, the capability of recurrent neural networks to learn complex temporal patterns lends itself well to the musical domain. Promising results have been observed across a number of recent attempts at music composition using deep RNNs. These approaches generally aim at first training neural networks to reproduce subsequences drawn from existing songs. Subsequently, they are used to compose music either at the audio sample-level or at the note-level. We designed a representation that divides polyphonic music into a small number of monophonic streams. This representation greatly reduces the complexity of the problem and eliminates an exponential number of probably poor compositions. On top of our LSTM neural network that learnt musical sequences in this representation, we built an RL agent that learnt to find combinations of songs whose joint dominance produced pleasant compositions. We present \textbf{Amadeus}, an algorithmic music composition system that composes music that consists of intricate melodies, basic chords, and even occasional contrapuntal sequences.
http://arxiv.org/abs/1902.01973
Climbing soft robots are of tremendous interest in both science and engineering due to their potential applications in intelligent surveillance, inspection, maintenance, and detection under environments away from the ground. The challenge lies in the design of a fast, robust, switchable adhesion actuator to easily attach and detach the vertical surfaces. Here, we propose a new design of pneumatic-actuated bioinspired soft adhesion actuator working both on ground and under water. It is composed of extremely soft bilayer structures with an embedded spiral pneumatic channel resting on top of a base layer with a cavity. Rather than the traditional way of directly pumping air out of the cavity for suction in hard polymer-based adhesion actuator, we inflate air into the top spiral channel to deform into a stable 3D domed shape for achieving negative pressure in the cavity. The characterization of the maximum shear adhesion force of the proposed soft adhesion actuator shows strong and rapid reversible adhesion on multiple types of smooth and semi-smooth surfaces. Based on the switchable adhesion actuator, we design and fabricate a novel load-carrying amphibious climbing soft robot (ACSR) by combining with a soft bending actuator. We demonstrate that it can operate on a wide range of foreign horizontal and vertical surfaces including dry, wet, slippery, smooth, and semi-smooth ones on ground and also under water with certain load-carrying capability. We show that the vertical climbing speed can reach about 286 mm/min (1.6 body length/min) while carrying over 200g object (over 5 times the weight of ACSR itself) during climbing on ground and under water. This research could largely push the boundaries of soft robot capabilities and multifunctionality in window cleaning and underwater inspection under harsh environment.
http://arxiv.org/abs/1804.08692
In low light or short-exposure photography the image is often corrupted by noise. While longer exposure helps reduce the noise, it can produce blurry results due to the object and camera motion. The reconstruction of a noise-less image is an ill posed problem. Recent approaches for image denoising aim to predict kernels which are convolved with a set of successively taken images (burst) to obtain a clear image. We propose a deep neural network based approach called Multi-Kernel Prediction Networks (MKPN) for burst image denoising. MKPN predicts kernels of not just one size but of varying sizes and performs fusion of these different kernels resulting in one kernel per pixel. The advantages of our method are two fold: (a) the different sized kernels help in extracting different information from the image which results in better reconstruction and (b) kernel fusion assures retaining of the extracted information while maintaining computational efficiency. Experimental results reveal that MKPN outperforms state-of-the-art on our synthetic datasets with different noise levels.
http://arxiv.org/abs/1902.05392
In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e.g. logistic regression). The theory relies on a natural characterization of structural properties of the task loss and allows to derive statistical guarantees for many widely used methods in the context of multilabeling, ranking, ordinal regression and graph matching. In particular, we characterize the smooth convex surrogates compatible with a given task loss in terms of a suitable Bregman divergence composed with a link function. This allows to derive tight bounds for the calibration function and to obtain novel results on existing surrogate frameworks for structured prediction such as conditional random fields and quadratic surrogates.
http://arxiv.org/abs/1902.01958
We evaluate attention-based encoder-decoder models along two dimensions: choice of target unit (phoneme, grapheme, and word-piece), and the amount of available training data. We conduct experiments on the LibriSpeech 100hr, 460hr, and 960hr tasks; across all tasks, we find that grapheme or word-piece models consistently outperform phoneme-based models, even though they are evaluated without a lexicon or an external language model. On the 960hr task the word-piece model achieves a word error rate (WER) of 4.7% on the test-clean set and 13.4% on the test-other set, which improves to 3.6% (clean) and 10.3% (other) when decoded with an LSTM LM: the lowest reported numbers using sequence-to-sequence models. We also conduct a detailed analysis of the various models, and investigate their complementarity: we find that we can improve WERs by up to 9% relative by rescoring N-best lists generated from the word-piece model with either the phoneme or the grapheme model. Rescoring an N-best list generated by the phonemic system, however, provides limited improvements. Further analysis shows that the word-piece-based models produce more diverse N-best hypotheses, resulting in lower oracle WERs, than the phonemic system.
http://arxiv.org/abs/1902.01955
How can we learn to do probabilistic inference in a way that generalizes between models? Amortized variational inference learns for a single model, sharing statistical strength across observations. This benefits scalability and model learning, but does not help with generalization to new models. We propose meta-amortized variational inference, a framework that amortizes the cost of inference over a family of generative models. We apply this approach to deep generative models by introducing the MetaVAE: a variational autoencoder that learns to generalize to new distributions and rapidly solve new unsupervised learning problems using only a small number of target examples. Empirically, we validate the approach by showing that the MetaVAE can: (1) capture relevant sufficient statistics for inference, (2) learn useful representations of data for downstream tasks such as clustering, and (3) perform meta-density estimation on unseen synthetic distributions and out-of-sample Omniglot alphabets.
http://arxiv.org/abs/1902.01950
As part of the effort to improve quality and to reduce national healthcare costs, the Centers for Medicare and Medicaid Services (CMS) are responsible for creating and maintaining an array of clinical quality measures (CQMs) for assessing healthcare structure, process, outcome, and patient experience across various conditions, clinical specialties, and settings. The development and maintenance of CQMs involves substantial and ongoing evaluation of the evidence on the measure’s properties: importance, reliability, validity, feasibility, and usability. As such, CMS conducts monthly environmental scans of the published clinical and health service literature. Conducting time consuming, exhaustive evaluations of the ever-changing healthcare literature presents one of the largest challenges to an evidence-based approach to healthcare quality improvement. Thus, it is imperative to leverage automated techniques to aid CMS in the identification of clinical and health services literature relevant to CQMs. Additionally, the estimated labor hours and related cost savings of using CMS Sematrix compared to a traditional literature review are roughly 818 hours and 122,000 dollars for a single monthly environmental scan.
http://arxiv.org/abs/1902.01918
This paper presents a method for testing the decision making systems of autonomous vehicles. Our approach involves perturbing stochastic elements in the vehicle’s environment until the vehicle is involved in a collision. Instead of applying direct Monte Carlo sampling to find collision scenarios, we formulate the problem as a Markov decision process and use reinforcement learning algorithms to find the most likely failure scenarios. This paper presents Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (DRL) solutions that can scale to large environments. We show that DRL can find more likely failure scenarios than MCTS with fewer calls to the simulator. A simulation scenario involving a vehicle approaching a crosswalk is used to validate the framework. Our proposed approach is very general and can be easily applied to other scenarios given the appropriate models of the vehicle and the environment.
http://arxiv.org/abs/1902.01909
Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.
http://arxiv.org/abs/1809.07893
Within the realm of service robotics, researchers have placed a great amount of effort into learning, understanding, and representing motions as manipulations for task execution by robots. The task of robot learning and problem-solving is very broad, as it integrates a variety of tasks such as object detection, activity recognition, task/motion planning, localization, knowledge representation and retrieval, and the intertwining of perception/vision and machine learning techniques. In this paper, we solely focus on knowledge representations and notably how knowledge is typically gathered, represented, and reproduced to solve problems as done by researchers in the past decades. In accordance with the definition of knowledge representations, we discuss the key distinction between such representations and useful learning models that have extensively been introduced and studied in recent years, such as machine learning, deep learning, probabilistic modelling, and semantic graphical structures. Along with an overview of such tools, we discuss the problems which have existed in robot learning and how they have been built and used as solutions, technologies or developments (if any) which have contributed to solving them. Finally, we discuss key principles that should be considered when designing an effective knowledge representation.
http://arxiv.org/abs/1807.02192
Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training. Previous PBT implementations have been synchronized glass-box systems. We propose a general, black-box PBT framework that distributes many asynchronous “trials” (a small number of training steps with warm-starting) across a cluster, coordinated by the PBT controller. The black-box design does not make assumptions on model architectures, loss functions or training procedures. Our system supports dynamic hyperparameter schedules to optimize both differentiable and non-differentiable metrics. We apply our system to train a state-of-the-art WaveNet generative model for human voice synthesis. We show that our PBT system achieves better accuracy, less sensitivity and faster convergence compared to existing methods, given the same computational resource.
http://arxiv.org/abs/1902.01894
Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.
https://arxiv.org/abs/1810.10126
In this paper, we argue that simulation platforms enable a novel type of embodied spatial reasoning, one facilitated by a formal model of object and event semantics that renders the continuous quantitative search space of an open-world, real-time environment tractable. We provide examples for how a semantically-informed AI system can exploit the precise, numerical information provided by a game engine to perform qualitative reasoning about objects and events, facilitate learning novel concepts from data, and communicate with a human to improve its models and demonstrate its understanding. We argue that simulation environments, and game engines in particular, bring together many different notions of “simulation” and many different technologies to provide a highly-effective platform for developing both AI systems and tools to experiment in both machine and human intelligence.
http://arxiv.org/abs/1902.01886
In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return - in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long run. Yet, it may be difficult (or even intractable) mathematically to learn with this target. As such, temporal discounting is often applied to optimize over a shorter effective planning horizon. This comes at the cost of potentially biasing the optimization target away from the undiscounted goal. In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning. We present an extension of temporal difference (TD) learning, which we call TD($\Delta$), that breaks down a value function into a series of components based on the differences between value functions with smaller discount factors. The separation of a longer horizon value function into these components has useful properties in scalability and performance. We discuss these properties and show theoretic and empirical improvements over standard TD learning in certain settings.
http://arxiv.org/abs/1902.01883
Due to the high training costs of deep learning, model developers often rent cloud GPU servers to achieve better efficiency. However, this practice raises privacy concerns. An adversarial party may be interested in 1) personal identifiable information encoded in the training data and the learned models, 2) misusing the sensitive models for its own benefits, or 3) launching model inversion (MIA) and generative adversarial network (GAN) attacks to reconstruct replicas of training data (e.g., sensitive images). Learning from encrypted data seems impractical due to the large training data and expensive learning algorithms, while differential-privacy based approaches have to make significant trade-offs between privacy and model quality. We investigate the use of image disguising techniques to protect both data and model privacy. Our preliminary results show that with block-wise permutation and transformations, surprisingly, disguised images still give reasonably well performing deep neural networks (DNN). The disguised images are also resilient to the deep-learning enhanced visual discrimination attack and provide an extra layer of protection from MIA and GAN attacks.
http://arxiv.org/abs/1902.01878
This is an integrative review that address the question, “What makes for a good explanation?” with reference to AI systems. Pertinent literatures are vast. Thus, this review is necessarily selective. That said, most of the key concepts and issues are expressed in this Report. The Report encapsulates the history of computer science efforts to create systems that explain and instruct (intelligent tutoring systems and expert systems). The Report expresses the explainability issues and challenges in modern AI, and presents capsule views of the leading psychological theories of explanation. Certain articles stand out by virtue of their particular relevance to XAI, and their methods, results, and key points are highlighted. It is recommended that AI/XAI researchers be encouraged to include in their research reports fuller details on their empirical or experimental methods, in the fashion of experimental psychology research reports: details on Participants, Instructions, Procedures, Tasks, Dependent Variables (operational definitions of the measures and metrics), Independent Variables (conditions), and Control Conditions.
http://arxiv.org/abs/1902.01876
Automatic liver segmentation plays an important role in computer-aided diagnosis and treatment. Manual segmentation of organs is a difficult and tedious task and so prone to human errors. In this paper, we propose an adaptive 3D region growing with subject-specific conditions. For this aim we use the intensity distribution of most probable voxels in prior map along with location prior. We also incorporate the boundary of target organs to restrict the region growing. In order to obtain strong edges and high contrast, we propose an effective contrast enhancement algorithm to facilitate more accurate segmentation. In this paper, 92.56% Dice score is achieved. We compare our method with the method of hard thresholding on Deeds prior map and also with the majority voting on Deeds registration with 13 organs.
http://arxiv.org/abs/1802.07794
Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose \manifoldmixup{}, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. \manifoldmixup{} leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with \manifoldmixup{} learn class-representations with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it on practical situations, and connect it to previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, \manifoldmixup{} improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.
http://arxiv.org/abs/1806.05236