Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Target Based Speech Act Classification in Political Campaign Text

2019-05-20

Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin

arXiv_CL

arXiv_CL Classification
Abstract

We study pragmatics in political campaign text, through analysis of speech acts and the target of each utterance. We propose a new annotation schema incorporating domain-specific speech acts, such as commissive-action, and present a novel annotated corpus of media releases and speech transcripts from the 2016 Australian election cycle. We show how speech acts and target referents can be modeled as sequential classification, and evaluate several techniques, exploiting contextualized word representations, semi-supervised learning, task dependencies and speaker meta-data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07856

PDF

http://arxiv.org/pdf/1905.07856
Read All
Learning Video Representations from Correspondence Proposals

2019-05-20

Xingyu Liu, Joon-Young Lee, Hailin Jin

arXiv_CV

arXiv_CV
Abstract

Correspondences between frames encode rich information about dynamic content in videos. However, it is challenging to effectively capture and learn those due to their irregular structure and complex dynamics. In this paper, we propose a novel neural network that learns video representations by aggregating information from potential correspondences. This network, named $CPNet$, can learn evolving 2D fields with temporal consistency. In particular, it can effectively learn representations for videos by mixing appearance and long-range motion with an RGB-only input. We provide extensive ablation experiments to validate our model. CPNet shows stronger performance than existing methods on Kinetics and achieves the state-of-the-art performance on Something-Something and Jester. We provide analysis towards the behavior of our model and show its robustness to errors in proposals.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07853

PDF

http://arxiv.org/pdf/1905.07853
Read All
Boundary Loss for Remote Sensing Imagery Semantic Segmentation

2019-05-20

Alexey Bokhovkin, Evgeny Burnaev

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation Detection
Abstract

In response to the growing importance of geospatial data, its analysis including semantic segmentation becomes an increasingly popular task in computer vision today. Convolutional neural networks are powerful visual models that yield hierarchies of features and practitioners widely use them to process remote sensing data. When performing remote sensing image segmentation, multiple instances of one class with precisely defined boundaries are often the case, and it is crucial to extract those boundaries accurately. The accuracy of segments boundaries delineation influences the quality of the whole segmented areas explicitly. However, widely-used segmentation loss functions such as BCE, IoU loss or Dice loss do not penalize misalignment of boundaries sufficiently. In this paper, we propose a novel loss function, namely a differentiable surrogate of a metric accounting accuracy of boundary detection. We can use the loss function with any neural network for binary segmentation. We performed validation of our loss function with various modifications of UNet on a synthetic dataset, as well as using real-world data (ISPRS Potsdam, INRIA AIL). Trained with the proposed loss function, models outperform baseline methods in terms of IoU score.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07852

PDF

http://arxiv.org/pdf/1905.07852
Read All
Implications of Computer Vision Driven Assistive Technologies Towards Individuals with Visual Impairment

2019-05-20

Linda Wang, Alexander Wong

arXiv_CV

arXiv_CV Face Caption
Abstract

Computer vision based technology is becoming ubiquitous in society. One application area that has seen an increase in computer vision is assistive technologies, specifically for those with visual impairment. Research has shown the ability of computer vision models to achieve tasks such provide scene captions, detect objects and recognize faces. Although assisting individuals with visual impairment with these tasks increases their independence and autonomy, concerns over bias, privacy and potential usefulness arise. This paper addresses the positive and negative implications computer vision based assistive technologies have on individuals with visual impairment, as well as considerations for computer vision researchers and developers in order to mitigate the amount of negative implications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07844

PDF

http://arxiv.org/pdf/1905.07844
Read All
Multimodal Transformer with Multi-View Visual Representation for Image Captioning

2019-05-20

Jun Yu, Jing Li, Zhou Yu, Qingming Huang

arXiv_CV

arXiv_CV Image_Caption Attention Caption CNN RNN Quantitative
Abstract

Image captioning aims to automatically generate a natural language description of a given image, and most state-of-the-art models have adopted an encoder-decoder framework. The framework consists of a convolution neural network (CNN)-based image encoder that extracts region-based visual features from the input image, and an recurrent neural network (RNN)-based caption decoder that generates the output caption words based on the visual features with the attention mechanism. Despite the success of existing studies, current methods only model the co-attention that characterizes the inter-modal interactions while neglecting the self-attention that characterizes the intra-modal interactions. Inspired by the success of the Transformer model in machine translation, here we extend it to a Multimodal Transformer (MT) model for image captioning. Compared to existing image captioning approaches, the MT model simultaneously captures intra- and inter-modal interactions in a unified attention block. Due to the in-depth modular composition of such attention blocks, the MT model can perform complex multimodal reasoning and output accurate captions. Moreover, to further improve the image captioning performance, multi-view visual features are seamlessly introduced into the MT model. We quantitatively and qualitatively evaluate our approach using the benchmark MSCOCO image captioning dataset and conduct extensive ablation studies to investigate the reasons behind its effectiveness. The experimental results show that our method significantly outperforms the previous state-of-the-art methods. With an ensemble of seven models, our solution ranks the 1st place on the real-time leaderboard of the MSCOCO image captioning challenge at the time of the writing of this paper.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07841

PDF

http://arxiv.org/pdf/1905.07841
Read All
Enabling Computer Vision Driven Assistive Devices for the Visually Impaired via Micro-architecture Design Exploration

2019-05-20

Linda Wang, Alexander Wong

arXiv_CV

arXiv_CV Object_Detection Optimization Detection
Abstract

Recent improvements in object detection have shown potential to aid in tasks where previous solutions were not able to achieve. A particular area is assistive devices for individuals with visual impairment. While state-of-the-art deep neural networks have been shown to achieve superior object detection performance, their high computational and memory requirements make them cost prohibitive for on-device operation. Alternatively, cloud-based operation leads to privacy concerns, both not attractive to potential users. To address these challenges, this study investigates creating an efficient object detection network specifically for OLIV, an AI-powered assistant for object localization for the visually impaired, via micro-architecture design exploration. In particular, we formulate the problem of finding an optimal network micro-architecture as an numerical optimization problem, where we find the set of hyperparameters controlling the MobileNetV2-SSD network micro-architecture that maximizes a modified NetScore objective function for the MSCOCO-OLIV dataset of indoor objects. Experimental results show that such a micro-architecture design exploration strategy leads to a compact deep neural network with a balanced trade-off between accuracy, size, and speed, making it well-suited for enabling on-device computer vision driven assistive devices for the visually impaired.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07836

PDF

http://arxiv.org/pdf/1905.07836
Read All
Prediction of Construction Cost for Field Canals Improvement Projects in Egypt

2019-05-20

Haytham H. Elmousalami

arXiv_AI

arXiv_AI Prediction Quantitative
Abstract

Field canals improvement projects (FCIPs) are one of the ambitious projects constructed to save fresh water. To finance this project, Conceptual cost models are important to accurately predict preliminary costs at the early stages of the project. The first step is to develop a conceptual cost model to identify key cost drivers affecting the project. Therefore, input variables selection remains an important part of model development, as the poor variables selection can decrease model precision. The study discovered the most important drivers of FCIPs based on a qualitative approach and a quantitative approach. Subsequently, the study has developed a parametric cost model based on machine learning methods such as regression methods, artificial neural networks, fuzzy model and case-based reasoning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.11804

PDF

http://arxiv.org/pdf/1905.11804
Read All
Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error

2019-05-20

Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

arXiv_CL

arXiv_CL Tracking
Abstract

Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: upcoming words are more difficultto predict when the context is degraded, leading to increased surprisal.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00595

PDF

http://arxiv.org/pdf/1902.00595
Read All
Testing Deep Neural Network based Image Classifiers

2019-05-20

Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Baishakhi Ray

arXiv_CV

arXiv_CV Image_Classification Classification
Abstract

Image classification is an important task in today’s world with many applications from socio-technical to safety-critical domains. The recent advent of Deep Neural Network (DNN) is the key behind such a wide-spread success. However, such wide adoption comes with the concerns about the reliability of these systems, as several erroneous behaviors have already been reported in many sensitive and critical circumstances. Thus, it has become crucial to rigorously test the image classifiers to ensure high reliability. Many reported erroneous cases in popular neural image classifiers appear because the models often confuse one class with another, or show biases towards some classes over others. These errors usually violate some group properties. Most existing DNN testing and verification techniques focus on per image violations and thus fail to detect such group-level confusions or biases. In this paper, we design, implement and evaluate DeepInspect, a white box testing tool, for automatically detecting confusion and bias of DNN-driven image classification applications. We evaluate DeepInspect using popular DNN-based image classifiers and detect hundreds of classification mistakes. Some of these cases are able to expose potential biases of the network towards certain populations. DeepInspect further reports many classification errors in state-of-the-art robust models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07831

PDF

http://arxiv.org/pdf/1905.07831
Read All
2019-05-31

Read All
HellaSwag: Can a Machine Really Finish Your Sentence?

2019-05-19

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi

arXiv_CL

arXiv_CL Adversarial Inference
Abstract

Recent work by Zellers et al. (2018) introduced a new task of commonsense natural language inference: given an event description such as “A woman sits at a piano,” a machine must select the most likely followup: “She sets her fingers on the keys.” With the introduction of BERT, near human-level performance was reached. Does this mean that machines can perform human level commonsense inference? In this paper, we show that commonsense inference still proves difficult for even state-of-the-art models, by presenting HellaSwag, a new challenge dataset. Though its questions are trivial for humans (>95% accuracy), state-of-the-art models struggle (<48%). We achieve this via Adversarial Filtering (AF), a data collection paradigm wherein a series of discriminators iteratively select an adversarial set of machine-generated wrong answers. AF proves to be surprisingly robust. The key insight is to scale up the length and complexity of the dataset examples towards a critical ‘Goldilocks’ zone wherein generated text is ridiculous to humans, yet often misclassified by state-of-the-art models. Our construction of HellaSwag, and its resulting difficulty, sheds light on the inner workings of deep pretrained models. More broadly, it suggests a new path forward for NLP research, in which benchmarks co-evolve with the evolving state-of-the-art in an adversarial way, so as to present ever-harder challenges.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07830

PDF

http://arxiv.org/pdf/1905.07830
Read All
U-Net Based Multi-instance Video Object Segmentation

2019-05-19

Heguang Liu, Jingle Jiang

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

Multi-instance video object segmentation is to segment specific instances throughout a video sequence in pixel level, given only an annotated first frame. In this paper, we implement an effective fully convolutional networks with U-Net similar structure built on top of OSVOS fine-tuned layer. We use instance isolation to transform this multi-instance segmentation problem into binary labeling problem, and use weighted cross entropy loss and dice coefficient loss as our loss function. Our best model achieves F mean of 0.467 and J mean of 0.424 on DAVIS dataset, which is a comparable performance with the State-of-the-Art approach. But case analysis shows this model can achieve a smoother contour and better instance coverage, meaning it better for recall focused segmentation scenario. We also did experiments on other convolutional neural networks, including Seg-Net, Mask R-CNN, and provide insightful comparison and discussion.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07826

PDF

http://arxiv.org/pdf/1905.07826
Read All
Spatio-Temporal Adversarial Learning for Detecting Unseen Falls

2019-05-19

Shehroz S. Khan, Jacob Nogas, Alex Mihailidis

arXiv_CV

arXiv_CV Adversarial Detection
Abstract

Fall detection is an important problem from both the health and machine learning perspective. A fall can lead to severe injuries, long term impairments or even death in some cases. In terms of machine learning, it presents a severely class imbalance problem with very few or no training data for falls owing to the fact that falls occur rarely. In this paper, we take an alternate philosophy to detect falls in the absence of their training data, by training the classifier on only the normal activities (that are available in abundance) and identifying a fall as an anomaly. To realize such a classifier, we use an adversarial learning framework, which comprises of a spatio-temporal autoencoder for reconstructing input video frames and a spatio-temporal convolution network to discriminate them against original video frames. 3D convolutions are used to learn spatial and temporal features from the input video frames. The adversarial learning of the spatio-temporal autoencoder will enable reconstructing the normal activities of daily living efficiently; thus, rendering detecting unseen falls plausible within this framework. We tested the performance of the proposed framework on camera sensing modalities that may preserve an individual’s privacy (fully or partially), such as thermal and depth camera. Our results on three publicly available datasets show that the proposed spatio-temporal adversarial framework performed better than other frame based (or spatial) adversarial learning methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07817

PDF

http://arxiv.org/pdf/1905.07817
Read All
NLP-assisted software testing: a systematic review

2019-05-19

Vahid Garousi, Sara Bauer, Michael Felderer

arXiv_CL

arXiv_CL Review Knowledge Survey Classification
Abstract

To reduce manual effort of extracting test cases from natural-language requirements, many approaches based on Natural Language Processing (NLP) have been proposed in the literature. Given the large amount of approaches in this area, and since many practitioners are eager to utilize such techniques, it is important to synthesize and provide an overview of the state-of-the-art in this area. Our objective is to summarize the state-of-the-art in NLP-assisted software testing which could benefit practitioners to potentially utilize those NLP-based techniques. Moreover, this can benefit researchers in providing an overview of the research landscape. To address the above need, we conducted a survey in the form of a systematic literature mapping (classification) and systematic literature review (SLR). After compiling an initial pool of 95 papers, we conducted a systematic voting, and our final pool included 67 technical papers. This review paper provides an overview of the contribution types presented in the papers, types of NLP approaches used to assist software testing, types of required input requirements, and a review of tool support in this area. Some key results we have detected are: (1) only four of the 38 tools (11%) presented in the papers are available for download; (2) a larger ratio of the papers (30 of 67) provided a shallow exposure to the NLP aspects (almost no details). Conclusion: This paper would benefit both practitioners and researchers by serving as an “index” to the body of knowledge in this area. The results could help practitioners utilizing the existing NLP-based techniques; this, in turn, reduces the cost of test-case design and decreases the amount of human resources spent on test activities. After sharing this review with some of our industrial collaborators, initial insights show that this review can indeed be useful and beneficial to practitioners.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.00696

PDF

http://arxiv.org/pdf/1806.00696
Read All
Structured Summarization of Academic Publications

2019-05-19

Alexios Gidiotis, Grigorios Tsoumakas

arXiv_CL

arXiv_CL Summarization
Abstract

We propose SUSIE, a novel summarization method that can work with state-of-the-art summarization models in order to produce structured scientific summaries for academic articles. We also created PMC-SA, a new dataset of academic publications, suitable for the task of structured summarization with neural networks. We apply SUSIE combined with three different summarization models on the new PMC-SA dataset and we show that the proposed method improves the performance of all models by as much as 4 ROUGE points.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07695

PDF

http://arxiv.org/pdf/1905.07695
Read All
2019-05-31

Read All
Deep Reinforcement Learning for Autonomous Driving

2019-05-19

Sen Wang, Daoyuan Jia, Xinshuo Weng

arXiv_CV

arXiv_CV Reinforcement_Learning Quantitative
Abstract

Reinforcement learning has steadily improved and outperform human in lots of traditional games since the resurgence of deep neural network. However, these success is not easy to be copied to autonomous driving because the state spaces in real world are extreme complex and action spaces are continuous and fine control is required. Moreover, the autonomous driving vehicles must also keep functional safety under the complex environments. To deal with these challenges, we first adopt the deep deterministic policy gradient (DDPG) algorithm, which has the capacity to handle complex state and action spaces in continuous domain. We then choose The Open Racing Car Simulator (TORCS) as our environment to avoid physical damage. Meanwhile, we select a set of appropriate sensor information from TORCS and design our own rewarder. In order to fit DDPG algorithm to TORCS, we design our network architecture for both actor and critic inside DDPG paradigm. To demonstrate the effectiveness of our model, We evaluate on different modes in TORCS and show both quantitative and qualitative results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.11329

PDF

http://arxiv.org/pdf/1811.11329
Read All
Characterizing SLAM Benchmarks and Methods for the Robust Perception Age

2019-05-19

Wenkai Ye, Yipu Zhao, Patricio A. Vela

arXiv_CV

arXiv_CV SLAM
Abstract

The diversity of SLAM benchmarks affords extensive testing of SLAM algorithms to understand their performance, individually or in relative terms. The ad-hoc creation of these benchmarks does not necessarily illuminate the particular weak points of a SLAM algorithm when performance is evaluated. In this paper, we propose to use a decision tree to identify challenging benchmark properties for state-of-the-art SLAM algorithms and important components within the SLAM pipeline regarding their ability to handle these challenges. Establishing what factors of a particular sequence lead to track failure or degradation relative to these characteristics is important if we are to arrive at a strong understanding for the core computational needs of a robust SLAM algorithm. Likewise, we argue that it is important to profile the computational performance of the individual SLAM components for use when benchmarking. In particular, we advocate the use of time-dilation during ROS bag playback, or what we refer to as slo-mo playback. Using slo-mo to benchmark SLAM instantiations can provide clues to how SLAM implementations should be improved at the computational component level. Three prevalent VO/SLAM algorithms and two low-latency algorithms of our own are tested on selected typical sequences, which are generated from benchmark characterization, to further demonstrate the benefits achieved from computationally efficient components.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07808

PDF

http://arxiv.org/pdf/1905.07808
Read All
Good Feature Selection for Least Squares Pose Optimization in VO/VSLAM

2019-05-19

Yipu Zhao, Patricio A. Vela

arXiv_RO

arXiv_RO Pose_Estimation Tracking Optimization SLAM
Abstract

This paper aims to select features that contribute most to the pose estimation in VO/VSLAM. Unlike existing feature selection works that are focused on efficiency only, our method significantly improves the accuracy of pose tracking, while introducing little overhead. By studying the impact of feature selection towards least squares pose optimization, we demonstrate the applicability of improving accuracy via good feature selection. To that end, we introduce the Max-logDet metric to guide the feature selection, which is connected to the conditioning of least squares pose optimization problem. We then describe an efficient algorithm for approximately solving the NP-hard Max-logDet problem. Integrating Max-logDet feature selection into a state-of-the-art visual SLAM system leads to accuracy improvements with low overhead, as demonstrated via evaluation on a public benchmark.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07807

PDF

http://arxiv.org/pdf/1905.07807
Read All
Techniques for Interpretable Machine Learning

2019-05-19

Mengnan Du, Ninghao Liu, Xia Hu

arXiv_AI

arXiv_AI Survey
Abstract

Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a comprehensive understanding of the achievements and challenges is still lacking. We provide a survey covering existing techniques to increase the interpretability of machine learning models. We also discuss crucial issues that the community should consider in future work such as designing user-friendly explanations and developing comprehensive evaluation metrics to further push forward the area of interpretable machine learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.00033

PDF

http://arxiv.org/pdf/1808.00033
Read All
CyLKs: Unsupervised Cycle Lucas-Kanade Network for Landmark Tracking

2019-05-19

Xinshuo Weng

arXiv_CV

arXiv_CV Tracking CNN
Abstract

Across a majority of modern learning-based tracking systems, expensive annotations are needed to achieve state-of-the-art performance. In contrast, the Lucas-Kanade (LK) algorithm works well without any annotation. However, LK has a strong assumption of photometric (brightness) consistency on image intensity and is easy to drift because of large motion, occlusion, and aperture problem. To relax the assumption and alleviate the drift problem, we propose CyLKs, a data-driven way of training Lucas-Kanade in an unsupervised manner. CyLKs learns a feature transformation through CNNs, transforming the input images to a feature space which is especially favorable to LK tracking. During training, we perform differentiable Lucas-Kanade forward and backward on the convolutional feature maps, and then minimize the re-projection error. During testing, we perform the LK tracking on the learned features. We apply our model to the task of landmark tracking and perform experiments on datasets of THUMOS, 300VW, and Mugsy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.11325

PDF

http://arxiv.org/pdf/1811.11325
Read All
Image Labeling with Markov Random Fields and Conditional Random Fields

2019-05-19

Shangxuan Wu, Xinshuo Weng

arXiv_CV

arXiv_CV Review Segmentation
Abstract

Most existing methods for object segmentation in computer vision are formulated as a labeling task. This, in general, could be transferred to a pixel-wise label assignment task, which is quite similar to the structure of hidden Markov random field. In terms of Markov random field, each pixel can be regarded as a state and has a transition probability to its neighbor pixel, the label behind each pixel is a latent variable and has an emission probability from its corresponding state. In this paper, we reviewed several modern image labeling methods based on Markov random field and conditional random Field. And we compare the result of these methods with some classical image labeling methods. The experiment demonstrates that the introduction of Markov random field and conditional random field make a big difference in the segmentation result.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.11323

PDF

http://arxiv.org/pdf/1811.11323
Read All
Low-latency Visual SLAM with Appearance-Enhanced Local Map Building

2019-05-19

Yipu Zhao, Wenkai Ye, Patricio A. Vela

arXiv_CV

arXiv_CV Pose_Estimation Tracking SLAM
Abstract

A local map module is often implemented in modern VO/VSLAM systems to improve data association and pose estimation. Conventionally, the local map contents are determined by co-visibility. While co-visibility is cheap to establish, it utilizes the relatively-weak temporal prior (i.e. seen before, likely to be seen now), therefore admitting more features into the local map than necessary. This paper describes an enhancement to co-visibility local map building by incorporating a strong appearance prior, which leads to a more compact local map and latency reduction in downstream data association. The appearance prior collected from the current image influences the local map contents: only the map features visually similar to the current measurements are potentially useful for data association. To that end, mapped features are indexed and queried with Multi-index Hashing (MIH). An online hash table selection algorithm is developed to further reduce the query overhead of MIH and the local map size. The proposed appearance-based local map building method is integrated into a state-of-the-art VO/VSLAM system. When evaluated on two public benchmarks, the size of the local map, as well as the latency of real-time pose tracking in VO/VSLAM are significantly reduced. Meanwhile, the VO/VSLAM mean performance is preserved or improves.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07797

PDF

http://arxiv.org/pdf/1905.07797
Read All
Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

2019-05-19

Yinfei Yang, Oshin Agarwal, Chris Tar, Byron C. Wallace, Ani Nenkova

arXiv_CL

arXiv_CL Prediction
Abstract

Modern NLP systems require high-quality annotated data. In specialized domains, expert annotations may be prohibitively expensive. An alternative is to rely on crowdsourcing to reduce costs at the risk of introducing noise. In this paper we demonstrate that directly modeling instance difficulty can be used to improve model performance, and to route instances to appropriate annotators. Our difficulty prediction model combines two learned representations: a `universal’ encoder trained on out-of-domain data, and a task-specific encoder. Experiments on a complex biomedical information extraction task using expert and lay annotators show that: (i) simply excluding from the training data instances predicted to be difficult yields a small boost in performance; (ii) using difficulty scores to weight instances during training provides further, consistent gains; (iii) assigning instances predicted to be difficult to domain experts is an effective strategy for task routing. Our experiments confirm the expectation that for specialized tasks expert annotations are higher quality than crowd labels, and hence preferable to obtain if practical. Moreover, augmenting small amounts of expert data with a larger set of lay annotations leads to further improvements in model performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07791

PDF

http://arxiv.org/pdf/1905.07791
Read All
Correlation Coefficients and Semantic Textual Similarity

2019-05-19

Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Nils Y. Hammerla

arXiv_CL

arXiv_CL Attention Embedding Relation
Abstract

A large body of research into semantic textual similarity has focused on constructing state-of-the-art embeddings using sophisticated modelling, careful choice of learning signals and many clever tricks. By contrast, little attention has been devoted to similarity measures between these embeddings, with cosine similarity being used unquestionably in the majority of cases. In this work, we illustrate that for all common word vectors, cosine similarity is essentially equivalent to the Pearson correlation coefficient, which provides some justification for its use. We thoroughly characterise cases where Pearson correlation (and thus cosine similarity) is unfit as similarity measure. Importantly, we show that Pearson correlation is appropriate for some word vectors but not others. When it is not appropriate, we illustrate how common non-parametric rank correlation coefficients can be used instead to significantly improve performance. We support our analysis with a series of evaluations on word-level and sentence-level semantic textual similarity benchmarks. On the latter, we show that even the simplest averaged word vectors compared by rank correlation easily rival the strongest deep representations compared by cosine similarity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07790

PDF

http://arxiv.org/pdf/1905.07790
Read All
Revisiting the Softmax Bellman Operator: New Benefits and New Perspective

2019-05-19

Zhao Song, Ronald E. Parr, Lawrence Carin

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

The impact of softmax on the value function itself in reinforcement learning (RL) is often viewed as problematic because it leads to sub-optimal value (or Q) functions and interferes with the contraction properties of the Bellman operator. Surprisingly, despite these concerns, and independent of its effect on exploration, the softmax Bellman operator when combined with Deep Q-learning, leads to Q-functions with superior policies in practice, even outperforming its double Q-learning counterpart. To better understand how and why this occurs, we revisit theoretical properties of the softmax Bellman operator, and prove that $(i)$ it converges to the standard Bellman operator exponentially fast in the inverse temperature parameter, and $(ii)$ the distance of its Q function from the optimal one can be bounded. These alone do not explain its superior performance, so we also show that the softmax operator can reduce the overestimation error, which may give some insight into why a sub-optimal operator leads to better performance in the presence of value function approximation. A comparison among different Bellman operators is then presented, showing the trade-offs when selecting them.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.00456

PDF

http://arxiv.org/pdf/1812.00456
Read All
Topic-Enhanced Memory Networks for Personalised Point-of-Interest Recommendation

2019-05-19

Xiao Zhou, Cecilia Mascolo, Zhongxiang Zhao

arXiv_CV

arXiv_CV Attention Relation Memory_Networks Recommendation
Abstract

Point-of-Interest (POI) recommender systems play a vital role in people’s lives by recommending unexplored POIs to users and have drawn extensive attention from both academia and industry. Despite their value, however, they still suffer from the challenges of capturing complicated user preferences and fine-grained user-POI relationship for spatio-temporal sensitive POI recommendation. Existing recommendation algorithms, including both shallow and deep approaches, usually embed the visiting records of a user into a single latent vector to model user preferences: this has limited power of representation and interpretability. In this paper, we propose a novel topic-enhanced memory network (TEMN), a deep architecture to integrate the topic model and memory network capitalising on the strengths of both the global structure of latent patterns and local neighbourhood-based features in a nonlinear fashion. We further incorporate a geographical module to exploit user-specific spatial preference and POI-specific spatial influence to enhance recommendations. The proposed unified hybrid model is widely applicable to various POI recommendation scenarios. Extensive experiments on real-world WeChat datasets demonstrate its effectiveness (improvement ratio of 3.25% and 29.95% for context-aware and sequential recommendation, respectively). Also, qualitative analysis of the attention weights and topic modeling provides insight into the model’s recommendation process and results.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.13127

PDF

https://arxiv.org/pdf/1905.13127
Read All
A type of generalization error induced by initialization in deep neural networks

2019-05-19

Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma

arXiv_AI

arXiv_AI Quantitative
Abstract

How different initializations and loss functions affect the learning of a deep neural network (DNN), specifically its generalization error, is an important problem in practice. In this work, focusing on regression problems, we develop a kernel-norm minimization framework for the analysis of DNNs in the kernel regime in which the number of neurons in each hidden layer is sufficiently large (Jacot et al. 2018, Lee et al. 2019). We find that, in the kernel regime, for any loss in a general class of functions, e.g., any Lp loss for $1 < p < \infty$, the DNN finds the same global minima-the one that is nearest to the initial value in the parameter space, or equivalently, the one that is closest to the initial DNN output in the corresponding reproducing kernel Hilbert space. With this framework, we prove that a non-zero initial output increases the generalization error of DNN. We further propose an antisymmetrical initialization (ASI) trick that eliminates this type of error and accelerates the training. We also demonstrate experimentally that even for DNNs in the non-kernel regime, our theoretical analysis and the ASI trick remain effective. Overall, our work provides insight into how initialization and loss function quantitatively affect the generalization of DNNs, and also provides guidance for the training of DNNs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07777

PDF

http://arxiv.org/pdf/1905.07777
Read All
Online Convex Optimization in Adversarial Markov Decision Processes

2019-05-19

Aviv Rosenberg, Yishay Mansour

arXiv_AI

arXiv_AI Regularization Adversarial Optimization
Abstract

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show $\tilde{O}(L|X|\sqrt{|A|T})$ regret bound, where $T$ is the number of episodes, $X$ is the state space, $A$ is the action space, and $L$ is the length of each episode. Our online algorithm is implemented using entropic regularization methodology, which allows to extend the original adversarial MDP model to handle convex performance criteria (different ways to aggregate the losses of a single episode) , as well as improve previous regret bounds.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07773

PDF

http://arxiv.org/pdf/1905.07773
Read All
Leveraging Semantic Embeddings for Safety-Critical Applications

2019-05-19

Thomas Brunner, Frederik Diehl, Michael Truong Le, Alois Knoll

arXiv_CV

arXiv_CV Knowledge Embedding Prediction Detection
Abstract

Semantic Embeddings are a popular way to represent knowledge in the field of zero-shot learning. We observe their interpretability and discuss their potential utility in a safety-critical context. Concretely, we propose to use them to add introspection and error detection capabilities to neural network classifiers. First, we show how to create embeddings from symbolic domain knowledge. We discuss how to use them for interpreting mispredictions and propose a simple error detection scheme. We then introduce the concept of semantic distance: a real-valued score that measures confidence in the semantic space. We evaluate this score on a traffic sign classifier and find that it achieves near state-of-the-art performance, while being significantly faster to compute than other confidence scores. Our approach requires no changes to the original network and is thus applicable to any task for which domain knowledge is available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07733

PDF

http://arxiv.org/pdf/1905.07733
Read All
Progressive Feature Alignment for Unsupervised Domain Adaptation

2019-05-19

Chaoqi Chen, Weiping Xie, Wenbing Huang, Yu Rong, Xinghao Ding, Yue Huang, Tingyang Xu, Junzhou Huang

arXiv_CV

arXiv_CV Knowledge Classification
Abstract

Unsupervised domain adaptation (UDA) transfers knowledge from a label-rich source domain to a fully-unlabeled target domain. To tackle this task, recent approaches resort to discriminative domain transfer in virtue of pseudo-labels to enforce the class-level distribution alignment across the source and target domains. These methods, however, are vulnerable to the error accumulation and thus incapable of preserving cross-domain category consistency, as the pseudo-labeling accuracy is not guaranteed explicitly. In this paper, we propose the Progressive Feature Alignment Network (PFAN) to align the discriminative features across domains progressively and effectively, via exploiting the intra-class variation in the target domain. To be specific, we first develop an Easy-to-Hard Transfer Strategy (EHTS) and an Adaptive Prototype Alignment (APA) step to train our model iteratively and alternatively. Moreover, upon observing that a good domain adaptation usually requires a non-saturated source classifier, we consider a simple yet efficient way to retard the convergence speed of the source classification loss by further involving a temperature variate into the soft-max function. The extensive experimental results reveal that the proposed PFAN exceeds the state-of-the-art performance on three UDA datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.08585

PDF

http://arxiv.org/pdf/1811.08585
Read All
Double Supervised Network with Attention Mechanism for Scene Text Recognition

2019-05-19

Yuting Gao, Zheng Huang, Yuchen Dai

arXiv_AI

arXiv_AI Attention Recognition
Abstract

In this paper, we propose Double Supervised Network with Attention Mechanism (DSAN), a novel end-to-end trainable framework for scene text recognition. It incorporates one text attention module during feature extraction which enforces the model to focus on text regions and the whole framework is supervised by two branches. One supervision branch comes from context-level modelling and another comes from one extra supervision enhancement branch which aims at tackling inexplicit semantic information at character level. These two supervisions can benefit each other and yield better performance. The proposed approach can recognize text in arbitrary length and does not need any predefined lexicon. Our method outperforms the current state-of-the-art methods on three text recognition benchmarks: IIIT5K, ICDAR2013 and SVT reaching accuracy 88.6%, 92.3% and 84.1% respectively which suggests the effectiveness of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.00677

PDF

http://arxiv.org/pdf/1808.00677
Read All
Earlier Attention? Aspect-Aware LSTM for Aspect Sentiment Analysis

2019-05-19

Bowen Xing, Lejian Liao, Dandan Song, Jingang Wang, Fuzheng Zhang, Zhongyuan Wang, Heyan Huang

arXiv_CL

arXiv_CL Sentiment Attention RNN
Abstract

Aspect-based sentiment analysis (ABSA) aims to predict fine-grained sentiments of comments with respect to given aspect terms or categories. In previous ABSA methods, the importance of aspect has been realized and verified. Most existing LSTM-based models take aspect into account via the attention mechanism, where the attention weights are calculated after the context is modeled in the form of contextual vectors. However, aspect-related information may be already discarded and aspect-irrelevant information may be retained in classic LSTM cells in the context modeling process, which can be improved to generate more effective context representations. This paper proposes a novel variant of LSTM, termed as aspect-aware LSTM (AA-LSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism. Therefore, our AA-LSTM can dynamically produce aspect-aware contextual representations. We experiment with several representative LSTM-based models by replacing the classic LSTM cells with the AA-LSTM cells. Experimental results on SemEval-2014 Datasets demonstrate the effectiveness of AA-LSTM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07719

PDF

http://arxiv.org/pdf/1905.07719
Read All
Geometric Pose Affordance: 3D Human Pose with Scene Constraints

2019-05-19

Zhe Wang, Liyan Chen, Shaurya Rathore, Daeyun Shin, Charless Fowlkes

arXiv_CV

arXiv_CV Knowledge Face Pose_Estimation
Abstract

Full 3D estimation of human pose from a single image remains a challenging task despite many recent advances. In this paper, we explore the hypothesis that strong prior information about scene geometry can be used to improve pose estimation accuracy. To tackle this question empirically, we have assembled a novel $\textbf{Geometric Pose Affordance}$ dataset, consisting of multi-view imagery of people interacting with a variety of rich 3D environments. We utilized a commercial motion capture system to collect gold-standard estimates of pose and construct accurate geometric 3D CAD models of the scene itself. To inject prior knowledge of scene constraints into existing frameworks for pose estimation from images, we introduce a novel, view-based representation of scene geometry, a $\textbf{multi-layer depth map}$, which employs multi-hit ray tracing to concisely encode multiple surface entry and exit points along each camera view ray direction. We propose two different mechanisms for integrating multi-layer depth information pose estimation: input as encoded ray features used in lifting 2D pose to full 3D, and secondly as a differentiable loss that encourages learned models to favor geometrically consistent pose estimates. We show experimentally that these techniques can improve the accuracy of 3D pose estimates, particularly in the presence of occlusion and complex scene geometry.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07718

PDF

http://arxiv.org/pdf/1905.07718
Read All
A 2D dilated residual U-Net for multi-organ segmentation in thoracic CT

2019-05-19

Sulaiman Vesal, Nishant Ravikumar, Andreas Maier

arXiv_CV

arXiv_CV Segmentation GAN CNN Deep_Learning
Abstract

Automatic segmentation of organs-at-risk (OAR) in computed tomography (CT) is an essential part of planning effective treatment strategies to combat lung and esophageal cancer. Accurate segmentation of organs surrounding tumours helps account for the variation in position and morphology inherent across patients, thereby facilitating adaptive and computer-assisted radiotherapy. Although manual delineation of OARs is still highly prevalent, it is prone to errors due to complex variations in the shape and position of organs across patients, and low soft tissue contrast between neighbouring organs in CT images. Recently, deep convolutional neural networks (CNNs) have gained tremendous traction and achieved state-of-the-art results in medical image segmentation. In this paper, we propose a deep learning framework to segment OARs in thoracic CT images, specifically for the: heart, esophagus, trachea and aorta. Our approach employs dilated convolutions and aggregated residual connections in the bottleneck of a U-Net styled network, which incorporates global context and dense information. Our method achieved an overall Dice score of 91.57% on 20 unseen test samples from the ISBI 2019 SegTHOR challenge.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07710

PDF

http://arxiv.org/pdf/1905.07710
Read All
Co-localization with Category-Consistent Features and Geodesic Distance Propagation

2019-05-19

Hieu Le, Chen-Ping Yu, Gregory Zelinsky, Dimitris Samaras

arXiv_CV

arXiv_CV Object_Detection Image_Classification Classification Detection
Abstract

Co-localization is the problem of localizing objects of the same class using only the set of images that contain them. This is a challenging task because the object detector must be built without negative examples that can lead to more informative supervision signals. The main idea of our method is to cluster the feature space of a generically pre-trained CNN, to find a set of CNN features that are consistently and highly activated for an object category, which we call category-consistent CNN features. Then, we propagate their combined activation map using superpixel geodesic distances for co-localization. In our first set of experiments, we show that the proposed method achieves state-of-the-art performance on three related benchmarks: PASCAL 2007, PASCAL-2012, and the Object Discovery dataset. We also show that our method is able to detect and localize truly unseen categories, on six held-out ImageNet categories with accuracy that is significantly higher than previous state-of-the-art. Our intuitive approach achieves this success without any region proposals or object detectors and can be based on a CNN that was pre-trained purely on image classification tasks without further fine-tuning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1612.03236

PDF

http://arxiv.org/pdf/1612.03236
Read All
An Objective Evaluation Metric for image fusion based on Del Operator

2019-05-19

Ali A. Kiaei, Hassan Khotanlou, Paniz Kiaei, Yasin Bhrouzi, Mahdi Abbasi

arXiv_CV

arXiv_CV
Abstract

In this paper, a novel objective evaluation metric for image fusion is presented. Remarkable and attractive points of the proposed metric are that it has no parameter, the result is probability in the range of [0, 1] and it is free from illumination dependence. This metric is easy to implement and the result is computed in four steps: (1) Smoothing the images using Gaussian filter. (2) Transforming images to a vector field using Del operator. (3) Computing the normal distribution function ({\mu},{\sigma}) for each corresponding pixel, and converting to the standard normal distribution function. (4) Computing the probability of being well-behaved fusion method as the result. To judge the quality of the proposed metric, it is compared to thirteen well-known non-reference objective evaluation metrics, where eight fusion methods are employed on seven experiments of multimodal medical images. The experimental results and statistical comparisons show that in contrast to the previously objective evaluation metrics the proposed one performs better in terms of both agreeing with human visual perception and evaluating fusion methods that are not performed at the same level.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07709

PDF

http://arxiv.org/pdf/1905.07709
Read All
FORECAST-CLSTM: A New Convolutional LSTM Network for Cloudage Nowcasting

2019-05-19

Chao Tan, Xin Feng, Jianwu Long, Li Geng

arXiv_CV

arXiv_CV CNN RNN Deep_Learning Prediction
Abstract

With the highly demand of large-scale and real-time weather service for public, a refinement of short-time cloudage prediction has become an essential part of the weather forecast productions. To provide a weather-service-compliant cloudage nowcasting, in this paper, we propose a novel hierarchical Convolutional Long-Short-Term Memory network based deep learning model, which we term as FORECAST-CLSTM, with a new Forecaster loss function to predict the future satellite cloud images. The model is designed to fuse multi-scale features in the hierarchical network structure to predict the pixel value and the morphological movement of the cloudage simultaneously. We also collect about 40K infrared satellite nephograms and create a large-scale Satellite Cloudage Map Dataset(SCMD). The proposed FORECAST-CLSTM model is shown to achieve better prediction performance compared with the state-of-the-art ConvLSTM model and the proposed Forecaster Loss Function is also demonstrated to retain the uncertainty of the real atmosphere condition better than conventional loss function.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07700

PDF

http://arxiv.org/pdf/1905.07700
Read All
Is Free Choice Permission Admissible in Classical Deontic Logic?

2019-05-19

Guido Governatori, Antonino Rotolo

arXiv_AI

arXiv_AI
Abstract

In this paper, we explore how, and if, free choice permission (FCP) can be accepted when we consider deontic conflicts between certain types of permissions and obligations. As is well known, FCP can license, under some minimal conditions, the derivation of an indefinite number of permissions. We discuss this and other drawbacks and present six Hilbert-style classical deontic systems admitting a guarded version of FCP. The systems that we present are not too weak from the inferential viewpoint, as far as permission is concerned, and do not commit to weakening any specific logic for obligations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07696

PDF

http://arxiv.org/pdf/1905.07696
Read All
Human Vocal Sentiment Analysis

2019-05-19

Andrew Huang, Puwei Bao

arXiv_SD

arXiv_SD Sentiment Classification
Abstract

In this paper, we use several techniques with conventional vocal feature extraction (MFCC, STFT), along with deep-learning approaches such as CNN, and also context-level analysis, by providing the textual data, and combining different approaches for improved emotion-level classification. We explore models that have not been tested to gauge the difference in performance and accuracy. We apply hyperparameter sweeps and data augmentation to improve performance. Finally, we see if a real-time approach is feasible, and can be readily integrated into existing systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08632

PDF

http://arxiv.org/pdf/1905.08632
Read All
DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

2019-05-19

Zhiqing Sun, Jian Tang, Pan Du, Zhi-Hong Deng, Jian-Yun Nie

arXiv_CL

arXiv_CL Summarization CNN
Abstract

Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document. DivGraphPointer combines the advantages of traditional graph-based ranking methods and recent neural network-based approaches. Specifically, given a document, a word graph is constructed from the document based on word proximity and is encoded with graph convolutional networks, which effectively capture document-level word salience by modeling long-range dependency between words in the document and aggregating multiple appearances of identical words into one node. Furthermore, we propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process. Experimental results on five benchmark data sets show that our proposed method significantly outperforms the existing state-of-the-art approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07689

PDF

http://arxiv.org/pdf/1905.07689
Read All
Learning to Memorize in Neural Task-Oriented Dialogue Systems

2019-05-19

Chien-Sheng Wu

arXiv_AI

arXiv_AI Knowledge Attention Ontology Tracking
Abstract

In this thesis, we leverage the neural copy mechanism and memory-augmented neural networks (MANNs) to address existing challenge of neural task-oriented dialogue learning. We show the effectiveness of our strategy by achieving good performance in multi-domain dialogue state tracking, retrieval-based dialogue systems, and generation-based dialogue systems. We first propose a transferable dialogue state generator (TRADE) that leverages its copy mechanism to get rid of dialogue ontology and share knowledge between domains. We also evaluate unseen domain dialogue state tracking and show that TRADE enables zero-shot dialogue state tracking and can adapt to new few-shot domains without forgetting the previous domains. Second, we utilize MANNs to improve retrieval-based dialogue learning. They are able to capture dialogue sequential dependencies and memorize long-term information. We also propose a recorded delexicalization copy strategy to replace real entity values with ordered entity types. Our models are shown to surpass other retrieval baselines, especially when the conversation has a large number of turns. Lastly, we tackle generation-based dialogue learning with two proposed models, the memory-to-sequence (Mem2Seq) and global-to-local memory pointer network (GLMP). Mem2Seq is the first model to combine multi-hop memory attention with the idea of the copy mechanism. GLMP further introduces the concept of response sketching and double pointers copying. We show that GLMP achieves the state-of-the-art performance on human evaluation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07687

PDF

http://arxiv.org/pdf/1905.07687
Read All
What Do Adversarially Robust Models Look At?

2019-05-19

Takahiro Itazuri, Yoshihiro Fukuhara, Hirokatsu Kataoka, Shigeo Morishima

arXiv_CV

arXiv_CV Adversarial Attention Quantitative
Abstract

In this paper, we address the open question: “What do adversarially robust models look at?” Recently, it has been reported in many works that there exists the trade-off between standard accuracy and adversarial robustness. According to prior works, this trade-off is rooted in the fact that adversarially robust and standard accurate models might depend on very different sets of features. However, it has not been well studied what kind of difference actually exists. In this paper, we analyze this difference through various experiments visually and quantitatively. Experimental results show that adversarially robust models look at things at a larger scale than standard models and pay less attention to fine textures. Furthermore, although it has been claimed that adversarially robust features are not compatible with standard accuracy, there is even a positive effect by using them as pre-trained models particularly in low resolution datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07666

PDF

http://arxiv.org/pdf/1905.07666
Read All
Trajectory Optimization on Manifolds: A Theoretically-Guaranteed Embedded Sequential Convex Programming Approach

2019-05-18

Riccardo Bonalli, Andrew Bylard, Abhishek Cauligi, Thomas Lew, Marco Pavone

arXiv_RO

arXiv_RO Embedding Optimization
Abstract

Sequential Convex Programming (SCP) has recently gained popularity as a tool for trajectory optimization due to its sound theoretical properties and practical performance. Yet, most SCP-based methods for trajectory optimization are restricted to Euclidean settings, which precludes their application to problem instances where one must reason about manifold-type constraints (that is, constraints, such as loop closure, which restrict the motion of a system to a subset of the ambient space). The aim of this paper is to fill this gap by extending SCP-based trajectory optimization methods to a manifold setting. The key insight is to leverage geometric embeddings to lift a manifold-constrained trajectory optimization problem into an equivalent problem defined over a space enjoying a Euclidean structure. This insight allows one to extend existing SCP methods to a manifold setting in a fairly natural way. In particular, we present a SCP algorithm for manifold problems with refined theoretical guarantees that resemble those derived for the Euclidean setting, and demonstrate its practical performance via numerical experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07654

PDF

http://arxiv.org/pdf/1905.07654
Read All
SAWNet: A Spatially Aware Deep Neural Network for 3D Point Cloud Processing

2019-05-18

Chaitanya Kaul, Nick Pears, Suresh Manandhar

arXiv_CV

arXiv_CV Segmentation Embedding Classification
Abstract

Deep neural networks have established themselves as the state-of-the-art methodology in almost all computer vision tasks to date. But their application to processing data lying on non-Euclidean domains is still a very active area of research. One such area is the analysis of point cloud data which poses a challenge due to its lack of order. Many recent techniques have been proposed, spearheaded by the PointNet architecture. These techniques use either global or local information from the point clouds to extract a latent representation for the points, which is then used for the task at hand (classification/segmentation). In our work, we introduce a neural network layer that combines both global and local information to produce better embeddings of these points. We enhance our architecture with residual connections, to pass information between the layers, which also makes the network easier to train. We achieve state-of-the-art results on the ModelNet40 dataset with our architecture, and our results are also highly competitive with the state-of-the-art on the ShapeNet part segmentation dataset and the indoor scene segmentation dataset. We plan to open source our pre-trained models on github to encourage the research community to test our networks on their data, or simply use them for benchmarking purposes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07650

PDF

http://arxiv.org/pdf/1905.07650
Read All
Learning while Competing -- 3D Modeling & Design

2019-05-18

Kalind Karia, Rucmenya Bessariya, Krishna Lala, Kavi Arya

arXiv_RO

arXiv_RO
Abstract

The e-Yantra project at IIT Bombay conducts an online competition, e-Yantra Robotics Competition (eYRC) which uses a Project Based Learning (PBL) methodology to train students to implement a robotics project in a step-by-step manner over a five-month period. Participation is absolutely free. The competition provides all resources - robot, accessories, and a problem statement - to a participating team. If selected for the finals, e-Yantra pays for them to come to the finals at IIT Bombay. This makes the competition accessible to resource-poor student teams. In this paper, we describe the methodology used in the 6th edition of eYRC, eYRC-2017 where we experimented with a Theme (projects abstracted into rulebooks) involving an advanced topic - 3D Designing and interfacing with sensors and actuators. We demonstrate that the learning outcomes are consistent with our previous studies [1]. We infer that even 3D designing to create a working model can be effectively learned in a competition mode through PBL.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07644

PDF

http://arxiv.org/pdf/1905.07644
Read All
An Open-Source System for Vision-Based Micro-Aerial Vehicle Mapping, Planning, and Flight in Cluttered Environments

2019-05-18

Helen Oleynikova, Christian Lanegger, Zachary Taylor, Michael Pantic, Alexander Millane, Roland Siegwart, Juan Nieto

arXiv_RO

arXiv_RO
Abstract

We present an open-source system for Micro-Aerial Vehicle autonomous navigation from vision-based sensing. Our system focuses on dense mapping, safe local planning, and global trajectory generation, especially when using narrow field of view sensors in very cluttered environments. In addition, details about other necessary parts of the system and special considerations for applications in real-world scenarios are presented. We focus our experiments on evaluating global planning, path smoothing, and local planning methods on real maps made on MAVs in realistic search and rescue and industrial inspection scenarios. We also perform thousands of simulations in cluttered synthetic environments, and finally validate the complete system in real-world experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.03892

PDF

http://arxiv.org/pdf/1812.03892
Read All
Evolving Rewards to Automate Reinforcement Learning

2019-05-18

Aleksandra Faust, Anthony Francis, Dar Mehta

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task objective. AutoRL, evaluated on four Mujoco continuous control tasks over two RL algorithms, shows improvements over baselines, with the the biggest uplift for more complex tasks. The video can be found at: \url{https://youtu.be/svdaOFfQyC8}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07628

PDF

http://arxiv.org/pdf/1905.07628
Read All
The New Nitrides: Layered, Ferroelectric, Magnetic, Metallic and Superconducting Nitrides to Boost the GaN Photonics and Electronics Eco-System

2019-05-18

Debdeep Jena, Ryan Page, Joseph Casamento, Phillip Dang, Jashan Singhal, Zexuan Zhang, John Wright, Guru Khalsa, Yongjin Cho, Huili Grace Xing

arXiv_CV

arXiv_CV Review GAN
Abstract

The nitride semiconductor materials GaN, AlN, and InN, and their alloys and heterostructures have been investigated extensively in the last 3 decades, leading to several technologically successful photonic and electronic devices. Just over the past few years, a number of new nitride materials have emerged with exciting photonic, electronic, and magnetic properties. Some examples are 2D and layered hBN and the III-V diamond analog cBN, the transition metal nitrides ScN, YN, and their alloys (e.g. ferroelectric ScAlN), piezomagnetic GaMnN, ferrimagnetic Mn4N, and epitaxial superconductor/semiconductor NbN/GaN heterojunctions. This article reviews the fascinating and emerging physics and science of these new nitride materials. It also discusses their potential applications in future generations of devices that take advantage of the photonic and electronic devices eco-system based on transistors, light-emitting diodes, and lasers that have already been created by the nitride semiconductors.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.07627

PDF

https://arxiv.org/pdf/1905.07627
Read All
SCALAR: Simultaneous Calibration of 2D Laser and Robot Kinematic Parameters Using Planarity and Distance Constraints

2019-05-18

Teguh Santoso Lembono, Francisco Suárez-Ruiz, Quang-Cuong Pham

arXiv_RO

arXiv_RO Optimization
Abstract

In this paper, we propose SCALAR, a calibration method to simultaneously calibrate the kinematic parameters of a 6-DoF robot and the extrinsic parameters of a 2D Laser Range Finder (LRF) attached to the robot’s flange. The calibration setup requires only a flat plate with two small holes carved on it at a known distance from each other, and a sharp tool-tip attached to the robot’s flange. The calibration is formulated as a nonlinear optimization problem where the laser and the tool-tip are used to provide planar and distance constraints, and the optimization problem is solved using Levenberg-Marquardt algorithm. We demonstrate through experiments that SCALAR can reduce the mean and the maximum tool position error from 0.44 mm to 0.19 mm and from 1.41 mm to 0.50 mm, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07625

PDF

http://arxiv.org/pdf/1905.07625
Read All

20/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL