Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Deep Anchored Convolutional Neural Networks

2019-04-22

Jiahui Huang, Kshitij Dwivedi, Gemma Roig

arXiv_CV

arXiv_CV CNN Prediction
Abstract

Convolutional Neural Networks (CNNs) have been proven to be extremely successful at solving computer vision tasks. State-of-the-art methods favor such deep network architectures for its accuracy performance, with the cost of having massive number of parameters and high weights redundancy. Previous works have studied how to prune such CNNs weights. In this paper, we go to another extreme and analyze the performance of a network stacked with a single convolution kernel across layers, as well as other weights sharing techniques. We name it Deep Anchored Convolutional Neural Network (DACNN). Sharing the same kernel weights across layers allows to reduce the model size tremendously, more precisely, the network is compressed in memory by a factor of L, where L is the desired depth of the network, disregarding the fully connected layer for prediction. The number of parameters in DACNN barely increases as the network grows deeper, which allows us to build deep DACNNs without any concern about memory costs. We also introduce a partial shared weights network (DACNN-mix) as well as an easy-plug-in module, coined regulators, to boost the performance of our architecture. We validated our idea on 3 datasets: CIFAR-10, CIFAR-100 and SVHN. Our results show that we can save massive amounts of memory with our model, while maintaining a high accuracy performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09764

PDF

http://arxiv.org/pdf/1904.09764
Read All
Water-Filling: An Efficient Algorithm for Digitized Document Shadow Removal

2019-04-22

Seungjun Jung, Muhammad Abul Hasan, Changick Kim

arXiv_CV

arXiv_CV Face
Abstract

In this paper, we propose a novel algorithm to rectify illumination of the digitized documents by eliminating shading artifacts. Firstly, a topographic surface of an input digitized document is created using luminance value of each pixel. Then the shading artifact on the document is estimated by simulating an immersion process. The simulation of the immersion process is modeled using a novel diffusion equation with an iterative update rule. After estimating the shading artifacts, the digitized document is reconstructed using the Lambertian surface model. In order to evaluate the performance of the proposed algorithm, we conduct rigorous experiments on a set of digitized documents which is generated using smartphones under challenging lighting conditions. According to the experimental results, it is found that the proposed method produces promising illumination correction results and outperforms the results of the state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09763

PDF

http://arxiv.org/pdf/1904.09763
Read All
FoxNet: A Multi-face Alignment Method

2019-04-22

Yuxiang Wu, Zehua Cheng, Bin Huang, Yiming Chen, Kele Xu, Weiyang Wang

arXiv_CV

arXiv_CV Face Tracking Detection Recognition Face_Recognition
Abstract

Multi-face alignment aims to identify geometry structures of multiple human face in a image, and its performance is important for the many practical tasks, such as face recognition, face tracking and face animation. In this work, we present a fast bottom-up multi-face alignment approach landmark detection approach, which can simultaneously localize multi-person facial landmarks with high precision. In more detail, unlike previous top-down approach, our bottom-up architecture maps the landmarks to the high-dimensional space. Then, the discriminative high-dimensional features are aggregated to represent the landmarks. By clustering the features belonging to the same face, our approach can align the multi-person facial landmarks synchronously. Extensive experiments are conducted in this paper, and the experimental results demonstrate that our method can achieve the high performance in the multiface landmark alignment task while our model is extremely fast. Moreover, we propose a new multi-face dataset to compare the speed and precision of bottom-up face alignment method. Our dataset is publicly available at https://github.com/AISAResearch/FoxNet

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09758

PDF

http://arxiv.org/pdf/1904.09758
Read All
Non-local Attention Optimized Deep Image Compression

2019-04-22

Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Xun Cao, Yao Wang, Zhan Ma

arXiv_CV

arXiv_CV Attention Relation
Abstract

This paper proposes a novel Non-Local Attention Optimized Deep Image Compression (NLAIC) framework, which is built on top of the popular variational auto-encoder (VAE) structure. Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations, and apply attention mechanism to generate masks that are used to weigh the features for the image and hyperprior, which implicitly adapt bit allocation for different features based on their importance. Furthermore, both hyperpriors and spatial-channel neighbors of the latent features are used to improve entropy coding. The proposed model outperforms the existing methods on Kodak dataset, including learned (e.g., Balle2019, Balle2018) and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion metrics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09757

PDF

http://arxiv.org/pdf/1904.09757
Read All
I Know What You Want: Semantic Learning for Text Comprehension

2019-04-22

Zhuosheng Zhang, Yuwei Wu, Zuchao Li, Shexia He, Hai Zhao

arXiv_CL

arXiv_CL Salient Embedding Inference Deep_Learning
Abstract

Who did what to whom is a major focus in natural language understanding, which is right the aim of semantic role labeling (SRL). Although SRL is naturally essential to text comprehension tasks, it is surprisingly ignored in previous work. This paper thus makes the first attempt to let SRL enhance text comprehension and inference through specifying verbal arguments and their corresponding semantic roles. In terms of deep learning models, our embeddings are enhanced by semantic role labels for more fine-grained semantics. We show that the salient labels can be conveniently added to existing models and significantly improve deep learning models in challenging text comprehension tasks. Extensive experiments on benchmark machine reading comprehension and inference datasets verify that the proposed semantic learning helps our system reach new state-of-the-art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.02794

PDF

http://arxiv.org/pdf/1809.02794
Read All
Forward Vehicle Collision Warning Based on Quick Camera Calibration

2019-04-22

Yuwei Lu, Yuan Yuan, Qi Wang

arXiv_AI

arXiv_AI Detection
Abstract

Forward Vehicle Collision Warning (FCW) is one of the most important functions for autonomous vehicles. In this procedure, vehicle detection and distance measurement are core components, requiring accurate localization and estimation. In this paper, we propose a simple but efficient forward vehicle collision warning framework by aggregating monocular distance measurement and precise vehicle detection. In order to obtain forward vehicle distance, a quick camera calibration method which only needs three physical points to calibrate related camera parameters is utilized. As for the forward vehicle detection, a multi-scale detection algorithm that regards the result of calibration as distance priori is proposed to improve the precision. Intensive experiments are conducted in our established real scene dataset and the results have demonstrated the effectiveness of the proposed framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12642

PDF

http://arxiv.org/pdf/1904.12642
Read All
Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection

2019-04-22

Yuan Yuan, Yuwei Lu, Qi Wang

arXiv_AI

arXiv_AI Tracking Detection
Abstract

Video-based vehicle detection and tracking is one of the most important components for Intelligent Transportation Systems (ITS). When it comes to road junctions, the problem becomes even more difficult due to the occlusions and complex interactions among vehicles. In order to get a precise detection and tracking result, in this work we propose a novel tracking-by-detection framework. In the detection stage, we present a sequential detection model to deal with serious occlusions. In the tracking stage, we model group behavior to treat complex interactions with overlaps and ambiguities. The main contributions of this paper are twofold: 1) Shape prior is exploited in the sequential detection model to tackle occlusions in crowded scene. 2) Traffic force is defined in the traffic scene to model group behavior, and it can assist to handle complex interactions among vehicles. We evaluate the proposed approach on real surveillance videos at road junctions and the performance has demonstrated the effectiveness of our method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12641

PDF

http://arxiv.org/pdf/1904.12641
Read All
The Curious Case of Neural Text Degeneration

2019-04-22

Ari Holtzman, Jan Buys, Maxwell Forbes, Yejin Choi

arXiv_CL

arXiv_CL Language_Model
Abstract

Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators. The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, using likelihood as a decoding objective leads to text that is bland and strangely repetitive. In this paper, we reveal surprising distributional differences between human text and machine text. In addition, we find that decoding strategies alone can dramatically effect the quality of machine text, even when generated from exactly the same neural language model. Our findings motivate Nucleus Sampling, a simple but effective method to draw the best out of neural generation. By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09751

PDF

http://arxiv.org/pdf/1904.09751
Read All
GraphNAS: Graph Neural Architecture Search with Reinforcement Learning

2019-04-22

Yang Gao, Hong Yang, Peng Zhang, Chuan Zhou, Yue Hu

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning Classification
Abstract

Graph Neural Networks (GNNs) have been popularly used for analyzing non-Euclidean data such as social network data and biological data. Despite their success, the design of graph neural networks requires a lot of manual work and domain knowledge. In this paper, we propose a Graph Neural Architecture Search method (GraphNAS for short) that enables automatic search of the best graph neural architecture based on reinforcement learning. Specifically, GraphNAS first uses a recurrent network to generate variable-length strings that describe the architectures of graph neural networks, and then trains the recurrent network with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation data set. Extensive experimental results on node classification tasks in both transductive and inductive learning settings demonstrate that GraphNAS can achieve consistently better performance on the Cora, Citeseer, Pubmed citation network, and protein-protein interaction network. On node classification tasks, GraphNAS can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09981

PDF

http://arxiv.org/pdf/1904.09981
Read All
Local Deep-Feature Alignment for Unsupervised Dimension Reduction

2019-04-22

Jian Zhang, Jun Yu, Dacheng Tao

arXiv_CV

arXiv_CV Classification Deep_Learning
Abstract

This paper presents an unsupervised deep-learning framework named Local Deep-Feature Alignment (LDFA) for dimension reduction. We construct neighbourhood for each data sample and learn a local Stacked Contractive Auto-encoder (SCAE) from the neighbourhood to extract the local deep features. Next, we exploit an affine transformation to align the local deep features of each neighbourhood with the global features. Moreover, we derive an approach from LDFA to map explicitly a new data sample into the learned low-dimensional subspace. The advantage of the LDFA method is that it learns both local and global characteristics of the data sample set: the local SCAEs capture local characteristics contained in the data set, while the global alignment procedures encode the interdependencies between neighbourhoods into the final low-dimensional feature representations. Experimental results on data visualization, clustering and classification show that the LDFA method is competitive with several well-known dimension reduction techniques, and exploiting locality in deep learning is a research topic worth further exploring.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09747

PDF

http://arxiv.org/pdf/1904.09747
Read All
Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference

2019-04-22

Nikita Kitaev, Dan Klein

arXiv_CL

arXiv_CL Inference
Abstract

We present a constituency parsing algorithm that maps from word-aligned contextualized feature vectors to parse trees. Our algorithm proceeds strictly left-to-right, processing one word at a time by assigning it a label from a small vocabulary. We show that, with mild assumptions, our inference procedure requires constant computation time per word. Our method gets 95.4 F1 on the WSJ test set.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09745

PDF

http://arxiv.org/pdf/1904.09745
Read All
2D3D-MatchNet: Learning to Match Keypoints Across 2D Image and 3D Point Cloud

2019-04-22

Mengdan Feng, Sixing Hu, Marcelo Ang, Gim Hee Lee

arXiv_CV

arXiv_CV Pose_Estimation
Abstract

Large-scale point cloud generated from 3D sensors is more accurate than its image-based counterpart. However, it is seldom used in visual pose estimation due to the difficulty in obtaining 2D-3D image to point cloud correspondences. In this paper, we propose the 2D3D-MatchNet - an end-to-end deep network architecture to jointly learn the descriptors for 2D and 3D keypoint from image and point cloud, respectively. As a result, we are able to directly match and establish 2D-3D correspondences from the query image and 3D point cloud reference map for visual pose estimation. We create our Oxford 2D-3D Patches dataset from the Oxford Robotcar dataset with the ground truth camera poses and 2D-3D image to point cloud correspondences for training and testing the deep network. Experimental results verify the feasibility of our approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09742

PDF

http://arxiv.org/pdf/1904.09742
Read All
NLP Driven Ensemble Based Automatic Subtitle Generation and Semantic Video Summarization Technique

2019-04-22

VB Aswin, Mohammed Javed, Parag Parihar, K Aswanth, CR Druval, Anpam Dagar, CV Aravinda

arXiv_CV

arXiv_CV Summarization Speech_Recognition Recognition
Abstract

This paper proposes an automatic subtitle generation and semantic video summarization technique. The importance of automatic video summarization is vast in the present era of big data. Video summarization helps in efficient storage and also quick surfing of large collection of videos without losing the important ones. The summarization of the videos is done with the help of subtitles which is obtained using several text summarization algorithms. The proposed technique generates the subtitle for videos with/without subtitles using speech recognition and then applies NLP based Text summarization algorithms on the subtitles. The performance of subtitle generation and video summarization is boosted through Ensemble method with two approaches such as Intersection method and Weight based learning method Experimental results reported show the satisfactory performance of the proposed method

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09740

PDF

http://arxiv.org/pdf/1904.09740
Read All
Switchable Whitening for Deep Representation Learning

2019-04-22

Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, Ping Luo

arXiv_CV

arXiv_CV Segmentation Style_Transfer CNN Image_Classification Semantic_Segmentation Represenation_Learning Classification
Abstract

Normalization methods are essential components in convolutional neural networks (CNNs). They either standardize or whiten data using statistics estimated in predefined sets of pixels. Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods. SW learns to switch among these operations in an end-to-end manner. It has several advantages. First, SW adaptively selects appropriate whitening or standardization statistics for different tasks (see Fig.1), making it well suited for a wide range of tasks without manual design. Second, by integrating benefits of different normalizers, SW shows consistent improvements over its counterparts in various challenging benchmarks. Third, SW serves as a useful tool for understanding the characteristics of whitening and standardization techniques. We show that SW outperforms other alternatives on image classification (CIFAR-10/100, ImageNet), semantic segmentation (ADE20K, Cityscapes), domain adaptation (GTA5, Cityscapes), and image style transfer (COCO). For example, without bells and whistles, we achieve state-of-the-art performance with 45.33% mIoU on the ADE20K dataset. Code and models will be released.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09739

PDF

http://arxiv.org/pdf/1904.09739
Read All
Facial Expression Recognition Research Based on Deep Learning

2019-04-22

Yongpei Zhu, Hongwei Fan, Kehong Yuan

arXiv_CV

arXiv_CV Object_Detection CNN Classification Deep_Learning Detection Relation Recognition
Abstract

With the development of deep learning, the structure of convolution neural network is becoming more and more complex and the performance of object recognition is getting better. However, the classification mechanism of convolution neural networks is still an unsolved core problem. The main problem is that convolution neural networks have too many parameters, which makes it difficult to analyze them. In this paper, we design and train a convolution neural network based on the expression recognition, and explore the classification mechanism of the network. By using the Deconvolution visualization method, the extremum point of the convolution neural network is projected back to the pixel space of the original image, and we qualitatively verify that the trained expression recognition convolution neural network forms a detector for the specific facial action unit. At the same time, we design the distance function to measure the distance between the presence of facial feature unit and the maximal value of the response on the feature map of convolution neural network. The greater the distance, the more sensitive the feature map is to the facial feature unit. By comparing the maximum distance of all facial feature elements in the feature graph, the mapping relationship between facial feature element and convolution neural network feature map is determined. Therefore, we have verified that the convolution neural network has formed a detector for the facial Action unit in the training process to realize the expression recognition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09737

PDF

http://arxiv.org/pdf/1904.09737
Read All
An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection

2019-04-22

Youngwan Lee, Joong-won Hwang, Sangrok Lee, Yuseok Bae, Jongyoul Park

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

As DenseNet conserves intermediate features with diverse receptive fields by aggregating them with dense connection, it shows good performance on the object detection task. Although feature reuse enables DenseNet to produce strong features with a small number of model parameters and FLOPs, the detector with DenseNet backbone shows rather slow speed and low energy efficiency. We find the linearly increasing input channel by dense connection leads to heavy memory access cost, which causes computation overhead and more energy consumption. To solve the inefficiency of DenseNet, we propose an energy and computation efficient architecture called VoVNet comprised of One-Shot Aggregation (OSA). The OSA not only adopts the strength of DenseNet that represents diversified features with multi receptive fields but also overcomes the inefficiency of dense connection by aggregating all features only once in the last feature maps. To validate the effectiveness of VoVNet as a backbone network, we design both lightweight and large-scale VoVNet and apply them to one-stage and two-stage object detectors. Our VoVNet based detectors outperform DenseNet based ones with 2x faster speed and the energy consumptions are reduced by 1.6x - 4.1x. In addition to DenseNet, VoVNet also outperforms widely used ResNet backbone with faster speed and better energy efficiency. In particular, the small object detection performance has been significantly improved over DenseNet and ResNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09730

PDF

http://arxiv.org/pdf/1904.09730
Read All
iQIYI-VID: A Large Dataset for Multi-modal Person Identification

2019-04-22

Yuanliu Liu, Bo Peng, Peipei Shi, He Yan, Yong Zhou, Bing Han, Yi Zheng, Chao Lin, Jianbin Jiang, Yin Fan, Tingwei Gao, Ganwen Wang, Jian Liu, Xiangju Lu, Danming Xie

arXiv_CV

arXiv_CV Re-identification Attention Face Person_Re-identification Recognition Face_Recognition
Abstract

Person identification in the wild is very challenging due to great variation in poses, face quality, clothes, makeup and so on. Traditional research, such as face recognition, person re-identification, and speaker recognition, often focuses on a single modal of information, which is inadequate to handle all the situations in practice. Multi-modal person identification is a more promising way that we can jointly utilize face, head, body, audio features, and so on. In this paper, we introduce iQIYI-VID, the largest video dataset for multi-modal person identification. It is composed of 600K video clips of 5,000 celebrities. These video clips are extracted from 400K hours of online videos of various types, ranging from movies, variety shows, TV series, to news broadcasting. All video clips pass through a careful human annotation process, and the error rate of labels is lower than 0.2\%. We evaluated the state-of-art models of face recognition, person re-identification, and speaker recognition on the iQIYI-VID dataset. Experimental results show that these models are still far from being perfect for the task of person identification in the wild. We proposed a Multi-modal Attention module to fuse multi-modal features that can improve person identification considerably. We have released the dataset online to promote multi-modal person identification research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.07548

PDF

http://arxiv.org/pdf/1811.07548
Read All
SocialIQA: Commonsense Reasoning about Social Interactions

2019-04-22

Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, Yejin Choi

arXiv_CL

arXiv_CL Knowledge QA Transfer_Learning Language_Model
Abstract

We introduce SocialIQa, the first large-scale benchmark for commonsense reasoning about social situations. This resource contains 45,000 multiple choice questions for probing emotional and social intelligence in a variety of everyday situations (e.g., Q: Skylar went to Jan's birthday party and gave her a gift. What does Skylar need to do before this?'' A:Go shopping’’). Through crowdsourcing, we collect commonsense questions along with correct and incorrect answers about social interactions, using a new framework that mitigates stylistic artifacts in incorrect answers by asking workers to provide the right answer to the wrong question. While humans can easily solve these questions (90%), our benchmark is more challenging for existing question-answering (QA) models, such as those based on pretrained language models (77%). Notably, we further establish SocialIQa as a resource for transfer learning of commonsense knowledge, achieving state-of-the-art performance on several commonsense reasoning tasks (Winograd Schemas, COPA).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09728

PDF

http://arxiv.org/pdf/1904.09728
Read All
A Right-of-Way Based Strategy to Implement Safe and Efficient Driving at Non-Signalized Intersections for Automated Vehicles

2019-04-22

Yadong Xing, Can Zhao, ZhiHeng Li, Yi Zhang, Li Li, Fei-Yue Wang, Xiao Wang, Yujing Wang, Yuelong Su, Dongpu Cao

arXiv_RO

arXiv_RO
Abstract

Non-signalized intersection is a typical and common scenario for connected and automated vehicles (CAVs). How to balance safety and efficiency remains difficult for researchers. To improve the original Responsibility Sensitive Safety (RSS) driving strategy on the non-signalized intersection, we propose a new strategy in this paper, based on right-of-way assignment (RWA). The performances of RSS strategy, cooperative driving strategy, and RWA based strategy are tested and compared. Testing results indicate that our strategy yields better traffic efficiency than RSS strategy, but not satisfying as the cooperative driving strategy due to the limited range of communication and the lack of long-term planning. However, our new strategy requires much fewer communication costs among vehicles.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01150

PDF

http://arxiv.org/pdf/1905.01150
Read All
FishNet: A Camera Localizer using Deep Recurrent Networks

2019-04-22

Hsin-I Chen, Sebastian Agethen, Chiamin Wu, Winston Hsu, Bing-Yu Chen

arXiv_CV

arXiv_CV Regularization Pose_Estimation RNN Deep_Learning
Abstract

This paper proposes a robust localization system that employs deep learning for better scene representation, and enhances the accuracy of 6-DOF camera pose estimation. Inspired by the fact that global scene structure can be revealed by wide field-of-view, we leverage the large overlap of a fisheye camera between adjacent frames, and the powerful high-level feature representations of deep learning. Our main contribution is the novel network architecture that extracts both temporal and spatial information using a Recurrent Neural Network. Specifically, we propose a novel pose regularization term combined with LSTM. This leads to smoother pose estimation, especially for large outdoor scenery. Promising experimental results on three benchmark datasets manifest the effectiveness of the proposed approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09722

PDF

http://arxiv.org/pdf/1904.09722
Read All
Understanding Roles and Entities: Datasets and Models for Natural Language Inference

2019-04-22

Arindam Mitra, Ishan Shrivastava, Chitta Baral

arXiv_CL

arXiv_CL Attention Inference
Abstract

We present two new datasets and a novel attention mechanism for Natural Language Inference (NLI). Existing neural NLI models, even though when trained on existing large datasets, do not capture the notion of entity and role well and often end up making mistakes such as “Peter signed a deal” can be inferred from “John signed a deal”. The two datasets have been developed to mitigate such issues and make the systems better at understanding the notion of “entities” and “roles”. After training the existing architectures on the new dataset we observe that the existing architectures does not perform well on one of the new benchmark. We then propose a modification to the “word-to-word” attention function which has been uniformly reused across several popular NLI architectures. The resulting architectures perform as well as their unmodified counterparts on the existing benchmarks and perform significantly well on the new benchmark for “roles” and “entities”.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09720

PDF

http://arxiv.org/pdf/1904.09720
Read All
FeatherNets: Convolutional Neural Networks as Light as Feather for Face Anti-spoofing

2019-04-22

Peng Zhang, Fuhao Zou, Zhiwen Wu, Nengli Dai, Skarpness Mark, Michael Fu, Juan Zhao, Kai Li

arXiv_AI

arXiv_AI Attention Face CNN Detection
Abstract

Face Anti-spoofing gains increased attentions recently in both academic and industrial fields. With the emergence of various CNN based solutions, the multi-modal(RGB, depth and IR) methods based CNN showed better performance than single modal classifiers. However, there is a need for improving the performance and reducing the complexity. Therefore, an extreme light network architecture(FeatherNet A/B) is proposed with a streaming module which fixes the weakness of Global Average Pooling and uses less parameters. Our single FeatherNet trained by depth image only, provides a higher baseline with 0.00168 ACER, 0.35M parameters and 83M FLOPS. Furthermore, a novel fusion procedure with ``ensemble + cascade’’ structure is presented to satisfy the performance preferred use cases. Meanwhile, the MMFD dataset is collected to provide more attacks and diversity to gain better generalization. We use the fusion method in the Face Anti-spoofing Attack Detection Challenge@CVPR2019 and got the result of 0.0013(ACER), 0.999(TPR@FPR=10e-2), 0.998(TPR@FPR=10e-3) and 0.9814(TPR@FPR=10e-4).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09290

PDF

http://arxiv.org/pdf/1904.09290
Read All
STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing

2019-04-22

Ming Liu, Yukang Ding, Min Xia, Xiao Liu, Errui Ding, Wangmeng Zuo, Shilei Wen

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Arbitrary attribute editing generally can be tackled by incorporating encoder-decoder and generative adversarial networks. However, the bottleneck layer in encoder-decoder usually gives rise to blurry and low quality editing result. And adding skip connections improves image quality at the cost of weakened attribute manipulation ability. Moreover, existing methods exploit target attribute vector to guide the flexible translation to desired target domain. In this work, we suggest to address these issues from selective transfer perspective. Considering that specific editing task is certainly only related to the changed attributes instead of all target attributes, our model selectively takes the difference between target and source attribute vectors as input. Furthermore, selective transfer units are incorporated with encoder-decoder to adaptively select and modify encoder feature for enhanced attribute editing. Experiments show that our method (i.e., STGAN) simultaneously improves attribute manipulation accuracy as well as perception quality, and performs favorably against state-of-the-arts in arbitrary facial attribute editing and season translation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09709

PDF

http://arxiv.org/pdf/1904.09709
Read All
Compositional generalization in a deep seq2seq model by separating syntax and semantics

2019-04-22

Jake Russin, Jason Jo, Randall C. O'Reilly

arXiv_AI

arXiv_AI Attention Deep_Learning
Abstract

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09708

PDF

http://arxiv.org/pdf/1904.09708
Read All
Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge

2019-04-22

Yu-Ping Ruan, Xiaodan Zhu, Zhen-Hua Ling, Zhan Shi, Quan Liu, Si Wei

arXiv_AI

arXiv_AI Knowledge
Abstract

Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in testing computers’ intelligence on common sense representation and reasoning. This paper presents the new state-of-theart on WSC, achieving an accuracy of 71.1%. We demonstrate that the leading performance benefits from jointly modelling sentence structures, utilizing knowledge learned from cutting-edge pretraining models, and performing fine-tuning. We conduct detailed analyses, showing that fine-tuning is critical for achieving the performance, but it helps more on the simpler associative problems. Modelling sentence dependency structures, however, consistently helps on the harder non-associative subset of WSC. Analysis also shows that larger fine-tuning datasets yield better performances, suggesting the potential benefit of future work on annotating more Winograd schema sentences.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09705

PDF

http://arxiv.org/pdf/1904.09705
Read All
Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers

2019-04-22

Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston

arXiv_AI

arXiv_AI Attention Inference
Abstract

The use of deep pretrained bidirectional transformers has led to remarkable progress in learning multi-sentence representations for downstream language understanding tasks (Devlin et al., 2018). For tasks that make pairwise comparisons, e.g. matching a given context with a corresponding response, two approaches have permeated the literature. A Cross-encoder performs full self-attention over the pair; a Bi-encoder performs self-attention for each sequence separately, and the final representation is a function of the pair. While Cross-encoders nearly always outperform Bi-encoders on various tasks, both in our work and others’ (Urbanek et al., 2019), they are orders of magnitude slower, which hampers their ability to perform real-time inference. In this work, we develop a new architecture, the Poly-encoder, that is designed to approach the performance of the Cross-encoder while maintaining reasonable computation time. Additionally, we explore two pretraining schemes with different datasets to determine how these affect the performance on our chosen dialogue tasks: ConvAI2 and DSTC7 Track 1. We show that our models achieve state-of-the-art results on both tasks; that the Poly-encoder is a suitable replacement for Bi-encoders and Cross-encoders; and that even better results can be obtained by pretraining on a large dialogue dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01969

PDF

http://arxiv.org/pdf/1905.01969
Read All
Robust Argument Unit Recognition and Classification

2019-04-22

Dietrich Trautmann, Johannes Daxenberger, Christian Stab, Hinrich Schütze, Iryna Gurevych

arXiv_CL

arXiv_CL Classification Recognition
Abstract

Argument mining is generally performed on the sentence-level – it is assumed that an entire sentence (not parts of it) corresponds to an argument. In this paper, we introduce the new task of Argument unit Recognition and Classification (ARC). In ARC, an argument is generally a part of a sentence – a more realistic assumption since several different arguments can occur in one sentence and longer sentences often contain a mix of argumentative and non-argumentative parts. Recognizing and classifying the spans that correspond to arguments makes ARC harder than previously defined argument mining tasks. We release ARC-8, a new benchmark for evaluating the ARC task. We show that token-level annotations for argument units can be gathered using scalable methods. ARC-8 contains 25\% more arguments than a dataset annotated on the sentence-level would. We cast ARC as a sequence labeling task, develop a number of methods for ARC sequence tagging and establish the state of the art for ARC-8. A focus of our work is robustness: both robustness against errors in sentence identification (which are frequent for noisy text) and robustness against divergence in training and test data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09688

PDF

http://arxiv.org/pdf/1904.09688
Read All
Learning Fast Matching Models from Weak Annotations

2019-04-22

Xue Li, Zhipeng Luo, Hao Sun, Jianjin Zhang, Weihao Han, Xianqi Chu, Liangjie Zhang, Qi Zhang

arXiv_CV

arXiv_CV
Abstract

This paper proposes a novel training scheme for fast matching models in Search Ads, which is motivated by the real challenges in model training. The first challenge stems from the pursuit of high throughput, which prohibits the deployment of inseparable architectures, and hence greatly limits the model accuracy. The second problem arises from the heavy dependency on human provided labels, which are expensive and time-consuming to collect, yet how to leverage unlabeled search log data is rarely studied. The proposed training framework targets on mitigating both issues, by treating the stronger but undeployable models as annotators, and learning a deployable model from both human provided relevance labels and weakly annotated search log data. Specifically, we first construct multiple auxiliary tasks from the enumerated relevance labels, and train the annotators by jointly learning from those related tasks. The annotation models are then used to assign scores to both labeled and unlabeled training samples. The deployable model is firstly learnt on the scored unlabeled data, and then fine-tuned on scored labeled data, by leveraging both labels and scores via minimizing the proposed label-aware weighted loss. According to our experiments, compared with the baseline that directly learns from relevance labels, training by the proposed framework outperforms it by a large margin, and improves data efficiency substantially by dispensing with 80% labeled samples. The proposed framework allows us to improve the fast matching model by learning from stronger annotators while keeping its architecture unchanged. Meanwhile, our training framework offers a principled manner to leverage search log data in the training phase, which could effectively alleviate our dependency on human provided labels.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.10710

PDF

https://arxiv.org/pdf/1901.10710
Read All
Consistent Generative Query Networks

2019-04-22

Ananya Kumar, S. M. Ali Eslami, Danilo J. Rezende, Marta Garnelo, Fabio Viola, Edward Lockhart, Murray Shanahan

arXiv_CV

arXiv_CV Prediction
Abstract

Stochastic video prediction models take in a sequence of image frames, and generate a sequence of consecutive future image frames. These models typically generate future frames in an autoregressive fashion, which is slow and requires the input and output frames to be consecutive. We introduce a model that overcomes these drawbacks by generating a latent representation from an arbitrary set of frames that can then be used to simultaneously and efficiently sample temporally consistent frames at arbitrary time-points. For example, our model can “jump” and directly sample frames at the end of the video, without sampling intermediate frames. Synthetic video evaluations confirm substantial gains in speed and functionality without loss in fidelity. We also apply our framework to a 3D scene reconstruction dataset. Here, our model is conditioned on camera location and can sample consistent sets of images for what an occluded region of a 3D scene might look like, even if there are multiple possibilities for what that region might contain. Reconstructions and videos are available at https://bit.ly/2O4Pc4R.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.02033

PDF

http://arxiv.org/pdf/1807.02033
Read All
Benchmarking Single Image Dehazing and Beyond

2019-04-22

Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, Zhangyang Wang

arXiv_AI

arXiv_AI
Abstract

We present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single Image DEhazing (RESIDE). RESIDE highlights diverse data sources and image contents, and is divided into five subsets, each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on RESIDE shed light on the comparisons and limitations of state-of-the-art dehazing algorithms, and suggest promising future directions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1712.04143

PDF

http://arxiv.org/pdf/1712.04143
Read All
Structural Self-adaptation for Decentralized Pervasive Intelligence

2019-04-21

Jovan Nikiloc, Evangelos Pournaras

arXiv_AI

arXiv_AI GAN Optimization
Abstract

Communication structure plays a key role in the learning capability of decentralized systems. Structural self-adaptation, by means of self-organization, changes the order as well as the input information of the agents’ collective decision-making. This paper studies the role of agents’ repositioning on the same communication structure, i.e. a tree, as the means to expand the learning capacity in complex combinatorial optimization problems, for instance, load-balancing power demand to prevent blackouts or efficient utilization of bike sharing stations. The optimality of structural self-adaptations is rigorously studied by constructing a novel large-scale benchmark that consists of 4000 agents with synthetic and real-world data performing 4 million structural self-adaptations during which almost 320 billion learning messages are exchanged. Based on this benchmark dataset, 124 deterministic structural criteria, applied as learning meta-features, are systematically evaluated as well as two online structural self-adaptation strategies designed to expand learning capacity. Experimental evaluation identifies metrics that capture agents with influential information and their optimal positioning. Significant gain in learning performance is observed for the two strategies especially under low-performing initialization. Strikingly, the strategy that triggers structural self-adaptation in a more exploratory fashion is the most cost-effective.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09681

PDF

http://arxiv.org/pdf/1904.09681
Read All
Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension

2019-04-21

Kai Sun, Dian Yu, Dong Yu, Claire Cardie

arXiv_CL

arXiv_CL Knowledge Face Language_Model
Abstract

With an ultimate goal of narrowing the gap between human and machine readers in text comprehension, we present the first collection of Challenging Chinese machine reading Comprehension datasets (C^3) collected from language and professional certification exams, which contains 13,924 documents and their associated 23,990 multiple-choice questions. Most of the questions in C^3 cannot be answered merely by surface-form matching against the given text. As a pilot study, we closely analyze the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed in these real world reading comprehension tasks. We further explore how to leverage linguistic knowledge including a lexicon of common idioms and proverbs and domain-specific knowledge such as textbooks to aid machine readers, through fine-tuning a pre-trained language model (Devlin et al.,2019). Our experimental results demonstrate that linguistic knowledge may help improve the performance of the baseline reader in both general and domain-specific tasks. C^3 will be available at this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09679

PDF

http://arxiv.org/pdf/1904.09679
Read All
UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages

2019-04-21

Ehsaneddin Asgari, Fabienne Braune, Christoph Ringlstetter, Mohammad R.K. Mofrad

arXiv_CL

arXiv_CL Sentiment Knowledge
Abstract

In this paper, we introduce UniSent a universal sentiment lexica for 1000 languages created using an English sentiment lexicon and a massively parallel corpus in the Bible domain. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of number of covered languages, including many low resource languages. To create UniSent, we propose Adapted Sentiment Pivot, a novel method that combines annotation projection, vocabulary expansion, and unsupervised domain adaptation. We evaluate the quality of UniSent for Macedonian, Czech, German, Spanish, and French and show that its quality is comparable to manually or semi-manually created sentiment resources. With the publication of this paper, we release UniSent lexica as well as Adapted Sentiment Pivot related codes. method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09678

PDF

http://arxiv.org/pdf/1904.09678
Read All
Estimating Forces of Robotic Pouring Using a LSTM RNN

2019-04-21

Kyle Mott

arXiv_RO

arXiv_RO RNN
Abstract

In machine learning, it is very important for a robot to be able to estimate dynamics from sequences of input data. This problem can be solved using a recurrent neural network. In this paper, we will discuss the preprocessing of 10 states of the dataset, then the use of a LSTM recurrent neural network to estimate one output state (dynamics) from the other 9 input states. We will discuss the architecture of the recurrent neural network, the data collection and preprocessing, the loss function, the results of the test data, and the discussion of changes that could improve the network. The results of this paper will be used for artificial intelligence research and identify the capabilities of a LSTM recurrent neural network architecture to estimate dynamics of a system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09980

PDF

http://arxiv.org/pdf/1904.09980
Read All
State Classification of Cooking Objects Using a VGG CNN

2019-04-21

Kyle Mott

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

In machine learning, it is very important for a robot to know the state of an object and recognize particular desired states. This is an image classification problem that can be solved using a convolutional neural network. In this paper, we will discuss the use of a VGG convolutional neural network to recognize those states of cooking objects. We will discuss the uses of activation functions, optimizers, data augmentation, layer additions, and other different versions of architectures. The results of this paper will be used to identify alternatives to the VGG convolutional neural network to improve accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12613

PDF

http://arxiv.org/pdf/1904.12613
Read All
BERTScore: Evaluating Text Generation with BERT

2019-04-21

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi

arXiv_CL

arXiv_CL Image_Caption Text_Generation Caption Embedding
Abstract

We propose BERTScore, an automatic evaluation metric for text generation. Analogous to common metrics, \method computes a similarity score for each token in the candidate sentence with each token in the reference. However, instead of looking for exact matches, we compute similarity using contextualized BERT embeddings. We evaluate on several machine translation and image captioning benchmarks, and show that BERTScore correlates better with human judgments than existing metrics, often significantly outperforming even task-specific supervised metrics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09675

PDF

http://arxiv.org/pdf/1904.09675
Read All
Deep Learning for Physical-Layer 5G Wireless Techniques: Opportunities, Challenges and Solutions

2019-04-21

Hongji Huang, Song Guo, Guan Gui, Zhen Yang, Jianhua Zhang, Hikmet Sari, Fumiyuki Adachi

arXiv_AI

arXiv_AI Review Deep_Learning
Abstract

The new demands for high-reliability and ultra-high capacity wireless communication have led to extensive research into 5G communications. However, the current communication systems, which were designed on the basis of conventional communication theories, signficantly restrict further performance improvements and lead to severe limitations. Recently, the emerging deep learning techniques have been recognized as a promising tool for handling the complicated communication systems, and their potential for optimizing wireless communications has been demonstrated. In this article, we first review the development of deep learning solutions for 5G communication, and then propose efficient schemes for deep learning-based 5G scenarios. Specifically, the key ideas for several important deep learningbased communication methods are presented along with the research opportunities and challenges. In particular, novel communication frameworks of non-orthogonal multiple access (NOMA), massive multiple-input multiple-output (MIMO), and millimeter wave (mmWave) are investigated, and their superior performances are demonstrated. We vision that the appealing deep learning-based wireless physical layer frameworks will bring a new direction in communication theories and that this work will move us forward along this road.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09673

PDF

http://arxiv.org/pdf/1904.09673
Read All
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels

2019-04-21

Rami Al-Rfou, Dustin Zelle, Bryan Perozzi

arXiv_AI

arXiv_AI Knowledge Attention Embedding Classification Prediction
Abstract

Can neural networks learn to compare graphs without feature engineering? In this paper, we show that it is possible to learn representations for graph similarity with neither domain knowledge nor supervision (i.e.\ feature engineering or labeled graphs). We propose Deep Divergence Graph Kernels, an unsupervised method for learning representations over graphs that encodes a relaxed notion of graph isomorphism. Our method consists of three parts. First, we learn an encoder for each anchor graph to capture its structure. Second, for each pair of graphs, we train a cross-graph attention network which uses the node representations of an anchor graph to reconstruct another graph. This approach, which we call isomorphism attention, captures how well the representations of one graph can encode another. We use the attention-augmented encoder’s predictions to define a divergence score for each pair of graphs. Finally, we construct an embedding space for all graphs using these pair-wise divergence scores. Unlike previous work, much of which relies on 1) supervision, 2) domain specific knowledge (e.g. a reliance on Weisfeiler-Lehman kernels), and 3) known node alignment, our unsupervised method jointly learns node representations, graph representations, and an attention-based alignment between graphs. Our experimental results show that Deep Divergence Graph Kernels can learn an unsupervised alignment between graphs, and that the learned representations achieve competitive results when used as features on a number of challenging graph classification tasks. Furthermore, we illustrate how the learned attention allows insight into the the alignment of sub-structures across graphs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09671

PDF

http://arxiv.org/pdf/1904.09671
Read All
Deep Hough Voting for 3D Object Detection in Point Clouds

2019-04-21

Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas

arXiv_CV

arXiv_CV Object_Detection Sparse Face Detection
Abstract

Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird’s eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data – samples from 2D manifolds in 3D space – we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09664

PDF

http://arxiv.org/pdf/1904.09664
Read All
An image structure model for exact edge detection

2019-04-21

Alessandro Dal Palu'

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

The paper presents a new model for single channel images low-level interpretation. The image is decomposed into a graph which captures a complete set of structural features. The description allows to accurately identify every edge location and its correct connectivity. The key features of the method are: vector description of the edges, subpixel precision, and parallelism of the underlying algorithm. The methodology outperforms classical and state of the art edge detectors at both conceptual and experimental levels. It also enables graph based algorithms for higher-level feature extraction. Any image processing pipeline can benefit from such results: e.g., controlled denoising, edge preserving filtering, upsampling, compression, vector and graph based pattern matching, neural network training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09659

PDF

http://arxiv.org/pdf/1904.09659
Read All
Probabilistic Face Embeddings

2019-04-21

Yichun Shi, Anil K. Jain, Nathan D. Kalka,

arXiv_CV

arXiv_CV Face Embedding Recognition Face_Recognition
Abstract

Embedding methods have achieved success in face recognition by comparing facial features in a latent semantic space. However, in a fully unconstrained face setting, the features learned by the embedding model could be ambiguous or may not even be present in the input face, leading to noisy representations. We propose Probabilistic Face Embeddings (PFEs), which represent each face image as a Gaussian distribution in the latent space. The mean of the distribution estimates the most likely feature values while the variance shows the uncertainty in the feature values. Probabilistic solutions can then be naturally derived for matching and fusing PFEs using the uncertainty information. Empirical evaluation on different baseline models, training datasets and benchmarks show that the proposed method can improve the face recognition performance of deterministic embeddings by converting them into PFEs. The uncertainties estimated by PFEs also serve as good indicators of the potential matching accuracy, which are important for a risk-controlled recognition system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09658

PDF

http://arxiv.org/pdf/1904.09658
Read All
Integrating Association Rules with Decision Trees in Object-Relational Databases

2019-04-21

Maruthi Rohit Ayyagari

arXiv_AI

arXiv_AI Classification Relation
Abstract

Research has provided evidence that associative classification produces more accurate results compared to other classification models. The Classification Based on Association (CBA) is one of the famous Associative Classification algorithms that generates accurate classifiers. However, current association classification algorithms reside external to databases, which reduces the flexibility of enterprise analytics systems. This paper implements the CBA in Oracle database using two variant models: hardcoding the CBA in Oracle Data Mining (ODM) package and Integrating Oracle Apriori model with the Oracle Decision tree model. We compared the proposed model performance with Naive Bayes, Support Vector Machine, Random Forests, and Decision Tree over 18 datasets from UCI. Results showed that our models outperformed the original CBA model with 1 percent and is competitive to chosen classification models over benchmark datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09654

PDF

http://arxiv.org/pdf/1904.09654
Read All
Beyond Binomial and Negative Binomial: Adaptation in Bernoulli Parameter Estimation

2019-04-21

Safa C. Medin, John Murray-Bruce, David Castañón, Vivek K Goyal

arXiv_CV

arXiv_CV
Abstract

Estimating the parameter of a Bernoulli process arises in many applications, including photon-efficient active imaging where each illumination period is regarded as a single Bernoulli trial. Motivated by acquisition efficiency when multiple Bernoulli processes are of interest, we formulate the allocation of trials under a constraint on the mean as an optimal resource allocation problem. An oracle-aided trial allocation demonstrates that there can be a significant advantage from varying the allocation for different processes and inspires a simple trial allocation gain quantity. Motivated by realizing this gain without an oracle, we present a trellis-based framework for representing and optimizing stopping rules. Considering the convenient case of Beta priors, three implementable stopping rules with similar performances are explored, and the simplest of these is shown to asymptotically achieve the oracle-aided trial allocation. These approaches are further extended to estimating functions of a Bernoulli parameter. In simulations inspired by realistic active imaging scenarios, we demonstrate significant mean-squared error improvements: up to 4.36 dB for the estimation of p and up to 1.80 dB for the estimation of log p.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08801

PDF

http://arxiv.org/pdf/1809.08801
Read All
Dynamic Past and Future for Neural Machine Translation

2019-04-21

Zaixiang Zheng, Shujian Huang, Zhaopeng Tu, Xin-Yu Dai, Jiajun Chen

arXiv_CL

arXiv_CL NMT
Abstract

Previous studies have shown that neural machine translation (NMT) models can benefit from modeling translated (Past) and un-translated (Future) source contents as recurrent states (Zheng et al., 2018). However, the recurrent process is less interpretable. In this paper, we propose to model Past and Future by Capsule Network (Hinton et al.,2011), which provides an explicit separation of source words into groups of Past and Future by the process of parts-to-wholes assignment. The assignment is learned with a novel variant of routing-by-agreement mechanism (Sabour et al., 2017), namely Guided Dynamic Routing, in which what to translate at current decoding step guides the routing process to assign each source word to its associated group represented by a capsule, and to refine the representation of the capsule dynamically and iteratively. Experiments on translation tasks of three language pairs show that our model achieves substantial improvements over both RNMT and Transformer. Extensive analysis further verifies that our method does recognize translated and untranslated content as expected, and produces better and more adequate translations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09646

PDF

http://arxiv.org/pdf/1904.09646
Read All
Mesh Learning Using Persistent Homology on the Laplacian Eigenfunctions

2019-04-21

Yunhao Zhang, Haowen Liu, Paul Rosen, Mustafa Hajij

arXiv_CV

arXiv_CV
Abstract

We use persistent homology along with the eigenfunctions of the Laplacian to study similarity amongst triangulated 2-manifolds. Our method relies on studying the lower-star filtration induced by the eigenfunctions of the Laplacian. This gives us a shape descriptor that inherits the rich information encoded in the eigenfunctions of the Laplacian. Moreover, the similarity between these descriptors can be easily computed using tools that are readily available in Topological Data Analysis. We provide experiments to illustrate the effectiveness of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09639

PDF

http://arxiv.org/pdf/1904.09639
Read All
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System

2019-04-21

Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang

arXiv_CL

arXiv_CL Knowledge Inference
Abstract

Deep pre-training and fine-tuning models (like BERT, OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to apply these complex models to real business scenarios becomes a challenging but practical problem. Previous works often leverage model compression approaches to resolve this problem. However, these methods usually induce information loss during the model compression procedure, leading to incomparable results between compressed model and the original model. To tackle this challenge, we propose a Multi-task Knowledge Distillation Model (MKDM for short) for web-scale Question Answering system, by distilling knowledge from multiple teacher models to a light-weight student model. In this way, more generalized knowledge can be transferred. The experiment results show that our method can significantly outperform the baseline methods and even achieve comparable results with the original teacher models, along with significant speedup of model inference.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09636

PDF

http://arxiv.org/pdf/1904.09636
Read All
Beyond Explainability: Leveraging Interpretability for Improved Adversarial Learning

2019-04-21

Devinder Kumar, Ibrahim Ben-Daya, Kanav Vats, Jeffery Feng, Graham Taylor and, Alexander Wong

arXiv_CV

arXiv_CV Adversarial
Abstract

In this study, we propose the leveraging of interpretability for tasks beyond purely the purpose of explainability. In particular, this study puts forward a novel strategy for leveraging gradient-based interpretability in the realm of adversarial examples, where we use insights gained to aid adversarial learning. More specifically, we introduce the concept of spatially constrained one-pixel adversarial perturbations, where we guide the learning of such adversarial perturbations towards more susceptible areas identified via gradient-based interpretability. Experimental results using different benchmark datasets show that such a spatially constrained one-pixel adversarial perturbation strategy can noticeably improve the speed of convergence as well as produce successful attacks that were also visually difficult to perceive, thus illustrating an effective use of interpretability methods for tasks outside of the purpose of purely explainability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09633

PDF

http://arxiv.org/pdf/1904.09633
Read All
Deep Metric Learning Beyond Binary Supervision

2019-04-21

Sungyeon Kim, Minkyo Seo, Ivan Laptev, Minsu Cho, Suha Kwak

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval Caption Relation
Abstract

Metric Learning for visual similarity has mostly adopted binary supervision indicating whether a pair of images are of the same class or not. Such a binary indicator covers only a limited subset of image relations, and is not sufficient to represent semantic similarity between images described by continuous and/or structured labels such as object poses, image captions, and scene graphs. Motivated by this, we present a novel method for deep metric learning using continuous labels. First, we propose a new triplet loss that allows distance ratios in the label space to be preserved in the learned metric space. The proposed loss thus enables our model to learn the degree of similarity rather than just the order. Furthermore, we design a triplet mining strategy adapted to metric learning with continuous labels. We address three different image retrieval tasks with continuous labels in terms of human poses, room layouts and image captions, and demonstrate the superior performance of our approach compared to previous methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09626

PDF

http://arxiv.org/pdf/1904.09626
Read All
Safety experiments for small robots investigating the potential of soft materials in mitigating the harm to the head due to impacts

2019-04-21

Ahmad Yaser Alhaddad, John-John Cabibihan, Ahmad Hayek, Andrea Bonarini

arXiv_RO

arXiv_RO Optimization
Abstract

There is a growing interest in social robots to be considered in the therapy of children with autism due to their effectiveness in improving the outcomes. However, children on the spectrum exhibit challenging behaviors that need to be considered when designing robots for them. A child could involuntarily throw a small social robot during meltdown and that could hit another person’s head and cause harm (e.g. concussion). In this paper, the application of soft materials is investigated for its potential in attenuating head’s linear acceleration upon impact. The thickness and storage modulus of three different soft materials were considered as the control factors while the noise factor was the impact velocity. The design of experiments was based on Taguchi method. A total of 27 experiments were conducted on a developed dummy head setup that reports the linear acceleration of the head. ANOVA tests were performed to analyze the data. The findings showed that the control factors are not statistically significant in attenuating the response. The optimal values of the control factors were identified using the signal-to-noise (S/N) ratio optimization technique. Confirmation runs at the optimal parameters (i.e. thickness of 3 mm and 5 mm) showed a better response as compared to other conditions. Designers of social robots should consider the application of soft materials to their designs as it help in reducing the potential harm to the head.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09621

PDF

http://arxiv.org/pdf/1904.09621
Read All
Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes

2019-04-21

Qian Yang, Aaron Steinfeld, John Zimmerman

arXiv_AI

arXiv_AI
Abstract

Clinical decision support tools (DST) promise improved healthcare outcomes by offering data-driven insights. While effective in lab settings, almost all DSTs have failed in practice. Empirical research diagnosed poor contextual fit as the cause. This paper describes the design and field evaluation of a radically new form of DST. It automatically generates slides for clinicians’ decision meetings with subtly embedded machine prognostics. This design took inspiration from the notion of “Unremarkable Computing”, that by augmenting the users’ routines technology/AI can have significant importance for the users yet remain unobtrusive. Our field evaluation suggests clinicians are more likely to encounter and embrace such a DST. Drawing on their responses, we discuss the importance and intricacies of finding the right level of unremarkableness in DST design, and share lessons learned in prototyping critical AI systems as a situated experience.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09612

PDF

http://arxiv.org/pdf/1904.09612
Read All

60/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL