Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Multi-User Hybrid Precoding for Dynamic Subarrays in MmWave Massive MIMO Systems

2019-02-28

Jing Jiang, Yue Yuan, Li Zhen

arXiv_CV

arXiv_CV
Abstract

Dynamic subarray achieves a compromise between sum rate and hardware complexity for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems in which antenna elements are dynamically partitioned to radio frequency (RF) chain according to the channel state information.} However, multi-user hybrid precoding for the dynamic subarray is intractable to solve as the antenna partitioning would result in the user unfairness and multi-user interference (MUI). In this paper, a novel multi-user hybrid precoding framework is proposed for the dynamic subarray architecture. Different from the existing schemes, the base station (BS) firstly selects the multi-user set based on the analog effective channel. And then the antenna partitioning algorithm allocates each antenna element to RF chain according to the maximal increment of the signal to the interference noise ratio (SINR). Finally, the hybrid precoding is optimized for the dynamic subarray architecture. By calculating SINRs on the analog effective channels of the selected users, the antenna partitioning can greatly reduce computation complexity and the size of the search space. Moreover, it also guarantees the user fairness since each antenna element is allocated to acquire the maximal SINR increment of all selected users. \textcolor{blue}{Extensive simulation results demonstrate that both the energy efficiency and sum rate of the proposed solution obviously outperforms that of the fixed subarrays, and obtains higher energy efficiency with slight loss of sum rate compared with the fully-connected architecture.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.11023

PDF

https://arxiv.org/pdf/1902.11023
Read All
Static Stability of Robotic Fabric Strip Folding

2019-02-28

Vladimír Petrík, Vladimír Smutný, Ville Kyrki

arXiv_RO

arXiv_RO Prediction
Abstract

Planning accurate manipulation for deformable objects requires prediction of their state. The prediction is often complicated by a loss of stability that may result in collapse of the deformable object. In this work, stability of a fabric strip folding performed by a robot is studied. We show that there is a static instability in the folding process. This instability is detected in a physics-based simulation and the position of the instability is verified experimentally by real robotic manipulation. Three state-of-the-art methods for folding are assessed in the presence of static instability. It is shown that one of the existing folding paths is suitable for folding of materials with internal friction such as fabrics. Another folding path that utilizes dynamic motion exists for ideal elastic materials without internal friction. Our results show that instability needs to be considered in planning to obtain accurate manipulation of deformable objects.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11021

PDF

http://arxiv.org/pdf/1902.11021
Read All
DPOD: Dense 6D Pose Object Detector in RGB images

2019-02-28

Sergey Zakharov, Ivan Shugurov, Slobodan Ilic

arXiv_CV

arXiv_CV Object_Detection Pose_Estimation Detection
Abstract

In this work we propose a new method for simultaneous object detection and 6DoF pose estimation. Unlike most recent techniques for CNN-based object detection and pose estimation, we do not base our approach on the common 2D counterparts, i.e. SSD and YOLO, but propose a new scheme. Instead of regressing 2D or 3D bounding boxes, we output full-sized 2D images containing multiclass object masks and dense 2D-3D correspondences. Having them at hand, a 6D pose is computed for each detected object using the PnP algorithm supplemented with RANSAC. This strategy allows for substantially better pose estimates due to a much higher number of relevant pose correspondences. Furthermore, the method is real-time capable, conceptually simple and not bound to any particular detection paradigms, such as R-CNN, SSD or YOLO. We test our method for single- and multiple-object pose estimation and compare the performance with the former state-of-the-art approaches. Moreover, we demonstrate how to use our pipeline when only synthetic renderings are available. In both cases, we outperform the former state-of-the-art by a large margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11020

PDF

http://arxiv.org/pdf/1902.11020
Read All
Mobile Formation Coordination and Tracking Control for Multiple Non-holonomic Vehicles

2019-02-28

Xiuhui Peng, Zhiyong Sun, Kexin Guo, Zhiyong Geng

arXiv_RO

arXiv_RO Tracking Relation
Abstract

This paper addresses forward motion control for trajectory tracking and mobile formation coordination for a group of non-holonomic vehicles on SE(2). Firstly, by constructing an intermediate attitude variable which involves vehicles’ position information and desired attitude, the translational and rotational control inputs are designed in two stages to solve the trajectory tracking problem. Secondly, the coordination relationships of relative positions and headings are explored thoroughly for a group of non-holonomic vehicles to maintain a mobile formation with rigid body motion constraints. We prove that, except for the cases of parallel formation and translational straight line formation, a mobile formation with strict rigid-body motion can be achieved if and only if the ratios of linear speed to angular speed for each individual vehicle are constants. Motion properties for mobile formation with weak rigid-body motion are also demonstrated. Thereafter, based on the proposed trajectory tracking approach, a distributed mobile formation control law is designed under a directed tree graph. The performance of the proposed controllers is validated by both numerical simulations and experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11015

PDF

http://arxiv.org/pdf/1902.11015
Read All
MassFace: an efficient implementation using triplet loss for face recognition

2019-02-28

Yule Li

arXiv_CV

arXiv_CV Face Recognition Face_Recognition
Abstract

In this paper we present an efficient implementation using triplet loss for face recognition. We conduct the practical experiment to analyze the factors that influence the training of triplet loss. All models are trained on CASIA-Webface dataset and tested on LFW. We analyze the experiment results and give some insights to help others balance the factors when they apply triplet loss to their own problem especially for face recognition task. Code has been released in https://github.com/yule-li/MassFace.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11007

PDF

http://arxiv.org/pdf/1902.11007
Read All
Global Vectors for Node Representations

2019-02-28

Robin Brochier, Adrien Guille, Julien Velcin

arXiv_CL

arXiv_CL Embedding Quantitative
Abstract

Most network embedding algorithms consist in measuring co-occurrences of nodes via random walks then learning the embeddings using Skip-Gram with Negative Sampling. While it has proven to be a relevant choice, there are alternatives, such as GloVe, which has not been investigated yet for network embedding. Even though SGNS better handles non co-occurrence than GloVe, it has a worse time-complexity. In this paper, we propose a matrix factorization approach for network embedding, inspired by GloVe, that better handles non co-occurrence with a competitive time-complexity. We also show how to extend this model to deal with networks where nodes are documents, by simultaneously learning word, node and document representations. Quantitative evaluations show that our model achieves state-of-the-art performance, while not being so sensitive to the choice of hyper-parameters. Qualitatively speaking, we show how our model helps exploring a network of documents by generating complementary network-oriented and content-oriented keywords.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11004

PDF

http://arxiv.org/pdf/1902.11004
Read All
3D High-Resolution Cardiac Segmentation Reconstruction from 2D Views using Conditional Variational Autoencoders

2019-02-28

Carlo Biffi, Juan J. Cerrolaza, Giacomo Tarroni, Antonio de Marvao, Stuart A. Cook, Declan P. O'Regan, Daniel Rueckert

arXiv_CV

arXiv_CV Segmentation Quantitative
Abstract

Accurate segmentation of heart structures imaged by cardiac MR is key for the quantitative analysis of pathology. High-resolution 3D MR sequences enable whole-heart structural imaging but are time-consuming, expensive to acquire and they often require long breath holds that are not suitable for patients. Consequently, multiplanar breath-hold 2D cine sequences are standard practice but are disadvantaged by lack of whole-heart coverage and low through-plane resolution. To address this, we propose a conditional variational autoencoder architecture able to learn a generative model of 3D high-resolution left ventricular (LV) segmentations which is conditioned on three 2D LV segmentations of one short-axis and two long-axis images. By only employing these three 2D segmentations, our model can efficiently reconstruct the 3D high-resolution LV segmentation of a subject. When evaluated on 400 unseen healthy volunteers, our model yielded an average Dice score of $87.92 \pm 0.15$ and outperformed competing architectures.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11000

PDF

http://arxiv.org/pdf/1902.11000
Read All
Salient object detection on hyperspectral images using features learned from unsupervised segmentation task

2019-02-28

Nevrez Imamoglu, Guanqun Ding, Yuming Fang, Asako Kanezaki, Toru Kouyama, Ryosuke Nakamura

arXiv_CV

arXiv_CV Salient Object_Detection Segmentation CNN Detection
Abstract

Various saliency detection algorithms from color images have been proposed to mimic eye fixation or attentive object detection response of human observers for the same scenes. However, developments on hyperspectral imaging systems enable us to obtain redundant spectral information of the observed scenes from the reflected light source from objects. A few studies using low-level features on hyperspectral images demonstrated that salient object detection can be achieved. In this work, we proposed a salient object detection model on hyperspectral images by applying manifold ranking (MR) on self-supervised Convolutional Neural Network (CNN) features (high-level features) from unsupervised image segmentation task. Self-supervision of CNN continues until clustering loss or saliency maps converges to a defined error between each iteration. Finally, saliency estimations is done as the saliency map at last iteration when the self-supervision procedure terminates with convergence. Experimental evaluations demonstrated that proposed saliency detection algorithm on hyperspectral images is outperforming state-of-the-arts hyperspectral saliency models including the original MR based saliency model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10993

PDF

http://arxiv.org/pdf/1902.10993
Read All
End-to-End Efficient Representation Learning via Cascading Combinatorial Optimization

2019-02-28

Yeonwoo Jeong, Yoonsuing Kim, Hyun Oh Song

arXiv_CV

arXiv_CV Sparse Embedding Represenation_Learning Optimization Inference Gradient_Descent
Abstract

We develop hierarchically quantized efficient embedding representations for similarity-based search and show that this representation provides not only the state of the art performance on the search accuracy but also provides several orders of speed up during inference. The idea is to hierarchically quantize the representation so that the quantization granularity is greatly increased while maintaining the accuracy and keeping the computational complexity low. We also show that the problem of finding the optimal sparse compound hash code respecting the hierarchical structure can be optimized in polynomial time via minimum cost flow in an equivalent flow network. This allows us to train the method end-to-end in a mini-batch stochastic gradient descent setting. Our experiments on Cifar100 and ImageNet datasets show the state of the art search accuracy while providing several orders of magnitude search speedup respectively over exhaustive linear search over the dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10990

PDF

http://arxiv.org/pdf/1902.10990
Read All
Better, Faster, Stronger Sequence Tagging Constituent Parsers

2019-02-28

David Vilares, Mostafa Abdou, Anders Søgaard

arXiv_CL

arXiv_CL
Abstract

Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the SPMRL datasets, we observe even greater improvements across the board, including a new state of the art on Basque, Hebrew, Polish and Swedish.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10985

PDF

http://arxiv.org/pdf/1902.10985
Read All
Weakly Supervised Bilinear Attention Network for Fine-Grained Visual Classification

2019-02-28

Tao Hu, Jizheng Xu, Cong Huang, Honggang Qi, Qingming Huang, Yan Lu

arXiv_CV

arXiv_CV Regularization Attention Weakly_Supervised Classification Prediction
Abstract

For fine-grained visual classification, objects usually share similar geometric structure but present variant local appearance and different pose. Therefore, localizing and extracting discriminative local features play a crucial role in accurate category prediction. Existing works either pay attention to limited object parts or train isolated networks for locating and classification. In this paper, we propose Weakly Supervised Bilinear Attention Network (WS-BAN) to solve these issues. It jointly generates a set of attention maps (region-of-interest maps) to indicate the locations of object’s parts and extracts sequential part features by Bilinear Attention Pooling (BAP). Besides, we propose attention regularization and attention dropout to weakly supervise the generating process of attention maps. WS-BAN can be trained end-to-end and achieves the state-of-the-art performance on multiple fine-grained classification datasets, including CUB-200-2011, Stanford Car and FGVC-Aircraft, which demonstrated its effectiveness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.02152

PDF

http://arxiv.org/pdf/1808.02152
Read All
Octree guided CNN with Spherical Kernels for 3D Point Clouds

2019-02-28

Huan Lei, Naveed Akhtar, Ajmal Mian

arXiv_CV

arXiv_CV Sparse Segmentation CNN Classification
Abstract

We propose an octree guided neural network architecture and spherical convolutional kernel for machine learning from arbitrary 3D point clouds. The network architecture capitalizes on the sparse nature of irregular point clouds, and hierarchically coarsens the data representation with space partitioning. At the same time, the proposed spherical kernels systematically quantize point neighborhoods to identify local geometric structures in the data, while maintaining the properties of translation-invariance and asymmetry. We specify spherical kernels with the help of network neurons that in turn are associated with spatial locations. We exploit this association to avert dynamic kernel generation during network training that enables efficient learning with high resolution point clouds. The effectiveness of the proposed technique is established on the benchmark tasks of 3D object classification and segmentation, achieving new state-of-the-art on ShapeNet and RueMonge2014 datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00343

PDF

http://arxiv.org/pdf/1903.00343
Read All
Extended Gaze Following: Detecting Objects in Videos Beyond the Camera Field of View

2019-02-28

Benoit Massé, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud

arXiv_CV

arXiv_CV CNN
Abstract

In this paper we address the problems of detecting objects of interest in a video and of estimating their locations, solely from the gaze directions of people present in the video. Objects can be indistinctly located inside or outside the camera field of view. We refer to this problem as extended gaze following. The contributions of the paper are the followings. First, we propose a novel spatial representation of the gaze directions adopting a top-view perspective. Second, we develop several convolutional encoder/decoder networks to predict object locations and compare them with heuristics and with classical learning-based approaches. Third, in order to train the proposed models, we generate a very large number of synthetic scenarios employing a probabilistic formulation. Finally, our methodology is empirically validated using a publicly available dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10953

PDF

http://arxiv.org/pdf/1902.10953
Read All
Context-aware Dynamic Block

2019-02-28

Yingcheng Su, Shunfeng Zhou, Yichao Wu, Xuebo Liu, Tian Su, Ding Liang, Junjie Yan

arXiv_CV

arXiv_CV Inference Classification
Abstract

Although deeper and larger neural networks have achieved better performance nowadays, the complex network structure and increasing computational cost cannot meet the demands of many resource-constrained applications. An effective way to address this problem is to make use of dynamic inference mechanism. Existing methods usually choose to execute or skip an entire specific layer through a switch structure, which can only alter the depth of the network. In this paper, we propose a dynamic inference method called Context-aware Dynamic Block (CDB), which provides more path selection choices in terms of network width and depth during inference. The execution of CDB is determined by a context-aware group controller, which can take into account both historical and object category information. The proposed method can be easily incorporated into most modern network architectures. Experimental results on ImageNet and CIFAR-100 demonstrate the superiority of our method on both efficiency and overall classification quality. To be specific, we integrate CDB block into ResNet-101 and find that our method significantly outperforms their counterparts and saves 45.1% FLOPs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10949

PDF

http://arxiv.org/pdf/1902.10949
Read All
Global Stock Market Prediction Based on Stock Chart Images Using Deep Q-Network

2019-02-28

Jinho Lee, Raehyun Kim, Yookyung Koh, Jaewoo Kang

arXiv_AI

arXiv_AI CNN Prediction
Abstract

We applied Deep Q-Network with a Convolutional Neural Network function approximator, which takes stock chart images as input, for making global stock market predictions. Our model not only yields profit in the stock market of the country where it was trained but generally yields profit in global stock markets. We trained our model only in the US market and tested it in 31 different countries over 12 years. The portfolios constructed based on our model’s output generally yield about 0.1 to 1.0 percent return per transaction prior to transaction costs in 31 countries. The results show that there are some patterns on stock chart image, that tend to predict the same future stock price movements across global stock markets. Moreover, the results show that future stock prices can be predicted even if the training and testing procedures are done in different countries. Training procedure could be done in relatively large and liquid markets (e.g., USA) and tested in small markets. This result demonstrates that artificial intelligence based stock price forecasting models can be used in relatively small markets (emerging countries) even though they do not have a sufficient amount of data for training.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.10948

PDF

https://arxiv.org/pdf/1902.10948
Read All
Look, Investigate, and Classify: A Deep Hybrid Attention Method for Breast Cancer Classification

2019-02-28

Bolei Xu, Jingxin Liu, Xianxu Hou, Bozhi Liu, Jon Garibaldi, Ian O. Ellis, Andy Green, Linlin Shen, Guoping Qiu

arXiv_CV

arXiv_CV Attention Classification Deep_Learning
Abstract

One issue with computer based histopathology image analysis is that the size of the raw image is usually very large. Taking the raw image as input to the deep learning model would be computationally expensive while resizing the raw image to low resolution would incur information loss. In this paper, we present a novel deep hybrid attention approach to breast cancer classification. It first adaptively selects a sequence of coarse regions from the raw image by a hard visual attention algorithm, and then for each such region it is able to investigate the abnormal parts based on a soft-attention mechanism. A recurrent network is then built to make decisions to classify the image region and also to predict the location of the image region to be investigated at the next time step. As the region selection process is non-differentiable, we optimize the whole network through a reinforcement approach to learn an optimal policy to classify the regions. Based on this novel Look, Investigate and Classify approach, we only need to process a fraction of the pixels in the raw image resulting in significant saving in computational resources without sacrificing performances. Our approach is evaluated on a public breast cancer histopathology database, where it demonstrates superior performance to the state-of-the-art deep learning approaches, achieving around 96\% classification accuracy while only 15% of raw pixels are used.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10946

PDF

http://arxiv.org/pdf/1902.10946
Read All
Learning Representations from Persian Handwriting for Offline Signature Verification, a Deep Transfer Learning Approach

2019-02-28

Omid Mersa, Farhood Etaati, Saeed Masoudnia, Babak N. Araabi

arXiv_CV

arXiv_CV Transfer_Learning Classification Recognition
Abstract

Offline Signature Verification (OSV) is a challenging pattern recognition task, especially when it is expected to generalize well on the skilled forgeries that are not available during the training. Its challenges also include small training sample and large intra-class variations. Considering the limitations, we suggest a novel transfer learning approach from Persian handwriting domain to multi-language OSV domain. We train two Residual CNNs on the source domain separately based on two different tasks of word classification and writer identification. Since identifying a person signature resembles identifying ones handwriting, it seems perfectly convenient to use handwriting for the feature learning phase. The learned representation on the more varied and plentiful handwriting dataset can compensate for the lack of training data in the original task, i.e. OSV, without sacrificing the generalizability. Our proposed OSV system includes two steps: learning representation and verification of the input signature. For the first step, the signature images are fed into the trained Residual CNNs. The output representations are then used to train SVMs for the verification. We test our OSV system on three different signature datasets, including MCYT (a Spanish signature dataset), UTSig (a Persian one) and GPDS-Synthetic (an artificial dataset). On UT-SIG, we achieved 9.80% Equal Error Rate (EER) which showed substantial improvement over the best EER in the literature, 17.45%. Our proposed method surpassed state-of-the-arts by 6% on GPDS-Synthetic, achieving 6.81%. On MCYT, EER of 3.98% was obtained which is comparable to the best previously reported results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06249

PDF

http://arxiv.org/pdf/1903.06249
Read All
A security steganography scheme based on hdr image

2019-02-28

Wei Gao, Yongqing Huo, Yan Qiao

arXiv_CV

arXiv_CV GAN
Abstract

It is widely recognized that the image format is crucial to steganography for that each individual format has its unique properities. Nowadays, the most famous approach of digital image steganography is to combine a well-defined distortion function with efficient practical codes such as STC. And numerous researches are concentrated on spatial domain and jpeg domain. However, whether in spatial domain or jpeg domain, high payload (e.g., 0.5 bit per pixel) is not secure enough. In this paper, we propose a novel adaptive steganography scheme based on 32-bit HDR (High dynamic range) format and Norm IEEE 754. Experiments show that the steganographic method can achieve satisfactory security under payload from 0.3bpp to 0.5bpp.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10943

PDF

http://arxiv.org/pdf/1902.10943
Read All
High dynamic range image forensics using cnn

2019-02-28

Yongqing Huo, Xiaofeng Zhu

arXiv_CV

arXiv_CV Knowledge Attention CNN Deep_Learning
Abstract

High dynamic range (HDR) imaging has recently drawn much attention in multimedia community. In this paper, we proposed a HDR image forensics method based on convolutional neural network (CNN).To our best knowledge, this is the first time to apply deep learning method on HDR image forensics. The proposed algorithm uses CNN to distinguish HDR images generated by multiple low dynamic range (LDR) images from that expanded by single LDR image using inverse tone mapping (iTM). To do this, we learn the change of statistical characteristics extracted by the proposed CNN architectures and classify two kinds of HDR images. Comparision results with some traditional statistical characteristics shows efficiency of the proposed method in HDR image source identification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10938

PDF

http://arxiv.org/pdf/1902.10938
Read All
Interaction-aware Kalman Neural Networks for Trajectory Prediction

2019-02-28

Ce Ju, Zheng Wang, Cheng Long, Xiaoyu Zhang, Gao Cong, Dong Eui Chang

arXiv_RO

arXiv_RO Prediction
Abstract

Forecasting the motion of surrounding dynamic obstacles (vehicles, bicycles, pedestrians and etc.) benefits the on-road motion planning for autonomous vehicles. Complex traffic scenes yield great challenges in modeling the traffic patterns of surrounding dynamic obstacles. In this paper, we propose a multi-layer architecture Interaction-aware Kalman Neural Networks (IaKNN) which involves an interaction layer for resolving high-dimensional traffic environmental observations as interaction-aware accelerations, a motion layer for transforming the accelerations to interaction-aware trajectories, and a filter layer for estimating future trajectories with a Kalman filter. Experiments on the NGSIM dataset demonstrate that IaKNN outperforms the state-of-the-art methods in terms of effectiveness for trajectory prediction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10928

PDF

http://arxiv.org/pdf/1902.10928
Read All
BERT for Joint Intent Classification and Slot Filling

2019-02-28

Qian Chen, Zhu Zhuo, Wen Wang

arXiv_CL

arXiv_CL Attention Classification
Abstract

Intent classification and slot filling are two essential tasks for natural language understanding. They often suffer from small-scale human-labeled training data, resulting in poor generalization capability, especially for rare words. Recently a new language representation model, BERT (Bidirectional Encoder Representations from Transformers), facilitates pre-training deep bidirectional representations on large-scale unlabeled corpora, and has created state-of-the-art models for a wide variety of natural language processing tasks after simple fine-tuning. However, there has not been much effort on exploring BERT for natural language understanding. In this work, we propose a joint intent classification and slot filling model based on BERT. Experimental results demonstrate that our proposed model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on several public benchmark datasets, compared to the attention-based recurrent neural network models and slot-gated models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10909

PDF

http://arxiv.org/pdf/1902.10909
Read All
PixelSteganalysis: Pixel-wise Hidden Information Removal with Low Visual Degradation

2019-02-28

Dahuin Jung, Ho Bae, Hyun-Soo Choi, Sungroh Yoon

arXiv_CV

arXiv_CV GAN Quantitative
Abstract

It is difficult to detect and remove secret images that are hidden in natural images using deep-learning algorithms. Our technique is the first work to effectively disable covert communications and transactions that use deep-learning steganography. We address the problem by exploiting sophisticated pixel distributions and edge areas of images using a deep neural network. Based on the given information, we adaptively remove secret information at the pixel level. We also introduce a new quantitative metric called destruction rate since the decoding method of deep-learning steganography is approximate (lossy), which is different from conventional steganography. We evaluate our technique using three public benchmarks in comparison with conventional steganalysis methods and show that the decoding rate improves by 10 ~ 20%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10905

PDF

http://arxiv.org/pdf/1902.10905
Read All
SweepNet: Wide-baseline Omnidirectional Depth Estimation

2019-02-28

Changhee Won, Jongbin Ryu, Jongwoo Lim

arXiv_CV

arXiv_CV CNN
Abstract

Omnidirectional depth sensing has its advantage over the conventional stereo systems since it enables us to recognize the objects of interest in all directions without any blind regions. In this paper, we propose a novel wide-baseline omnidirectional stereo algorithm which computes the dense depth estimate from the fisheye images using a deep convolutional neural network. The capture system consists of multiple cameras mounted on a wide-baseline rig with ultrawide field of view (FOV) lenses, and we present the calibration algorithm for the extrinsic parameters based on the bundle adjustment. Instead of estimating depth maps from multiple sets of rectified images and stitching them, our approach directly generates one dense omnidirectional depth map with full 360-degree coverage at the rig global coordinate system. To this end, the proposed neural network is designed to output the cost volume from the warped images in the sphere sweeping method, and the final depth map is estimated by taking the minimum cost indices of the aggregated cost volume by SGM. For training the deep neural network and testing the entire system, realistic synthetic urban datasets are rendered using Blender. The experiments using the synthetic and real-world datasets show that our algorithm outperforms the conventional depth estimation methods and generate highly accurate depth maps.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10904

PDF

http://arxiv.org/pdf/1902.10904
Read All
Bi-Directional Cascade Network for Perceptual Edge Detection

2019-02-28

Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang

arXiv_CV

arXiv_CV Detection
Abstract

Exploiting multi-scale representations is critical to improve edge detection for objects at different scales. To extract edges at dramatically different scales, we propose a Bi-Directional Cascade Network (BDCN) structure, where an individual layer is supervised by labeled edges at its specific scale, rather than directly applying the same supervision to all CNN outputs. Furthermore, to enrich multi-scale representations learned by BDCN, we introduce a Scale Enhancement Module (SEM) which utilizes dilated convolution to generate multi-scale features, instead of using deeper CNNs or explicitly fusing multi-scale edge maps. These new approaches encourage the learning of multi-scale representations in different layers and detect edges that are well delineated by their scales. Learning scale dedicated layers also results in compact network with a fraction of parameters. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and Multicue, and achieve ODS Fmeasure of 0.828, 1.3% higher than current state-of-the art on BSDS500. The code has been available at https://github.com/pkuCactus/BDCN.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10903

PDF

http://arxiv.org/pdf/1902.10903
Read All
Adversarial Attack and Defense on Point Sets

2019-02-28

Jiancheng Yang, Qiang Zhang, Rongyao Fang, Bingbing Ni, Jinxian Liu, Qi Tian

arXiv_CV

arXiv_CV Adversarial Attention
Abstract

Emergence of the utility of 3D point cloud data in critical vision tasks (e.g., ADAS) urges researchers to pay more attention to the robustness of 3D representations and deep networks. To this end, we develop an attack and defense scheme, dedicated to 3D point cloud data, for preventing 3D point clouds from manipulated as well as pursuing noise-tolerable 3D representation. A set of novel 3D point cloud attack operations are proposed via pointwise gradient perturbation and adversarial point attachment / detachment. We then develop a flexible perturbation-measurement scheme for 3D point cloud data to detect potential attack data or noisy sensing data. Extensive experimental results on common point cloud benchmarks demonstrate the validity of the proposed 3D attack and defense framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10899

PDF

http://arxiv.org/pdf/1902.10899
Read All
Mapping solar array location, size, and capacity using deep learning and overhead imagery

2019-02-28

Jordan M. Malof, Boning Li, Bohao Huang, Kyle Bradbury, Artem Stretslov

arXiv_CV

arXiv_CV Deep_Learning
Abstract

The effective integration of distributed solar photovoltaic (PV) arrays into existing power grids will require access to high quality data; the location, power capacity, and energy generation of individual solar PV installations. Unfortunately, existing methods for obtaining this data are limited in their spatial resolution and completeness. We propose a general framework for accurately and cheaply mapping individual PV arrays, and their capacities, over large geographic areas. At the core of this approach is a deep learning algorithm called SolarMapper - which we make publicly available - that can automatically map PV arrays in high resolution overhead imagery. We estimate the performance of SolarMapper on a large dataset of overhead imagery across three US cities in California. We also describe a procedure for deploying SolarMapper to new geographic regions, so that it can be utilized by others. We demonstrate the effectiveness of the proposed deployment procedure by using it to map solar arrays across the entire US state of Connecticut (CT). Using these results, we demonstrate that we achieve highly accurate estimates of total installed PV capacity within each of CT’s 168 municipal regions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10895

PDF

http://arxiv.org/pdf/1902.10895
Read All
Sparse Depth Enhanced Direct Thermal-infrared SLAM Beyond the Visible Spectrum

2019-02-28

Young-Sik Shin, Ayoung Kim

arXiv_RO

arXiv_RO Sparse Tracking Detection SLAM
Abstract

In this paper, we propose a thermal-infrared simultaneous localization and mapping (SLAM) system enhanced by sparse depth measurements from Light Detection and Ranging (LiDAR). Thermal-infrared cameras are relatively robust against fog, smoke, and dynamic lighting conditions compared to RGB cameras operating under the visible spectrum. Due to the advantages of thermal-infrared cameras, exploiting them for motion estimation and mapping is highly appealing. However, operating a thermal-infrared camera directly in existing vision-based methods is difficult because of the modality difference. This paper proposes a method to use sparse depth measurement for 6-DOF motion estimation by directly tracking under 14- bit raw measurement of the thermal camera. In addition, we perform a refinement to improve the local accuracy and include a loop closure to maintain global consistency. The experimental results demonstrate that the system is not only robust under various lighting conditions such as day and night, but also overcomes the scale problem of monocular cameras. The video is available at https://youtu.be/oO7lT3uAzLc.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10892

PDF

http://arxiv.org/pdf/1902.10892
Read All
Towards Robust ResNet: A Small Step but A Giant Leap

2019-02-28

Jingfeng Zhang, Bo Han, Laura Wynter, Kian Hsiang Low, Mohan Kankanhalli

arXiv_CV

arXiv_CV
Abstract

This paper presents a simple yet principled approach to boosting the robustness of the residual network (ResNet) that is motivated by the dynamical system perspective. Namely, a deep neural network can be interpreted using a partial differential equation, which naturally inspires us to characterize ResNet by an explicit Euler method. Our analytical studies reveal that the step factor h in the Euler method is able to control the robustness of ResNet in both its training and generalization. Specifically, we prove that a small step factor h can benefit the training robustness for back-propagation; from the view of forward-propagation, a small h can aid in the robustness of the model generalization. A comprehensive empirical evaluation on both vision CIFAR-10 and text AG-NEWS datasets confirms that a small h aids both the training and generalization robustness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10887

PDF

http://arxiv.org/pdf/1902.10887
Read All
Face Recognition Under Varying Blur, Illumination and Expression in an Unconstrained Environment

2019-02-28

Anubha Pearline.S, Hemalatha.M

arXiv_CV

arXiv_CV Face Classification Recognition Face_Recognition
Abstract

Face recognition system is one of the esteemed research areas in pattern recognition and computer vision as long as its major challenges. A few challenges in recognizing faces are blur, illumination, and varied expressions. Blur is natural while taking photographs using cameras, mobile phones, etc. Blur can be uniform and non-uniform. Usually non-uniform blur happens in images taken using handheld image devices. Distinguishing or handling a blurred image in a face recognition system is generally tough. Under varying lighting conditions, it is challenging to identify the person correctly. Diversified facial expressions such as happiness, sad, surprise, fear, anger changes or deforms the faces from normal images. Identifying faces with facial expressions is also a challenging task, due to the deformation caused by the facial expressions. To solve these issues, a pre-processing step was carried out after which Blur and Illumination-Robust Face recognition (BIRFR) algorithm was performed. The test image and training images with facial expression are transformed to neutral face using Facial expression removal (FER) peration. Every training image is transformed based on the optimal Transformation Spread Function (TSF), and illumination coefficients. Local Binary Pattern (LBP) features extracted from test image and transformed training image is used for classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10885

PDF

http://arxiv.org/pdf/1902.10885
Read All
Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation

2019-02-28

Xiaomeng Li, Lequan Yu, Hao Chen, Chi-Wing Fu, Pheng-Ann Heng

arXiv_CV

arXiv_CV Regularization Segmentation CNN Deep_Learning Prediction
Abstract

Deep convolutional neural networks have achieved remarkable progress on a variety of medical image computing tasks. A common problem when applying supervised deep learning methods to medical images is the lack of labeled data, which is very expensive and time-consuming to be collected. In this paper, we present a novel semi-supervised method for medical image segmentation, where the network is optimized by the weighted combination of a common supervised loss for labeled inputs only and a regularization loss for both labeled and unlabeled data. To utilize the unlabeled data, our method encourages the consistent predictions of the network-in-training for the same input under different regularizations. Aiming for the semi-supervised segmentation problem, we enhance the effect of regularization for pixel-level predictions by introducing a transformation, including rotation and flipping, consistent scheme in our self-ensembling model. With the aim of semi-supervised segmentation tasks, we introduce a transformation consistent strategy in our self-ensembling model to enhance the regularization effect for pixel-level predictions. We have extensively validated the proposed semi-supervised method on three typical yet challenging medical image segmentation tasks: (i) skin lesion segmentation from dermoscopy images on International Skin Imaging Collaboration (ISIC) 2017 dataset, (ii) optic disc segmentation from fundus images on Retinal Fundus Glaucoma Challenge (REFUGE) dataset, and (iii) liver segmentation from volumetric CT scans on Liver Tumor Segmentation Challenge (LiTS) dataset. Compared to the state-of-the-arts, our proposed method shows superior segmentation performance on challenging 2D/3D medical images, demonstrating the effectiveness of our semi-supervised method for medical image segmentation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00348

PDF

http://arxiv.org/pdf/1903.00348
Read All
A Dynamic Model Identification Package for the da Vinci Research Kit

2019-02-28

Yan Wang, Radian Gondokaryono, Adnan Munawar, Gregory S Fischer

arXiv_RO

arXiv_RO Optimization
Abstract

The da Vinci Research Kit (dVRK) is a teleoperated surgical robotic system. For dynamic simulations and modelbased control, the dynamic model of the dVRK with standard dynamic parameters is required. We developed a dynamic model identification package for the dVRK, capable of modeling the parallelograms, springs, counterweight, and tendon couplings, which are inherent to the dVRK. A convex optimization-based method is used to identify the standard dynamic parameters of the dVRK subject to physically feasible constraints. The relative errors between the predicted and measured motor torque are calculated on independent test trajectories, which are less than 16.3% and 18.9% for the first three joints and 34.0% and 26.5% for all joints for the master tool manipulator and patient side manipulator, respectively. We open source the identification software package. Although this software package is originally developed for the dVRK, it is easy to apply it on other robots with similar characteristics to the dVRK through simple configuration.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10875

PDF

http://arxiv.org/pdf/1902.10875
Read All
Real-time tree search with pessimistic scenarios

2019-02-28

Takayuki Osogami, Toshihiro Takahashi

arXiv_AI

arXiv_AI
Abstract

Autonomous agents need to make decisions in a sequential manner, under partially observable environment, and in consideration of how other agents behave. In critical situations, such decisions need to be made in real time for example to avoid collisions and recover to safe conditions. We propose a technique of tree search where a deterministic and pessimistic scenario is used after a specified depth. Because there is no branching with the deterministic scenario, the proposed technique allows us to take into account far ahead in the future in real time. The effectiveness of the proposed technique is demonstrated in Pommerman, a multi-agent environment used in a NeurIPS 2018 competition, where the agents that implement the proposed technique have won the first and third places.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10870

PDF

http://arxiv.org/pdf/1902.10870
Read All
Multiple Description Convolutional Neural Networks for Image Compression

2019-02-28

Lijun Zhao, Huihui Bai, Anhong Wang, Yao Zhao

arXiv_CV

arXiv_CV CNN
Abstract

Multiple description coding (MDC) is able to stably transmit the signal in the un-reliable and non-prioritized networks, which has been broadly studied for several decades. However, the traditional MDC doesn’t well leverage image’s context features to generate multiple descriptions. In this paper, we propose a novel standard-compliant convolutional neural network-based MDC framework in term of image’s context features. Firstly, multiple description generator network (MDGN) is designed to produce appearance-similar yet feature-different multiple descriptions automatically according to image’s content, which are compressed by standard codec. Secondly, we present multiple description reconstruction network (MDRN) including side reconstruction network (SRN) and central reconstruction network (CRN). When any one of two lossy descriptions is received at the decoder, SRN network is used to improve the quality of this decoded lossy description by removing the compression artifact and up-sampling simultaneously. Meanwhile, we utilize CRN network with two decoded descriptions as inputs for better reconstruction, if both of lossy descriptions are available. Thirdly, multiple description virtual codec network (MDVCN) is proposed to bridge the gap between MDGN network and MDRN network in order to train an end-to-end MDC framework. Here, two learning algorithms are provided to train our whole framework. In addition to structural similarity loss function, the produced descriptions are used as opposing labels with multiple description distance loss function to regularize the training of MDGN network. These losses guarantee that the generated description images are structurally similar yet finely diverse. Experimental results show a great deal of objective and subjective quality measurements to validate the efficiency of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.06611

PDF

http://arxiv.org/pdf/1801.06611
Read All
Non-rigid Object Tracking via Deep Multi-scale Spatial-temporal Discriminative Saliency Maps

2019-02-28

Pingping Zhang, Wei Liu, Dong Wang, Yinjie Lei, Hongyu Wang, Chunhua Shen, Huchuan Lu

arXiv_CV

arXiv_CV Salient Tracking CNN Object_Tracking Classification Detection
Abstract

In this paper, we propose a novel effective non-rigid object tracking framework based on the spatial-temporal consistent saliency detection. In contrast to most existing trackers that utilize a bounding box to specify the tracked target, the proposed framework can extract accurate regions of the target as tracking outputs. It achieves a better description of the non-rigid objects and reduces the background pollution for the tracking model. Furthermore, our model has several unique features. First, a tailored fully convolutional neural network (TFCN) is developed to model the local saliency prior for a given image region, which not only provides the pixel-wise outputs but also integrates the semantic information. Second, a novel multi-scale multi-region mechanism is proposed to generate local saliency maps that effectively consider visual perceptions with different spatial layouts and scale variations. Subsequently, local saliency maps are fused via a weighted entropy method, resulting in a final discriminative saliency map. Finally, we present a non-rigid object tracking algorithm based on the predicted saliency maps. By utilizing a spatial-temporal consistent saliency map (STCSM), we conduct target-background classification and use a simple fine-tuning scheme for online updating. Extensive experiments demonstrate that the proposed algorithm achieves competitive performance in both saliency detection and visual tracking, especially outperforming other related trackers on the non-rigid object tracking datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.07957

PDF

http://arxiv.org/pdf/1802.07957
Read All
PFLD: A Practical Facial Landmark Detector

2019-02-28

Xiaojie Guo, Siyuan Li, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling

arXiv_CV

arXiv_CV Regularization Object_Detection Face Detection
Abstract

Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at \url{this http URL} for encouraging comparisons and improvements from the community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10859

PDF

http://arxiv.org/pdf/1902.10859
Read All
Cascaded Recurrent Neural Networks for Hyperspectral Image Classification

2019-02-28

Renlong Hang, Qingshan Liu, Danfeng Hong, Pedram Ghamisi

arXiv_CV

arXiv_CV CNN Image_Classification RNN Classification
Abstract

By considering the spectral signature as a sequence, recurrent neural networks (RNNs) have been successfully used to learn discriminative features from hyperspectral images (HSIs) recently. However, most of these models only input the whole spectral bands into RNNs directly, which may not fully explore the specific properties of HSIs. In this paper, we propose a cascaded RNN model using gated recurrent units (GRUs) to explore the redundant and complementary information of HSIs. It mainly consists of two RNN layers. The first RNN layer is used to eliminate redundant information between adjacent spectral bands, while the second RNN layer aims to learn the complementary information from non-adjacent spectral bands. To improve the discriminative ability of the learned features, we design two strategies for the proposed model. Besides, considering the rich spatial information contained in HSIs, we further extend the proposed model to its spectral-spatial counterpart by incorporating some convolutional layers. To test the effectiveness of our proposed models, we conduct experiments on two widely used HSIs. The experimental results show that our proposed models can achieve better results than the compared models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10858

PDF

http://arxiv.org/pdf/1902.10858
Read All
Machine-assisted annotation of forensic imagery

2019-02-28

Sara Mousavi, Ramin Nabati, Megan Kleeschulte, Audris Mockus

arXiv_CV

arXiv_CV Segmentation Transfer_Learning Classification Prediction
Abstract

Image collections, if critical aspects of image content are exposed, can spur research and practical applications in many domains. Supervised machine learning may be the only feasible way to annotate very large collections, but leading approaches rely on large samples of completely and accurately annotated images. In the case of a large forensic collection, we are aiming to annotate, neither the complete annotation nor the large training samples can be feasibly produced. We, therefore, investigate ways to assist manual annotation efforts done by forensic experts. We present a method that can propose both images and areas within an image likely to contain desired classes. Evaluation of the method with human annotators showed highly accurate classification that was strongly helped by transfer learning. The segmentation precision (mAP) was improved by adding a separate class capturing background, but that did not affect the recall (mAR). Further work is needed to both increase the accuracy of segmentation and enhances prediction with additional covariates affecting decomposition. We hope this effort to be of help in other domains that require weak segmentation and have limited availability of qualified annotators.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10848

PDF

http://arxiv.org/pdf/1902.10848
Read All
Robust Re-identification of Manta Rays from Natural Markings by Learning Pose Invariant Embeddings

2019-02-28

Olga Moskvyak, Frederic Maire, Asia O. Armstrong, Feras Dayoub, Mahsa Baktashmotlagh

arXiv_CV

arXiv_CV Re-identification Face Embedding CNN
Abstract

Visual identification of individual animals that bear unique natural body markings is an important task in wildlife conservation. The photo databases of animal markings grow larger and each new observation has to be matched against thousands of images. Existing photo-identification solutions have constraints on image quality and appearance of the pattern of interest in the image. These constraints limit the use of photos from citizen scientists. We present a novel system for visual re-identification based on unique natural markings that is robust to occlusions, viewpoint and illumination changes. We adapt methods developed for face re-identification and implement a deep convolutional neural network (CNN) to learn embeddings for images of natural markings. The distance between the learned embedding points provides a dissimilarity measure between the corresponding input images. The network is optimized using the triplet loss function and the online semi-hard triplet mining strategy. The proposed re-identification method is generic and not species specific. We evaluate the proposed system on image databases of manta ray belly patterns and humpback whale flukes. To be of practical value and adopted by marine biologists, a re-identification system needs to have a top-10 accuracy of at least 95%. The proposed system achieves this performance standard.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10847

PDF

http://arxiv.org/pdf/1902.10847
Read All
Efficient Grasp Planning and Execution with Multi-Fingered Hands by Surface Fitting

2019-02-28

Yongxiang Fan, Masayoshi Tomizuka

arXiv_RO

arXiv_RO Face Optimization
Abstract

This paper introduces a framework to plan grasps with multi-fingered hands. The framework includes a multi-dimensional iterative surface fitting (MDISF) for grasp planning and a grasp trajectory optimization (GTO) for grasp imagination. The MDISF algorithm searches for optimal contact regions and hand configurations by minimizing the collision and surface fitting error, and the GTO algorithm generates optimal finger trajectories to reach the highly ranked grasp configurations and avoid collision with the environment. The proposed grasp planning and imagination framework considers the collision avoidance and the kinematics of the hand-robot system, and is able to plan grasps and trajectories of different categories efficiently with gradient-based methods using the captured point cloud. The found grasps and trajectories are robust to sensing noises and underlying uncertainties. The effectiveness of the proposed framework is verified by both simulations and experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10841

PDF

http://arxiv.org/pdf/1902.10841
Read All
Deep Interpretable Non-Rigid Structure from Motion

2019-02-28

Chen Kong, Simon Lucey

arXiv_CV

arXiv_CV Sparse
Abstract

All current non-rigid structure from motion (NRSfM) algorithms are limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle. This has hampered the practical utility of NRSfM for many applications within vision. In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates. The proposed neural network is mathematically interpretable as a multi-layer block sparse dictionary learning problem, and can handle problems of unprecedented scale and shape complexity. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works. The considerable model capacity of our approach affords remarkable generalization to unseen data. We propose a quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstruction. Once the network’s weights are estimated (for a non-rigid object) we show how our approach can effectively recover 3D shape from a single image – outperforming comparable methods that rely on direct 3D supervision.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10840

PDF

http://arxiv.org/pdf/1902.10840
Read All
2019-05-31

Read All
Deep learning generalizes because the parameter-function map is biased towards simple functions

2019-02-27

Guillermo Valle-Pérez, Chico Q. Camargo, Ard A. Louis

arXiv_AI

arXiv_AI Regularization CNN Deep_Learning
Abstract

Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias helps explain why deep networks generalize well on real world problems. This picture also facilitates a novel PAC-Bayes approach where the prior is taken over the DNN input-output function space, rather than the more conventional prior over parameter space. If we assume that the training algorithm samples parameters close to uniformly within the zero-error region then the PAC-Bayes theorem can be used to guarantee good expected generalization for target functions producing high-likelihood training sets. By exploiting recently discovered connections between DNNs and Gaussian processes to estimate the marginal likelihood, we produce relatively tight generalization PAC-Bayes error bounds which correlate well with the true error on realistic datasets such as MNIST and CIFAR10 and for architectures including convolutional and fully connected networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.08522

PDF

http://arxiv.org/pdf/1805.08522
Read All
A Study of a Class of Vibration-Driven Robots: Modeling, Analysis, Control and Design of the Brushbot

2019-02-27

Gennaro Notomista, Siddharth Mayya, Anirban Mazumdar, Seth Hutchinson, Magnus Egerstedt

arXiv_RO

arXiv_RO
Abstract

In this paper we present a theoretical study of a specific class of vibration-driven robots: the brushbots. In a bottom-up fashion, we start by improving the state-of-the-art dynamic models of the brush-based locomotion mechanism. Then, we discuss the range of validity of these models and their applicability to different types of brushbots which can be found in literature. Finally, we present two designs of brushbots: a fully-actuated platform and a differential-drive-like one. These two designs are used to experimentally validate both the developed theoretical models and the devised motion control algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10830

PDF

http://arxiv.org/pdf/1902.10830
Read All
The VOiCES from a Distance Challenge 2019 Evaluation Plan

2019-02-27

Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Colleen Richey, Aaron Lawson, Maria Alejandra Barrios

arXiv_SD

arXiv_SD Speech_Recognition Recognition
Abstract

The “VOiCES from a Distance Challenge 2019” is designed to foster research in the area of speaker recognition and automatic speech recognition (ASR) with the special focus on single channel distant/far-field audio, under noisy conditions. The main objectives of this challenge are to: (i) benchmark state-of-the-art technology in the area of speaker recognition and automatic speech recognition (ASR), (ii) support the development of new ideas and technologies in speaker recognition and ASR, (iii) support new research groups entering the field of distant/far-field speech processing, and (iv) provide a new, publicly available dataset to the community that exhibits realistic distance characteristics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10828

PDF

http://arxiv.org/pdf/1902.10828
Read All
An Integrated Inverse Space Sparse Representation Framework for Tumor Classification

2019-02-27

Xiaohui Yang, Wenming Wu, Yunmei Chen, Xianqi Li, Juan Zhang, Dan Long, Lijun Yang

arXiv_CV

arXiv_CV Sparse Optimization Classification Deep_Learning
Abstract

Microarray gene expression data-based tumor classification is an active and challenging issue. In this paper, an integrated tumor classification framework is presented, which aims to exploit information in existing available samples, and focuses on the small sample problem and unbalanced classification problem. Firstly, an inverse space sparse representation based classification (ISSRC) model is proposed by considering the characteristics of gene-based tumor data, such as sparsity and a small number of training samples. A decision information factors (DIF)-based gene selection method is constructed to enhance the representation ability of the ISSRC. It is worth noting that the DIF is established from reducing clinical misdiagnosis rate and dimension of small sample data. For further improving the representation ability and classification stability of the ISSRC, feature learning is conducted on the selected gene subset. The feature learning method is constructed by complementing the advantages of non-negative matrix factorization (NMF) and deep learning. Without confusion, the ISSRC combined with gene selection and feature learning is called the integrated ISSRC, whose stability, optimization and the corresponding convergence are analyzed. Extensive experiments on six public microarray gene expression datasets show the integrated ISSRC-based tumor classification framework is superior to classical and state-of-the-art methods. There are significant improvements in classification accuracy, specificity and sensitivity, whether there is a tumor in the early diagnosis, what kind of tumor, or whether metastasis occurs after tumor surgery.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.03562

PDF

http://arxiv.org/pdf/1803.03562
Read All
Learning Logistic Circuits

2019-02-27

Yitao Liang, Guy Van den Broeck

arXiv_AI

arXiv_AI Optimization Classification
Abstract

This paper proposes a new classification model called logistic circuits. On MNIST and Fashion datasets, our learning algorithm outperforms neural networks that have an order of magnitude more parameters. Yet, logistic circuits have a distinct origin in symbolic AI, forming a discriminative counterpart to probabilistic-logical circuits such as ACs, SPNs, and PSDDs. We show that parameter learning for logistic circuits is convex optimization, and that a simple local search algorithm can induce strong model structures from data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10798

PDF

http://arxiv.org/pdf/1902.10798
Read All
Dynamic Deep Multi-modal Fusion for Image Privacy Prediction

2019-02-27

Ashwini Tonge, Cornelia Caragea

arXiv_CV

arXiv_CV CNN Prediction
Abstract

With millions of images that are shared online on social networking sites, effective methods for image privacy prediction are highly needed. In this paper, we propose an approach for fusing object, scene context, and image tags modalities derived from convolutional neural networks for accurately predicting the privacy of images shared online. Specifically, our approach identifies the set of most competent modalities on the fly, according to each new target image whose privacy has to be predicted. The approach considers three stages to predict the privacy of a target image, wherein we first identify the neighborhood images that are visually similar and/or have similar sensitive content as the target image. Then, we estimate the competence of the modalities based on the neighborhood images. Finally, we fuse the decisions of the most competent modalities and predict the privacy label for the target image. Experimental results show that our approach predicts the sensitive (or private) content more accurately than the models trained on individual modalities (object, scene, and tags) and prior privacy prediction works. Also, our approach outperforms strong baselines, that train meta-classifiers to obtain an optimal combination of modalities.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10796

PDF

http://arxiv.org/pdf/1902.10796
Read All
Online Object and Task Learning via Human Robot Interaction

2019-02-27

Masood Dehghan, Zichen Zhang, Mennatullah Siam, Jun Jin, Laura Petrich, Martin Jagersand

arXiv_RO

arXiv_RO Knowledge Face Deep_Learning Recognition
Abstract

This work describes the development of a robotic system that acquires knowledge incrementally through human interaction where new tools and motions are taught on the fly. The robotic system developed was one of the five finalists in the KUKA Innovation Award competition and demonstrated during the Hanover Messe 2018 in Germany. The main contributions of the system are a) a novel incremental object learning module - a deep learning based localization and recognition system - that allows a human to teach new objects to the robot, b) an intuitive user interface for specifying 3D motion task associated with the new object, c) a hybrid force-vision control module for performing compliant motion on an unstructured surface. This paper describes the implementation and integration of the main modules of the system and summarizes the lessons learned from the competition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08722

PDF

http://arxiv.org/pdf/1809.08722
Read All
Semi-supervised Learning for Quantification of Pulmonary Edema in Chest X-Ray Images

2019-02-27

Ruizhi Liao, Jonathan Rubin, Grace Lam, Seth Berkowitz, Sandeep Dalal, William Wells, Steven Horng, Polina Golland

arXiv_CV

arXiv_CV Knowledge Segmentation Quantitative
Abstract

We propose and demonstrate machine learning algorithms to assess the severity of pulmonary edema in chest x-ray images of congestive heart failure patients. Accurate assessment of pulmonary edema in heart failure is critical when making treatment and disposition decisions. Our work is grounded in a large-scale clinical image dataset of over 300,000 x-ray images with associated radiology reports. While edema severity labels can be extracted unambiguously from a small fraction of the radiology reports, accurate annotation is challenging in most cases. To take advantage of the unlabeled images, we develop a generative model that includes an auto-encoder for learning a latent representation from the entire image dataset and a classifier that employs this representation for predicting pulmonary edema severity. We use segmentation to focus the auto-encoder on the lungs, where most of pulmonary edema findings are observed. Our experimental results suggest that modeling the distribution of images and providing anatomical information improve the accuracy of pulmonary edema scoring compared to a strictly supervised approach. To the best of our knowledge, this is the first attempt to employ machine learning algorithms to automatically and quantitatively assess the severity of pulmonary edema in chest x-ray images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10785

PDF

http://arxiv.org/pdf/1902.10785
Read All
Learning Task Knowledge and its Scope of Applicability in Experience-Based Planning Domains

2019-02-27

Vahid Mokhtari, Luis Seabra Lopes, Armando Pinho, Roman Manevich

arXiv_AI

arXiv_AI Knowledge
Abstract

Experience-based planning domains (EBPDs) have been recently proposed to improve problem solving by learning from experience. EBPDs provide important concepts for long-term learning and planning in robotics. They rely on acquiring and using task knowledge, i.e., activity schemata, for generating concrete solutions to problem instances in a class of tasks. Using Three-Valued Logic Analysis (TVLA), we extend previous work to generate a set of conditions as the scope of applicability for an activity schema. The inferred scope is a bounded representation of a set of problems of potentially unbounded size, in the form of a 3-valued logical structure, which allows an EBPD system to automatically find an applicable activity schema for solving task problems. We demonstrate the utility of our approach in a set of classes of problems in a simulated domain and a class of real world tasks in a fully physically simulated PR2 robot in Gazebo.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10770

PDF

http://arxiv.org/pdf/1902.10770
Read All

140/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL