Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Amplifying the Imitation Effect for Reinforcement Learning of UCAV's Mission Execution

2019-01-17

Gyeong Taek Lee, Chang Ouk Kim

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

This paper proposes a new reinforcement learning (RL) algorithm that enhances exploration by amplifying the imitation effect (AIE). This algorithm consists of self-imitation learning and random network distillation algorithms. We argue that these two algorithms complement each other and that combining these two algorithms can amplify the imitation effect for exploration. In addition, by adding an intrinsic penalty reward to the state that the RL agent frequently visits and using replay memory for learning the feature state when using an exploration bonus, the proposed approach leads to deep exploration and deviates from the current converged policy. We verified the exploration performance of the algorithm through experiments in a two-dimensional grid environment. In addition, we applied the algorithm to a simulated environment of unmanned combat aerial vehicle (UCAV) mission execution, and the empirical results show that AIE is very effective for finding the UCAV’s shortest flight path to avoid an enemy’s missiles.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05856

PDF

https://arxiv.org/pdf/1901.05856
Read All
Learning Long-Range Perception Using Self-Supervision from Short-Range Sensors and Odometry

2019-01-17

Mirko Nava, Jerome Guzzi, R. Omar Chavez-Garcia, Luca M. Gambardella, Alessandro Giusti

arXiv_RO

arXiv_RO CNN Prediction Quantitative
Abstract

We introduce a general self-supervised approach to predict the future outputs of a short-range sensor (such as a proximity sensor) given the current outputs of a long-range sensor (such as a camera); we assume that the former is directly related to some piece of information to be perceived (such as the presence of an obstacle in a given position), whereas the latter is information-rich but hard to interpret directly. We instantiate and implement the approach on a small mobile robot to detect obstacles at various distances using the video stream of the robot’s forward-pointing camera, by training a convolutional neural network on automatically-acquired datasets. We quantitatively evaluate the quality of the predictions on unseen scenarios, qualitatively evaluate robustness to different operating conditions, and demonstrate usage as the sole input of an obstacle-avoidance controller. We additionally instantiate the approach on a different simulated scenario with complementary characteristics, to exemplify the generality of our contribution.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.07207

PDF

http://arxiv.org/pdf/1809.07207
Read All
Using DNNs to Detect Materials in a Room based on Sound Absorption

2019-01-17

Constantinos Papayiannis, Christine Evers, Patrick A. Naylor

arXiv_SD

arXiv_SD Knowledge Face RNN Detection
Abstract

The materials of surfaces in a room play an important room in shaping the auditory experience within them. Different materials absorb energy at different levels. The level of absorption also varies across frequencies. This paper investigates how cues from a measured impulse response in the room can be exploited by machines to detect the materials present. With this motivation, this paper proposes a method for estimating the probability of presence of 10 material categories, based on their frequency-dependent absorption characteristics. The method is based on a CNN-RNN, trained as a multi-task classifier. The network is trained using a priori knowledge about the absorption characteristics of materials from the literature. In the experiments shown, the network is tested on over 5,00 impulse responses and 167 materials. The F1 score of the detections was 98%, with an even precision and recall. The method finds direct applications in architectural acoustics and in creating more parsimonious models for acoustic reflections.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05852

PDF

http://arxiv.org/pdf/1901.05852
Read All
Learning Tractable Probabilistic Models in Open Worlds

2019-01-17

Amelie Levray, Vaishak Belle

arXiv_AI

arXiv_AI Knowledge Inference Relation
Abstract

Large-scale probabilistic representations, including statistical knowledge bases and graphical models, are increasingly in demand. They are built by mining massive sources of structured and unstructured data, the latter often derived from natural language processing techniques. The very nature of the enterprise makes the extracted representations probabilistic. In particular, inducing relations and facts from noisy and incomplete sources via statistical machine learning models means that the labels are either already probabilistic, or that probabilities approximate confidence. While the progress is impressive, extracted representations essentially enforce the closed-world assumption, which means that all facts in the database are accorded the corresponding probability, but all other facts have probability zero. The CWA is deeply problematic in most machine learning contexts. A principled solution is needed for representing incomplete and indeterminate knowledge in such models, imprecise probability models such as credal networks being an example. In this work, we are interested in the foundational problem of learning such open-world probabilistic models. However, since exact inference in probabilistic graphical models is intractable, the paradigm of tractable learning has emerged to learn data structures (such as arithmetic circuits) that support efficient probabilistic querying. We show here how the computational machinery underlying tractable learning has to be generalized for imprecise probabilities. Our empirical evaluations demonstrate that our regime is also effective.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05847

PDF

https://arxiv.org/pdf/1901.05847
Read All
Near-infrared intersubband photodetection in GaN/AlN nanowires

2019-01-17

Jonas Lähnemann, Akhil Ajay, Martien I. den Hertog, Eva Monroy

arXiv_CV

arXiv_CV Object_Detection GAN Face Detection
Abstract

Intersubband optoelectronic devices rely on transitions between quantum-confined electron levels in semiconductor heterostructures, which enables infrared (IR) photodetection in the 1-30 $\mu$m wavelength window with picosecond response times. Incorporating nanowires as active media could enable an independent control over the electrical cross-section of the device and the optical absorption cross-section. Furthermore, the three-dimensional carrier confinement in nanowire heterostructures opens new possibilities to tune the carrier relaxation time. However, the generation of structural defects and the surface sensitivity of GaAs nanowires have so far hindered the fabrication of nanowire intersubband devices. Here, we report the first demonstration of intersubband photodetection in a nanowire, using GaN nanowires containing a GaN/AlN superlattice absorbing at 1.55 $\mu$m. The combination of spectral photocurrent measurements with 8-band k$\cdot$p calculations of the electronic structure supports the interpretation of the result as intersubband photodetection in these extremely short-period superlattices. We observe a linear dependence of the photocurrent with the incident illumination power, which confirms the insensitivity of the intersubband process to surface states and highlights how architectures featuring large surface-to-volume ratios are suitable as intersubband photodetectors. Our analysis of the photocurrent characteristics points out routes for an improvement of the device performance. This first nanowire based intersubband photodetector represents a technological breakthrough that paves the way to a powerful device platform with potential for ultrafast, ultrasensitive photodetectors and highly-efficient quantum cascade emitters with improved thermal stability.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.00871

PDF

https://arxiv.org/pdf/1710.00871
Read All
Robust Chinese Word Segmentation with Contextualized Word Representations

2019-01-17

Yung-Sung Chuang

arXiv_CL

arXiv_CL Segmentation RNN Language_Model
Abstract

In recent years, after the neural-network-based method was proposed, the accuracy of the Chinese word segmentation task has made great progress. However, when dealing with out-of-vocabulary words, there is still a large error rate. We used a simple bidirectional LSTM architecture and a large-scale pretrained language model to generate high-quality contextualize character representations, which successfully reduced the weakness of the ambiguous meanings of each Chinese character that widely appears in Chinese characters, and hence effectively reduced OOV error rate. State-of-the-art performance is achieved on many datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05816

PDF

https://arxiv.org/pdf/1901.05816
Read All
No reference image quality assessment metric based on regional mutual information among images

2019-01-17

Vinay Kumar, Vivek Singh Bawa

arXiv_CV

arXiv_CV Classification
Abstract

With the inclusion of camera in daily life, an automatic no reference image quality evaluation index is required for automatic classification of images. The present manuscripts proposes a new No Reference Regional Mutual Information based technique for evaluating the quality of an image. We use regional mutual information on subsets of the complete image. Proposed technique is tested on four benchmark natural image databases, and one benchmark synthetic database. A comparative analysis with classical and state-of-art methods indicate superiority of the present technique for high quality images and comparable for other images of the respective databases.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05811

PDF

https://arxiv.org/pdf/1901.05811
Read All
AuxNet: Auxiliary tasks enhanced Semantic Segmentation for Automated Driving

2019-01-17

Sumanth Chennupati, Ganesh Sistu, Senthil Yogamani, Samir Rawashdeh

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Classification
Abstract

Decision making in automated driving is highly specific to the environment and thus semantic segmentation plays a key role in recognizing the objects in the environment around the car. Pixel level classification once considered a challenging task which is now becoming mature to be productized in a car. However, semantic annotation is time consuming and quite expensive. Synthetic datasets with domain adaptation techniques have been used to alleviate the lack of large annotated datasets. In this work, we explore an alternate approach of leveraging the annotations of other tasks to improve semantic segmentation. Recently, multi-task learning became a popular paradigm in automated driving which demonstrates joint learning of multiple tasks improves overall performance of each tasks. Motivated by this, we use auxiliary tasks like depth estimation to improve the performance of semantic segmentation task. We propose adaptive task loss weighting techniques to address scale issues in multi-task loss functions which become more crucial in auxiliary tasks. We experimented on automotive datasets including SYNTHIA and KITTI and obtained 3% and 5% improvement in accuracy respectively.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05808

PDF

https://arxiv.org/pdf/1901.05808
Read All
Towards Building the Semantic Map from a Monocular Camera with a Multi-task Network

2019-01-17

Lei Fan, Yucai Bai, Ziyu Pan, Long Chen

arXiv_CV

arXiv_CV Knowledge CNN Prediction Relation
Abstract

In many robotic applications, especially for the autonomous driving, understanding the semantic information and the geometric structure of surroundings are both essential. Semantic 3D maps, as a carrier of the environmental knowledge, are then intensively studied for their abilities and applications. However, it is still challenging to produce a dense outdoor semantic map from a monocular image stream. Motivated by this target, in this paper, we propose a method for large-scale 3D reconstruction from consecutive monocular images. First, with the correlation of underlying information between depth and semantic prediction, a novel multi-task Convolutional Neural Network (CNN) is designed for joint prediction. Given a single image, the network learns low-level information with a shared encoder and separately predicts with decoders containing additional Atrous Spatial Pyramid Pooling (ASPP) layers and the residual connection which merits disparities and semantic mutually. To overcome the inconsistency of monocular depth prediction for reconstruction, post-processing steps with the superpixelization and the effective 3D representation approach are obtained to give the final semantic map. Experiments are compared with other methods on both semantic labeling and depth prediction. We also qualitatively demonstrate the map reconstructed from large-scale, difficult monocular image sequences to prove the effectiveness and superiority.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05807

PDF

https://arxiv.org/pdf/1901.05807
Read All
Ensemble Feature for Person Re-Identification

2019-01-17

Jiabao Wang, Yang Li, Zhuang Miao

arXiv_CV

arXiv_CV Re-identification Person_Re-identification CNN Deep_Learning Prediction
Abstract

In person re-identification (re-ID), the key task is feature representation, which is used to compute distance or similarity in prediction. Person re-ID achieves great improvement when deep learning methods are introduced to tackle this problem. The features extracted by convolutional neural networks (CNN) are more effective and discriminative than the hand-crafted features. However, deep feature extracted by a single CNN network is not robust enough in testing stage. To improve the ability of feature representation, we propose a new ensemble network (EnsembleNet) by dividing a single network into multiple end-to-end branches. The ensemble feature is obtained by concatenating each of the branch features to represent a person. EnsembleNet is designed based on ResNet-50 and its backbone shares most of the parameters for saving computation and memory cost. Experimental results show that our EnsembleNet achieves the state-of-the-art performance on the public Market1501, DukeMTMC-reID and CUHK03 person re-ID benchmarks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05798

PDF

https://arxiv.org/pdf/1901.05798
Read All
Cone-beam CT to Planning CT synthesis using generative adversarial networks

2019-01-17

S. Kida, S. Kaji, K. Nawa, T. Imae, T. Nakamoto, S. Ozaki, T. Ohta, Y. Nozawa, K. Nakagawa

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Cone-beam computed tomography (CBCT) offers advantages over conventional fan-beam CT in that it requires a shorter time and less exposure to obtain images. CBCT has found a wide variety of applications in patient positioning for image-guided radiation therapy, extracting radiomic information for designing patient-specific treatment, and computing fractional dose distributions for adaptive radiation therapy. However, CBCT images suffer from low soft-tissue contrast, noise, and artifacts compared to conventional fan-beam CT images. Therefore, it is essential to improve the image quality of CBCT. In this paper, we propose a synthetic approach to translate CBCT images with deep neural networks. Our method requires only unpaired and unaligned CBCT images and planning fan-beam CT (PlanCT) images for training. Once trained, 3D reconstructed CBCT images can be directly translated to high-quality PlanCT-like images. We demonstrate the effectiveness of our method with images obtained from 20 prostate patients, and we provide a statistical and visual comparison. The image quality of the translated images shows substantial improvement in voxel values, spatial uniformity, and artifact suppression compared to those of the original CBCT. The anatomical structures of the original CBCT images were also well preserved in the translated images. Our method enables more accurate adaptive radiation therapy, and opens up new applications for CBCT that hinge on high-quality images.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05773

PDF

https://arxiv.org/pdf/1901.05773
Read All
SAFE: Scale Aware Feature Encoder for Scene Text Recognition

2019-01-17

Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong

arXiv_CV

arXiv_CV Attention CNN Recognition
Abstract

In this paper, we address the problem of having characters with different scales in scene text recognition. We propose a novel scale aware feature encoder (SAFE) that is designed specifically for encoding characters with different scales. SAFE is composed of a multi-scale convolutional encoder and a scale attention network. The multi-scale convolutional encoder targets at extracting character features under multiple scales, and the scale attention network is responsible for selecting features from the most relevant scale(s). SAFE has two main advantages over the traditional single-CNN encoder used in current state-of-the-art text recognizers. First, it explicitly tackles the scale problem by extracting scale-invariant features from the characters. This allows the recognizer to put more effort in handling other challenges in scene text recognition, like those caused by view distortion and poor image quality. Second, it can transfer the learning of feature encoding across different character scales. This is particularly important when the training set has a very unbalanced distribution of character scales, as training with such a dataset will make the encoder biased towards extracting features from the predominant scale. To evaluate the effectiveness of SAFE, we design a simple text recognizer named scale-spatial attention network (S-SAN) that employs SAFE as its feature encoder, and carry out experiments on six public benchmarks. Experimental results demonstrate that S-SAN can achieve state-of-the-art (or, in some cases, extremely competitive) performance without any post-processing.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05770

PDF

https://arxiv.org/pdf/1901.05770
Read All
Unsupervised Graph-based Rank Aggregation for Improved Retrieval

2019-01-17

Icaro Cavalcante Dourado, Daniel Carlos Guimarães Pedronette, Ricardo da Silva Torres

arXiv_CV

arXiv_CV Relation
Abstract

This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval of their fusion graph, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus showing the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05743

PDF

https://arxiv.org/pdf/1901.05743
Read All
Video-Based Pedestrian Attribute Recognition

2019-01-17

Zhiyuan Chen, Annan Li, Yunhong Wang

arXiv_CV

arXiv_CV Recognition
Abstract

In this paper, we first tackle the problem of pedestrian attribute recognition by video-based approach.The challenge mainly lies in spatial and temporal modeling and how to integrating them for effective and dynamic pedestrian representation.To solve this problem, a novel deep recurrent neural network with hybrid pooling strategy is proposed.Since publicly available dataset is rare, a new large-scale video dataset for pedestrian attribute recognition is annotated, on which the effectiveness of both video-based pedestrian attribute recognition and the proposed new network architecture is well demonstrated.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05742

PDF

https://arxiv.org/pdf/1901.05742
Read All
Multiple Sclerosis Lesion Synthesis in MRI using an encoder-decoder U-NET

2019-01-17

Mostafa Salem, Sergi Valverde, Mariano Cabezas, Deborah Pareto, Arnau Oliver, Joaquim Salvi, Àlex Rovira, Xavier Lladó

arXiv_CV

arXiv_CV Segmentation CNN Detection
Abstract

In this paper, we propose generating synthetic multiple sclerosis (MS) lesions on MRI images with the final aim to improve the performance of supervised machine learning algorithms, therefore avoiding the problem of the lack of available ground truth. We propose a two-input two-output fully convolutional neural network model for MS lesion synthesis in MRI images. The lesion information is encoded as discrete binary intensity level masks passed to the model and stacked with the input images. The model is trained end-to-end without the need for manually annotating the lesions in the training set. We then perform the generation of synthetic lesions on healthy images via registration of patient images, which are subsequently used for data augmentation to increase the performance for supervised MS lesion detection algorithms. Our pipeline is evaluated on MS patient data from an in-house clinical dataset and the public ISBI2015 challenge dataset. The evaluation is based on measuring the similarities between the real and the synthetic images as well as in terms of lesion detection performance by segmenting both the original and synthetic images individually using a state-of-the-art segmentation framework. We also demonstrate the usage of synthetic MS lesions generated on healthy images as data augmentation. We analyze a scenario of limited training data (one-image training) to demonstrate the effect of the data augmentation on both datasets. Our results significantly show the effectiveness of the usage of synthetic MS lesion images. For the ISBI2015 challenge, our one-image model trained using only a single image plus the synthetic data augmentation strategy showed a performance similar to that of other CNN methods that were fully trained using the entire training set, yielding a comparable human expert rater performance

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05733

PDF

https://arxiv.org/pdf/1901.05733
Read All
2019-05-31

Read All
BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data

2019-01-17

Roman Kaplan, Leonid Yavits, Ran Ginosar

arXiv_CV

arXiv_CV
Abstract

Genome sequences contain hundreds of millions of DNA base pairs. Finding the degree of similarity between two genomes requires executing a compute-intensive dynamic programming algorithm, such as Smith-Waterman. Traditional von Neumann architectures have limited parallelism and cannot provide an efficient solution for large-scale genomic data. Approximate heuristic methods (e.g. BLAST) are commonly used. However, they are suboptimal and still compute-intensive. In this work, we present BioSEAL, a Biological SEquence ALignment accelerator. BioSEAL is a massively parallel non-von Neumann processing-in-memory architecture for large-scale DNA and protein sequence alignment. BioSEAL is based on resistive content addressable memory, capable of energy-efficient and high-performance associative processing. We present an associative processing algorithm for entire database sequence alignment on BioSEAL and compare its performance and power consumption with state-of-art solutions. We show that BioSEAL can achieve up to 57x speedup and 156x better energy efficiency, compared with existing solutions for genome sequence alignment and protein sequence database search.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05959

PDF

https://arxiv.org/pdf/1901.05959
Read All
Evolving embodied intelligence from materials to machines

2019-01-17

David Howard, Agoston E. Eiben, Danielle Frances Kennedy, Jean-Baptiste Mouret, Philip Valencia, Dave Winkler

arXiv_RO

arXiv_RO
Abstract

Natural lifeforms specialise to their environmental niches across many levels; from low-level features such as DNA and proteins, through to higher-level artefacts including eyes, limbs, and overarching body plans. We propose Multi-Level Evolution (MLE), a bottom-up automatic process that designs robots across multiple levels and niches them to tasks and environmental conditions. MLE concurrently explores constituent molecular and material ‘building blocks’, as well as their possible assemblies into specialised morphological and sensorimotor configurations. MLE provides a route to fully harness a recent explosion in available candidate materials and ongoing advances in rapid manufacturing processes. We outline a feasible MLE architecture that realises this vision, highlight the main roadblocks and how they may be overcome, and show robotic applications to which MLE is particularly suited. By forming a research agenda to stimulate discussion between researchers in related fields, we hope to inspire the pursuit of multi-level robotic design all the way from material to machine.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05704

PDF

http://arxiv.org/pdf/1901.05704
Read All
Image Enhancement Network Trained by Using HDR images

2019-01-17

Yuma Kinoshita, Hitoshi Kiya

arXiv_CV

arXiv_CV Image_Enhancement
Abstract

In this paper, a novel image enhancement network is proposed, where HDR images are used for generating training data for our network. Most of conventional image enhancement methods, including Retinex based methods, do not take into account restoring lost pixel values caused by clipping and quantizing. In addition, recently proposed CNN based methods still have a limited scope of application or a limited performance, due to network architectures. In contrast, the proposed method have a higher performance and a simpler network architecture than existing CNN based methods. Moreover, the proposed method enables us to restore lost pixel values. Experimental results show that the proposed method can provides higher-quality images than conventional image enhancement methods including a CNN based method, in terms of TMQI and NIQE.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05686

PDF

https://arxiv.org/pdf/1901.05686
Read All
Convolutional Neural Networks combined with Runge-Kutta Methods

2019-01-17

Mai Zhu, Bo Chang, Chong Fu

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

A convolutional neural network for image classification can be constructed mathematically since it can be regarded as a multi-period dynamical system. In this paper, a novel approach is proposed to construct network models from the dynamical systems view. Since a pre-activation residual network can be deemed an approximation of a time-dependent dynamical system using the forward Euler method, higher order Runge-Kutta methods (RK methods) can be utilized to build network models in order to achieve higher accuracy. The model constructed in such a way is referred to as the Runge-Kutta Convolutional Neural Network (RKNet). RK methods also provide an interpretation of Dense Convolutional Networks (DenseNets) and Convolutional Neural Networks with Alternately Updated Clique (CliqueNets) from the dynamical systems view. The proposed methods are evaluated on benchmark datasets: CIFAR-10/100, SVHN and ImageNet. The experimental results are consistent with the theoretical properties of RK methods and support the dynamical systems interpretation. Moreover, the experimental results show that the RKNets are superior to the state-of-the-art network models on CIFAR-10 and on par on CIFAR-100, SVHN and ImageNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.08831

PDF

http://arxiv.org/pdf/1802.08831
Read All
Multi-agent Reinforcement Learning Embedded Game for the Optimization of Building Energy Control and Power System Planning

2019-01-17

Jun Hao

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

Most of the current game-theoretic demand-side management methods focus primarily on the scheduling of home appliances, and the related numerical experiments are analyzed under various scenarios to achieve the corresponding Nash-equilibrium (NE) and optimal results. However, not much work is conducted for academic or commercial buildings. The methods for optimizing academic-buildings are distinct from the optimal methods for home appliances. In my study, we address a novel methodology to control the operation of heating, ventilation, and air conditioning system (HVAC). With the development of Artificial Intelligence and computer technologies, reinforcement learning (RL) can be implemented in multiple realistic scenarios and help people to solve thousands of real-world problems. Reinforcement Learning, which is considered as the art of future AI, builds the bridge between agents and environments through Markov Decision Chain or Neural Network and has seldom been used in power system. The art of RL is that once the simulator for a specific environment is built, the algorithm can keep learning from the environment. Therefore, RL is capable of dealing with constantly changing simulator inputs such as power demand, the condition of power system and outdoor temperature, etc. Compared with the existing distribution power system planning mechanisms and the related game theoretical methodologies, our proposed algorithm can plan and optimize the hourly energy usage, and have the ability to corporate with even shorter time window if needed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.07333

PDF

http://arxiv.org/pdf/1901.07333
Read All
Background subtraction on depth videos with convolutional neural networks

2019-01-17

Xueying Wang, Lei Liu, Guangli Li, Xiao Dong, Peng Zhao, Xiaobing Feng

arXiv_CV

arXiv_CV Tracking CNN Object_Tracking Detection
Abstract

Background subtraction is a significant component of computer vision systems. It is widely used in video surveillance, object tracking, anomaly detection, etc. A new data source for background subtraction appeared as the emergence of low-cost depth sensors like Microsof t Kinect, Asus Xtion PRO, etc. In this paper, we propose a background subtraction approach on depth videos, which is based on convolutional neural networks (CNNs), called BGSNet-D (BackGround Subtraction neural Networks for Depth videos). The method can be used in color unavailable scenarios like poor lighting situations, and can also be applied to combine with existing RGB background subtraction methods. A preprocessing strategy is designed to reduce the influences incurred by noise from depth sensors. The experimental results on the SBM-RGBD dataset show that the proposed method outperforms existing methods on depth data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05676

PDF

https://arxiv.org/pdf/1901.05676
Read All
A model for prohibition and obligation dilemmas generation in virtual environments

2019-01-17

Azzeddine Benabbou (Heudiasyc), Domitile Lourdeaux (Heudiasyc), Dominique Lenne (Heudiasyc)

arXiv_AI

arXiv_AI Knowledge
Abstract

Under the project Maccoy Critical, we would like to train individuals, in virtual environments, to handle critical situations such as dilemmas. These latter refer to situations where there is no ``good’’ solution. In other words, situations that lead to negative consequences whichever choice is made. Our objective is to use Knowledge Models to extract necessary properties for dilemmas to emerge. To do so, our approach consists in developing a Scenario Orchestration System that generates dilemma situations dynamically without having to write them beforehand. In this paper we present this approach and expose a proof of concept of the generation process.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09790

PDF

http://arxiv.org/pdf/1901.09790
Read All
Certainty-Driven Consistency Loss for Semi-supervised Learning

2019-01-17

Yingting Li, Lu Liu, Robby T. Tan

arXiv_CV

arXiv_CV Deep_Learning Prediction
Abstract

The recently proposed semi-supervised learning methods exploit consistency loss between different predictions under random perturbations. Typically, a student model is trained to predict consistently with the targets generated by a noisy teacher. However, they ignore the fact that not all training data provide meaningful and reliable information in terms of consistency. For misclassified data, blindly minimizing the consistency loss around them can hinder learning. In this paper, we propose a novel certainty-driven consistency loss (CCL) to dynamically select data samples that have relatively low uncertainty. Specifically, we measure the variance or entropy of multiple predictions under random augmentations and dropout as an estimation of uncertainty. Then, we introduce two approaches, i.e. Filtering CCL and Temperature CCL to guide the student learn more meaningful and certain/reliable targets, and hence improve the quality of the gradients backpropagated to the student. Experiments demonstrate the advantages of the proposed method over the state-of-the-art semi-supervised deep learning methods on three benchmark datasets: SVHN, CIFAR10, and CIFAR100. Our method also shows robustness to noisy labels.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05657

PDF

https://arxiv.org/pdf/1901.05657
Read All
Interactive Plan Explicability in Human-Robot Teaming

2019-01-17

Mehrdad Zakershahrak, Yu Zhang

arXiv_RO

arXiv_RO
Abstract

Human-robot teaming is one of the most important applications of artificial intelligence in the fast-growing field of robotics. For effective teaming, a robot must not only maintain a behavioral model of its human teammates to project the team status, but also be aware that its human teammates’ expectation of itself. Being aware of the human teammates’ expectation leads to robot behaviors that better align with human expectation, thus facilitating more efficient and potentially safer teams. Our work addresses the problem of human-robot cooperation with the consideration of such teammate models in sequential domains by leveraging the concept of plan explicability. In plan explicability, however, the human is considered solely as an observer. In this paper, we extend plan explicability to consider interactive settings where human and robot behaviors can influence each other. We term this new measure as Interactive Plan Explicability. We compare the joint plan generated with the consideration of this measure using the fast forward planner (FF) with the plan created by FF without such consideration, as well as the plan created with actual human subjects. Results indicate that the explicability score of plans generated by our algorithm is comparable to the human plan, and better than the plan created by FF without considering the measure, implying that the plans created by our algorithms align better with expected joint plans of the human during execution. This can lead to more efficient collaboration in practice.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05642

PDF

http://arxiv.org/pdf/1901.05642
Read All
AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference

2019-01-17

Jian-Hao Luo, Jianxin Wu

arXiv_CV

arXiv_CV Inference
Abstract

Channel pruning is an important family of methods to speed up deep model’s inference. Previous filter pruning algorithms regard channel pruning and model fine-tuning as two independent steps. This paper argues that combining them into a single end-to-end trainable system will lead to better results. We propose an efficient channel selection layer, namely AutoPruner, to find less important filters automatically in a joint training manner. Our AutoPruner takes previous activation responses as an input and generates a true binary index code for pruning. Hence, all the filters corresponding to zero index values can be removed safely after training. We empirically demonstrate that the gradient information of this channel selection layer is also helpful for the whole model training. By gradually erasing several weak filters, we can prevent an excessive drop in model accuracy. Compared with previous state-of-the-art pruning algorithms (including training from scratch), AutoPruner achieves significantly better performance. Furthermore, ablation experiments show that the proposed novel mini-batch pooling and binarization operations are vital for the success of filter pruning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.08941

PDF

http://arxiv.org/pdf/1805.08941
Read All
Enhance the Motion Cues for Face Anti-Spoofing using CNN-LSTM Architecture

2019-01-17

Xiaoguang Tu, Hengsheng Zhang, Mei Xie, Yao Luo, Yuefei Zhang, Zheng Ma

arXiv_CV

arXiv_CV Attention Face CNN RNN
Abstract

Spatio-temporal information is very important to capture the discriminative cues between genuine and fake faces from video sequences. To explore such a temporal feature, the fine-grained motions (e.g., eye blinking, mouth movements and head swing) across video frames are very critical. In this paper, we propose a joint CNN-LSTM network for face anti-spoofing, focusing on the motion cues across video frames. We first extract the high discriminative features of video frames using the conventional Convolutional Neural Network (CNN). Then we leverage Long Short-Term Memory (LSTM) with the extracted features as inputs to capture the temporal dynamics in videos. To ensure the fine-grained motions more easily to be perceived in the training process, the eulerian motion magnification is used as the preprocessing to enhance the facial expressions exhibited by individuals, and the attention mechanism is embedded in LSTM to ensure the model learn to focus selectively on the dynamic frames across the video clips. Experiments on Replay Attack and MSU-MFSD databases show that the proposed method yields state-of-the-art performance with better generalization ability compared with several other popular algorithms.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05635

PDF

https://arxiv.org/pdf/1901.05635
Read All
Cognitive Analysis of 360 degree Surround Photos

2019-01-17

Madhawa Vidanapathirana, Lakmal Meegahapola, Indika Perera

arXiv_CV

arXiv_CV Knowledge
Abstract

360 degrees surround photography or photospheres have taken the world by storm as the new media for content creation providing viewers rich, immersive experience compared to conventional photography. With the emergence of Virtual Reality as a mainstream trend, the 360 degrees photography is increasingly important to offer a practical approach to the general public to capture virtual reality ready content from their mobile phones without explicit tool support or knowledge. Even though the amount of 360-degree surround content being uploaded to the Internet continues to grow, there is no proper way to index them or to process them for further information. This is because of the difficulty in image processing the photospheres due to the distorted nature of objects embedded. This challenge lies in the way 360-degree panoramic photospheres are saved. This paper presents a unique, and innovative technique named Photosphere to Cognition Engine (P2CE), which allows cognitive analysis on 360-degree surround photos using existing image cognitive analysis algorithms and APIs designed for conventional photos. We have optimized the system using a wide variety of indoor and outdoor samples and extensive evaluation approaches. On average, P2CE provides up-to 100% growth in accuracy on image cognitive analysis of Photospheres over direct use of conventional non-photosphere based Image Cognition Systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05634

PDF

https://arxiv.org/pdf/1901.05634
Read All
Deep Transfer Across Domains for Face Anti-spoofing

2019-01-17

Xiaoguang Tu, Hengsheng Zhang, Mei Xie, Yao Luo, Yuefei Zhang, Zheng Ma

arXiv_CV

arXiv_CV Sparse Face Recognition Face_Recognition
Abstract

A practical face recognition system demands not only high recognition performance, but also the capability of detecting spoofing attacks. While emerging approaches of face anti-spoofing have been proposed in recent years, most of them do not generalize well to new database. The generalization ability of face anti-spoofing needs to be significantly improved before they can be adopted by practical application systems. The main reason for the poor generalization of current approaches is the variety of materials among the spoofing devices. As the attacks are produced by putting a spoofing display (e.t., paper, electronic screen, forged mask) in front of a camera, the variety of spoofing materials can make the spoofing attacks quite different. Furthermore, the background/lighting condition of a new environment can make both the real accesses and spoofing attacks different. Another reason for the poor generalization is that limited labeled data is available for training in face anti-spoofing. In this paper, we focus on improving the generalization ability across different kinds of datasets. We propose a CNN framework using sparsely labeled data from the target domain to learn features that are invariant across domains for face anti-spoofing. Experiments on public-domain face spoofing databases show that the proposed method significantly improve the cross-dataset testing performance only with a small number of labeled samples from the target domain.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05633

PDF

https://arxiv.org/pdf/1901.05633
Read All
Hand Sign to Bangla Speech: A Deep Learning in Vision based system for Recognizing Hand Sign Digits and Generating Bangla Speech

2019-01-17

Shahjalal Ahmed, Md. Rafiqul Islam, Jahid Hassan, Minhaz Uddin Ahmed, Bilkis Jamal Ferdosi, Sanjay Saha, Md. Shopon

arXiv_CV

arXiv_CV GAN CNN Classification Deep_Learning Recognition
Abstract

Recent advancements in the field of computer vision with the help of deep neural networks have led us to explore and develop many existing challenges that were once unattended due to the lack of necessary technologies. Hand Sign/Gesture Recognition is one of the significant areas where the deep neural network is making a substantial impact. In the last few years, a large number of researches has been conducted to recognize hand signs and hand gestures, which we aim to extend to our mother-tongue, Bangla (also known as Bengali). The primary goal of our work is to make an automated tool to aid the people who are unable to speak. We developed a system that automatically detects hand sign based digits and speaks out the result in Bangla language. According to the report of the World Health Organization (WHO), 15% of people in the world live with some kind of disabilities. Among them, individuals with communication impairment such as speech disabilities experience substantial barrier in social interaction. The proposed system can be invaluable to mitigate such a barrier. The core of the system is built with a deep learning model which is based on convolutional neural networks (CNN). The model classifies hand sign based digits with 92% accuracy over validation data which ensures it a highly trustworthy system. Upon classification of the digits, the resulting output is fed to the text to speech engine and the translator unit eventually which generates audio output in Bangla language. A web application to demonstrate our tool is available at this http URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05613

PDF

https://arxiv.org/pdf/1901.05613
Read All
Learning Generalizable and Identity-Discriminative Representations for Face Anti-Spoofing

2019-01-17

Xiaoguang Tu, Jian Zhao, Mei Xie, Guodong Du, Hengsheng Zhang, Jianshu Li, Zheng Ma, Jiashi Feng

arXiv_CV

arXiv_CV Attention Face Detection Recognition Face_Recognition
Abstract

Face anti-spoofing (a.k.a presentation attack detection) has drawn growing attention due to the high-security demand in face authentication systems. Existing CNN-based approaches usually well recognize the spoofing faces when training and testing spoofing samples display similar patterns, but their performance would drop drastically on testing spoofing faces of unseen scenes. In this paper, we try to boost the generalizability and applicability of these methods by designing a CNN model with two major novelties. First, we propose a simple yet effective Total Pairwise Confusion (TPC) loss for CNN training, which enhances the generalizability of the learned Presentation Attack (PA) representations. Secondly, we incorporate a Fast Domain Adaptation (FDA) component into the CNN model to alleviate negative effects brought by domain changes. Besides, our proposed model, which is named Generalizable Face Authentication CNN (GFA-CNN), works in a multi-task manner, performing face anti-spoofing and face recognition simultaneously. Experimental results show that GFA-CNN outperforms previous face anti-spoofing approaches and also well preserves the identity information of input face images.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05602

PDF

https://arxiv.org/pdf/1901.05602
Read All
Virtual-to-Real-World Transfer Learning for Robots on Wilderness Trails

2019-01-17

Michael L. Iuzzolino, Michael E. Walker, Daniel Szafir

arXiv_RO

arXiv_RO Transfer_Learning Classification Deep_Learning
Abstract

Robots hold promise in many scenarios involving outdoor use, such as search-and-rescue, wildlife management, and collecting data to improve environment, climate, and weather forecasting. However, autonomous navigation of outdoor trails remains a challenging problem. Recent work has sought to address this issue using deep learning. Although this approach has achieved state-of-the-art results, the deep learning paradigm may be limited due to a reliance on large amounts of annotated training data. Collecting and curating training datasets may not be feasible or practical in many situations, especially as trail conditions may change due to seasonal weather variations, storms, and natural erosion. In this paper, we explore an approach to address this issue through virtual-to-real-world transfer learning using a variety of deep learning models trained to classify the direction of a trail in an image. Our approach utilizes synthetic data gathered from virtual environments for model training, bypassing the need to collect a large amount of real images of the outdoors. We validate our approach in three main ways. First, we demonstrate that our models achieve classification accuracies upwards of 95% on our synthetic data set. Next, we utilize our classification models in the control system of a simulated robot to demonstrate feasibility. Finally, we evaluate our models on real-world trail data and demonstrate the potential of virtual-to-real-world transfer learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05599

PDF

http://arxiv.org/pdf/1901.05599
Read All
Domain Adaptation for Structured Output via Discriminative Patch Representations

2019-01-17

Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker

arXiv_CV

arXiv_CV Adversarial Segmentation CNN Semantic_Segmentation
Abstract

Predicting structured outputs such as semantic segmentation relies on expensive per-pixel annotations to learn strong supervised models like convolutional neural networks. However, these models trained on one data domain may not generalize well to other domains unequipped with annotations for model finetuning. To avoid the labor-intensive process of annotation, we develop a domain adaptation method to adapt the source data to the unlabeled target domain. To this end, we propose to learn discriminative feature representations of patches based on label histograms in the source domain, through the construction of a clustered space. With such representations as guidance, we then use an adversarial learning scheme to push the feature representations in target patches to the closer distributions in source ones. In addition, we show that our framework can integrate a global alignment process with the proposed patch-level alignment and achieve state-of-the-art performance on semantic segmentation. Extensive ablation studies and experiments are conducted on numerous benchmark datasets with various settings, such as synthetic-to-real and cross-city scenarios.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05427

PDF

http://arxiv.org/pdf/1901.05427
Read All
Cooking State Recognition from Images Using Inception Architecture

2019-01-17

Md Sirajus Salekin, Ahmad Babaeian Jelodar, Rafsanjany Kushol

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Detection Recognition
Abstract

A kitchen robot properly needs to understand the cooking environment to continue any cooking activities. But object’s state detection has not been researched well so far as like object detection. In this paper, we propose a deep learning approach to identify different cooking states from images for a kitchen robot. In our research, we investigate particularly the performance of Inception architecture and propose a modified architecture based on Inception model to classify different cooking states. The model is analyzed robustly in terms of different layers, and optimizers. Experimental results on a cooking datasets demonstrate that proposed model can be a potential solution to the cooking state recognition problem.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.09967

PDF

http://arxiv.org/pdf/1805.09967
Read All
Kinematically-Informed Interactive Perception: Robot-Generated 3D Models for Classification

2019-01-17

Abhishek Venkataraman, Brent Griffin, Jason J. Corso

arXiv_RO

arXiv_RO Classification
Abstract

To be useful in everyday environments, robots must be able to observe and learn about objects. Recent datasets enable progress for classifying data into known object categories; however, it is unclear how to collect reliable object data when operating in cluttered, partially-observable environments. In this paper, we address the problem of building complete 3D models for real-world objects using a robot platform, which can remove objects from clutter for better classification. Furthermore, we are able to learn entirely new object categories as they are encountered, enabling the robot to classify previously unidentifiable objects during future interactions. We build models of grasped objects using simultaneous manipulation and observation, and we guide the processing of visual data using a kinematic description of the robot to combine observations from different view-points and remove background noise. To test our framework, we use a mobile manipulation robot equipped with an RGBD camera to build voxelized representations of unknown objects and then classify them into new categories. We then have the robot remove objects from clutter to manipulate, observe, and classify them in real-time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05580

PDF

http://arxiv.org/pdf/1901.05580
Read All
Generating Realistic Sequences of Customer-level Transactions for Retail Datasets

2019-01-17

Thang Doan, Neil Veira, Brian Keng

arXiv_AI

arXiv_AI Adversarial GAN Face Embedding RNN Prediction
Abstract

In order to better engage with customers, retailers rely on extensive customer and product databases which allows them to better understand customer behaviour and purchasing patterns. This has long been a challenging task as customer modelling is a multi-faceted, noisy and time-dependent problem. The most common way to tackle this problem is indirectly through task-specific supervised learning prediction problems, with relatively little literature on modelling a customer by directly simulating their future transactions. In this paper we propose a method for generating realistic sequences of baskets that a given customer is likely to purchase over a period of time. Customer embedding representations are learned using a Recurrent Neural Network (RNN) which takes into account the entire sequence of transaction data. Given the customer state at a specific point in time, a Generative Adversarial Network (GAN) is trained to generate a plausible basket of products for the following week. The newly generated basket is then fed back into the RNN to update the customer’s state. The GAN is thus used in tandem with the RNN module in a pipeline alternating between basket generation and customer state updating steps. This allows for sampling over a distribution of a customer’s future sequence of baskets, which then can be used to gain insight into how to service the customer more effectively. The methodology is empirically shown to produce baskets that appear similar to real baskets and enjoy many common properties, including frequencies of different product types, brands, and prices. Furthermore, the generated data is able to replicate most of the strongest sequential patterns that exist between product types in the real data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05577

PDF

https://arxiv.org/pdf/1901.05577
Read All
Soft Rasterizer: Differentiable Rendering for Unsupervised Single-View Mesh Reconstruction

2019-01-17

Shichen Liu, Weikai Chen, Tianye Li, Hao Li

arXiv_CV

arXiv_CV Quantitative
Abstract

Rendering is the process of generating 2D images from 3D assets, simulated in a virtual environment, typically with a graphics pipeline. By inverting such renderer, one can think of a learning approach to predict a 3D shape from an input image. However, standard rendering pipelines involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence suitable for learning. We present the first non-parametric and truly differentiable rasterizer based on silhouettes. Our method enables unsupervised learning for high-quality 3D mesh reconstruction from a single image. We call our framework `soft rasterizer’ as it provides an accurate soft approximation of the standard rasterizer. The key idea is to fuse the probabilistic contributions of all mesh triangles with respect to the rendered pixels. When combined with a mesh generator in a deep neural network, our soft rasterizer is able to generate an approximated silhouette of the generated polygon mesh in the forward pass. The rendering loss is back-propagated to supervise the mesh generation without the need of 3D training data. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art unsupervised techniques, both quantitatively and qualitatively. We also show that our soft rasterizer can achieve comparable results to the cutting-edge supervised learning method and in various cases even better ones, especially for real-world data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05567

PDF

https://arxiv.org/pdf/1901.05567
Read All
Distance-Guided GA-Based Approach to Distributed Data-Intensive Web Service Composition

2019-01-16

Soheila Sadeghiram, Hui MA, Gang Chen

arXiv_AI

arXiv_AI
Abstract

Distributed computing which uses Web services as fundamental elements, enables high-speed development of software applications through composing many interoperating, distributed, re-usable, and autonomous services. As a fundamental challenge for service developers, service composition must fulfil functional requirements and optimise Quality of Service (QoS) attributes, simultaneously. On the other hand, huge amounts of data have been created by advances in technologies, which may be exchanged between services. Data-intensive Web services are of great interest to implement data-intensive processes. However, current approaches to Web service composition have omitted either the effect of data, or the distribution of services. Evolutionary Computing (EC) techniques allow for the creation of compositions that meet all the above factors. In this paper, we will develop Genetic Algorithm (GA)-based approach for solving the problem of distributed data-intensive Web service composition (DWSC). In particular, we will introduce two new heuristics, i.e. Longest Common Subsequence(LCS) distance of services, in designing crossover operators. Additionally, a new local search technique incorporating distance of services will be proposed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05564

PDF

https://arxiv.org/pdf/1901.05564
Read All
Visual Feature Fusion and its Application to Support Unsupervised Clustering Tasks

2019-01-16

Gladys Hilasaca, Fernando Paulovich

arXiv_CV

arXiv_CV Knowledge Quantitative
Abstract

On visual analytics applications, the concept of putting the user on the loop refers to the ability to replace heuristics by user knowledge on machine learning and data mining tasks. On supervised tasks, the user engagement occurs via the manipulation of the training data. However, on unsupervised tasks, the user involvement is limited to changes in the algorithm parametrization or the input data representation, also known as features. Depending on the application domain, different types of features can be extracted from the raw data. Therefore, the result of unsupervised algorithms heavily depends on the type of employed feature. Since there is no perfect feature extractor, combining different features have been explored in a process called feature fusion. The feature fusion is straightforward when the machine learning or data mining task has a cost function. However, when such a function does not exist, user support for combination needs to be provided otherwise the process is impractical. In this paper, we present a novel feature fusion approach that uses small data samples to allows users not only to effortless control the combination of different feature sets but also to interpret the attained results. The effectiveness of our approach is confirmed by a comprehensive set of qualitative and quantitative tests, opening up different possibilities of user-guided analytical scenarios not covered yet. The ability of our approach to providing real-time feedback for the feature fusion is exploited on the context of unsupervised clustering techniques, where the composed groups reflect the semantics of the feature combination.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05556

PDF

https://arxiv.org/pdf/1901.05556
Read All
Class-Balanced Loss Based on Effective Number of Samples

2019-01-16

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie

arXiv_CV

arXiv_CV
Abstract

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-β^{n})/(1-β)$, where $n$ is the number of samples and $β\in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05555

PDF

https://arxiv.org/pdf/1901.05555
Read All
Primitive-based 3D Building Modeling, Sensor Simulation, and Estimation

2019-01-16

Xia Li, Yen-Liang Lin, James Miller, Alex Cheon, Walt Dixon

arXiv_CV

arXiv_CV Segmentation
Abstract

As we begin to consider modeling large, realistic 3D building scenes, it becomes necessary to consider a more compact representation over the polygonal mesh model. Due to the large amounts of annotated training data, which is costly to obtain, we leverage synthetic data to train our system for the satellite image domain. By utilizing the synthetic data, we formulate the building decomposition as an application of instance segmentation and primitive fitting to decompose a building into a set of primitive shapes. Experimental results on WorldView-3 satellite image dataset demonstrate the effectiveness of our 3D building modeling approach.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05554

PDF

https://arxiv.org/pdf/1901.05554
Read All
Conditional Domain Adaptation GANs for Biomedical Image Segmentation

2019-01-16

Hugo Oliveira, Edemir Ferreira, Jefersson A. dos Santos

arXiv_CV

arXiv_CV Adversarial Segmentation GAN Face Transfer_Learning Classification Quantitative
Abstract

Due to visual differences in biomedical image datasets acquired using distinct digitization techniques, Transfer Learning is an important step for improving the generalization capabilities of Neural Networks in this area. Despite succeeding in classification tasks, most Domain Adaptation strategies face serious limitations in segmentation. Therefore, improving on previous Image Translation networks, we propose a Domain Adaptation method for biomedical image segmentation based on adversarial networks that can learn from both unlabeled and labeled data. Our experimental procedure compares our method using several domains, datasets, segmentation tasks and baselines, performing quantitative and qualitative comparisons of the proposed method with baselines. The proposed method shows consistently better results than the baselines in scarce label scenarios, often achieving Jaccard values greater than 0.9 and adequate segmentation quality in most tasks and datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05553

PDF

https://arxiv.org/pdf/1901.05553
Read All
Practical Algorithms for Multi-Stage Voting Rules with Parallel Universes Tiebreaking

2019-01-16

Jun Wang, Sujoy Sikdar, Tyler Shepherd, Zhibing Zhao, Chunheng Jiang, Lirong Xia

arXiv_AI

arXiv_AI
Abstract

STV and ranked pairs (RP) are two well-studied voting rules for group decision-making. They proceed in multiple rounds, and are affected by how ties are broken in each round. However, the literature is surprisingly vague about how ties should be broken. We propose the first algorithms for computing the set of alternatives that are winners under some tiebreaking mechanism under STV and RP, which is also known as parallel-universes tiebreaking (PUT). Unfortunately, PUT-winners are NP-complete to compute under STV and RP, and standard search algorithms from AI do not apply. We propose multiple DFS-based algorithms along with pruning strategies, heuristics, sampling and machine learning to prioritize search direction to significantly improve the performance. We also propose novel ILP formulations for PUT-winners under STV and RP, respectively. Experiments on synthetic and real-world data show that our algorithms are overall faster than ILP.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.09791

PDF

http://arxiv.org/pdf/1901.09791
Read All
Response to 'Visual Dialogue without Vision or Dialogue'

2019-01-16

Abhishek Das, Devi Parikh, Dhruv Batra

arXiv_CV

arXiv_CV
Abstract

In a recent workshop paper, Massiceti et al. presented a baseline model and subsequent critique of Visual Dialog (Das et al., CVPR 2017) that raises what we believe to be unfounded concerns about the dataset and evaluation. This article intends to rebut the critique and clarify potential confusions for practitioners and future participants in the Visual Dialog challenge.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05531

PDF

https://arxiv.org/pdf/1901.05531
Read All
Manipulating Highly Deformable Materials Using a Visual Feedback Dictionary

2019-01-16

Biao Jia, Zhe Hu, Jia Pan, Dinesh Manocha

arXiv_RO

arXiv_RO
Abstract

The complex physical properties of highly deformable materials such as clothes pose significant challenges fanipulation systems. We present a novel visual feedback dictionary-based method for manipulating defoor autonomous robotic mrmable objects towards a desired configuration. Our approach is based on visual servoing and we use an efficient technique to extract key features from the RGB sensor stream in the form of a histogram of deformable model features. These histogram features serve as high-level representations of the state of the deformable material. Next, we collect manipulation data and use a visual feedback dictionary that maps the velocity in the high-dimensional feature space to the velocity of the robotic end-effectors for manipulation. We have evaluated our approach on a set of complex manipulation tasks and human-robot manipulation tasks on different cloth pieces with varying material characteristics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1710.06947

PDF

http://arxiv.org/pdf/1710.06947
Read All
Cloth Manipulation Using Random-Forest-Based Imitation Learning

2019-01-16

Biao Jia, Zherong Pan, Zhe Hu, Jia Pan, Dinesh Manocha

arXiv_RO

arXiv_RO Optimization Classification Deep_Learning
Abstract

We present a novel approach for robust manipulation of high-DOF deformable objects such as cloth. Our approach uses a random forest-based controller that maps the observed visual features of the cloth to an optimal control action of the manipulator. The topological structure of this random forest-based controller is determined automatically based on the training data consisting visual features and optimal control actions. This enables us to integrate the overall process of training data classification and controller optimization into an imitation learning (IL) approach. Our approach enables learning of robust control policy for cloth manipulation with guarantees on convergence.We have evaluated our approach on different multi-task cloth manipulation benchmarks such as flattening, folding and twisting. In practice, our approach works well with different deformable features learned based on the specific task or deep learning. Moreover, our controller outperforms a simple or piecewise linear controller in terms of robustness to noise. In addition, our approach is easy to implement and does not require much parameter tuning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.09661

PDF

http://arxiv.org/pdf/1802.09661
Read All
Survey of Bayesian Networks Applications on Unmanned Intelligent Autonomous Vehicles

2019-01-16

Rocío Díaz de León Torres, Martín Molina, Pascual Campoy

arXiv_AI

arXiv_AI Review Survey
Abstract

This article review the applications of Bayesian networks on Unmanned Intelligent Autonomous Vehicles (UIAV) from the decision making point of view, which represents the final step for fully autonomous unmanned vehicles (currently under discussion). Until now when it comes to make high level decisions for unmanned autonomous vehicles (UAV) the humans have the last word. Based on the works exposed in this article and current analysis, the modules of a general decision making framework and its variables are inferred. Many efforts have been made in the labs showing Bayesian networks as a promising computer model for decision making. Remains for the future to test Bayesian networks models in real situations. Besides the applications, Bayesian networks fundaments are introduced as elements to consider when we try to develop (UIAVs) with the potential of achieving high level judgements.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05517

PDF

https://arxiv.org/pdf/1901.05517
Read All
Autonomous visual inspection of large-scale infrastructures using aerial robots

2019-01-16

Christoforos Kanellakis, Emil Fresk, Sina Sharif Mansouri, Dariusz Kominiak, George Nikolakopoulos

arXiv_RO

arXiv_RO Pose_Estimation
Abstract

This article presents a novel framework for performing visual inspection around 3D infrastructures, by establishing a team of fully autonomous Micro Aerial Vehicles (MAVs) with robust localization, planning and perception capabilities. The proposed aerial inspection system reaches high level of autonomy on a large scale, while pushing to the boundaries the real life deployment of aerial robotics. In the presented approach, the MAVs deployed for the inspection of the structure rely only on their onboard computer and sensory systems. The developed framework envisions a modular system, combining open research challenges in the fields of localization, path planning and mapping, with an overall capability for a fast on site deployment and a reduced execution time that can repeatably perform the inspection mission according to the operator needs. The architecture of the established system includes: 1) a geometry-based path planner for coverage of complex structures by multiple MAVs, 2) an accurate yet flexible localization component, which provides an accurate pose estimation for the MAVs by utilizing an Ultra Wideband fused inertial estimation scheme, and 3) visual data post-processing scheme for the 3D model building. The performance of the proposed framework has been experimentally demonstrated in multiple realistic outdoor field trials, all focusing on the challenging structure of a wind turbine as the main test case. The successful experimental results, depict the merits of the proposed autonomous navigation system as the enabling technology towards aerial robotic inspectors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05510

PDF

http://arxiv.org/pdf/1901.05510
Read All
Multi-Agent Pathfinding with Continuous Time

2019-01-16

Anton Andreychuk, Konstantin Yakovlev, Dor Atzmon, Roni Stern

arXiv_AI

arXiv_AI
Abstract

MAPF is the problem of finding paths for multiple agents such that every agent reaches its goal and the agents do not collide. Most prior work on MAPF were on grid, assumed all actions cost the same, agents do not have a volume, and considered discrete time steps. In this work we propose a MAPF algorithm that do not assume any of these assumptions, is complete, and provides provably optimal solutions. This algorithm is based on a novel combination of SIPP, a continuous time single agent planning algorithms, and CBS, a state of the art multi-agent pathfinding algorithm. We analyze this algorithm, discuss its pros and cons, and evaluate it experimentally on several standard benchmarks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05506

PDF

https://arxiv.org/pdf/1901.05506
Read All
Fundamentals of effective cloud management for the new NASA Astrophysics Data System

2019-01-16

Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Kris Bukovi, Nathan Rapport

arXiv_CV

arXiv_CV
Abstract

The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient system if some fundamental questions are addressed. After having experimented with different AWS environments and deployment methods, we decided in December 2017 to go with Kubernetes as our container orchestration. Defining the best strategy to properly setup Kubernetes has shown to be challenging: automatic scaling services and load balancing traffic can lead to errors whose origin is difficult to identify, monitoring and logging the activity that happens across multiple layers for a single request needs to be carefully addressed, and the best workflow for a Continuous Integration and Delivery (CI/CD) system is not self-evident. We present here how we tackle these challenges and our plans for the future.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05463

PDF

https://arxiv.org/pdf/1901.05463
Read All

185/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL