In this work, we explain how to use computational topology for detecting differences in the geometrical distribution of cells forming epithelial tissues. In particular, we extract topological information from images using persistent homology and summarize it with a number called persistent entropy. This method is scale invariant, robust to noise and sensitive to global topological features of the tissue. We have found significant differences between chick neuroepithelium and epithelium of Drosophila wing discs in both, larva and prepupal stages.
http://arxiv.org/abs/1902.06467
GaN films with thicknesses up to 3 mm were grown in two custom-made halide vapor phase epitaxy (HVPE) reactors. V-shaped defects (pits) with densities from 1 cm$^{-2}$ to 100 cm$^{-2}$ were found on the surfaces of the films. Origins of pit formation and the process of pit overgrowth were studied by analyzing the kinematics of pit evolution. Two mechanisms of pit overgrowth were observed. Pits can be overgrown intentionally by varying growth parameters to increase the growth rate of pit facets. Pits can overgrow spontaneously if a fast-growing facet nucleates at their bottom under constant growth conditions.
https://arxiv.org/abs/1902.06465
Sparse representation has recently been successfully applied in visual tracking. It utilizes a set of templates to represent target candidates and find the best one with the minimum reconstruction error as the tracking result. In this paper, we propose a robust deep features-based structured group local sparse tracker (DF-SGLST), which exploits the deep features of local patches inside target candidates and represents them by a set of templates in the particle filter framework. Unlike the conventional local sparse trackers, the proposed optimization model in DF-SGLST employs a group-sparsity regularization term to seamlessly adopt local and spatial information of the target candidates and attain the spatial layout structure among them. To solve the optimization model, we propose an efficient and fast numerical algorithm that consists of two subproblems with the closed-form solutions. Different evaluations in terms of success and precision on the benchmarks of challenging image sequences (e.g., OTB50 and OTB100) demonstrate the superior performance of the proposed tracker against several state-of-the-art trackers.
http://arxiv.org/abs/1902.07668
Generative Adversarial Networks (GANs) are powerful tools for reconstructing Compressed Sensing Magnetic Resonance Imaging (CS-MRI). However most recent works lack exploration of structure information of MRI images that is crucial for clinical diagnosis. To tackle this problem, we propose the Structure-Enhanced GAN (SEGAN) that aims at restoring structure information at both local and global scale. SEGAN defines a new structure regularization called Patch Correlation Regularization (PCR) which allows for efficient extraction of structure information. In addition, to further enhance the ability to uncover structure information, we propose a novel generator SU-Net by incorporating multiple-scale convolution filters into each layer. Besides, we theoretically analyze the convergence of stochastic factors contained in training process. Experimental results show that SEGAN is able to learn target structure information and achieves state-of-the-art performance for CS-MRI reconstruction.
http://arxiv.org/abs/1902.06455
Self-attention network, an attention-based feedforward neural network, has recently shown the potential to replace recurrent neural networks (RNNs) in a variety of NLP tasks. However, it is not clear if the self-attention network could be a good alternative of RNNs in automatic speech recognition (ASR), which processes the longer speech sequences and may have online recognition requirements. In this paper, we present a RNN-free end-to-end model: self-attention aligner (SAA), which applies the self-attention networks to a simplified recurrent neural aligner (RNA) framework. We also propose a chunk-hopping mechanism, which enables the SAA model to encode on segmented frame chunks one after another to support online recognition. Experiments on two Mandarin ASR datasets show the replacement of RNNs by the self-attention networks yields a 8.4%-10.2% relative character error rate (CER) reduction. In addition, the chunk-hopping mechanism allows the SAA to have only a 2.5% relative CER degradation with a 320ms latency. After jointly training with a self-attention network language model, our SAA model obtains further error rate reduction on multiple datasets. Especially, it achieves 24.12% CER on the Mandarin ASR benchmark (HKUST), exceeding the best end-to-end model by over 2% absolute CER.
http://arxiv.org/abs/1902.06450
Vossian Antonomasia is a prolific stylistic device, in use since antiquity. It can compress the introduction or description of a person or another named entity into a terse, poignant formulation and can best be explained by an example: When Norwegian world champion Magnus Carlsen is described as “the Mozart of chess”, it is Vossian Antonomasia we are dealing with. The pattern is simple: A source (Mozart) is used to describe a target (Magnus Carlsen), the transfer of meaning is reached via a modifier (“of chess”). This phenomenon has been discussed before (as ‘metaphorical antonomasia’ or, with special focus on the source object, as ‘paragons’), but no corpus-based approach has been undertaken as yet to explore its breadth and variety. We are looking into a full-text newspaper corpus (The New York Times, 1987-2007) and describe a new method for the automatic extraction of Vossian Antonomasia based on Wikidata entities. Our analysis offers new insights into the occurrence of popular paragons and their distribution.
http://arxiv.org/abs/1902.06428
In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the same level of common datasets and benchmarking methods. In 2015 a sub-challenge was introduced at the EndoVis workshop where a set of robotic images were provided with automatically generated annotations from robot forward kinematics. However, there were issues with this dataset due to the limited background variation, lack of complex motion and inaccuracies in the annotation. In this work we present the results of the 2017 challenge on robotic instrument segmentation which involved 10 teams participating in binary, parts and type based segmentation of articulated da Vinci robotic instruments.
http://arxiv.org/abs/1902.06426
Continuous Bag of Words (CBOW) is a powerful text embedding method. Due to its strong capabilities to encode word content, CBOW embeddings perform well on a wide range of downstream tasks while being efficient to compute. However, CBOW is not capable of capturing the word order. The reason is that the computation of CBOW’s word embeddings is commutative, i.e., embeddings of XYZ and ZYX are the same. In order to address this shortcoming, we propose a learning algorithm for the Continuous Matrix Space Model, which we call Continual Multiplication of Words (CMOW). Our algorithm is an adaptation of word2vec, so that it can be trained on large quantities of unlabeled text. We empirically show that CMOW better captures linguistic properties, but it is inferior to CBOW in memorizing word content. Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW. Our results show that the hybrid CBOW-CMOW-model retains CBOW’s strong ability to memorize word content while at the same time substantially improving its ability to encode other linguistic information by 8%. As a result, the hybrid also performs better on 8 out of 11 supervised downstream tasks with an average improvement of 1.2%.
http://arxiv.org/abs/1902.06423
Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that individual units are selectively responsive to specific morphemes, words, and phrases, rather than responding to arbitrary and uninterpretable patterns. In order to quantitatively analyze such an intriguing phenomenon, we propose a concept alignment method based on how units respond to the replicated text. We conduct analyses with different architectures on multiple datasets for classification and translation tasks and provide new insights into how deep models understand natural language.
http://arxiv.org/abs/1902.07249
Learning based on networks of real neurons, and by extension biologically inspired models of neural networks, has yet to find general learning rules leading to widespread applications. In this paper, we argue for the existence of a principle allowing to steer the dynamics of a biologically inspired neural network. Using carefully timed external stimulation, the network can be driven towards a desired dynamical state. We term this principle “Learning by Stimulation Avoidance” (LSA). We demonstrate through simulation that the minimal sufficient conditions leading to LSA in artificial networks are also sufficient to reproduce learning results similar to those obtained in biological neurons by Shahaf and Marom [1]. We examine the mechanism’s basic dynamics in a reduced network, and demonstrate how it scales up to a network of 100 neurons. We show that LSA has a higher explanatory power than existing hypotheses about the response of biological neural networks to external simulation, and can be used as a learning rule for an embodied application: learning of wall avoidance by a simulated robot. The surge in popularity of artificial neural networks is mostly directed to disembodied models of neurons with biologically irrelevant dynamics: to the authors’ knowledge, this is the first work demonstrating sensory-motor learning with random spiking networks through pure Hebbian learning.
http://arxiv.org/abs/1609.07706
Complex environments provide structured yet variable sensory inputs. To best exploit information from these environments, organisms must evolve the ability to correctly anticipate consequences of unknown stimuli, and act on these predictions. We propose an evolutionary path for neural networks, leading an organism from reactive behavior to simple proactive behavior and from simple proactive behavior to induction-based behavior. Through in-vitro and in-silico experiments, we define the minimal conditions necessary in a network with spike-timing dependent plasticity for the organism to go from reactive to proactive behavior. Our results support the existence of small evolutionary steps and four necessary conditions allowing embodied neural networks to evolve predictive and inductive abilities from an initial reactive strategy. We extend these conditions to more general structures.
http://arxiv.org/abs/1902.06410
Channel-based pruning has achieved significant successes in accelerating deep convolutional neural network, whose pipeline is an iterative three-step procedure: ranking, pruning and fine-tuning. However, this iterative procedure is computationally expensive. In this study, we present a novel computationally efficient channel pruning approach based on the coarse ranking that utilizes the intermediate results during fine-tuning to rank the importance of filters, built upon state-of-the-art works with data-driven ranking criteria. The goal of this work is not to propose a single improved approach built upon a specific channel pruning method, but to introduce a new general framework that works for a series of channel pruning methods. Various benchmark image datasets (CIFAR-10, ImageNet, Birds-200, and Flowers-102) and network architectures (AlexNet and VGG-16) are utilized to evaluate the proposed approach for object classification purpose. Experimental results show that the proposed method can achieve almost identical performance with the corresponding state-of-the-art works (baseline) while our ranking time is negligibly short. In specific, with the proposed method, 75% and 54% of the total computation time for the whole pruning procedure can be reduced for AlexNet on CIFAR-10, and for VGG-16 on ImageNet, respectively. Our approach would significantly facilitate pruning practice, especially on resource-constrained platforms.
http://arxiv.org/abs/1902.06385
In spite of the advancements made in the periocular recognition, the dataset and periocular recognition in the wild remains a challenge. In this paper, we propose a multilayer fusion approach by means of a pair of shared parameters (dual-stream) convolutional neural network where each network accepts RGB data and a novel colour-based texture descriptor, namely Orthogonal Combination-Local Binary Coded Pattern (OC-LBCP) for periocular recognition in the wild. Specifically, two distinct late-fusion layers are introduced in the dual-stream network to aggregate the RGB data and OC-LBCP. Thus, the network beneficial from this new feature of the late-fusion layers for accuracy performance gain. We also introduce and share a new dataset for periocular in the wild, namely Ethnic-ocular dataset for benchmarking. The proposed network has also been assessed on two publicly available datasets, namely CASIA-iris distance and UBIPr. The proposed network outperforms several competing approaches on these datasets.
http://arxiv.org/abs/1902.06383
Channel pruning has been identified as an effective approach to constructing efficient network structures. Its typical pipeline requires iterative pruning and fine-tuning. In this work, we propose a novel single-shot channel pruning approach based on alternating direction methods of multipliers (ADMM), which can eliminate the need for complex iterative pruning and fine-tuning procedure and achieve a target compression ratio with only one run of pruning and fine-tuning. To the best of our knowledge, this is the first study of single-shot channel pruning. The proposed method introduces filter-level sparsity during training and can achieve competitive performance with a simple heuristic pruning criterion (L1-norm). Extensive evaluations have been conducted with various widely-used benchmark architectures and image datasets for object classification purpose. The experimental results on classification accuracy show that the proposed method can outperform state-of-the-art network pruning works under various scenarios.
http://arxiv.org/abs/1902.06382
Recently most popular tracking frameworks focus on 2D image sequences. They seldom track the 3D object in point clouds. In this paper, we propose PointIT, a fast, simple tracking method based on 3D on-road instance segmentation. Firstly, we transform 3D LiDAR data into the spherical image with the size of 64 x 512 x 4 and feed it into instance segment model to get the predicted instance mask for each class. Then we use MobileNet as our primary encoder instead of the original ResNet to reduce the computational complexity. Finally, we extend the Sort algorithm with this instance framework to realize tracking in the 3D LiDAR point cloud data. The model is trained on the spherical images dataset with the corresponding instance label masks which are provided by KITTI 3D Object Track dataset. According to the experiment results, our network can achieve on Average Precision (AP) of 0.617 and the performance of multi-tracking task has also been improved.
http://arxiv.org/abs/1902.06379
Knowledge graph (KG) refinement mainly aims at KG completion and correction (i.e., error detection). However, most conventional KG embedding models only focus on KG completion with an unreasonable assumption that all facts in KG hold without noises, ignoring error detection which also should be significant and essential for KG refinement.In this paper, we propose a novel support-confidence-aware KG embedding framework (SCEF), which implements KG completion and correction simultaneously by learning knowledge representations with both triple support and triple confidence. Specifically, we build model energy function by incorporating conventional translation-based model with support and confidence. To make our triple support-confidence more sufficient and robust, we not only consider the internal structural information in KG, studying the approximate relation entailment as triple confidence constraints, but also the external textual evidence, proposing two kinds of triple supports with entity types and descriptions respectively.Through extensive experiments on real-world datasets, we demonstrate SCEF’s effectiveness.
http://arxiv.org/abs/1902.06377
Web services are basic functions of a software system to support the concept of service-oriented architecture. They are often composed together to provide added values, known as web service composition. Researchers often employ Evolutionary Computation techniques to efficiently construct composite services with near-optimized functional quality (i.e., Quality of Semantic Matchmaking) or non-functional quality (i.e., Quality of Service) or both due to the complexity of this problem. With a significant increase in service composition requests, many composition requests have similar input and output requirements but may vary due to different preferences from different user segments. This problem is often treated as a multi-objective service composition so as to cope with different preferences from different user segments simultaneously. Without taking a multi-objective approach that gives rise to a solution selection challenge, we perceive multiple similar service composition requests as jointly forming an evolutionary multi-tasking problem in this work. We propose an effective permutation-based evolutionary multi-tasking approach that can simultaneously generate a set of solutions, with one for each service request. We also introduce a neighborhood structure over multiple tasks to allow newly evolved solutions to be evaluated on related tasks. Our proposed method can perform better at the cost of only a fraction of time, compared to one state-of-art single-tasking EC-based method. We also found that the use of the proper neighborhood structure can enhance the effectiveness of our approach.
http://arxiv.org/abs/1902.06370
Reliable and automatic segmentation of lung lobes is important for diagnosis, assessment, and quantification of pulmonary diseases. The existing techniques are prohibitively slow, undesirably rely on prior (airway/vessel) segmentation, and/or require user interactions for optimal results. This work presents a reliable, fast, and fully automated lung lobe segmentation based on a progressive dense V-network (PDV-Net). The proposed method can segment lung lobes in one forward pass of the network, with an average runtime of 2 seconds using 1 Nvidia Titan XP GPU, eliminating the need for any prior atlases, lung segmentation or any subsequent user intervention. We evaluated our model using 84 chest CT scans from the LIDC and 154 pathological cases from the LTRC datasets. Our model achieved a Dice score of $0.939 \pm 0.02$ for the LIDC test set and $0.950 \pm 0.01$ for the LTRC test set, significantly outperforming a 2D U-net model and a 3D dense V-net. We further evaluated our model against 55 cases from the LOLA11 challenge, obtaining an average Dice score of 0.935—a performance level competitive to the best performing team with an average score of 0.938. Our extensive robustness analyses also demonstrate that our model can reliably segment both healthy and pathological lung lobes in CT scans from different vendors, and that our model is robust against configurations of CT scan reconstruction.
http://arxiv.org/abs/1902.06362
Our goal is to build systems which write code automatically from the kinds of specifications humans can most easily provide, such as examples and natural language instruction. The key idea of this work is that a flexible combination of pattern recognition and explicit reasoning can be used to solve these complex programming problems. We propose a method for dynamically integrating these types of information. Our novel intermediate representation and training algorithm allow a program synthesis system to learn, without direct supervision, when to rely on pattern recognition and when to perform symbolic search. Our model matches the memorization and generalization performance of neural synthesis and symbolic search, respectively, and achieves state-of-the-art performance on a dataset of simple English description-to-code programming problems.
http://arxiv.org/abs/1902.06349
A femtosecond laser focused inside bulk GaN was used to slice a thin GaN film with an epitaxial device structure from a bulk GaN substrate. The demonstrated Laser Slicing lift-off process did not require any special release layers in the epitaxial structure. GaN film with a thickness of 5 $\mu$m and an InGaN LED epitaxial device structure was lifted off a GaN substrate and transferred onto a copper substrate. The electroluminescence of the LED chip after the Laser Slicing lift-off was demonstrated.
https://arxiv.org/abs/1902.06348
Segmentation is a key stage in dermoscopic image processing, where the accuracy of the border line that defines skin lesions is of utmost importance for subsequent algorithms (e.g., classification) and computer-aided early diagnosis of serious medical conditions. This paper proposes a novel segmentation method based on Local Binary Patterns (LBP), where LBP and K-Means clustering are combined to achieve a detailed delineation in dermoscopic images. In comparison with usual dermatologist-like segmentation (i.e., the available ground-truth), the proposed method is capable of finding more realistic borders of skin lesions, i.e., with much more detail. The results also exhibit reduced variability amongst different performance measures and they are consistent across different images. The proposed method can be applied for cell-based like segmentation adapted to the lesion border growing specificities. Hence, the method is suitable to follow the growth dynamics associated with the lesion border geometry in skin melanocytic images.
http://arxiv.org/abs/1902.06347
We address the problem of automatic American Sign Language fingerspelling recognition from video. Prior work has largely relied on frame-level labels, hand-crafted features, or other constraints, and has been hampered by the scarcity of data for this task. We introduce a model for fingerspelling recognition that addresses these issues. The model consists of an auto-encoder-based feature extractor and an attention-based neural encoder-decoder, which are trained jointly. The model receives a sequence of image frames and outputs the fingerspelled word, without relying on any frame-level training labels or hand-crafted features. In addition, the auto-encoder subcomponent makes it possible to leverage unlabeled data to improve the feature learning. The model achieves 11.6% and 4.4% absolute letter accuracy improvement respectively in signer-independent and signer-adapted fingerspelling recognition over previous approaches that required frame-level training labels.
http://arxiv.org/abs/1710.03255
Many machine learning image classifiers are vulnerable to adversarial attacks, inputs with perturbations designed to intentionally trigger misclassification. Current adversarial methods directly alter pixel colors and evaluate against pixel norm-balls: pixel perturbations smaller than a specified magnitude, according to a measurement norm. This evaluation, however, has limited practical utility since perturbations in the pixel space do not correspond to underlying real-world phenomena of image formation that lead to them and has no security motivation attached. Pixels in natural images are measurements of light that has interacted with the geometry of a physical scene. As such, we propose the direct perturbation of physical parameters that underly image formation: lighting and geometry. As such, we propose a novel evaluation measure, parametric norm-balls, by directly perturbing physical parameters that underly image formation. One enabling contribution we present is a physically-based differentiable renderer that allows us to propagate pixel gradients to the parametric space of lighting and geometry. Our approach enables physically-based adversarial attacks, and our differentiable renderer leverages models from the interactive rendering literature to balance the performance and accuracy trade-offs necessary for a memory-efficient and scalable adversarial data augmentation workflow.
http://arxiv.org/abs/1808.02651
Limited lookahead has been studied for decades in complete-information games. We initiate a new direction via two simultaneous deviation points: generalization to incomplete-information games and a game-theoretic approach. We study how one should act when facing an opponent whose lookahead is limited. We study this for opponents that differ based on their lookahead depth, based on whether they, too, have incomplete information, and based on how they break ties. We characterize the hardness of finding a Nash equilibrium or an optimal commitment strategy for either player, showing that in some of these variations the problem can be solved in polynomial time while in others it is PPAD-hard or NP-hard. We proceed to design algorithms for computing optimal commitment strategies—for when the opponent breaks ties favorably, according to a fixed rule, or adversarially. We then experimentally investigate the impact of limited lookahead. The limited-lookahead player often obtains the value of the game if she knows the expected values of nodes in the game tree for some equilibrium—but we prove this is not sufficient in general. Finally, we study the impact of noise in those estimates and different lookahead depths. This uncovers an incomplete-information game lookahead pathology.
http://arxiv.org/abs/1902.06335
In this paper, we generate and control semantically interpretable filters that are directly learned from natural images in an unsupervised fashion. Each semantic filter learns a visually interpretable local structure in conjunction with other filters. The significance of learning these interpretable filter sets is demonstrated on two contrasting applications. The first application is image recognition under progressive decolorization, in which recognition algorithms should be color-insensitive to achieve a robust performance. The second application is image quality assessment where objective methods should be sensitive to color degradations. In the proposed work, the sensitivity and lack thereof are controlled by weighing the semantic filters based on the local structures they represent. To validate the proposed approach, we utilize the CURE-TSR dataset for image recognition and the TID 2013 dataset for image quality assessment. We show that the proposed semantic filter set achieves state-of-the-art performances in both datasets while maintaining its robustness across progressive distortions.
http://arxiv.org/abs/1902.06334
We provide a complete characterisation of the phenomenon of adversarial examples - inputs intentionally crafted to fool machine learning models. We aim to cover all the important concerns in this field of study: (1) the conjectures on the existence of adversarial examples, (2) the security, safety and robustness implications, (3) the methods used to generate and (4) protect against adversarial examples and (5) the ability of adversarial examples to transfer between different machine learning models. We provide ample background information in an effort to make this document self-contained. Therefore, this document can be used as survey, tutorial or as a catalog of attacks and defences using adversarial examples.
http://arxiv.org/abs/1810.01185
Popular deep domain adaptation methods have mainly focused on learning discriminative and domain-invariant features of different domains. In this work, we present a novel approach inspired by human cognitive processes where receptive fields learned from other vision tasks are recruited to recognize new objects. First, representations of the source and target domains are obtained by the variational auto-encoder (VAE) respectively. Then we construct networks with cross-grafted representation stacks (CGRS). There, it recruits the different level representations learned by sliced receptive field, which projects the self-domain latent encodings to a new association space. Finally, we employ the generative adversarial networks (GAN) to pull the associations from the target to the source, mapped to the known label space. This adaptation process contains three phases, information encoding, association generation, and label alignment. Experiments results demonstrate the CGRS bridges the domain gap well, and the proposed model outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios.
http://arxiv.org/abs/1902.06328
We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. Computation speed is critical as detection is a necessary component for safety. Existing approaches are, however, expensive in computation due to high dimensionality of point clouds. We utilize the 3D data more efficiently by representing the scene from the Bird’s Eye View (BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions. The input representation, network architecture, and model optimization are especially designed to balance high accuracy and real-time efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets we show that the proposed detector surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at >28 FPS.
http://arxiv.org/abs/1902.06326
Images with abnormal brain anatomy produce problems for automatic segmentation techniques, and as a result poor ROI detection affects both quantitative measurements and visual assessment of perfusion data. This paper presents a new approach for fully automated and relatively accurate ROI detection from dynamic susceptibility contrast perfusion magnetic resonance and can therefore be applied excellently in the perfusion analysis. In the proposed approach the segmentation output is a binary mask of perfusion ROI that has zero values for air pixels, pixels that represent non-brain tissues, and cerebrospinal fluid pixels. The process of binary mask producing starts with extracting low intensity pixels by thresholding. Optimal low-threshold value is solved by obtaining intensity pixels information from the approximate anatomical brain location. Holes filling algorithm and binary region growing algorithm are used to remove falsely detected regions and produce region of only brain tissues. Further, CSF pixels extraction is provided by thresholding of high intensity pixels from region of only brain tissues. Each time-point image of the perfusion sequence is used for adjustment of CSF pixels location. The segmentation results were compared with the manual segmentation performed by experienced radiologists, considered as the reference standard for evaluation of proposed approach. On average of 120 images the segmentation results have a good agreement with the reference standard. All detected perfusion ROIs were deemed by two experienced radiologists as satisfactory enough for clinical use. The results show that proposed approach is suitable to be used for perfusion ROI detection from DSC head scans. Segmentation tool based on the proposed approach can be implemented as a part of any automatic brain image processing system for clinical use.
http://arxiv.org/abs/1902.06323
Regret minimization is a powerful tool for solving large-scale problems; it was recently used in breakthrough results for large-scale extensive-form game solving. This was achieved by composing simplex regret minimizers into an overall regret-minimization framework for extensive-form game strategy spaces. In this paper we study the general composability of regret minimizers. We derive a calculus for constructing regret minimizers for composite convex sets that are obtained from convexity-preserving operations on simpler convex sets. We show that local regret minimizers for the simpler sets can be combined with additional regret minimizers into an aggregate regret minimizer for the composite set. As one application, we show that the CFR framework can be constructed easily from our framework. We also show ways to include curtailing (constraining) operations into our framework. For one, they enables the construction of CFR generalization for extensive-form games with general convex strategy constraints that can cut across decision points.
http://arxiv.org/abs/1811.02540
We propose a new framework for prototypical learning that bases decision-making on few relevant examples that we call prototypes. Our framework utilizes an attention mechanism that relates the encoded representations to determine the prototypes. This results in a model that: (1) enables interpretability by outputting samples most relevant to the decision-making in addition to outputting the classification results; (2) allows confidence-controlled prediction by quantifying the mismatch across prototype labels; (3) permits detection of distribution mismatch; and (4) improves robustness to label noise. We demonstrate that our model is able to maintain comparable performance to baseline models while enabling all these benefits.
http://arxiv.org/abs/1902.06292
The memory physics induced unknown offset of the channel is a critical and difficult issue to be tackled for many non-volatile memories (NVMs). In this paper, we first propose novel neural network (NN) detectors by using the multilayer perceptron (MLP) network and the recurrent neural network (RNN), which can effectively tackle the unknown offset of the channel. However, compared with the conventional threshold detector, the NN detectors will incur a significant delay of the read latency and more power consumption. Therefore, we further propose a novel dynamic threshold detector (DTD), whose detection threshold can be derived based on the outputs of the proposed NN detectors. In this way, the NN-based detection only needs to be invoked when the error correction code (ECC) decoder fails, or periodically when the system is in the idle state. Thereafter, the threshold detector will still be adopted by using the adjusted detection threshold derived base on the outputs of the NN detector, until a further adjustment of the detection threshold is needed. Simulation results demonstrate that the proposed DTD based on the RNN detection can achieve the error performance of the optimum detector, without the prior knowledge of the channel.
https://arxiv.org/abs/1902.06289
For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50%.
http://arxiv.org/abs/1902.06285
The integration of visual-tactile stimulus is common while humans performing daily tasks. In contrast, using unimodal visual or tactile perception limits the perceivable dimensionality of a subject. However, it remains a challenge to integrate the visual and tactile perception to facilitate robotic tasks. In this paper, we propose a novel framework for the cross-modal sensory data generation for visual and tactile perception. Taking texture perception as an example, we apply conditional generative adversarial networks to generate pseudo visual images or tactile outputs from data of the other modality. Extensive experiments on the ViTac dataset of cloth textures show that the proposed method can produce realistic outputs from other sensory inputs. We adopt the structural similarity index to evaluate similarity of the generated output and real data and results show that realistic data have been generated. Classification evaluation has also been performed to show that the inclusion of generated data can improve the perception performance. The proposed framework has potential to expand datasets for classification tasks, generate sensory outputs that are not easy to access, and also advance integrated visual-tactile perception.
http://arxiv.org/abs/1902.06273
Semantic segmentation has made encouraging progress due to the success of deep convolutional networks in recent years. Meanwhile, depth sensors become prevalent nowadays, so depth maps can be acquired more easily. However, there are few studies that focus on the RGB-D semantic segmentation task. Exploiting the depth information effectiveness to improve performance is a challenge. In this paper, we propose a novel solution named LDFNet, which incorporates Luminance, Depth and Color information by a fusion-based network. It includes a sub-network to process depth maps and employs luminance images to assist the depth information in processes. LDFNet outperforms the other state-of-art systems on the Cityscapes dataset, and its inference speed is faster than most of the existing networks. The experimental results show the effectiveness of the proposed multi-modal fusion network and its potential for practical applications.
http://arxiv.org/abs/1809.09077
The Federal Government of Germany aims to boost the research in the field of Artificial Intelligence (AI). For instance, 100 new professorships are said to be established. However, the white paper of the government does not answer what an AI professorship is at all. In order to give colleagues, politicians, and citizens an idea, we present a view that is often followed when appointing professors for AI at German and international universities. We hope that it will help to establish a guideline with internationally accepted measures and thus make the public debate more informed.
http://arxiv.org/abs/1903.09516
In situations where explicit communication is limited, a human collaborator is typically able to learn to: (i) infer the meaning behind their partner’s actions and (ii) balance between taking actions that are exploitative given their current understanding of the state vs. those that can convey private information about the state to their partner. The first component of this learning process has been well-studied in multi-agent systems, whereas the second — which is equally crucial for a successful collaboration — has not. In this work, we complete the learning process and introduce our novel algorithm, Policy Belief Learning (“PBL”), which mimics both components mentioned above. A belief module models the other agent’s private information by observing their actions, whilst a policy module makes use of the inferred private information to return a distribution over actions. They are mutually reinforced with each other and iteratively learned. We use a novel auxiliary reward to encourage information exchange by actions. We evaluate our approach on the non-competitive bidding problem from contract bridge and show that by self-play agents are able to effectively collaborate with implicit communication, and PBL outperforms several meaningful baselines that have been considered.
http://arxiv.org/abs/1810.04444
Image attribute transfer aims to change an input image to a target one with expected attributes, which has received significant attention in recent years. However, most of the existing methods lack the ability to de-correlate the target attributes and irrelevant information, i.e., the other attributes and background information, thus often suffering from blurs and artifacts. To address these issues, we propose a novel Attribute Manifold Encoding GAN (AME-GAN) for fully-featured attribute transfer, which can modify and adjust every detail in the images. Specifically, our method divides the input image into image attribute part and image background part on manifolds, which are controlled by attribute latent variables and background latent variables respectively. Through enforcing attribute latent variables to Gaussian distributions and background latent variables to uniform distributions respectively, the attribute transfer procedure becomes controllable and image generation is more photo-realistic. Furthermore, we adopt a conditional multi-scale discriminator to render accurate and high-quality target attribute images. Experimental results on three popular datasets demonstrate the superiority of our proposed method in both performances of the attribute transfer and image generation quality.
http://arxiv.org/abs/1902.06258
Three-dimensional (3-D) scene reconstruction is one of the key techniques in Augmented Reality (AR), which is related to the integration of image processing and display systems of complex information. Stereo matching is a computer vision based approach for 3-D scene reconstruction. In this paper, we explore an improved stereo matching network, SLED-Net, in which a Single Long Encoder-Decoder is proposed to replace the stacked hourglass network in PSM-Net for better contextual information learning. We compare SLED-Net to state-of-the-art methods recently published, and demonstrate its superior performance on Scene Flow and KITTI2015 test sets.
http://arxiv.org/abs/1902.06255
Data driven segmentation is an important initial step of shape prior-based segmentation methods since it is assumed that the data term brings a curve to a plausible level so that shape and data terms can then work together to produce better segmentations. When purely data driven segmentation produces poor results, the final segmentation is generally affected adversely. One challenge faced by many existing data terms is due to the fact that they consider only pixel intensities to decide whether to assign a pixel to the foreground or to the background region. When the distributions of the foreground and background pixel intensities have significant overlap, such data terms become ineffective, as they produce uncertain results for many pixels in a test image. In such cases, using prior information about the spatial context of the object to be segmented together with the data term can bring a curve to a plausible stage, which would then serve as a good initial point to launch shape-based segmentation. In this paper, we propose a new segmentation approach that combines nonparametric context priors with a learned-intensity-based data term and nonparametric shape priors. We perform experiments for dendritic spine segmentation in both 2D and 3D 2-photon microscopy images. The experimental results demonstrate that using spatial context priors leads to significant improvements.
http://arxiv.org/abs/1901.02513
Unlike other languages, the Arabic language has a morphological complexity which makes the Arabic sentiment analysis is a challenging task. Moreover, the presence of the dialects in the Arabic texts have made the sentiment analysis task is more challenging, due to the absence of specific rules that govern the writing or speaking system. Generally, one of the problems of sentiment analysis is the high dimensionality of the feature vector. To resolve this problem, many feature selection methods have been proposed. In contrast to the dialectal Arabic language, these selection methods have been investigated widely for the English language. This work investigated the effect of feature selection methods and their combinations on dialectal Arabic sentiment classification. The feature selection methods are Information Gain (IG), Correlation, Support Vector Machine (SVM), Gini Index (GI), and Chi-Square. A number of experiments were carried out on dialectical Jordanian reviews with using an SVM classifier. Furthermore, the effect of different term weighting schemes, stemmers, stop words removal, and feature models on the performance were investigated. The experimental results showed that the best performance of the SVM classifier was obtained after the SVM and correlation feature selection methods had been combined with the uni-gram model.
http://arxiv.org/abs/1902.06242
Potential-based reward shaping (PBRS) is a particular category of machine learning methods which aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the process of transfer learning: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both the transfer learning and the potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes’ cumulative rewards. The proposed method has been evaluated in the Arcade learning environment and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
http://arxiv.org/abs/1902.06239
Image colorization achieves more and more realistic results with the increasing computation power of recent deep learning techniques. It becomes more difficult to identify the fake colorized images by human eyes. In this work, we propose a novel forensic method to distinguish between natural images (NIs) and colorized images (CIs) based on convolutional neural network (CNN). Our method is able to achieve high classification accuracy and cope with the challenging scenario of blind detection, i.e., no training sample is available from “unknown” colorization algorithm that we may encounter during the testing phase. This blind detection performance can be regarded as a generalization performance. First, we design and implement a base network, which can attain better performance in terms of classification accuracy and generalization (in most cases) compared with state-of-the-art methods. Furthermore, we design a new branch, which analyzes smaller regions of extracted features, and insert it into the above base network. Consequently, our network can not only improve the classification accuracy, but also enhance the generalization in the vast majority of cases. To further improve the performance of blind detection, we propose to automatically construct negative samples through linear interpolation of paired natural and colorized images. Then, we progressively insert these negative samples into the original training dataset and continue to train the network. Experimental results demonstrate that our method can achieve stable and high generalization performance when tested against different state-of-the-art colorization algorithms.
http://arxiv.org/abs/1902.06222
For dense sampled light field (LF) reconstruction problem, existing approaches focus on a depth-free framework to achieve non-Lambertian performance. However, they trap in the trade-off “either aliasing or blurring” problem, i.e., pre-filtering the aliasing components (caused by the angular sparsity of the input LF) always leads to a blurry result. In this paper, we intend to solve this challenge by introducing an elaborately designed epipolar plane image (EPI) structure within a learning-based framework. Specifically, we start by analytically showing that decreasing the spatial scale of an EPI shows higher efficiency in addressing the aliasing problem than simply adopting pre-filtering. Accordingly, we design a Laplacian Pyramid EPI (LapEPI) structure that contains both low spatial scale EPI (for aliasing) and high-frequency residuals (for blurring) to solve the trade-off problem. We then propose a novel network architecture for the LapEPI structure, termed as LapEPI-net. To ensure the non-Lambertian performance, we adopt a transfer-learning strategy by first pre-training the network with natural images then fine-tuning it with unstructured LFs. Extensive experiments demonstrate the high performance and robustness of the proposed approach for tackling the aliasing-or-blurring problem as well as the non-Lambertian reconstruction.
http://arxiv.org/abs/1902.06221
Pedestrian detection is a research hotspot and a difficult issue in the computer vision such as the Intelligent Surveillance System (ISS), the Intelligent Transport System (ITS), robotics, and automotive safety. However, the human body’s position, angle, and dress in a video scene are complicated and changeable, which have a great influence on the detection accuracy. In this paper, through the analysis on the pros and cons of Census Transform Histogram (CENTRIST), a novel feature is presented for human detection-Ternary CENTRIST (T-CENTRIST). The T-CENTRIST feature takes the relationship between each pixel and its neighborhood pixels into account. Meanwhile, it also considers the relevancy among these neighborhood pixels. Therefore, the proposed feature description method can reflect the silhouette of pedestrian more adequately and accurately than that of CENTRIST. Second, we propose a fast pedestrian detection framework based on T-CENTRIST, which introduces the idea of extended blocks and the integral image. Finally, experimental results verify the effectiveness of the proposed pedestrian detection method.
http://arxiv.org/abs/1902.06218
With the increasing importance of online communities, discussion forums, and customer reviews, Internet “trolls” have proliferated thereby making it difficult for information seekers to find relevant and correct information. In this paper, we consider the problem of detecting and identifying Internet trolls, almost all of which are human agents. Identifying a human agent among a human population presents significant challenges compared to detecting automated spam or computerized robots. To learn a troll’s behavior, we use contextual anomaly detection to profile each chat user. Using clustering and distance-based methods, we use contextual data such as the group’s current goal, the current time, and the username to classify each point as an anomaly. A user whose features significantly differ from the norm will be classified as a troll. We collected 38 million data points from the viral Internet fad, Twitch Plays Pokemon. Using clustering and distance-based methods, we develop heuristics for identifying trolls. Using MapReduce techniques for preprocessing and user profiling, we are able to classify trolls based on 10 features extracted from a user’s lifetime history.
http://arxiv.org/abs/1902.06208
The diurnal cycle of tropical cyclones (TCs) is a daily cycle in clouds that appears in satellite images and may have implications for TC structure and intensity. The diurnal pattern can be seen in infrared (IR) satellite imagery as cyclical pulses in the cloud field that propagate radially outward from the center of nearly all Atlantic-basin TCs. These diurnal pulses, a distinguishing characteristic of the TC diurnal cycle, begin forming in the storm’s inner core near sunset each day and appear as a region of cooling cloud-top temperatures. The area of cooling takes on a ring-like appearance as cloud-top warming occurs on its inside edge and the cooling moves away from the storm overnight, reaching several hundred kilometers from the circulation center by the following afternoon. The state-of-the-art TC diurnal cycle measurement has a limited ability to analyze the behavior beyond qualitative observations. We present a method for quantifying the TC diurnal cycle using one-dimensional persistent homology, a tool from Topological Data Analysis, by tracking maximum persistence and quantifying the cycle using the discrete Fourier transform. Using Geostationary Operational Environmental Satellite IR imagery data from Hurricane Felix (2007), our method is able to detect an approximate daily cycle.
http://arxiv.org/abs/1902.06202
This paper focuses on automatic guided vehicle (AGV) trajectory planning in the presence of moving obstacles with known but complicated trajectories. In order to achieve good solution precision, optimality and unification, the concerned task should be formulated as an optimal control problem, and then discretized into a nonlinear programming (NLP) problem, which is numerically optimized thereafter. Without a near-feasible or near-optimal initial guess, the NLP-solving process is usually slow. With the purpose of accelerating the NLP solution, a search-based rough planning stage is added to generate appropriate initial guesses. Concretely, a continuous state space is formulated, which consists of Cartesian product of 2D configuration space and a time dimension. The rough trajectory is generated by a graph-search based planner, namely the A* algorithm. Herein, the nodes in the graph are constructed by discretizing the aforementioned continuous spatio-temporal space. Through this first-search-then-optimization framework, optimal solutions to unified trajectory planning problems can be obtained fast. Simulations have been conducted to verify the real-time performance of our proposal.
http://arxiv.org/abs/1902.06201
Previous works for PCB defect detection based on image difference and image processing techniques have already achieved promising performance. However, they sometimes fall short because of the unaccounted defect patterns or over-sensitivity about some hyper-parameters. In this work, we design a deep model that accurately detects PCB defects from an input pair of a detect-free template and a defective tested image. A novel group pyramid pooling module is proposed to efficiently extract features of a large range of resolutions, which are merged by group to predict PCB defect of corresponding scales. To train the deep model, a dataset is established, namely DeepPCB, which contains 1,500 image pairs with annotations including positions of 6 common types of PCB defects. Experiment results validate the effectiveness and efficiency of the proposed model by achieving $98.6\%$ mAP @ 62 FPS on DeepPCB dataset. This dataset is now available at: https://github.com/tangsanli5201/DeepPCB.
http://arxiv.org/abs/1902.06197
Conversational agents are systems with a conversational interface that afford interaction in spoken language. These systems are becoming prevalent and are preferred in various contexts and for many users. Despite their increasing success, the automated testing infrastructure to support the effective and efficient development of such systems compared to traditional software systems is still limited. Automated testing framework for conversational systems can improve the quality of these systems by assisting developers to write, execute, and maintain test cases. In this paper, we introduce our work-in-progress automated testing framework, and its realization in the Python programming language. We discuss some research problems in the development of such an automated testing framework for conversational agents. In particular, we point out the problems of the specification of the expected behavior, known as test oracles, and semantic comparison of utterances.
http://arxiv.org/abs/1902.06193