Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Optimization of Project Scheduling Activities in Dynamic CPM and PERT Networks Using Genetic Algorithms

2019-02-02

Muhammed Hanefi Calp, Muhammet Ali Akcayol

arXiv_AI

arXiv_AI Review Attention Optimization
Abstract

Projects consist of interconnected dimensions such as objective, time, resource and environment. Use of these dimensions in a controlled way and their effective scheduling brings the project success. Project scheduling process includes defining project activities, and estimation of time and resources to be used for the activities. At this point, the project resource-scheduling problems have begun to attract more attention after Program Evaluation and Review Technique (PERT) and Critical Path Method (CPM) are developed one after the other. However, complexity and difficulty of CPM and PERT processes led to the use of these techniques through artificial intelligence methods such as Genetic Algorithm (GA). In this study, an algorithm was proposed and developed, which determines critical path, critical activities and project completion duration by using GA, instead of CPM and PERT techniques used for network analysis within the scope of project management. The purpose of using GA was that these algorithms are an effective method for solution of complex optimization problems. Therefore, correct decisions can be made for implemented project activities by using obtained results. Thus, optimum results were obtained in a shorter time than the CPM and PERT techniques by using the model based on the dynamic algorithm. It is expected that this study will contribute to the performance field (time, speed, low error etc.) of other studies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00659

PDF

http://arxiv.org/pdf/1902.00659
Read All
Learned Indexes for Dynamic Workloads

2019-02-02

Chuzhe Tang, Zhiyuan Dong, Minjie Wang, Zhaoguo Wang, Haibo Chen

arXiv_AI

arXiv_AI
Abstract

The recent proposal of learned index structures opens up a new perspective on how traditional range indexes can be optimized. However, the current learned indexes assume the data distribution is relatively static and the access pattern is uniform, while real-world scenarios consist of skew query distribution and evolving data. In this paper, we demonstrate that the missing consideration of access patterns and dynamic data distribution notably hinders the applicability of learned indexes. To this end, we propose solutions for learned indexes for dynamic workloads (called Doraemon). To improve the latency for skew queries, Doraemon augments the training data with access frequencies. To address the slow model re-training when data distribution shifts, Doraemon caches the previously-trained models and incrementally fine-tunes them for similar access patterns and data distribution. Our preliminary result shows that, Doraemon improves the query latency by 45.1% and reduces the model re-training time to 1/20.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00655

PDF

http://arxiv.org/pdf/1902.00655
Read All
FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation

2019-02-02

Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han

arXiv_SD

arXiv_SD CNN RNN
Abstract

Deep gated convolutional networks have been proved to be very effective in single channel speech separation. However current state-of-the-art framework often considers training the gated convolutional networks in time-frequency (TF) domain. Such an approach will result in limited perceptual score, such as signal-to-distortion ratio (SDR) upper bound of separated utterances and also fail to exploit an end-to-end framework. In this paper we present an integrated simple and effective end-to-end approach to monaural speech separation, which consists of deep gated convolutional neural networks (GCNN) that takes the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker’s voice. In addition long short-term memory (LSTM) is employed for long term temporal modeling. For the objective, we propose to train the network by directly optimizing utterance level SDR in a permutation invariant training (PIT) style. Our experiments on the the public WSJ0-2mix data corpus demonstrate that this new scheme can produce more discriminative separated utterances and leading to performance improvement on the speaker separation task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00651

PDF

http://arxiv.org/pdf/1902.00651
Read All
Enabling Robots to Infer how End-Users Teach and Learn through Human-Robot Interaction

2019-02-02

Dylan P. Losey, Marcia K. O'Malley

arXiv_RO

arXiv_RO Inference
Abstract

During human-robot interaction (HRI), we want the robot to understand us, and we want to intuitively understand the robot. In order to communicate with and understand the robot, we can leverage interactions, where the human and robot observe each other’s behavior. However, it is not always clear how the human and robot should interpret these actions: a given interaction might mean several different things. Within today’s state-of-the-art, the robot assigns a single interaction strategy to the human, and learns from or teaches the human according to this fixed strategy. Instead, we here recognize that different users interact in different ways, and so one size does not fit all. Therefore, we argue that the robot should maintain a distribution over the possible human interaction strategies, and then infer how each individual end-user interacts during the task. We formally define learning and teaching when the robot is uncertain about the human’s interaction strategy, and derive solutions to both problems using Bayesian inference. In examples and a benchmark simulation, we show that our personalized approach outperforms standard methods that maintain a fixed interaction strategy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00646

PDF

http://arxiv.org/pdf/1902.00646
Read All
Pairwise Teacher-Student Network for Semi-Supervised Hashing

2019-02-02

Shifeng Zhang, Jianmin Li, Bo Zhang

arXiv_CV

arXiv_CV Attention
Abstract

Hashing method maps similar high-dimensional data to binary hashcodes with smaller hamming distance, and it has received broad attention due to its low storage cost and fast retrieval speed. Pairwise similarity is easily obtained and widely used for retrieval, and most supervised hashing algorithms are carefully designed for the pairwise supervisions. As labeling all data pairs is difficult, semi-supervised hashing is proposed which aims at learning efficient codes with limited labeled pairs and abundant unlabeled ones. Existing methods build graphs to capture the structure of dataset, but they are not working well for complex data as the graph is built based on the data representations and determining the representations of complex data is difficult. In this paper, we propose a novel teacher-student semi-supervised hashing framework in which the student is trained with the pairwise information produced by the teacher network. The network follows the smoothness assumption, which achieves consistent distances for similar data pairs so that the retrieval results are similar for neighborhood queries. Experiments on large-scale datasets show that the proposed method reaches impressive gain over the supervised baselines and is superior to state-of-the-art semi-supervised hashing methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00643

PDF

http://arxiv.org/pdf/1902.00643
Read All
Semantic Cluster Unary Loss for Efficient Deep Hashing

2019-02-02

Shifeng Zhang, Jianmin Li, Bo Zhang

arXiv_CV

arXiv_CV Attention Classification Deep_Learning
Abstract

Hashing method maps similar data to binary hashcodes with smaller hamming distance, which has received a broad attention due to its low storage cost and fast retrieval speed. With the rapid development of deep learning, deep hashing methods have achieved promising results in efficient information retrieval. Most of the existing deep hashing methods adopt pairwise or triplet losses to deal with similarities underlying the data, but the training is difficult and less efficient because $O(n^2)$ data pairs and $O(n^3)$ triplets are involved. To address these issues, we propose a novel deep hashing algorithm with unary loss which can be trained very efficiently. We first of all introduce a Unary Upper Bound of the traditional triplet loss, thus reducing the complexity to $O(n)$ and bridging the classification-based unary loss and the triplet loss. Second, we propose a novel Semantic Cluster Deep Hashing (SCDH) algorithm by introducing a modified Unary Upper Bound loss, named Semantic Cluster Unary Loss (SCUL). The resultant hashcodes form several compact clusters, which means hashcodes in the same cluster have similar semantic information. We also demonstrate that the proposed SCDH is easy to be extended to semi-supervised settings by incorporating the state-of-the-art semi-supervised learning algorithms. Experiments on large-scale datasets show that the proposed method is superior to state-of-the-art hashing algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.08705

PDF

http://arxiv.org/pdf/1805.08705
Read All
Is CQT more suitable for monaural speech separation than STFT? an empirical study

2019-02-02

Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han

arXiv_SD

arXiv_SD
Abstract

Short-time Fourier transform (STFT) is used as the front end of many popular successful monaural speech separation methods, such as deep clustering (DPCL), permutation invariant training (PIT) and their various variants. Since the frequency component of STFT is linear, while the frequency distribution of human auditory system is nonlinear. In this work we propose and give an empirical study to use an alternative front end called constant Q transform (CQT) instead of STFT to achieve a better simulation of the frequency resolving power of the human auditory system. The upper bound in signal-to-distortion (SDR) of ideal speech separation based on CQT’s ideal ration mask (IRM) is higher than that based on STFT. In the same experimental setting on WSJ0-2mix corpus, we examined the performance of CQT under different backends, including the original DPCL, utterance level PIT, and some of their variants. It is found that all CQT-based methods are better than STFT-based methods, and achieved on average 0.4dB better performance than STFT based method in SDR improvements.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00631

PDF

http://arxiv.org/pdf/1902.00631
Read All
Nonparametric Curve Alignment

2019-02-02

Marwan Mattar, Michael Ross, Erik Learned-Miller

arXiv_AI

arXiv_AI Face
Abstract

Congealing is a flexible nonparametric data-driven framework for the joint alignment of data. It has been successfully applied to the joint alignment of binary images of digits, binary images of object silhouettes, grayscale MRI images, color images of cars and faces, and 3D brain volumes. This research enhances congealing to practically and effectively apply it to curve data. We develop a parameterized set of nonlinear transformations that allow us to apply congealing to this type of data. We present positive results on aligning synthetic and real curve data sets and conclude with a discussion on extending this work to simultaneous alignment and clustering.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00626

PDF

http://arxiv.org/pdf/1902.00626
Read All
A Question Answering System Using Graph-Pattern Association Rules On YAGO Knowledge Base

2019-02-02

Wahyudi, Masayu Leylia Khodra, Ary Setijadi Prihatmanto, Carmadi Machbub

arXiv_AI

arXiv_AI Knowledge QA Classification
Abstract

A question answering system (QA System) was developed that uses graph-pattern association rules on the YAGO knowledge base. The answer as output of the system is provided based on a user question as input. If the answer is missing or unavailable in the database, then graph-pattern association rules are used to get the answer. The architecture of this question answering system is as follows: question classification, graph component generation, query generation, and query processing. The question answering system uses association graph patterns in a waterfall model. In this paper, the architecture of the system is described, specifically discussing its reasoning and performance capabilities. The results of this research is that rules with high confidence and correct logic produce correct answers, and vice versa

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00624

PDF

http://arxiv.org/pdf/1902.00624
Read All
Collaborative Quantization for Cross-Modal Similarity Search

2019-02-02

Ting Zhang, Jingdong Wang

arXiv_CV

arXiv_CV
Abstract

Cross-modal similarity search is a problem about designing a search system supporting querying across content modalities, e.g., using an image to search for texts or using a text to search for images. This paper presents a compact coding solution for efficient search, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in the single-modal similarity search. We propose a cross-modal quantization approach, which is among the early attempts to introduce quantization into cross-modal search. The major contribution lies in jointly learning the quantizers for both modalities through aligning the quantized representations for each pair of image and text belonging to a document. In addition, our approach simultaneously learns the common space for both modalities in which quantization is conducted to enable efficient and effective search using the Euclidean distance computed in the common space with fast distance table lookup. Experimental results compared with several competitive algorithms over three benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00623

PDF

http://arxiv.org/pdf/1902.00623
Read All
Supervised Quantization for Similarity Search

2019-02-02

Xiaojuan Wang, Ting Zhang, Guo-Jun Q, Jinhui Tang, Jingdong Wang

arXiv_CV

arXiv_CV Optimization Classification
Abstract

In this paper, we address the problem of searching for semantically similar images from a large database. We present a compact coding approach, supervised quantization. Our approach simultaneously learns feature selection that linearly transforms the database points into a low-dimensional discriminative subspace, and quantizes the data points in the transformed space. The optimization criterion is that the quantized points not only approximate the transformed points accurately, but also are semantically separable: the points belonging to a class lie in a cluster that is not overlapped with other clusters corresponding to other classes, which is formulated as a classification problem. The experiments on several standard datasets show the superiority of our approach over the state-of-the art supervised hashing and unsupervised quantization algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00617

PDF

http://arxiv.org/pdf/1902.00617
Read All
Formulation of Deep Reinforcement Learning Architecture Toward Autonomous Driving for On-Ramp Merge

2019-02-02

Pin Wang, Ching-Yao Chan

arXiv_AI

arXiv_AI Reinforcement_Learning RNN
Abstract

Multiple automakers have in development or in production automated driving systems (ADS) that offer freeway-pilot functions. This type of ADS is typically limited to restricted-access freeways only, that is, the transition from manual to automated modes takes place only after the ramp merging process is completed manually. One major challenge to extend the automation to ramp merging is that the automated vehicle needs to incorporate and optimize long-term objectives (e.g. successful and smooth merge) when near-term actions must be safely executed. Moreover, the merging process involves interactions with other vehicles whose behaviors are sometimes hard to predict but may influence the merging vehicle optimal actions. To tackle such a complicated control problem, we propose to apply Deep Reinforcement Learning (DRL) techniques for finding an optimal driving policy by maximizing the long-term reward in an interactive environment. Specifically, we apply a Long Short-Term Memory (LSTM) architecture to model the interactive environment, from which an internal state containing historical driving information is conveyed to a Deep Q-Network (DQN). The DQN is used to approximate the Q-function, which takes the internal state as input and generates Q-values as output for action selection. With this DRL architecture, the historical impact of interactive environment on the long-term reward can be captured and taken into account for deciding the optimal control policy. The proposed architecture has the potential to be extended and applied to other autonomous driving scenarios such as driving through a complex intersection or changing lanes under varying traffic flow conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1709.02066

PDF

http://arxiv.org/pdf/1709.02066
Read All
Application Specific Drone Simulators: Recent Advances and Challenges

2019-02-02

Aakif Mairaj, Asif I. Baba, Ahmad Y. Javaid

arXiv_RO

arXiv_RO Attention Drone Survey
Abstract

Over the past two decades, Unmanned Aerial Vehicles (UAVs), more commonly known as drones, have gained a lot of attention, and are rapidly becoming ubiquitous because of their diverse applications such as surveillance, disaster management, pollution monitoring, film-making, and military reconnaissance. However, incidents such as fatal system failures, malicious attacks, and disastrous misuses have raised concerns in the recent past. Security and viability concerns in drone-based applications are growing at an alarming rate. Besides, UAV networks (UAVNets) are distinctive from other ad-hoc networks. Therefore, it is necessary to address these issues to ensure proper functioning of these UAVs while keeping their uniqueness in mind. Furthermore, adequate security and functionality require the consideration of many parameters that may include an accurate cognizance of the working mechanism of vehicles, geographical and weather conditions, and UAVNet communication. This is achievable by creating a simulator that includes these aspects. A performance evaluation through relevant drone simulator becomes indispensable procedure to test features, configurations, and designs to demonstrate superiority to comparative schemes and suitability. Thus, it becomes of paramount importance to establish the credibility of simulation results by investigating the merits and limitations of each simulator prior to selection. Based on this motivation, we present a comprehensive survey of current drone simulators. In addition, open research issues and research challenges are discussed and presented.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00616

PDF

http://arxiv.org/pdf/1902.00616
Read All
Confidence Trigger Detection: An Approach to Build Real-time Tracking-by-detection System

2019-02-02

Zhicheng Ding, Edward Wong

arXiv_CV

arXiv_CV Object_Detection Tracking Deep_Learning Detection
Abstract

With deep learning based image analysis getting popular in recent years, a lot of multiple objects tracking applications are in demand. Some of these applications (e.g., surveillance camera, intelligent robotics, and autonomous driving) require the system runs in real-time. Though recent proposed methods reach fairly high accuracy, the speed is still slower than real-time application requirement. In order to increase tracking-by-detection system’s speed for real-time tracking, we proposed confidence trigger detection (CTD) approach which uses confidence of tracker to decide when to trigger object detection. Using this approach, system can safely skip detection of images frames that objects barely move. We had studied the influence of different confidences in three popular detectors separately. Though we found trade-off between speed and accuracy, our approach reaches higher accuracy at given speed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00615

PDF

http://arxiv.org/pdf/1902.00615
Read All
Understanding Composition of Word Embeddings via Tensor Decomposition

2019-02-02

Abraham Frandsen, Rong Ge

arXiv_CL

arXiv_CL Embedding Relation
Abstract

Word embedding is a powerful tool in natural language processing. In this paper we consider the problem of word embedding composition -– given vector representations of two words, compute a vector for the entire phrase. We give a generative model that can capture specific syntactic relations between words. Under our model, we prove that the correlations between three words (measured by their PMI) form a tensor that has an approximate low rank Tucker decomposition. The result of the Tucker decomposition gives the word embeddings as well as a core tensor, which can be used to produce better compositions of the word embeddings. We also complement our theoretical results with experiments that verify our assumptions, and demonstrate the effectiveness of the new composition method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00613

PDF

http://arxiv.org/pdf/1902.00613
Read All
Path Tracking of Highly Dynamic Autonomous Vehicle Trajectories via Iterative Learning Control

2019-02-02

Nitin R. Kapania, J. Christian Gerdes

arXiv_RO

arXiv_RO Tracking
Abstract

Iterative learning control has been successfully used for several decades to improve the performance of control systems that perform a single repeated task. Using information from prior control executions, learning controllers gradually determine open-loop control inputs whose reference tracking performance can exceed that of traditional feedback-feedforward control algorithms. This paper considers iterative learning control for a previously unexplored field: autonomous racing. Racecars are driven multiple laps around the same sequence of turns while operating near the physical limits of tire-road friction, where steering dynamics become highly nonlinear and transient, making accurate path tracking difficult. However, because the vehicle trajectory is identical for each lap in the case of single-car racing, the nonlinear vehicle dynamics and unmodelled road conditions are repeatable and can be accounted for using iterative learning control, provided the tire force limits have not been exceeded. This paper describes the design and application of proportional-derivative (PD) and quadratically optimal (Q-ILC) learning algorithms for multiple-lap path tracking of an autonomous race vehicle. Simulation results are used to tune controller gains and test convergence, and experimental results are presented on an Audi TTS race vehicle driving several laps around Thunderhill Raceway in Willows, CA at lateral accelerations of up to 8 $\mathrm{m/s^2}$. Both control algorithms are able to correct transient path tracking errors and improve the performance provided by a reference feedforward controller.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00611

PDF

http://arxiv.org/pdf/1902.00611
Read All
A Sequential Two-Step Algorithm for Fast Generation of Vehicle Racing Trajectories

2019-02-02

Nitin R. Kapania, John Subosits, J Christian Gerdes

arXiv_RO

arXiv_RO Optimization Gradient_Descent
Abstract

The problem of maneuvering a vehicle through a race course in minimum time requires computation of both longitudinal (brake and throttle) and lateral (steering wheel) control inputs. Unfortunately, solving the resulting nonlinear optimal control problem is typically computationally expensive and infeasible for real-time trajectory planning. This paper presents an iterative algorithm that divides the path generation task into two sequential subproblems that are significantly easier to solve. Given an initial path through the race track, the algorithm runs a forward-backward integration scheme to determine the minimum-time longitudinal speed profile, subject to tire friction constraints. With this fixed speed profile, the algorithm updates the vehicle’s path by solving a convex optimization problem that minimizes the resulting path curvature while staying within track boundaries and obeying affine, time-varying vehicle dynamics constraints. This two-step process is repeated iteratively until the predicted lap time no longer improves. While providing no guarantees of convergence or a globally optimal solution, the approach performs very well when validated on the Thunderhill Raceway course in Willows, CA. The predicted lap time converges after four to five iterations, with each iteration over the full 4.5 km race course requiring only thirty seconds of computation time on a laptop computer. The resulting trajectory is experimentally driven at the race circuit with an autonomous Audi TTS test vehicle, and the resulting lap time and racing line is comparable to both a nonlinear gradient descent solution and a trajectory recorded from a professional racecar driver. The experimental results indicate that the proposed method is a viable option for online trajectory planning in the near future.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00606

PDF

http://arxiv.org/pdf/1902.00606
Read All
Progressive Explanation Generation for Human-robot Teaming

2019-02-02

Yu Zhang, Mehrdad Zakershahrak

arXiv_AI

arXiv_AI
Abstract

Generating explanation to explain its behavior is an essential capability for a robotic teammate. Explanations help human partners better understand the situation and maintain trust of their teammates. Prior work on robot generating explanations focuses on providing the reasoning behind its decision making. These approaches, however, fail to heed the cognitive requirement of understanding an explanation. In other words, while they provide the right explanations from the explainer’s perspective, the explainee part of the equation is ignored. In this work, we address an important aspect along this direction that contributes to a better understanding of a given explanation, which we refer to as the progressiveness of explanations. A progressive explanation improves understanding by limiting the cognitive effort required at each step of making the explanation. As a result, such explanations are expected to be smoother and hence easier to understand. A general formulation of progressive explanation is presented. Algorithms are provided based on several alternative quantifications of cognitive effort as an explanation is being made, which are evaluated in a standard planning competition domain.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00604

PDF

http://arxiv.org/pdf/1902.00604
Read All
A Hybrid Control Design for Autonomous Vehicles at Uncontrolled Intersections

2019-02-02

Nitin R. Kapania, Vijay Govindarajan, Francesco Borrelli, J. Christian Gerdes

arXiv_RO

arXiv_RO
Abstract

As autonomous vehicles (AVs) inch closer to reality, a central requirement for acceptance will be earning the trust of humans in everyday driving situations. In particular, the interaction between AVs and pedestrians is of high importance, as every human is a pedestrian at some point of the day. This paper considers the interaction of a pedestrian and an autonomous vehicle at a mid-block, unsignalized intersection where there is ambiguity over when the pedestrian should cross and when and how the vehicle should yield. By modeling pedestrian behavior through the concept of gap acceptance, the authors show that a hybrid controller with just four distinct modes allows an autonomous vehicle to successfully interact with a pedestrian across a continuous spectrum of possible crosswalk entry behaviors. The controller is validated through extensive simulation and compared to an alternate POMDP solution and experimental results are provided on a research vehicle for a virtual pedestrian.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00597

PDF

http://arxiv.org/pdf/1902.00597
Read All
Character-based Surprisal as a Model of Human Reading in the Presence of Errors

2019-02-02

Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

arXiv_CL

arXiv_CL Tracking
Abstract

Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: upcoming words are more difficultto predict when the context is degraded, leading to increased surprisal.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00595

PDF

http://arxiv.org/pdf/1902.00595
Read All
An end-to-end Generative Retrieval Method for Sponsored Search Engine --Decoding Efficiently into a Closed Target Domain

2019-02-02

Yijiang Lian, Zhijie Chen, Jinlong Hu, Kefeng Zhang, Chunwei Yan, Muchenxuan Tong, Wenying Han, Hanju Guan, Ying Li, Ying Cao, Yang Yu, Zhigang Li, Xiaochun Liu, Yue Wang

arXiv_CL

arXiv_CL NMT Inference
Abstract

In this paper, we present a generative retrieval method for sponsored search engine, which uses neural machine translation (NMT) to generate keywords directly from query. This method is completely end-to-end, which skips query rewriting and relevance judging phases in traditional retrieval systems. Different from standard machine translation, the target space in the retrieval setting is a constrained closed set, where only committed keywords should be generated. We present a Trie-based pruning technique in beam search to address this problem. The biggest challenge in deploying this method into a real industrial environment is the latency impact of running the decoder. Self-normalized training coupled with Trie-based dynamic pruning dramatically reduces the inference time, yielding a speedup of more than 20 times. We also devise an mixed online-offline serving architecture to reduce the latency and CPU consumption. To encourage the NMT to generate new keywords uncovered by the existing system, training data is carefully selected. This model has been successfully applied in Baidu’s commercial search engine as a supplementary retrieval branch, which has brought a remarkable revenue improvement of more than 10 percents.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.00592

PDF

https://arxiv.org/pdf/1902.00592
Read All
2019-05-31

Read All
2019-05-31

Read All
Learning Navigation Behaviors End-to-End with AutoRL

2019-02-01

Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

We learn end-to-end point-to-point and path-following navigation behaviors that avoid moving obstacles. These policies receive noisy lidar observations and output robot linear and angular velocities. The policies are trained in small, static environments with AutoRL, an evolutionary automation layer around Reinforcement Learning (RL) that searches for a deep RL reward and neural network architecture with large-scale hyper-parameter optimization. AutoRL first finds a reward that maximizes task completion, and then finds a neural network architecture that maximizes the cumulative of the found reward. Empirical evaluations, both in simulation and on-robot, show that AutoRL policies do not suffer from the catastrophic forgetfulness that plagues many other deep reinforcement learning algorithms, generalize to new environments and moving obstacles, are robust to sensor, actuator, and localization noise, and can serve as robust building blocks for larger navigation tasks. Our path-following and point-to-point policies are respectively 23% and 26% more successful than comparison methods across new environments. Video at: https://youtu.be/0UwkjpUEcbI

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.10124

PDF

http://arxiv.org/pdf/1809.10124
Read All
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

2019-02-01

Zhe Gan, Yu Cheng, Ahmed EI Kholy, Linjie Li, Jingjing Liu, Jianfeng Gao

arXiv_CV

arXiv_CV Attention
Abstract

This paper presents Recurrent Dual Attention Network (ReDAN) for visual dialog, using multi-step reasoning to answer a series of questions about an image. In each turn of the dialog, ReDAN infers answers progressively through multiple steps. In each step, a recurrently-updated semantic representation of the (refined) query is used for iterative reasoning over both the image and previous dialog history. Experimental results on VisDial v1.0 dataset show that the proposed ReDAN model outperforms prior state-of-the-art approaches across multiple evaluation metrics. Visualization on the iterative reasoning process further demonstrates that ReDAN can locate context-relevant visual and textual clues leading to the correct answers step-by-step.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00579

PDF

http://arxiv.org/pdf/1902.00579
Read All
Robustness of Generalized Learning Vector Quantization Models against Adversarial Attacks

2019-02-01

Sascha Saralajew, Lars Holdijk, Maike Rees, Thomas Villmann

arXiv_AI

arXiv_AI Adversarial
Abstract

Adversarial attacks and the development of (deep) neural networks robust against them are currently two widely researched topics. The robustness of Learning Vector Quantization (LVQ) models against adversarial attacks has however not yet been studied to the same extend. We therefore present an extensive evaluation of three LVQ models: Generalized LVQ, Generalized Matrix LVQ and Generalized Tangent LVQ. The evaluation suggests that both Generalized LVQ and Generalized Tangent LVQ have a high base robustness, on par with the current state-of-the-art in robust neural network methods. In contrast to this, Generalized Matrix LVQ shows a high susceptibility to adversarial attacks, scoring consistently behind all other models. Additionally, our numerical evaluation indicates that increasing the number of prototypes per class improves the robustness of the models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00577

PDF

http://arxiv.org/pdf/1902.00577
Read All
Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed

2019-02-01

Atta Norouzian, Bogdan Mazoure, Dermot Connolly, Daniel Willett

arXiv_CL

arXiv_CL Attention CNN Classification
Abstract

Voice controlled virtual assistants (VAs) are now available in smartphones, cars, and standalone devices in homes. In most cases, the user needs to first “wake-up” the VA by saying a particular word/phrase every time he or she wants the VA to do something. Eliminating the need for saying the wake-up word for every interaction could improve the user experience. This would require the VA to have the capability to detect the speech that is being directed at it and respond accordingly. In other words, the challenge is to distinguish between system-directed and non-system-directed speech utterances. In this paper, we present a number of neural network architectures for tackling this classification problem based on using only acoustic features. These architectures are based on using convolutional, recurrent and feed-forward layers. In addition, we investigate the use of an attention mechanism applied to the output of the convolutional and the recurrent layers. It is shown that incorporating the proposed attention mechanism into the models always leads to significant improvement in classification accuracy. The best model achieved equal error rates of 16.25 and 15.62 percents on two distinct realistic datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00570

PDF

http://arxiv.org/pdf/1902.00570
Read All
The Efficacy of SHIELD under Different Threat Models

2019-02-01

Cory Cornelius

arXiv_AI

arXiv_AI Adversarial Face Prediction
Abstract

We study the efficacy of SHIELD in the face of alternative threat models. We find that SHIELD’s robustness decreases by 65% (accuracy drops from 63% to 22%) against an adaptive adversary (one who knows JPEG compression is being used as a pre-processing step but not necessarily the compression level) in the gray-box threat model (adversary is aware of the model architecture but not necessarily the weights of that model). However, these adversarial examples are, so far, unable to force a targeted prediction. We also find that the robustness of the JPEG-trained models used in SHIELD decreases by 67% (accuracy drops from 57% to 19% on average) against an adaptive adversary in the gray-box threat model. The addition of SLQ pre-processing to these JPEG-trained models is also not a robust defense (accuracy drops to 0.1%) against an adaptive adversary in the gray-box threat model, and an adversary can create adversarial perturbations that force a chosen prediction. We find that neither JPEG-trained models with SLQ pre-processing nor SHIELD are robust against an adaptive adversary in the white-box threat model (accuracy is 0.1%) and the adversary can control the predicted output of their adversarial images. Finally, ensemble-based attacks transfer better (29.8% targeted accuracy) than non-ensemble based attacks (1.4%) against the JPEG-trained models in SHIELD.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00541

PDF

http://arxiv.org/pdf/1902.00541
Read All
Multi-layered Cepstrum for Instantaneous Frequency Estimation

2019-02-01

Chin-Yun Yu, Li Su

arXiv_SD

arXiv_SD Salient
Abstract

We propose the multi-layered cepstrum (MLC) method to estimate multiple fundamental frequencies (MF0) of a signal under challenging contamination such as high-pass filter noise. Taking the operation of cepstrum (i.e., Fourier transform, filtering, and nonlinear activation) recursively, MLC is shown as an efficient method to enhance MF0 saliency in a step-by-step manner. Evaluation on a real-world polyphonic music dataset under both normal and low-fidelity conditions demonstrates the potential of MLC.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00539

PDF

http://arxiv.org/pdf/1902.00539
Read All
The Hanabi Challenge: A New Frontier for AI Research

2019-02-01

Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

arXiv_AI

arXiv_AI
Abstract

From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay and imperfect information in a two to five player setting. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques capable of imbuing artificial agents with such theory of mind will not only be crucial for their success in Hanabi, but also in broader collaborative efforts, and especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00506

PDF

http://arxiv.org/pdf/1902.00506
Read All
Learning Differentiable Grammars for Continuous Data

2019-02-01

AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo

arXiv_CV

arXiv_CV
Abstract

This paper proposes a novel algorithm which learns a formal regular grammar from real-world continuous data, such as videos or other streaming data. Learning latent terminals, non-terminals, and productions rules directly from streaming data allows the construction of a generative model capturing sequential structures with multiple possibilities. Our model is fully differentiable, and provides easily interpretable results which are important in order to understand the learned structures. It outperforms the state-of-the-art on several challenging datasets and is more accurate for forecasting future activities in videos. We plan to open-source the code.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00505

PDF

http://arxiv.org/pdf/1902.00505
Read All
Examining the Presence of Gender Bias in Customer Reviews Using Word Embedding

2019-02-01

A. Mishra, H. Mishra, S. Rathee

arXiv_CL

arXiv_CL Review Segmentation Embedding Recommendation
Abstract

Humans have entered the age of algorithms. Each minute, algorithms shape countless preferences from suggesting a product to a potential life partner. In the marketplace algorithms are trained to learn consumer preferences from customer reviews because user-generated reviews are considered the voice of customers and a valuable source of information to firms. Insights mined from reviews play an indispensable role in several business activities ranging from product recommendation, targeted advertising, promotions, segmentation etc. In this research, we question whether reviews might hold stereotypic gender bias that algorithms learn and propagate Utilizing data from millions of observations and a word embedding approach, GloVe, we show that algorithms designed to learn from human language output also learn gender bias. We also examine why such biases occur: whether the bias is caused because of a negative bias against females or a positive bias for males. We examine the impact of gender bias in reviews on choice and conclude with policy implications for female consumers, especially when they are unaware of the bias, and the ethical implications for firms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00496

PDF

http://arxiv.org/pdf/1902.00496
Read All
Human acceptability judgements for extractive sentence compression

2019-02-01

Abram Handler, Brian Dillon, Brendan O'Connor

arXiv_CL

arXiv_CL
Abstract

Recent approaches to English-language sentence compression rely on parallel corpora consisting of sentence-compression pairs. However, a sentence may be shortened in many different ways, which each might be suited to the needs of a particular application. Therefore, in this work, we collect and model crowdsourced judgements of the acceptability of many possible sentence shortenings. We then show how a model of such judgements can be used to support a flexible approach to the compression task. We release our model and dataset for future work.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00489

PDF

http://arxiv.org/pdf/1902.00489
Read All
Top-view Trajectories: A Pedestrian Dataset of Vehicle-Crowd Interaction from Controlled Experiments and Crowded Campus

2019-02-01

Dongfang Yang, Linhui Li, Keith Redmill, Ümit Özgüner

arXiv_CV

arXiv_CV Drone
Abstract

Predicting the collective motion of a group of pedestrians (a crowd) under the vehicle influence is essential for the development of autonomous vehicles to deal with mixed urban scenarios where interpersonal interaction and vehicle-crowd interaction (VCI) are significant. This usually requires a model that can describe individual pedestrian motion under the influence of nearby pedestrians and the vehicle. This study proposed two pedestrian trajectory dataset, CITR dataset and DUT dataset, so that the pedestrian motion models can be further calibrated and verified, especially when vehicle influence on pedestrians plays an important role. CITR dataset consists of experimentally designed fundamental VCI scenarios (front, back, and lateral VCIs) and provides unique ID for each pedestrian, which is suitable for exploring a specific aspect of VCI. DUT dataset gives two ordinary and natural VCI scenarios in crowded university campus, which can be used for more general purpose VCI exploration. The trajectories of pedestrians, as well as vehicles, were extracted by processing video frames that come from a down-facing camera mounted on a hovering drone as the recording equipment. The final trajectories were refined by a Kalman Filter, in which the pedestrian velocity was also estimated. The statistics of the velocity magnitude distribution demonstrated the validity of the proposed dataset. In total, there are approximate 340 pedestrian trajectories in CITR dataset and 1793 pedestrian trajectories in DUT dataset. The dataset is available at GitHub.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00487

PDF

http://arxiv.org/pdf/1902.00487
Read All
Memorization in Overparameterized Autoencoders

2019-02-01

Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

arXiv_CV

arXiv_CV CNN
Abstract

Memorization of data in deep neural networks has become a subject of significant research interest. We prove that over-parameterized single layer fully connected autoencoders memorize training data: they produce outputs in (a non-linear version of) the span of the training examples. In contrast to fully connected autoencoders, we prove that depth is necessary for memorization in convolutional autoencoders. Moreover, we observe that adding nonlinearity to deep convolutional autoencoders results in a stronger form of memorization: instead of outputting points in the span of the training images, deep convolutional autoencoders tend to output individual training images. Since convolutional autoencoder components are building blocks of deep convolutional networks, we envision that our findings will shed light on the important phenomenon of memorization in over-parameterized deep networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.10333

PDF

http://arxiv.org/pdf/1810.10333
Read All
Active Estimation of 3D Lines in Spherical Coordinates

2019-02-01

André Mateus, Omar Tahri, Pedro Miraldo

arXiv_RO

arXiv_RO
Abstract

Straight lines are common features in human made environments. They are a richer feature than points, since they yield more information about the environment (these are one degree features instead of the zero degrees of points). Besides, they are easier to detect and track in image sensors. Having a robust estimation of the 3D parameters of a line measured from an image is a must for several control applications, such as Visual Servoing. In this work, a classical dynamical system that models the apparent motion of lines in a moving camera’s image is presented. In order to obtain the 3D structure of lines, a nonlinear observer is proposed. However, in order to guarantee convergence, the dynamical system must be coupled with an algebraic equation. This is achieved by using spherical coordinates to represent the line’s moment vector and a change of basis, which allows to introduce the algebraic constraint directly on the system’s dynamics. Finally, a control law that attempts to optimize the convergence behavior of the observer is presented. The approach is validated in simulation, and with a real robotic platform with a camera onboard.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00473

PDF

http://arxiv.org/pdf/1902.00473
Read All
SCATGAN for Reconstruction of Ultrasound Scatterers Using Generative Adversarial Networks

2019-02-01

Andrawes Al Bahou, Christine Tanner, Orcun Goksel

arXiv_CV

arXiv_CV Adversarial GAN Optimization Inference
Abstract

Computational simulation of ultrasound (US) echography is essential for training sonographers. Realistic simulation of US interaction with microscopic tissue structures is often modeled by a tissue representation in the form of point scatterers, convolved with a spatially varying point spread function. This yields a realistic US B-mode speckle texture, given that a scatterer representation for a particular tissue type is readily available. This is often not the case and scatterers are nontrivial to determine. In this work we propose to estimate scatterer maps from sample US B-mode images of a tissue, by formulating this inverse mapping problem as image translation, where we learn the mapping with Generative Adversarial Networks, using a US simulation software for training. We demonstrate robust reconstruction results, invariant to US viewing and imaging settings such as imaging direction and center frequency. Our method is shown to generalize beyond the trained imaging settings, demonstrated on in-vivo US data. Our inference runs orders of magnitude faster than optimization-based techniques, enabling future extensions for reconstructing 3D B-mode volumes with only linear computational complexity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00469

PDF

http://arxiv.org/pdf/1902.00469
Read All
TF-Replicator: Distributed Machine Learning for Researchers

2019-02-01

Peter Buchlovsky, David Budden, Dominik Grewe, Chris Jones, John Aslanides, Frederic Besse, Andy Brock, Aidan Clark, Sergio Gómez Colmenarejo, Aedan Pope, Fabio Viola, Dan Belov

arXiv_AI

arXiv_AI GAN Reinforcement_Learning Classification
Abstract

We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchronous or asynchronous training regimes. To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet image generation, and (3) a D4PG reinforcement learning agent for continuous control. Our results show strong scalability performance without demanding any distributed systems expertise of the user. The TF-Replicator programming model will be open-sourced as part of TensorFlow 2.0 (see https://github.com/tensorflow/community/pull/25).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00465

PDF

http://arxiv.org/pdf/1902.00465
Read All
Efficient Hybrid Network Architectures for Extremely Quantized Neural Networks Enabling Intelligence at the Edge

2019-02-01

Indranil Chakraborty, Deboleena Roy, Aayush Ankit, Kaushik Roy

arXiv_CV

arXiv_CV Classification
Abstract

The recent advent of `Internet of Things’ (IOT) has increased the demand for enabling AI-based edge computing. This has necessitated the search for efficient implementations of neural networks in terms of both computations and storage. Although extreme quantization has proven to be a powerful tool to achieve significant compression over full-precision networks, it can result in significant degradation in performance. In this work, we propose extremely quantized hybrid network architectures with both binary and full-precision sections to emulate the classification performance of full-precision networks while ensuring significant energy efficiency and memory compression. We explore several hybrid network architectures and analyze the performance of the networks in terms of accuracy, energy efficiency and memory compression. We perform our analysis on ResNet and VGG network architectures. Among the proposed network architectures, we show that the hybrid networks with full-precision residual connections emerge as the optimum by attaining accuracies close to full-precision networks while achieving excellent memory compression, up to 21.8x in case of VGG-19. This work demonstrates an effective way of hybridizing networks which achieve performance close to full-precision networks while attaining significant compression, furthering the feasibility of using such networks for energy-efficient neural computing in IOT-based edge devices.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.00460

PDF

https://arxiv.org/pdf/1902.00460
Read All
Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs

2019-02-01

Andres Milioto, Cyrill Stachniss

arXiv_CV

arXiv_CV Knowledge Segmentation Face CNN Semantic_Segmentation Prediction
Abstract

The ability to interpret a scene is an important capability for a robot that is supposed to interact with its environment. The knowledge of what is in front of the robot is, for example, relevant for navigation, manipulation, or planning. Semantic segmentation labels each pixel of an image with a class label and thus provides a detailed semantic annotation of the surroundings to the robot. Convolutional neural networks (CNNs) are popular methods for addressing this type of problem. The available software for training and the integration of CNNs for real robots, however, is quite fragmented and often difficult to use for non-experts, despite the availability of several high-quality open-source frameworks for neural network implementation and training. In this paper, we propose a tool called Bonnet, which addresses this fragmentation problem by building a higher abstraction that is specific for the semantic segmentation task. It provides a modular approach to simplify the training of a semantic segmentation CNN independently of the used dataset and the intended task. Furthermore, we also address the deployment on a real robotic platform. Thus, we do not propose a new CNN approach in this paper. Instead, we provide a stable and easy-to-use tool to make this technology more approachable in the context of autonomous systems. In this sense, we aim at closing a gap between computer vision research and its use in robotics research. We provide an open-source codebase for training and deployment. The training interface is implemented in Python using TensorFlow and the deployment interface provides a C++ library that can be easily integrated in an existing robotics codebase, a ROS node, and two standalone applications for label prediction in images and videos.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.08960

PDF

http://arxiv.org/pdf/1802.08960
Read All
tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

2019-02-01

Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak

arXiv_CL

arXiv_CL Knowledge Attention Text_Classification Classification Prediction
Abstract

The use of background knowledge remains largely unexploited in many text classification tasks. In this work, we explore word taxonomies as means for constructing new semantic features, which may improve the performance and robustness of the learned classifiers. We propose tax2vec, a parallel algorithm for constructing taxonomy based features, and demonstrate its use on six short-text classification problems, including gender, age and personality type prediction, drug effectiveness and side effect prediction, and news topic prediction. The experimental results indicate that the interpretable features constructed using tax2vec can notably improve the performance of classifiers; the constructed features, in combination with fast, linear classifiers tested against strong baselines, such as hierarchical attention neural networks, achieved comparable or better classification results on short documents. Further, tax2vec can also serve for extraction of corpus-specific keywords. Finally, we investigated the semantic space of potential features where we observe a similarity with the well known Zipf’s law.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00438

PDF

http://arxiv.org/pdf/1902.00438
Read All
Do we train on test data? Purging CIFAR of near-duplicates

2019-02-01

Björn Barz, Joachim Denzler

arXiv_CV

arXiv_CV Classification Recognition
Abstract

We find that 3.3% and 10% of the images from the CIFAR-10 and CIFAR-100 test sets, respectively, have duplicates in the training set. This may incur a bias on the comparison of image recognition techniques with respect to their generalization capability on these heavily benchmarked datasets. To eliminate this bias, we provide the “fair CIFAR” (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. The training set remains unchanged, in order not to invalidate pre-trained models. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. Fortunately, this does not seem to be the case yet. The ciFAIR dataset and pre-trained models are available at https://cvjena.github.io/cifair/, where we also maintain a leaderboard.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00423

PDF

http://arxiv.org/pdf/1902.00423
Read All
Pattern-Based Approach to the Workflow Satisfiability Problem with User-Independent Constraints

2019-02-01

Daniel Karapetyan, Andrew J. Parkes, Gregory Gutin, Andrei Gagarin

arXiv_AI

arXiv_AI
Abstract

The fixed parameter tractable (FPT) approach is a powerful tool in tackling computationally hard problems. In this paper, we link FPT results to classic artificial intelligence (AI) techniques to show how they complement each other. Specifically, we consider the workflow satisfiability problem (WSP) which asks whether there exists an assignment of authorised users to the steps in a workflow specification, subject to certain constraints on the assignment. It was shown by Cohen et al. (JAIR 2014) that WSP restricted to the class of user-independent constraints (UI), covering many practical cases, admits FPT algorithms, i.e. can be solved in time exponential only in the number of steps $k$ and polynomial in the number of users $n$. Since usually $k \ll n$ in WSP, such FPT algorithms are of great practical interest. We present a new interpretation of the FPT nature of the WSP with UI constraints giving a decomposition of the problem into two levels. Exploiting this two-level split, we develop a new FPT algorithm that is by many orders of magnitude faster than the previous state-of-the-art WSP algorithm and also has only polynomial-space complexity. We also introduce new pseudo-Boolean (PB) and Constraint Satisfaction (CSP) formulations of the WSP with UI constraints which efficiently exploit this new decomposition of the problem and raise the novel issue of how to use general-purpose solvers to tackle FPT problems in a fashion that meets FPT efficiency expectations. In our computational study, we investigate, for the first time, the phase transition (PT) properties of the WSP, under a model for generation of random instances. We show how PT studies can be extended, in a novel fashion, to support empirical evaluation of scaling of FPT algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1604.05636

PDF

http://arxiv.org/pdf/1604.05636
Read All
Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI

2019-02-01

Thomas Sanchez, Baran Gözcü, Ruud B. van Heeswijk, Efe Ilıcak, Tolga Çukur, and Volkan Cevher

arXiv_CV

arXiv_CV Optimization
Abstract

Slow acquisition has been one of the historical problems in dynamic magnetic resonance imaging (dMRI), but the rise of compressed sensing (CS) has brought numerous algorithms that successfully achieve high acceleration rates. While CS proposes random sampling for data acquisition, practical CS applications to dMRI have typically relied on random variable-density (VD) sampling patterns, where masks are drawn from probabilistic models, which preferably sample from the center of the Fourier domain. In contrast to this model-driven approach, we propose the first data-driven, scalable framework for optimizing sampling patterns in dMRI. Through a greedy algorithm, this approach allows the data to directly govern the search for a mask that exhibits good empirical performance. Previous greedy approach, designed for static MRI, required very intensive computations, prohibiting their direct application to dMRI, and we address this issue by resorting to a stochastic greedy algorithm that exploits only a fraction of resources compared to the previous approach without sacrificing the reconstruction accuracy. A thorough comparison on in vivo datasets shows the inefficiency of model-based approaches in terms of sampling performance and suggests that our data-driven sampling approach could fully enable the potential of CS applied to dMRI.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00386

PDF

http://arxiv.org/pdf/1902.00386
Read All
Learnable Embedding Space for Efficient Neural Architecture Compression

2019-02-01

Shengcao Cao, Xiaofang Wang, Kris M. Kitani

arXiv_CV

arXiv_CV Reinforcement_Learning Embedding Optimization
Abstract

We propose a method to incrementally learn an embedding space over the domain of network architectures, to enable the careful selection of architectures for evaluation during compressed architecture search. Given a teacher network, we search for a compressed network architecture by using Bayesian Optimization (BO) with a kernel function defined over our proposed embedding space to select architectures for evaluation. We demonstrate that our search algorithm can significantly outperform various baseline methods, such as random search and reinforcement learning (Ashok et al., 2018). The compressed architectures found by our method are also better than the state-of-the-art manually-designed compact architecture ShuffleNet (Zhang et al., 2018). We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00383

PDF

http://arxiv.org/pdf/1902.00383
Read All
Intelligent architectures for robotics: The merging of cognition and emotion

2019-02-01

Luiz Pessoa

arXiv_AI

arXiv_AI
Abstract

What is the place of emotion in intelligent robots? In the past two decades, researchers have advocated for the inclusion of some emotion-related components in the general information processing architecture of autonomous agents, say, for better communication with humans, or to instill a sense of urgency to action. The framework advanced here goes beyond these approaches and proposes that emotion and motivation need to be integrated with all aspects of the architecture. Thus, cognitive-emotional integration is a key design principle. Emotion is not an “add on” that endows a robot with “feelings” (for instance, reporting or expressing its internal state). It allows the significance of percepts, plans, and actions to be an integral part of all its computations. It is hypothesized that a sophisticated artificial intelligence cannot be built from separate cognitive and emotional modules. A hypothetical test inspired by the Turing test, called the Dolores test, is proposed to test this assertion.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00363

PDF

http://arxiv.org/pdf/1902.00363
Read All
Projection-Based 2.5D U-net Architecture for Fast Volumetric Segmentation

2019-02-01

Christoph Angermann, Markus Haltmeier, Ruth Steiger, Sergiy Pereverzyev Jr, Elke Gizewski

arXiv_CV

arXiv_CV Sparse Segmentation CNN
Abstract

Convolutional neural networks are state-of-the-art for various segmentation tasks. While for 2D images these networks are also computationally efficient, 3D convolutions have huge storage requirements and require long training time. To overcome this issue, we introduce a network structure for volumetric data without 3D convolution layers. The main idea is to integrate projection layers to transform the volumetric data to a sequence of images, where each image contains information of the full data. We then apply 2D convolutions to the projection images followed by lifting to a volumetric data. The proposed network structure can be trained in much less time than any 3D-network and still shows accurate performance for a sparse binary segmentation task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00347

PDF

http://arxiv.org/pdf/1902.00347
Read All
Learning icons appearance similarity

2019-02-01

Manuel Lagunas, Elena Garces, Diego Gutierrez

arXiv_CV

arXiv_CV Knowledge GAN
Abstract

Selecting an optimal set of icons is a crucial step in the pipeline of visual design to structure and navigate through content. However, designing the icons sets is usually a difficult task for which expert knowledge is required. In this work, to ease the process of icon set selection to the users, we propose a similarity metric which captures the properties of style and visual identity. We train a Siamese Neural Network with an online dataset of icons organized in visually coherent collections that are used to adaptively sample training data and optimize the training process. As the dataset contains noise, we further collect human-rated information on the perception of icon’s similarity which will be used for evaluating and testing the proposed model. We present several results and applications based on searches, kernel visualizations and optimized set proposals that can be helpful for designers and non-expert users while exploring large collections of icons.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05378

PDF

http://arxiv.org/pdf/1902.05378
Read All
SensitiveNets: Learning Agnostic Representations with Application to Face Recognition

2019-02-01

Aythami Morales, Julian Fierrez, Ruben Vera-Rodriguez

arXiv_CV

arXiv_CV Face Recognition Face_Recognition
Abstract

This work proposes a new neural network feature representation that help to leave out sensitive information in the decision-making process of pattern recognition and machine learning algorithms. The aim of this work is to develop a learning method capable to remove certain information from the feature space without drop of performance in a recognition task based on that feature space. Our work is in part motivated by the new international regulation for personal data protection, which forces data controllers to avoid discriminative hazards while managing sensitive data of users. Our method is based on a triplet loss learning generalization that introduces a sensitive information removal process. The method is evaluated on face recognition technologies using state-of-the-art algorithms and publicly available benchmarks. In addition, we present a new annotation dataset with balanced distribution between genders and ethnic origins. The dataset includes more than 120K images from 24K identities with variety of poses, image quality, facial expressions, and illumination. The experiments demonstrate that it is possible to reduce sensitive information such as gender or ethnicity in the feature representation while retaining competitive performance in a face recognition task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00334

PDF

http://arxiv.org/pdf/1902.00334
Read All
2019-05-31

Read All

168/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL