Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model

2018-06-12

Shaohui Kuang, Deyi Xiong

arXiv_CL

arXiv_CL NMT
Abstract

Neural machine translation (NMT) systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring inter-sentence information. This may make the translation of a sentence ambiguous or even inconsistent with the translations of neighboring sentences. In order to handle this issue, we propose an inter-sentence gate model that uses the same encoder to encode two adjacent sentences and controls the amount of information flowing from the preceding sentence to the translation of the current sentence with an inter-sentence gate. In this way, our proposed model can capture the connection between sentences and fuse recency from neighboring sentences into neural machine translation. On several NIST Chinese-English translation tasks, our experiments demonstrate that the proposed inter-sentence gate model achieves substantial improvements over the baseline.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.04466

PDF

https://arxiv.org/pdf/1806.04466
Read All
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

2018-06-12

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich

arXiv_CV

arXiv_CV Classification
Abstract

Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process that incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.02257

PDF

https://arxiv.org/pdf/1711.02257
Read All
Object detection and tracking benchmark in industry based on improved correlation filter

2018-06-12

Shangzhen Luan, Yan Li, Xiaodi Wang, Baochang Zhang

arXiv_CV

arXiv_CV Object_Detection Tracking Detection Relation
Abstract

Real-time object detection and tracking have shown to be the basis of intelligent production for industrial 4.0 applications. It is a challenging task because of various distorted data in complex industrial setting. The correlation filter (CF) has been used to trade off the low-cost computation and high performance. However, traditional CF training strategy can not get satisfied performance for the various industrial data; because the simple sampling(bagging) during training process will not find the exact solutions in a data space with a large diversity. In this paper, we propose Dijkstra-distance based correlation filters (DBCF), which establishes a new learning framework that embeds distribution-related constraints into the multi-channel correlation filters (MCCF). DBCF is able to handle the huge variations existing in the industrial data by improving those constraints based on the shortest path among all solutions. To evaluate DBCF, we build a new dataset as the benchmark for industrial 4.0 application. Extensive experiments demonstrate that DBCF produces high performance and exceeds the state-of-the-art methods. The dataset and source code can be found at this https URL

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03853

PDF

https://arxiv.org/pdf/1806.03853
Read All
iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

2018-06-12

Chenhui Chu, Mayu Otani, Yuta Nakashima

arXiv_CV

arXiv_CV Image_Caption Attention Caption VQA
Abstract

A paraphrase is a restatement of the meaning of a text in other words. Paraphrases have been studied to enhance the performance of many natural language processing tasks. In this paper, we propose a novel task iParaphrasing to extract visually grounded paraphrases (VGPs), which are different phrasal expressions describing the same visual concept in an image. These extracted VGPs have the potential to improve language and image multimodal tasks such as visual question answering and image captioning. How to model the similarity between VGPs is the key of iParaphrasing. We apply various existing methods as well as propose a novel neural network-based method with image attention, and report the results of the first attempt toward iParaphrasing.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.04284

PDF

https://arxiv.org/pdf/1806.04284
Read All
The Numerics of GANs

2018-06-11

Lars Mescheder, Sebastian Nowozin, Andreas Geiger

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs). Using the formalism of smooth two-player games we analyze the associated gradient vector field of GAN training objectives. Our findings suggest that the convergence of current algorithms suffers due to two factors: i) presence of eigenvalues of the Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues with big imaginary part. Using these findings, we design a new algorithm that overcomes some of these limitations and has better convergence properties. Experimentally, we demonstrate its superiority on training common GAN architectures and show convergence on GAN architectures that are known to be notoriously hard to train.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1705.10461

PDF

https://arxiv.org/pdf/1705.10461
Read All
First Experiments with Neural Translation of Informal to Formal Mathematics

2018-06-11

Qingxiang Wang, Cezary Kaliszyk, Josef Urban

arXiv_CL

arXiv_CL Knowledge NMT Inference
Abstract

We report on our experiments to train deep neural networks that automatically translate informalized LaTeX-written Mizar texts into the formal Mizar language. To the best of our knowledge, this is the first time when neural networks have been adopted in the formalization of mathematics. Using Luong et al.’s neural machine translation model (NMT), we tested our aligned informal-formal corpora against various hyperparameters and evaluated their results. Our experiments show that our best performing model configurations are able to generate correct Mizar statements on 65.73\% of the inference data, with the union of all models covering 79.17\%. These results indicate that formalization through artificial neural network is a promising approach for automated formalization of mathematics. We present several case studies to illustrate our results.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.06502

PDF

https://arxiv.org/pdf/1805.06502
Read All
Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Images

2018-06-11

Fernando Camaro Nogues, Andrew Huie, Sakyasingha Dasgupta

arXiv_CV

arXiv_CV Adversarial Object_Detection GAN Detection
Abstract

In this work, we present an application of domain randomization and generative adversarial networks (GAN) to train a near real-time object detector for industrial electric parts, entirely in a simulated environment. Large scale availability of labelled real world data is typically rare and difficult to obtain in many industrial settings. As such here, only a few hundred of unlabelled real images are used to train a Cyclic-GAN network, in combination with various degree of domain randomization procedures. We demonstrate that this enables robust translation of synthetic images to the real world domain. We show that a combination of the original synthetic (simulation) and GAN translated images, when used for training a Mask-RCNN object detection network achieves greater than 0.95 mean average precision in detecting and classifying a collection of industrial electric parts. We evaluate the performance across different combinations of training data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11778

PDF

https://arxiv.org/pdf/1805.11778
Read All
Transfer Learning for Unseen Robot Detection and Joint Estimation on a Multi-Objective Convolutional Neural Network

2018-06-11

Justinas Miseikis, Inka Brijacak, Saeed Yahyanejad, Kyrre Glette, Ole Jakob Elle, Jim Torresen

arXiv_CV

arXiv_CV CNN Transfer_Learning Classification Deep_Learning Detection Recognition
Abstract

A significant problem of using deep learning techniques is the limited amount of data available for training. There are some datasets available for the popular problems like item recognition and classification or self-driving cars, however, it is very limited for the industrial robotics field. In previous work, we have trained a multi-objective Convolutional Neural Network (CNN) to identify the robot body in the image and estimate 3D positions of the joints by using just a 2D image, but it was limited to a range of robots produced by Universal Robots (UR). In this work, we extend our method to work with a new robot arm - Kuka LBR iiwa, which has a significantly different appearance and an additional joint. However, instead of collecting large datasets once again, we collect a number of smaller datasets containing a few hundred frames each and use transfer learning techniques on the CNN trained on UR robots to adapt it to a new robot having different shapes and visual features. We have proven that transfer learning is not only applicable in this field, but it requires smaller well-prepared training datasets, trains significantly faster and reaches similar accuracy compared to the original method, even improving it on some aspects.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11849

PDF

https://arxiv.org/pdf/1805.11849
Read All
Deconvolution-Based Global Decoding for Neural Machine Translation

2018-06-10

Junyang Lin, Xu Sun, Xuancheng Ren, Shuming Ma, Jinsong Su, Qi Su

arXiv_CL

arXiv_CL NMT RNN Prediction
Abstract

A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order. As the studies of linguistics have proved that language is not linear word sequence but sequence of complex structure, translation at each step should be conditioned on the whole target-side context. To tackle the problem, we propose a new NMT model that decodes the sequence with the guidance of its structural prediction of the context of the target sequence. Our model generates translation based on the structural prediction of the target-side context so that the translation can be freed from the bind of sequential order. Experimental results demonstrate that our model is more competitive compared with the state-of-the-art methods, and the analysis reflects that our model is also robust to translating sentences of different lengths and it also reduces repetition with the instruction from the target-side context for decoding.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03692

PDF

https://arxiv.org/pdf/1806.03692
Read All
Self-Stabilizing and Private Distributed Shared Atomic Memory in Seldomly Fair Message Passing Networks

2018-06-09

Shlomi Dolev, Thomas Petig, Elad Michael Schiller

arXiv_CV

arXiv_CV Knowledge
Abstract

We study the problem of privately emulating shared memory in message-passing networks. The system includes clients that store and retrieve replicated information on N servers, out of which e are malicious. When a client access a malicious server, the data field of that server response might be different than the value it originally stored. However, all other control variables in the server reply and protocol actions are according to the server algorithm. For the coded atomic storage (CAS) algorithms by Cadambe et al., we present an enhancement that ensures no information leakage and malicious fault-tolerance. We also consider recovery after the occurrence of transient faults that violate the assumptions according to which the system is to behave. After their last occurrence, transient faults leave the system in an arbitrary state (while the program code stays intact). We present a self-stabilizing algorithm, which recovers after the occurrence of transient faults. This addition to Cadambe et al. considers asynchronous settings as long as no transient faults occur. The recovery from transient faults that bring the system counters (close) to their maximal values may include the use of a global reset procedure, which requires the system run to be controlled by a fair scheduler. After the recovery period, the safety properties are provided for asynchronous system runs that are not necessarily controlled by fair schedulers. Since the recovery period is bounded and the occurrence of transient faults is extremely rare, we call this design criteria self-stabilization in the presence of seldom fairness. Our self-stabilizing algorithm uses a bounded storage during asynchronous executions (that are not necessarily fair). To the best of our knowledge, we are the first to address privacy and self-stabilization in the context of emulating atomic shared memory in networked systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03498

PDF

https://arxiv.org/pdf/1806.03498
Read All
CS-VQA: Visual Question Answering with Compressively Sensed Images

2018-06-08

Li-Chi Huang, Kuldeep Kulkarni, Anik Jha, Suhas Lohit, Suren Jayasuriya, Pavan Turaga

arXiv_CV

arXiv_CV QA VQA Recognition
Abstract

Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition. In this paper, we explore whether VQA is solvable when images are captured in a sub-Nyquist compressive paradigm. We develop a series of deep-network architectures that exploit available compressive data to increasing degrees of accuracy, and show that VQA is indeed solvable in the compressed domain. Our results show that there is nominal degradation in VQA performance when using compressive measurements, but that accuracy can be recovered when VQA pipelines are used in conjunction with state-of-the-art deep neural networks for CS reconstruction. The results presented yield important implications for resource-constrained VQA applications.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03379

PDF

https://arxiv.org/pdf/1806.03379
Read All
Self-supervisory Signals for Object Discovery and Detection

2018-06-08

Etienne Pot, Alexander Toshev, Jana Kosecka

arXiv_CV

arXiv_CV Object_Detection Knowledge Face Embedding Detection
Abstract

In robotic applications, we often face the challenge of discovering new objects while having very little or no labelled training data. In this paper we explore the use of self-supervision provided by a robot traversing an environment to learn representations of encountered objects. Knowledge of ego-motion and depth perception enables the agent to effectively associate multiple object proposals, which serve as training data for learning object representations from unlabelled images. We demonstrate the utility of this representation in two ways. First, we can automatically discover objects by performing clustering in the learned embedding space. Each resulting cluster contains examples of one instance seen from various viewpoints and scales. Second, given a small number of labeled images, we can efficiently learn detectors for these labels. In the few-shot regime, these detectors have a substantially higher mAP of 0.22 compared to 0.12 of off-the-shelf standard detectors trained on this limited data. Thus, the proposed self-supervision results in effective environment specific object discovery and detection at no or very small human labeling cost.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03370

PDF

https://arxiv.org/pdf/1806.03370
Read All
Unsupervised Learning for Surgical Motion by Learning to Predict the Future

2018-06-08

Robert DiPietro, Gregory D. Hager

arXiv_CV

arXiv_CV RNN Prediction
Abstract

We show that it is possible to learn meaningful representations of surgical motion, without supervision, by learning to predict the future. An architecture that combines an RNN encoder-decoder and mixture density networks (MDNs) is developed to model the conditional distribution over future motion given past motion. We show that the learned encodings naturally cluster according to high-level activities, and we demonstrate the usefulness of these learned encodings in the context of information retrieval, where a database of surgical motion is searched for suturing activity using a motion-based query. Future prediction with MDNs is found to significantly outperform simpler baselines as well as the best previously-published result for this task, advancing state-of-the-art performance from an F1 score of 0.60 +- 0.14 to 0.77 +- 0.05.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03318

PDF

https://arxiv.org/pdf/1806.03318
Read All
Discriminability objective for training descriptive captions

2018-06-08

Ruotian Luo, Brian Price, Scott Cohen, Gregory Shakhnarovich

arXiv_CV

arXiv_CV Image_Caption Caption
Abstract

One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them. We propose a way to improve this aspect of caption generation. By incorporating into the captioning training objective a loss component directly related to ability (by a machine) to disambiguate image/caption matches, we obtain systems that produce much more discriminative caption, according to human evaluation. Remarkably, our approach leads to improvement in other aspects of generated captions, reflected by a battery of standard scores such as BLEU, SPICE etc. Our approach is modular and can be applied to a variety of model/loss combinations commonly proposed for image captioning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1803.04376

PDF

https://arxiv.org/pdf/1803.04376
Read All
Investigation of Ta2O5 as an alternative high k{appa} dielectric for InAlN/GaN MOS HEMT on Si

2018-06-08

Sandeep kumar, Himanshu Kumar, Sandeep Vura, Anamika Singh Pratiyush, Vanjari Sai Charan, Surani B. Dolmanan, Sudhiranjan Tripathy, Rangarajan Muralidharan, Digbijoy N. Nath

arXiv_CV

arXiv_CV GAN Face
Abstract

We report on the demonstration and investigation of Ta2O5 as high-\k{appa} dielectric for InAlN/GaN-MOS HEMT-on-Si. Ta2O5 of thickness 24 nm and dielectric constant ~ 30 was sputter deposited on InAlN/GaN HEMT and was investigated for different post deposition anneal conditions (PDA). The gate leakage was 16nA/mm at -15 V which was ~ 5 orders of magnitude lower compared to reference HEMT. The 2-dimensional electron gas (2DEG) density was found to vary with annealing temperature suggesting the presence of net charge at the Ta2O5/InAlN interface. Dispersion in the capacitance-voltage (C-V) characteristics was used to estimate the frequency-dependent interface charge while energy band diagrams under flat band conditions were investigated to estimate fixed charge. The optimum anneal condition was found to be 500° C which has resulted into a flat band voltage spread (VFB) of 0.4 V and interface fix charge (Qf) of 3.98x10^13 cm-2. XPS (X-ray photoelectron spectroscopy) spectra of as deposited and annealed Ta2O5 film were analyzed for Ta and O compositions in the film. The sample annealed at 500° C has shown Ta:O ratio of 0.41.XRD (X-ray diffraction) analysis was done to check the evolution of poly-crystallization of the Ta2O5 film at higher annealing temperatures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03291

PDF

https://arxiv.org/pdf/1806.03291
Read All
Multilingual Neural Machine Translation with Task-Specific Attention

2018-06-08

Graeme Blackwood, Miguel Ballesteros, Todd Ward

arXiv_CL

arXiv_CL Attention NMT
Abstract

Multilingual machine translation addresses the task of translating between multiple source and target languages. We propose task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation. Our approach seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for language-specific specialization of the attention model to a particular language-pair or task. Our experiments on four languages of the Europarl corpus show that using a target-specific model of attention provides consistent gains in translation quality for all possible translation directions, compared to a model in which all parameters are shared. We observe improved translation quality even in the (extreme) low-resource zero-shot translation directions for which the model never saw explicitly paired parallel data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03280

PDF

https://arxiv.org/pdf/1806.03280
Read All
Hierarchy of GANs for learning embodied self-awareness model

2018-06-08

Mahdyar Ravanbakhsh, Mohamad Baydoun, Damian Campo, Pablo Marin, David Martin, Lucio Marcenaro, Carlo S. Regazzoni

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

In recent years several architectures have been proposed to learn embodied agents complex self-awareness models. In this paper, dynamic incremental self-awareness (SA) models are proposed that allow experiences done by an agent to be modeled in a hierarchical fashion, starting from more simple situations to more structured ones. Each situation is learned from subsets of private agent perception data as a model capable to predict normal behaviors and detect abnormalities. Hierarchical SA models have been already proposed using low dimensional sensorial inputs. In this work, a hierarchical model is introduced by means of a cross-modal Generative Adversarial Networks (GANs) processing high dimensional visual data. Different levels of the GANs are detected in a self-supervised manner using GANs discriminators decision boundaries. Real experiments on semi-autonomous ground vehicles are presented.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.04012

PDF

https://arxiv.org/pdf/1806.04012
Read All
Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

2018-06-08

Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang

arXiv_CV

arXiv_CV Object_Detection Attention CNN Detection
Abstract

In this paper, we propose a zoom-out-and-in network for generating object proposals. A key observation is that it is difficult to classify anchors of different sizes with the same set of features. Anchors of different sizes should be placed accordingly based on different depth within a network: smaller boxes on high-resolution layers with a smaller stride while larger boxes on low-resolution counterparts with a larger stride. Inspired by the conv/deconv structure, we fully leverage the low-level local details and high-level regional semantics from two feature map streams, which are complimentary to each other, to identify the objectness in an image. A map attention decision (MAD) unit is further proposed to aggressively search for neuron activations among two streams and attend the most contributive ones on the feature learning of the final loss. The unit serves as a decisionmaker to adaptively activate maps along certain channels with the solely purpose of optimizing the overall training loss. One advantage of MAD is that the learned weights enforced on each feature channel is predicted on-the-fly based on the input context, which is more suitable than the fixed enforcement of a convolutional kernel. Experimental results on three datasets, including PASCAL VOC 2007, ImageNet DET, MS COCO, demonstrate the effectiveness of our proposed algorithm over other state-of-the-arts, in terms of average recall (AR) for region proposal and average precision (AP) for object detection.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.04347

PDF

https://arxiv.org/pdf/1709.04347
Read All
Generating Image Sequence from Description with LSTM Conditional GAN

2018-06-08

Xu Ouyang, Xi Zhang, Di Ma, Gady Agam

arXiv_CV

arXiv_CV Adversarial GAN Image_Generation RNN
Abstract

Generating images from word descriptions is a challenging task. Generative adversarial networks(GANs) are shown to be able to generate realistic images of real-life objects. In this paper, we propose a new neural network architecture of LSTM Conditional Generative Adversarial Networks to generate images of real-life objects. Our proposed model is trained on the Oxford-102 Flowers and Caltech-UCSD Birds-200-2011 datasets. We demonstrate that our proposed model produces the better results surpassing other state-of-art approaches.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.03027

PDF

https://arxiv.org/pdf/1806.03027
Read All
A Systematic Evaluation of Recent Deep Learning Architectures for Fine-Grained Vehicle Classification

2018-06-08

Krassimir Valev, Arne Schumann, Lars Sommer, Jürgen Beyerer

arXiv_CV

arXiv_CV Re-identification Tracking CNN Image_Classification Classification Deep_Learning Detection
Abstract

Fine-grained vehicle classification is the task of classifying make, model, and year of a vehicle. This is a very challenging task, because vehicles of different types but similar color and viewpoint can often look much more similar than vehicles of same type but differing color and viewpoint. Vehicle make, model, and year in com- bination with vehicle color - are of importance in several applications such as vehicle search, re-identification, tracking, and traffic analysis. In this work we investigate the suitability of several recent landmark convolutional neural network (CNN) architectures, which have shown top results on large scale image classification tasks, for the task of fine-grained classification of vehicles. We compare the performance of the networks VGG16, several ResNets, Inception architectures, the recent DenseNets, and MobileNet. For classification we use the Stanford Cars-196 dataset which features 196 different types of vehicles. We investigate several aspects of CNN training, such as data augmentation and training from scratch vs. fine-tuning. Importantly, we introduce no aspects in the architectures or training process which are specific to vehicle classification. Our final model achieves a state-of-the-art classification accuracy of 94.6% outperforming all related works, even approaches which are specifically tailored for the task, e.g. by including vehicle part detections.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.02987

PDF

https://arxiv.org/pdf/1806.02987
Read All
Dank Learning: Generating Memes Using Deep Neural Networks

2018-06-08

Abel L Peirson V, E Meltem Tolunay

arXiv_CV

arXiv_CV Attention Caption Embedding RNN
Abstract

We introduce a novel meme generation system, which given any image can produce a humorous and relevant caption. Furthermore, the system can be conditioned on not only an image but also a user-defined label relating to the meme template, giving a handle to the user on meme content. The system uses a pretrained Inception-v3 network to return an image embedding which is passed to an attention-based deep-layer LSTM model producing the caption - inspired by the widely recognised Show and Tell Model. We implement a modified beam search to encourage diversity in the captions. We evaluate the quality of our model using perplexity and human assessment on both the quality of memes generated and whether they can be differentiated from real ones. Our model produces original memes that cannot on the whole be differentiated from real ones.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.04510

PDF

https://arxiv.org/pdf/1806.04510
Read All
Multi-Source Neural Machine Translation with Missing Data

2018-06-08

Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura

arXiv_CL

arXiv_CL NMT
Abstract

Multi-source translation is an approach to exploit multiple inputs (e.g. in two different languages) to increase translation accuracy. In this paper, we examine approaches for multi-source neural machine translation (NMT) using an incomplete multilingual corpus in which some translations are missing. In practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages (for example, in TED talks, most English talks only have subtitles for a small portion of the languages that TED supports). Existing studies on multi-source translation did not explicitly handle such situations. This study focuses on the use of incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts and examines a very simple implementation where missing source translations are replaced by a special symbol . These methods allow us to use incomplete corpora both at training time and test time. In experiments with real incomplete multilingual corpora of TED Talks, the multi-source NMT with the tokens achieved higher translation accuracies measured by BLEU than those by any one-to-one NMT systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.02525

PDF

https://arxiv.org/pdf/1806.02525
Read All
Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

2018-06-08

Ashwin Kalyan, Stefan Lee, Anitha Kannan, Dhruv Batra

arXiv_CV

arXiv_CV Sparse Caption Classification Prediction
Abstract

Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being correct for an input - e.g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e.g. all English sentences). In practice, these problems are cast as multi-class prediction, with the likelihood of only a sparse set of annotations being maximized - unfortunately penalizing for placing beliefs on plausible but unannotated outputs. We make and test the following hypothesis - for a given input, the annotations of its neighbors may serve as an additional supervisory signal. Specifically, we propose an objective that transfers supervision from neighboring examples. We first study the properties of our developed method in a controlled toy setup before reporting results on multi-label classification and two image-grounded sequence modeling tasks - captioning and question generation. We evaluate using standard task-specific metrics and measures of output diversity, finding consistent improvements over standard maximum likelihood training and other baselines.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.02934

PDF

https://arxiv.org/pdf/1806.02934
Read All
From Nodes to Networks: Evolving Recurrent Neural Networks

2018-06-07

Aditya Rawal, Risto Miikkulainen

arXiv_CV

arXiv_CV Speech_Recognition Reinforcement_Learning RNN Deep_Learning Language_Model Recognition
Abstract

Gated recurrent networks such as those composed of Long Short-Term Memory (LSTM) nodes have recently been used to improve state of the art in many sequential processing tasks such as speech recognition and machine translation. However, the basic structure of the LSTM node is essentially the same as when it was first conceived 25 years ago. Recently, evolutionary and reinforcement learning mechanisms have been employed to create new variations of this structure. This paper proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods. The method discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task. The paper also shows how the search process can be speeded up by training an LSTM network to estimate performance of candidate structures, and by encouraging exploration of novel solutions. Thus, evolutionary design of complex neural network structures promises to improve performance of deep learning architectures beyond human ability to do so.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1803.04439

PDF

https://arxiv.org/pdf/1803.04439
Read All
Model-based active learning to detect isometric deformable objects in the wild with deep architectures

2018-06-07

Shrinivasan Sankar, Adrien Bartoli

arXiv_CV

arXiv_CV Object_Detection CNN Quantitative Detection Recognition
Abstract

In the recent past, algorithms based on Convolutional Neural Networks (CNNs) have achieved significant milestones in object recognition. With large examples of each object class, standard datasets train well for inter-class variability. However, gathering sufficient data to train for a particular instance of an object within a class is impractical. Furthermore, quantitatively assessing the imaging conditions for each image in a given dataset is not feasible. By generating sufficient images with known imaging conditions, we study to what extent CNNs can cope with hard imaging conditions for instance-level recognition in an active learning regime. Leveraging powerful rendering techniques to achieve instance-level detection, we present results of training three state-of-the-art object detection algorithms namely, Fast R-CNN, Faster R-CNN and YOLO9000, for hard imaging conditions imposed into the scene by rendering. Our extensive experiments produce a mean Average Precision score of 0.92 on synthetic images and 0.83 on real images using the best performing Faster R-CNN. We show for the first time how well detection algorithms based on deep architectures fare for each hard imaging condition studied.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.02850

PDF

https://arxiv.org/pdf/1806.02850
Read All
Path-Level Network Transformation for Efficient Architecture Search

2018-06-07

Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, Yong Yu

arXiv_CV

arXiv_CV NAS Reinforcement_Learning Image_Classification Classification
Abstract

We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency. We aim to address the limitation of current network transformation operations that can only perform layer-level architecture modifications, such as adding (pruning) filters or inserting (removing) a layer, which fails to change the topology of connection paths. Our proposed path-level transformation operations enable the meta-controller to modify the path topology of the given network while keeping the merits of reusing weights, and thus allow efficiently designing effective structures with complex path topologies like Inception models. We further propose a bidirectional tree-structured reinforcement learning meta-controller to explore a simple yet highly expressive tree-structured architecture space that can be viewed as a generalization of multi-branch architectures. We experimented on the image classification datasets with limited computational resources (about 200 GPU-hours), where we observed improved parameter efficiency and better test results (97.70% test accuracy on CIFAR-10 with 14.3M parameters and 74.6% top-1 accuracy on ImageNet in the mobile setting), demonstrating the effectiveness and transferability of our designed architectures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.02639

PDF

https://arxiv.org/pdf/1806.02639
Read All
Deploying Deep Ranking Models for Search Verticals

2018-06-06

Rohan Ramanath, Gungor Polatkan, Liqin Xu, Harold Lee, Bo Hu, Shan Zhou

arXiv_CV

arXiv_CV
Abstract

In this paper, we present an architecture executing a complex machine learning model such as a neural network capturing semantic similarity between a query and a document; and deploy to a real-world production system serving 500M+users. We present the challenges that arise in a real-world system and how we solve them. We demonstrate that our architecture provides competitive modeling capability without any significant performance impact to the system in terms of latency. Our modular solution and insights can be used by other real-world search systems to realize and productionize recent gains in neural networks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.02281

PDF

https://arxiv.org/pdf/1806.02281
Read All
On the Long-Term Memory of Deep Recurrent Networks

2018-06-06

Yoav Levine, Or Sharir, Alon Ziv, Amnon Shashua

arXiv_CV

arXiv_CV CNN RNN
Abstract

A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs long-term memory capacity is lacking, and thus formal understanding of the effect of depth on their ability to correlate data throughout time is limited. Specifically, existing depth efficiency results on convolutional networks do not suffice in order to account for the success of deep RNNs on data of varying lengths. In order to address this, we introduce a measure of the network’s ability to support information flow across time, referred to as the Start-End separation rank, which reflects the distance of the function realized by the recurrent network from modeling no dependency between the beginning and end of the input sequence. We prove that deep recurrent networks support Start-End separation ranks which are combinatorially higher than those supported by their shallow counterparts. Thus, we establish that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and provide an exemplar of quantifying this key attribute which may be readily extended to other RNN architectures of interest, e.g. variants of LSTM networks. We obtain our results by considering a class of recurrent networks referred to as Recurrent Arithmetic Circuits, which merge the hidden state with the input via the Multiplicative Integration operation, and empirically demonstrate the discussed phenomena on common RNNs. Finally, we employ the tool of quantum Tensor Networks to gain additional graphic insight regarding the complexity brought forth by depth in recurrent networks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.09431

PDF

https://arxiv.org/pdf/1710.09431
Read All
Improving Performance Models for Irregular Point-to-Point Communication

2018-06-06

Amanda Bienz, William D. Gropp, Luke N. Olson

arXiv_CV

arXiv_CV Sparse
Abstract

Parallel applications are often unable to take full advantage of emerging parallel architectures due to scaling limitations, which arise due to inter-process communication. Performance models are used to analyze the sources of communication costs. However, traditional models for point-to-point communication fail to capture the full cost of many irregular operations, such as sparse matrix methods. In this paper, a node-aware based model is presented. Furthermore, the model is extended to include communication queue search time as well as an additional parameter estimating network contention. The resulting model is applied to a variety of irregular communication patterns throughout matrix operations, displaying improved accuracy over traditional models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.02030

PDF

https://arxiv.org/pdf/1806.02030
Read All
Automating Network Error Detection using Long-Short Term Memory Networks

2018-06-06

Moin Nadeem, Vibhor Nigam, Dimosthenis Anagnostopoulos, Patrick Carretas

arXiv_CV

arXiv_CV RNN Detection Memory_Networks
Abstract

In this work, we investigate the current flaws with identifying network-related errors, and examine how K-Means and Long-Short Term Memory Networks solve these problems. We demonstrate that K-Means is able to classify messages, but not necessary provide meaningful clusters. However, Long-Short Term Memory Networks are able to meet our goals of providing an intelligent clustering of messages by grouping messages that are temporally related. Additionally, Long-Short Term Memory Networks can provide the ability to understand and visualize temporal causality, which unlocks the ability to warn about errors before they happen. We show that LSTMs have a 70% accuracy on classifying network errors, and provide some suggestions on future work.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.02000

PDF

https://arxiv.org/pdf/1806.02000
Read All
A Classification-Based Study of Covariate Shift in GAN Distributions

2018-06-06

Shibani Santurkar, Ludwig Schmidt, Aleksander Mądry

arXiv_CV

arXiv_CV Adversarial GAN Classification Quantitative
Abstract

A basic, and still largely unanswered, question in the context of Generative Adversarial Networks (GANs) is whether they are truly able to capture all the fundamental characteristics of the distributions they are trained on. In particular, evaluating the diversity of GAN distributions is challenging and existing methods provide only a partial understanding of this issue. In this paper, we develop quantitative and scalable tools for assessing the diversity of GAN distributions. Specifically, we take a classification-based perspective and view loss of diversity as a form of covariate shift introduced by GANs. We examine two specific forms of such shift: mode collapse and boundary distortion. In contrast to prior work, our methods need only minimal human supervision and can be readily applied to state-of-the-art GANs on large, canonical datasets. Examining popular GANs using our tools indicates that these GANs have significant problems in reproducing the more distributional properties of their training dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.00970

PDF

https://arxiv.org/pdf/1711.00970
Read All
The EuroCity Persons Dataset: A Novel Benchmark for Object Detection

2018-06-05

Markus Braun, Sebastian Krebs, Fabian Flohr, Dariu M. Gavrila

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Detection
Abstract

Big data has had a great share in the success of deep learning in computer vision. Recent works suggest that there is significant further potential to increase object detection performance by utilizing even bigger datasets. In this paper, we introduce the EuroCity Persons dataset, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238200 person instances manually labeled in over 47300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211200). We optimize four state-of-the-art deep learning approaches (Faster R-CNN, R-FCN, SSD and YOLOv3) to serve as baselines for the new object detection benchmark. In experiments with previous datasets we analyze the generalization capabilities of these detectors when trained with the new dataset. We furthermore study the effect of the training set size, the dataset diversity (day- vs. night-time, geographical region), the dataset detail (i.e. availability of object orientation information) and the annotation quality on the detector performance. Finally, we analyze error sources and discuss the road ahead.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.07193

PDF

https://arxiv.org/pdf/1805.07193
Read All
BAGAN: Data Augmentation with Balancing GAN

2018-06-05

Giovanni Mariani, Florian Scheidegger, Roxana Istrate, Costas Bekas, Cristiano Malossi

arXiv_CV

arXiv_CV Adversarial GAN Image_Classification Classification
Abstract

Image classification datasets are often imbalanced, characteristic that negatively affects the accuracy of deep-learning classifiers. In this work we propose balancing GAN (BAGAN) as an augmentation tool to restore balance in imbalanced datasets. This is challenging because the few minority-class images may not be enough to train a GAN. We overcome this issue by including during the adversarial training all available images of majority and minority classes. The generative model learns useful features from majority classes and uses these to generate images for minority classes. We apply class conditioning in the latent space to drive the generation process towards a target class. The generator in the GAN is initialized with the encoder module of an autoencoder that enables us to learn an accurate class-conditioning in the latent space. We compare the proposed methodology with state-of-the-art GANs and demonstrate that BAGAN generates images of superior quality when trained with an imbalanced dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1803.09655

PDF

https://arxiv.org/pdf/1803.09655
Read All
Weighted Unsupervised Learning for 3D Object Detection

2018-06-04

Kamran Kowsari, Manal H. Alassaf

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

This paper introduces a novel weighted unsupervised learning for object detection using an RGB-D camera. This technique is feasible for detecting the moving objects in the noisy environments that are captured by an RGB-D camera. The main contribution of this paper is a real-time algorithm for detecting each object using weighted clustering as a separate cluster. In a preprocessing step, the algorithm calculates the pose 3D position X, Y, Z and RGB color of each data point and then it calculates each data point’s normal vector using the point’s neighbor. After preprocessing, our algorithm calculates k-weights for each data point; each weight indicates membership. Resulting in clustered objects of the scene.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1602.05920

PDF

https://arxiv.org/pdf/1602.05920
Read All
Learning Visually Grounded Sentence Representations

2018-06-04

Douwe Kiela, Alexis Conneau, Allan Jabri, Maximilian Nickel

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval Caption Embedding
Abstract

We introduce a variety of models, trained on a supervised image captioning corpus to predict the image features for a given caption, to perform sentence representation grounding. We train a grounded sentence encoder that achieves good performance on COCO caption and image retrieval and subsequently show that this encoder can successfully be transferred to various NLP tasks, with improved performance over text-only models. Lastly, we analyze the contribution of grounding, and show that word embeddings learned by this system outperform non-grounded ones.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.06320

PDF

https://arxiv.org/pdf/1707.06320
Read All
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

2018-06-04

Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

arXiv_CL

arXiv_CL Attention Speech_Recognition NMT RNN Language_Model Recognition
Abstract

Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, a new sequence-to-sequence attention-based model relying entirely on self-attention without using RNNs or convolutions, achieves a new single-model state-of-the-art BLEU on neural machine translation (NMT) tasks. Since the outstanding performance of the Transformer, we extend it to speech and concentrate on it as the basic architecture of sequence-to-sequence attention-based model on Mandarin Chinese ASR tasks. Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. Additionally, a greedy cascading decoder with the Transformer is proposed for mapping CI-phoneme sequences and syllable sequences into word sequences. Experiments on HKUST datasets demonstrate that syllable based model with the Transformer performs better than CI-phoneme based counterpart, and achieves a character error rate (CER) of \emph{$28.77\%$}, which is competitive to the state-of-the-art CER of $28.0\%$ by the joint CTC-attention based encoder-decoder network.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.10752

PDF

https://arxiv.org/pdf/1804.10752
Read All
On the Limitations of First-Order Approximation in GAN Dynamics

2018-06-03

Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice. To address this issue, we study GAN dynamics in a simple yet rich parametric model that exhibits several of the common problematic convergence behaviors such as vanishing gradients, mode collapse, and diverging or oscillatory behavior. In spite of the non-convex nature of our model, we are able to perform a rigorous theoretical analysis of its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN with an optimal discriminator provably converges, while first order approximations of the discriminator steps lead to unstable GAN dynamics and mode collapse. Our result suggests that using first order discriminator steps (the de-facto standard in most existing GAN setups) might be one of the factors that makes GAN training challenging in practice.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.09884

PDF

https://arxiv.org/pdf/1706.09884
Read All
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

2018-06-03

Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi

arXiv_CV

arXiv_CV QA Attention Relation VQA Recognition
Abstract

A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers. Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively). First, we evaluate several existing VQA models under this new setting and show that their performance degrades significantly compared to the original VQA setting. Second, we propose a novel Grounded Visual Question Answering model (GVQA) that contains inductive biases and restrictions in the architecture specifically designed to prevent the model from ‘cheating’ by primarily relying on priors in the training data. Specifically, GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. GVQA is built off an existing VQA model – Stacked Attention Networks (SAN). Our experiments demonstrate that GVQA significantly outperforms SAN on both VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in several cases. GVQA offers strengths complementary to SAN when trained and evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more transparent and interpretable than existing VQA models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.00377

PDF

https://arxiv.org/pdf/1712.00377
Read All
Fast Locality Sensitive Hashing for Beam Search on GPU

2018-06-02

Xing Shi, Shizhen Xu, Kevin Knight

arXiv_CV

arXiv_CV
Abstract

We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to speed up beam search for sequence models. We utilize the winner-take-all (WTA) hash, which is based on relative ranking order of hidden dimensions and thus resilient to perturbations in numerical values. Our algorithm is designed by fully considering the underling architecture of CUDA-enabled GPUs (Algorithm/Architecture Co-design): 1) A parallel Cuckoo hash table is applied for LSH code lookup (guaranteed O(1) lookup time); 2) Candidate lists are shared across beams to maximize the parallelism; 3) Top frequent words are merged into candidate lists to improve performance. Experiments on 4 large-scale neural machine translation models demonstrate that our algorithm can achieve up to 4x speedup on softmax module, and 2x overall speedup without hurting BLEU on GPU.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.00588

PDF

https://arxiv.org/pdf/1806.00588
Read All
Semi-Recurrent CNN-based VAE-GAN for Sequential Data Generation

2018-06-01

Mohammad Akbari, Jie Liang

arXiv_CV

arXiv_CV GAN Relation
Abstract

A semi-recurrent hybrid VAE-GAN model for generating sequential data is introduced. In order to consider the spatial correlation of the data in each frame of the generated sequence, CNNs are utilized in the encoder, generator, and discriminator. The subsequent frames are sampled from the latent distributions obtained by encoding the previous frames. As a result, the dependencies between the frames are maintained. Two testing frameworks for synthesizing a sequence with any number of frames are also proposed. The promising experimental results on piano music generation indicates the potential of the proposed framework in modeling other sequential data such as video.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.00509

PDF

https://arxiv.org/pdf/1806.00509
Read All
Improved Image Captioning with Adversarial Semantic Alignment

2018-06-01

Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Tom Sercu (IBM Research, USA)

arXiv_CV

arXiv_CV Image_Caption Adversarial GAN Caption RNN Relation
Abstract

We study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically study the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST). We show that, surprisingly, SCST (a policy gradient method) shows more stable gradient behavior and improved results over Gumbel ST, even without accessing the discriminator gradients directly. We also address the open question of automatic evaluation for these models and introduce a new semantic score and demonstrate its strong correlation to human judgement. As an evaluation paradigm, we suggest that an important criterion is the ability of a captioner to generalize to compositions between objects that do not usually occur together, for which we introduce a captioned Out of Context (OOC) test set. The OOC dataset combined with our semantic score is a new benchmark for the captioning community. Under this OOC benchmark, and the traditional MSCOCO dataset, we show that SCST has a strong performance in both semantic score and human evaluation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.00063

PDF

https://arxiv.org/pdf/1805.00063
Read All
Sockeye: A Toolkit for Neural Machine Translation

2018-06-01

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post

arXiv_CL

arXiv_CL Regularization Attention CNN NMT Inference RNN
Abstract

We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural networks, self-attentional transformers, and fully convolutional networks. Sockeye also supports a wide range of optimizers, normalization and regularization techniques, and inference improvements from current NMT literature. Users can easily run standard training recipes, explore different model settings, and incorporate new ideas. In this paper, we highlight Sockeye’s features and benchmark it against other NMT toolkits on two language arcs from the 2017 Conference on Machine Translation (WMT): English-German and Latvian-English. We report competitive BLEU scores across all three architectures, including an overall best score for Sockeye’s transformer implementation. To facilitate further comparison, we release all system outputs and training scripts used in our experiments. The Sockeye toolkit is free software released under the Apache 2.0 license.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.05690

PDF

https://arxiv.org/pdf/1712.05690
Read All
A Survey of Domain Adaptation for Neural Machine Translation

2018-06-01

Chenhui Chu, Rui Wang

arXiv_CL

arXiv_CL Survey NMT Deep_Learning
Abstract

Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.00258

PDF

https://arxiv.org/pdf/1806.00258
Read All
TAPAS: Train-less Accuracy Predictor for Architecture Search

2018-06-01

R. Istrate, F. Scheidegger, G. Mariani, D. Nikolopoulos, C. Bekas, A. C. I. Malossi

arXiv_CV

arXiv_CV NAS Reinforcement_Learning Classification Prediction
Abstract

In recent years an increasing number of researchers and practitioners have been suggesting algorithms for large-scale neural network architecture search: genetic algorithms, reinforcement learning, learning curve extrapolation, and accuracy predictors. None of them, however, demonstrated high-performance without training new experiments in the presence of unseen datasets. We propose a new deep neural network accuracy predictor, that estimates in fractions of a second classification performance for unseen input datasets, without training. In contrast to previously proposed approaches, our prediction is not only calibrated on the topological network information, but also on the characterization of the dataset-difficulty which allows us to re-tune the prediction without any training. Our predictor achieves a performance which exceeds 100 networks per second on a single GPU, thus creating the opportunity to perform large-scale architecture search within a few minutes. We present results of two searches performed in 400 seconds on a single GPU. Our best discovered networks reach 93.67% accuracy for CIFAR-10 and 81.01% for CIFAR-100, verified by training. These networks are performance competitive with other automatically discovered state-of-the-art networks however we only needed a small fraction of the time to solution and computational resources.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.00250

PDF

https://arxiv.org/pdf/1806.00250
Read All
Millimeter Wave Communications with Reconfigurable Antennas

2018-05-31

Biao He, Hamid Jafarkhani

arXiv_CV

arXiv_CV Sparse
Abstract

The highly sparse nature of propagation channels and the restricted use of radio frequency (RF) chains at transceivers limit the performance of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems. Introducing reconfigurable antennas to mmWave can offer an additional degree of freedom on designing mmWave MIMO systems. This paper provides a theoretical framework for studying the mmWave MIMO with reconfigurable antennas. We present an architecture of reconfigurable mmWave MIMO with beamspace hybrid analog-digital beamformers and reconfigurable antennas at both the transmitter and the receiver. We show that employing reconfigurable antennas can provide throughput gain for the mmWave MIMO. We derive the expression for the average throughput gain of using reconfigurable antennas, and further simplify the expression by considering the case of large number of reconfiguration states. In addition, we propose a low-complexity algorithm for the reconfiguration state and beam selection, which achieves nearly the same throughput performance as the optimal selection of reconfiguration state and beams by exhaustive search.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.00051

PDF

https://arxiv.org/pdf/1806.00051
Read All
Impact of 2D-Graphene on SiN Passivated AlGaN/GaN MIS-HEMTs under Mist Exposure

2018-05-31

M. Fátima Romero, Alberto Boscá, Jorge Pedrós, Javier Martínez, Rajveer Fandan, Tomás Palacios, Fernando Calle

arXiv_CV

arXiv_CV GAN
Abstract

The effect of a two dimensional (2D) graphene layer (GL) on top of the silicon nitride (SiN) passivation layer of AlGaN/GaN metal-insulator-semiconductor high-electron-mobility transistors (MIS-HEMTs) has been systematically analyzed. Results showed that in the devices without the GL, the maximum drain current density (I_D,max) and the maximum transconductance (g_m,max) decreased gradually as the mist exposure time increased, up to 23% and 10%, respectively. Moreover, the gate lag ratio (GLR) increased around 10% during mist exposure. In contrast, devices with a GL showed a robust behavior and not significant changes in the electrical characteristics in both DC and pulsed conditions. The origin of these behaviors has been discussed and the results pointed to the GL as the key factor for improving the moisture resistance of the SiN passivation layer.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.12432

PDF

https://arxiv.org/pdf/1805.12432
Read All
Reinforced Continual Learning

2018-05-31

Ju Xu, Zhanxing Zhu

arXiv_CV

arXiv_CV Knowledge Reinforcement_Learning Classification
Abstract

Most artificial intelligence models have limiting ability to solve new tasks faster, without forgetting previously acquired knowledge. The recently emerging paradigm of continual learning aims to solve this issue, in which the model learns various tasks in a sequential fashion. In this work, a novel approach for continual learning is proposed, which searches for the best neural architecture for each coming task via sophisticatedly designed reinforcement learning strategies. We name it as Reinforced Continual Learning. Our method not only has good performance on preventing catastrophic forgetting but also fits new tasks well. The experiments on sequential classification tasks for variants of MNIST and CIFAR-100 datasets demonstrate that the proposed approach outperforms existing continual learning alternatives for deep networks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.12369

PDF

https://arxiv.org/pdf/1805.12369
Read All
Grow and Prune Compact, Fast, and Accurate LSTMs

2018-05-31

Xiaoliang Dai, Hongxu Yin, Niraj K. Jha

arXiv_CV

arXiv_CV Image_Caption Speech_Recognition Caption RNN Recognition
Abstract

Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM’s original one level non-linear control gates. H-LSTM increases accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly. We employ grow-and-prune (GP) training to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections. This learns both the weights and the compact architecture of H-LSTM control gates. We have GP-trained H-LSTMs for image captioning and speech recognition applications. For the NeuralTalk architecture on the MSCOCO dataset, our three models reduce the number of parameters by 38.7x [floating-point operations (FLOPs) by 45.5x], run-time latency by 4.5x, and improve the CIDEr score by 2.6. For the DeepSpeech2 architecture on the AN4 dataset, our two models reduce the number of parameters by 19.4x (FLOPs by 23.5x), run-time latency by 15.7%, and the word error rate from 12.9% to 8.7%. Thus, GP-trained H-LSTMs can be seen to be compact, fast, and accurate.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11797

PDF

https://arxiv.org/pdf/1805.11797
Read All
Multi-Mention Learning for Reading Comprehension with Neural Cascades

2018-05-30

Swabha Swayamdipta, Ankur P. Parikh, Tom Kwiatkowski

arXiv_CV

arXiv_CV QA Attention
Abstract

Reading comprehension is a challenging task, especially when executed across longer or across multiple evidence documents, where the answer is likely to reoccur. Existing neural architectures typically do not scale to the entire evidence, and hence, resort to selecting a single passage in the document (either via truncation or other means), and carefully searching for the answer within that passage. However, in some cases, this strategy can be suboptimal, since by focusing on a specific passage, it becomes difficult to leverage multiple mentions of the same answer throughout the document. In this work, we take a different approach by constructing lightweight models that are combined in a cascade to find the answer. Each submodel consists only of feed-forward networks equipped with an attention mechanism, making it trivially parallelizable. We show that our approach can scale to approximately an order of magnitude larger evidence documents and can aggregate information at the representation level from multiple mentions of each answer candidate across the document. Empirically, our approach achieves state-of-the-art performance on both the Wikipedia and web domains of the TriviaQA dataset, outperforming more complex, recurrent architectures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.00894

PDF

https://arxiv.org/pdf/1711.00894
Read All
Extracting Scientific Figures with Distantly Supervised Neural Networks

2018-05-30

Noah Siegel, Nicholas Lourie, Russell Power, Waleed Ammar

arXiv_CV

arXiv_CV Caption Detection
Abstract

Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven methods for scientific figure extraction. In this paper, we induce high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention. To accomplish this we leverage the auxiliary data provided in two large web collections of scientific documents (arXiv and PubMed) to locate figures and their associated captions in the rasterized PDF. We share the resulting dataset of over 5.5 million induced labels—4,000 times larger than the previous largest figure extraction dataset—with an average precision of 96.8%, to enable the development of modern data-driven methods for this task. We use this dataset to train a deep neural network for end-to-end figure detection, yielding a model that can be more easily extended to new domains compared to previous work. The model was successfully deployed in Semantic Scholar, a large-scale academic search engine, and used to extract figures in 13 million scientific documents.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.02445

PDF

https://arxiv.org/pdf/1804.02445
Read All

215/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL