Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Learning to Compose with Professional Photographs on the Web

2017-07-18

Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, Kwan-Liu Ma

arXiv_CV

arXiv_CV
Abstract

Photo composition is an important factor affecting the aesthetics in photography. However, it is a highly challenging task to model the aesthetic properties of good compositions due to the lack of globally applicable rules to the wide variety of photographic styles. Inspired by the thinking process of photo taking, we formulate the photo composition problem as a view finding process which successively examines pairs of views and determines their aesthetic preferences. We further exploit the rich professional photographs on the web to mine unlimited high-quality ranking samples and demonstrate that an aesthetics-aware deep ranking network can be trained without explicitly modeling any photographic rules. The resulting model is simple and effective in terms of its architectural design and data sampling method. It is also generic since it naturally learns any photographic rules implicitly encoded in professional photographs. The experiments show that the proposed view finding network achieves state-of-the-art performance with sliding window search strategy on two image cropping datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1702.00503

PDF

https://arxiv.org/pdf/1702.00503
Read All
Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder

2017-07-18

Huadong Chen, Shujian Huang, David Chiang, Jiajun Chen

arXiv_CL

arXiv_CL Attention NMT
Abstract

Most neural machine translation (NMT) models are based on the sequential encoder-decoder framework, which makes no use of syntactic information. In this paper, we improve this model by explicitly incorporating source-side syntactic trees. More specifically, we propose (1) a bidirectional tree encoder which learns both sequential and tree structured representations; (2) a tree-coverage model that lets the attention depend on the source-side syntax. Experiments on Chinese-English translation demonstrate that our proposed models outperform the sequential attentional model as well as a stronger baseline with a bottom-up tree encoder and word coverage.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.05436

PDF

https://arxiv.org/pdf/1707.05436
Read All
Auxiliary Objectives for Neural Error Detection Models

2017-07-17

Marek Rei, Helen Yannakoudakis

arXiv_CV

arXiv_CV Detection
Abstract

We investigate the utility of different auxiliary objectives and training strategies within a neural sequence labeling approach to error detection in learner writing. Auxiliary costs provide the model with additional linguistic information, allowing it to learn general-purpose compositional features that can then be exploited for other objectives. Our experiments show that a joint learning approach trained with parallel labels on in-domain data improves performance over the previous best error detection system. While the resulting model has the same number of parameters, the additional objectives allow it to be optimised more efficiently and achieve better performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.05227

PDF

https://arxiv.org/pdf/1707.05227
Read All
Oscillations in networks of networks stem from adaptive nodes with memory

2017-07-17

Amir Goldental, Herut Uzan, Shira Sardi, Ido Kanter

arXiv_CV

arXiv_CV Quantitative
Abstract

We present an analytical framework that allows the quantitative study of statistical dynamic properties of networks with adaptive nodes that have memory and is used to examine the emergence of oscillations in networks with response failures. The frequency of the oscillations was quantitatively found to increase with the excitability of the nodes and with the average degree of the network and to decrease with delays between nodes. For networks of networks, diverse cluster oscillation modes were found as a function of the topology. Analytical results are in agreement with large-scale simulations and open the horizon for understanding network dynamics composed of finite memory nodes as well as their different phases of activity.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.05157

PDF

https://arxiv.org/pdf/1707.05157
Read All
Wide-Residual-Inception Networks for Real-time Object Detection

2017-07-17

Youngwan Lee, Byeonghak Yim, Huien Kim, Eunsoo Park, Xuenan Cui, Taekang Woo, Hakil Kim

arXiv_CV

arXiv_CV Object_Detection CNN Classification Detection
Abstract

Since convolutional neural network(CNN)models emerged,several tasks in computer vision have actively deployed CNN models for feature extraction. However,the conventional CNN models have a high computational cost and require high memory capacity, which is impractical and unaffordable for commercial applications such as real-time on-road object detection on embedded boards or mobile platforms. To tackle this limitation of CNN models, this paper proposes a wide-residual-inception (WR-Inception) network, which constructs the architecture based on a residual inception unit that captures objects of various sizes on the same feature map, as well as shallower and wider layers, compared to state-of-the-art networks like ResNet. To verify the proposed networks, this paper conducted two experiments; one is a classification task on CIFAR-10/100 and the other is an on-road object detection task using a Single-Shot Multi-box Detector(SSD) on the KITTI dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1702.01243

PDF

https://arxiv.org/pdf/1702.01243
Read All
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach

2017-07-16

Aidean Sharghi, Jacob S. Laurel, Boqing Gong

arXiv_CV

arXiv_CV Summarization
Abstract

Recent years have witnessed a resurgence of interest in video summarization. However, one of the main obstacles to the research on video summarization is the user subjectivity - users have various preferences over the summaries. The subjectiveness causes at least two problems. First, no single video summarizer fits all users unless it interacts with and adapts to the individual users. Second, it is very challenging to evaluate the performance of a video summarizer. To tackle the first problem, we explore the recently proposed query-focused video summarization which introduces user preferences in the form of text queries about the video into the summarization process. We propose a memory network parameterized sequential determinantal point process in order to attend the user query onto different video frames and shots. To address the second challenge, we contend that a good evaluation metric for video summarization should focus on the semantic information that humans can perceive rather than the visual features or temporal overlaps. To this end, we collect dense per-video-shot concept annotations, compile a new dataset, and suggest an efficient evaluation method defined upon the concept annotations. We conduct extensive experiments contrasting our video summarizer to existing ones and present detailed analyses about the dataset and the new evaluation method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.04960

PDF

https://arxiv.org/pdf/1707.04960
Read All
Ensembling Factored Neural Machine Translation Models for Automatic Post-Editing and Quality Estimation

2017-07-15

Chris Hokamp

arXiv_CL

arXiv_CL NMT
Abstract

This work presents a novel approach to Automatic Post-Editing (APE) and Word-Level Quality Estimation (QE) using ensembles of specialized Neural Machine Translation (NMT) systems. Word-level features that have proven effective for QE are included as input factors, expanding the representation of the original source and the machine translation hypothesis, which are used to generate an automatically post-edited hypothesis. We train a suite of NMT models that use different input representations, but share the same output space. These models are then ensembled together, and tuned for both the APE and the QE task. We thus attempt to connect the state-of-the-art approaches to APE and QE within a single framework. Our models achieve state-of-the-art results in both tasks, with the only difference in the tuning step which learns weights for each component of the ensemble.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.05083

PDF

https://arxiv.org/pdf/1706.05083
Read All
Simplified Long Short-term Memory Recurrent Neural Networks: part III

2017-07-14

Atra Akandeh, Fathi M. Salem

arXiv_CV

arXiv_CV RNN
Abstract

This is part III of three-part work. In parts I and II, we have presented eight variants for simplified Long Short Term Memory (LSTM) recurrent neural networks (RNNs). It is noted that fast computation, specially in constrained computing resources, are an important factor in processing big time-sequence data. In this part III paper, we present and evaluate two new LSTM model variants which dramatically reduce the computational load while retaining comparable performance to the base (standard) LSTM RNNs. In these new variants, we impose (Hadamard) pointwise state multiplications in the cell-memory network in addition to the gating signal networks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.04626

PDF

https://arxiv.org/pdf/1707.04626
Read All
Simplified Long Short-term Memory Recurrent Neural Networks: part II

2017-07-14

Atra Akandeh, Fathi M. Salem

arXiv_CV

arXiv_CV RNN
Abstract

This is part II of three-part work. Here, we present a second set of inter-related five variants of simplified Long Short-term Memory (LSTM) recurrent neural networks by further reducing adaptive parameters. Two of these models have been introduced in part I of this work. We evaluate and verify our model variants on the benchmark MNIST dataset and assert that these models are comparable to the base LSTM model while use progressively less number of parameters. Moreover, we observe that in case of using the ReLU activation, the test accuracy performance of the standard LSTM will drop after a number of epochs when learning parameter become larger. However all of the new model variants sustain their performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.04623

PDF

https://arxiv.org/pdf/1707.04623
Read All
Simplified Long Short-term Memory Recurrent Neural Networks: part I

2017-07-14

Atra Akandeh, Fathi M. Salem

arXiv_CV

arXiv_CV RNN
Abstract

We present five variants of the standard Long Short-term Memory (LSTM) recurrent neural networks by uniformly reducing blocks of adaptive parameters in the gating mechanisms. For simplicity, we refer to these models as LSTM1, LSTM2, LSTM3, LSTM4, and LSTM5, respectively. Such parameter-reduced variants enable speeding up data training computations and would be more suitable for implementations onto constrained embedded platforms. We comparatively evaluate and verify our five variant models on the classical MNIST dataset and demonstrate that these variant models are comparable to a standard implementation of the LSTM model while using less number of parameters. Moreover, we observe that in some cases the standard LSTM’s accuracy performance will drop after a number of epochs when using the ReLU nonlinearity; in contrast, however, LSTM3, LSTM4 and LSTM5 will retain their performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.04619

PDF

https://arxiv.org/pdf/1707.04619
Read All
CUNI System for the WMT17 Multimodal Translation Task

2017-07-14

Jindřich Helcl, Jindřich Libovický

arXiv_CV

arXiv_CV Image_Caption Caption
Abstract

In this paper, we describe our submissions to the WMT17 Multimodal Translation Task. For Task 1 (multimodal translation), our best scoring system is a purely textual neural translation of the source image caption to the target language. The main feature of the system is the use of additional data that was acquired by selecting similar sentences from parallel corpora and by data synthesis with back-translation. For Task 2 (cross-lingual image captioning), our best submitted system generates an English caption which is then translated by the best system used in Task 1. We also present negative results, which are based on ideas that we believe have potential of making improvements, but did not prove to be useful in our particular setup.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.04550

PDF

https://arxiv.org/pdf/1707.04550
Read All
A New Urban Objects Detection Framework Using Weakly Annotated Sets

2017-07-14

Eric Keiji, Gabriel Ferreira, Claudio Silva, Roberto M. Cesar Jr

arXiv_CV

arXiv_CV Object_Detection Detection Recognition
Abstract

Urban informatics explore data science methods to address different urban issues intensively based on data. The large variety and quantity of data available should be explored but this brings important challenges. For instance, although there are powerful computer vision methods that may be explored, they may require large annotated datasets. In this work we propose a novel approach to automatically creating an object recognition system with minimal manual annotation. The basic idea behind the method is to use large input datasets using available online cameras on large cities. A off-the-shelf weak classifier is used to detect an initial set of urban elements of interest (e.g. cars, pedestrians, bikes, etc.). Such initial dataset undergoes a quality control procedure and it is subsequently used to fine tune a strong classifier. Quality control and comparative performance assessment are used as part of the pipeline. We evaluate the method for detecting cars based on monitoring cameras. Experimental results using real data show that despite losing generality, the final detector provides better detection rates tailored to the selected cameras. The programmed robot gathered 770 video hours from 24 online city cameras (~300GB), which has been fed to the proposed system. Our approach has shown that the method nearly doubled the recall (93\%) with respect to state-of-the-art methods using off-the-shelf algorithms.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.09308

PDF

https://arxiv.org/pdf/1706.09308
Read All
Inner-Scene Similarities as a Contextual Cue for Object Detection

2017-07-14

Noa Arbel, Tamar Avraham, Michael Lindenbaum

arXiv_CV

arXiv_CV Object_Detection Optimization Classification Detection
Abstract

Using image context is an effective approach for improving object detection. Previously proposed methods used contextual cues that rely on semantic or spatial information. In this work, we explore a different kind of contextual information: inner-scene similarity. We present the CISS (Context by Inner Scene Similarity) algorithm, which is based on the observation that two visually similar sub-image patches are likely to share semantic identities, especially when both appear in the same image. CISS uses base-scores provided by a base detector and performs as a post-detection stage. For each candidate sub-image (denoted anchor), the CISS algorithm finds a few similar sub-images (denoted supporters), and, using them, calculates a new enhanced score for the anchor. This is done by utilizing the base-scores of the supporters and a pre-trained dependency model. The new scores are modeled as a linear function of the base scores of the anchor and the supporters and is estimated using a minimum mean square error optimization. This approach results in: (a) improved detection of partly occluded objects (when there are similar non-occluded objects in the scene), and (b) fewer false alarms (when the base detector mistakenly classifies a background patch as an object). This work relates to Duncan and Humphreys’ “similarity theory,” a psychophysical study. which suggested that the human visual system perceptually groups similar image regions and that the classification of one region is affected by the estimated identity of the other. Experimental results demonstrate the enhancement of a base detector’s scores on the PASCAL VOC dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.04406

PDF

https://arxiv.org/pdf/1707.04406
Read All
End-to-End Instance Segmentation with Recurrent Attention

2017-07-13

Mengye Ren, Richard S. Zemel

arXiv_CV

arXiv_CV Image_Caption Segmentation Attention Caption CNN Semantic_Segmentation RNN Prediction VQA
Abstract

While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. Techniques that combine large graphical models with low-level vision have been proposed to address this problem; however, we propose an end-to-end recurrent neural network (RNN) architecture with an attention mechanism to model a human-like counting process, and produce detailed instance segmentations. The network is jointly trained to sequentially produce regions of interest as well as a dominant object segmentation within each region. The proposed model achieves competitive results on the CVPPP, KITTI, and Cityscapes datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1605.09410

PDF

https://arxiv.org/pdf/1605.09410
Read All
Well-Founded Operators for Normal Hybrid MKNF Knowledge Bases

2017-07-12

Jianmin Ji, Fangfang Liu, Jia-Huai You

arXiv_CV

arXiv_CV Knowledge Ontology Inference
Abstract

Hybrid MKNF knowledge bases have been considered one of the dominant approaches to combining open world ontology languages with closed world rule-based languages. Currently, the only known inference methods are based on the approach of guess-and-verify, while most modern SAT/ASP solvers are built under the DPLL architecture. The central impediment here is that it is not clear what constitutes a constraint propagator, a key component employed in any DPLL-based solver. In this paper, we address this problem by formulating the notion of unfounded sets for nondisjunctive hybrid MKNF knowledge bases, based on which we propose and study two new well-founded operators. We show that by employing a well-founded operator as a constraint propagator, a sound and complete DPLL search engine can be readily defined. We compare our approach with the operator based on the alternating fixpoint construction by Knorr et al [2011] and show that, when applied to arbitrary partial partitions, the new well-founded operators not only propagate more truth values but also circumvent the non-converging behavior of the latter. In addition, we study the possibility of simplifying a given hybrid MKNF knowledge base by employing a well-founded operator, and show that, out of the two operators proposed in this paper, the weaker one can be applied for this purpose and the stronger one cannot. These observations are useful in implementing a grounder for hybrid MKNF knowledge bases, which can be applied before the computation of MKNF models. The paper is under consideration for acceptance in TPLP.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.01959

PDF

https://arxiv.org/pdf/1707.01959
Read All
Capacity, Fidelity, and Noise Tolerance of Associative Spatial-Temporal Memories Based on Memristive Neuromorphic Network

2017-07-12

Dmitri Gavrilov, Dmitri Strukov, Konstantin K. Likharev

arXiv_CV

arXiv_CV Gradient_Descent
Abstract

We have calculated the key characteristics of associative (content-addressable) spatial-temporal memories based on neuromorphic networks with restricted connectivity - “CrossNets”. Such networks may be naturally implemented in nanoelectronic hardware using hybrid CMOS/memristor circuits, which may feature extremely high energy efficiency, approaching that of biological cortical circuits, at much higher operation speed. Our numerical simulations, in some cases confirmed by analytical calculations, have shown that the characteristics depend substantially on the method of information recording into the memory. Of the four methods we have explored, two look especially promising - one based on the quadratic programming, and the other one being a specific discrete version of the gradient descent. The latter method provides a slightly lower memory capacity (at the same fidelity) then the former one, but it allows local recording, which may be more readily implemented in nanoelectronic hardware. Most importantly, at the synchronous retrieval, both methods provide a capacity higher than that of the well-known Ternary Content-Addressable Memories with the same number of nonvolatile memory cells (e.g., memristors), though the input noise immunity of the CrossNet memories is somewhat lower.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.03855

PDF

https://arxiv.org/pdf/1707.03855
Read All
NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles

2017-07-12

Jiajun Lu, Hussein Sibai, Evan Fabry, David Forsyth

arXiv_CV

arXiv_CV Adversarial Object_Detection Classification Detection
Abstract

It has been shown that most machine learning algorithms are susceptible to adversarial perturbations. Slightly perturbing an image in a carefully chosen direction in the image space may cause a trained neural network model to misclassify it. Recently, it was shown that physical adversarial examples exist: printing perturbed images then taking pictures of them would still result in misclassification. This raises security and safety concerns. However, these experiments ignore a crucial property of physical objects: the camera can view objects from different distances and at different angles. In this paper, we show experiments that suggest that current constructions of physical adversarial examples do not disrupt object detection from a moving platform. Instead, a trained neural network classifies most of the pictures taken from different distances and angles of a perturbed image correctly. We believe this is because the adversarial property of the perturbation is sensitive to the scale at which the perturbed picture is viewed, so (for example) an autonomous car will misclassify a stop sign only from a small range of distances. Our work raises an important question: can one construct examples that are adversarial for many or most viewing conditions? If so, the construction should offer very significant insights into the internal representation of patterns by deep networks. If not, there is a good prospect that adversarial examples can be reduced to a curiosity with little practical impact.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.03501

PDF

https://arxiv.org/pdf/1707.03501
Read All
A network model for clonal differentiation and immune memory

2017-07-11

Alexandre de Castro

arXiv_CV

arXiv_CV
Abstract

A model of bit-strings, that uses the technique of multi-spin coding, was previously used to study the time evolution of B-cell clone repertoire, in a paper by Lagreca, Almeida and Santos. In this work we extend that simplified model to include independently the role of the populations of antibodies, in the control of the immune response, producing mechanisms of differentiation and regulation in a more complete way. Although the antibodies have the same molecular shape of the B-cells receptors (BCR), they should present a different time evolution and thus should be treated separately. We have also studied a possible model for the network immune memory, suggesting a random memory regeneration, which is self-perpetuating.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.09295

PDF

https://arxiv.org/pdf/1707.09295
Read All
A step towards procedural terrain generation with GANs

2017-07-11

Christopher Beckham, Christopher Pal

arXiv_CV

arXiv_CV GAN
Abstract

Procedural terrain generation for video games has been traditionally been done with smartly designed but handcrafted algorithms that generate heightmaps. We propose a first step toward the learning and synthesis of these using recent advances in deep generative modelling with openly available satellite imagery from NASA.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.03383

PDF

https://arxiv.org/pdf/1707.03383
Read All
Application-Driven Near-Data Processing for Similarity Search

2017-07-10

Vincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin

arXiv_CV

arXiv_CV Recommendation
Abstract

Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases, computational biology, and computer graphics. At its core, similarity search manifests as k-nearest neighbors (kNN), a computationally simple primitive consisting of highly parallel distance calculations and a global top-k sort. However, kNN is poorly supported by today’s architectures because of its high memory bandwidth requirements. This paper proposes an application-driven near-data processing accelerator for similarity search: the Similarity Search Associative Memory (SSAM). By instantiating compute units close to memory, SSAM benefits from the higher memory bandwidth and density exposed by emerging memory technologies. We evaluate the SSAM design down to layout on top of the Micron hybrid memory cube (HMC), and show that SSAM can achieve up to two orders of magnitude area-normalized throughput and energy efficiency improvement over multicore CPUs; we also show SSAM is faster and more energy efficient than competing GPUs and FPGAs. Finally, we show that SSAM is also useful for other data intensive tasks like kNN index construction, and can be generalized to semantically function as a high capacity content addressable memory.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1606.03742

PDF

https://arxiv.org/pdf/1606.03742
Read All
The mise en scene of memristive networks: effective memory, dynamics and learning

2017-07-08

Francesco Caravelli

arXiv_CV

arXiv_CV Gradient_Descent
Abstract

We discuss the properties of the dynamics of purely memristive circuits using a recently derived consistent equation for the internal memory variables of the involved memristors. In particular, we show that the number of independent memory states in a memristive circuit is constrained by the circuit conservation laws, and that the dynamics preserves these symmetries by means of a projection on the physical subspace. Moreover, we discuss other symmetries of the dynamics under various transformations of the internal memory, and study the linearized and strongly non-linear regimes of the dynamics. In the strongly non-linear regime, we derive a conservation law for the internal memory variables. We also provide a condition on the reality of the eigenvalues of Lyapunov matrices describing the linearized dynamics close to a fixed point. We show that the eigenvalues ca be imaginary only for mixtures of passive and active components. Our last result concerns the weak non-linear regime. We show that the internal memory dynamics can be interpreted as a constrained gradient descent, and provide the functional being minimized. This latter result provides another direct connection between memristors and learning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1611.02104

PDF

https://arxiv.org/pdf/1611.02104
Read All
Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures

2017-07-08

Joseph Suarez, Clare Zhu

arXiv_CV

arXiv_CV Sparse QA VQA
Abstract

We present a simple dynamic batching approach applicable to a large class of dynamic architectures that consistently yields speedups of over 10x. We provide performance bounds when the architecture is not known a priori and a stronger bound in the special case where the architecture is a predetermined balanced tree. We evaluate our approach on Johnson et al.’s recent visual question answering (VQA) result of his CLEVR dataset by Inferring and Executing Programs (IEP). We also evaluate on sparsely gated mixture of experts layers and achieve speedups of up to 1000x over the naive implementation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.02402

PDF

https://arxiv.org/pdf/1707.02402
Read All
Visual Search at eBay

2017-07-07

Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour, Robinson Piramuthu

arXiv_CV

arXiv_CV Face Optimization Inference Deep_Learning
Abstract

In this paper, we propose a novel end-to-end approach for scalable visual search infrastructure. We discuss the challenges we faced for a massive volatile inventory like at eBay and present our solution to overcome those. We harness the availability of large image collection of eBay listings and state-of-the-art deep learning techniques to perform visual search at scale. Supervised approach for optimized search limited to top predicted categories and also for compact binary signature are key to scale up without compromising accuracy and precision. Both use a common deep neural network requiring only a single forward inference. The system architecture is presented with in-depth discussions of its basic components and optimizations for a trade-off between search relevance and latency. This solution is currently deployed in a distributed cloud infrastructure and fuels visual search in eBay ShopBot and Close5. We show benchmark on ImageNet dataset on which our approach is faster and more accurate than several unsupervised baselines. We share our learnings with the hope that visual search becomes a first class citizen for all large scale search engines rather than an afterthought.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.03154

PDF

https://arxiv.org/pdf/1706.03154
Read All
Deep Character-Level Click-Through Rate Prediction for Sponsored Search

2017-07-07

Bora Edizel, Amin Mantrach, Xiao Bai

arXiv_CV

arXiv_CV CNN Language_Model Prediction
Abstract

Predicting the click-through rate of an advertisement is a critical component of online advertising platforms. In sponsored search, the click-through rate estimates the probability that a displayed advertisement is clicked by a user after she submits a query to the search engine. Commercial search engines typically rely on machine learning models trained with a large number of features to make such predictions. This is inevitably requires a lot of engineering efforts to define, compute, and select the appropriate features. In this paper, we propose two novel approaches (one working at character level and the other working at word level) that use deep convolutional neural networks to predict the click-through rate of a query-advertisement pair. Specially, the proposed architectures only consider the textual content appearing in a query-advertisement pair as input, and produce as output a click-through rate prediction. By comparing the character-level model with the word-level model, we show that language representation can be learnt from scratch at character level when trained on enough data. Through extensive experiments using billions of query-advertisement pairs of a popular commercial search engine, we demonstrate that both approaches significantly outperform a baseline model built on well-selected text features and a state-of-the-art word2vec-based approach. Finally, by combining the predictions of the deep models introduced in this study with the prediction of the model in production of the same commercial search engine, we significantly improve the accuracy and the calibration of the click-through rate prediction of the production system.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.02158

PDF

https://arxiv.org/pdf/1707.02158
Read All
Long-Term Memory Networks for Question Answering

2017-07-06

Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, Jing Gao

arXiv_CV

arXiv_CV Face Inference RNN Memory_Networks
Abstract

Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task. Several deep neural network architectures have been developed recently, which employ memory and inference components to memorize and reason over text information, and generate answers to questions. However, a major drawback of many such models is that they are capable of only generating single-word answers. In addition, they require large amount of training data to generate accurate answers. In this paper, we introduce the Long-Term Memory Network (LTMN), which incorporates both an external memory module and a Long Short-Term Memory (LSTM) module to comprehend the input data and generate multi-word answers. The LTMN model can be trained end-to-end using back-propagation and requires minimal supervision. We test our model on two synthetic data sets (based on Facebook’s bAbI data set) and the real-world Stanford question answering data set, and show that it can achieve state-of-the-art performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.01961

PDF

https://arxiv.org/pdf/1707.01961
Read All
A Survey on Geographically Distributed Big-Data Processing using MapReduce

2017-07-06

Shlomi Dolev, Patricia Florissi, Ehud Gudes, Shantanu Sharma, Ido Singer

arXiv_CV

arXiv_CV Face Survey
Abstract

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved in the current system, are expected to process geographically distributed data at their locations without moving entire raw datasets to a single location. In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. We classify and study batch processing (MapReduce-based systems), stream processing (Spark-based systems), and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.01869

PDF

https://arxiv.org/pdf/1707.01869
Read All
RON: Reverse Connection with Objectness Prior Networks for Object Detection

2017-07-06

Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, Yurong Chen

arXiv_CV

arXiv_CV Object_Detection CNN Detection
Abstract

We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384X384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3X faster than the Faster R-CNN counterpart.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.01691

PDF

https://arxiv.org/pdf/1707.01691
Read All
Toward Computation and Memory Efficient Neural Network Acoustic Models with Binary Weights and Activations

2017-07-04

Liang Lu

arXiv_CV

arXiv_CV Speech_Recognition Inference Recognition
Abstract

Neural network acoustic models have significantly advanced state of the art speech recognition over the past few years. However, they are usually computationally expensive due to the large number of matrix-vector multiplications and nonlinearity operations. Neural network models also require significant amounts of memory for inference because of the large model size. For these two reasons, it is challenging to deploy neural network based speech recognizers on resource-constrained platforms such as embedded devices. This paper investigates the use of binary weights and activations for computation and memory efficient neural network acoustic models. Compared to real-valued weight matrices, binary weights require much fewer bits for storage, thereby cutting down the memory footprint. Furthermore, with binary weights or activations, the matrix-vector multiplications are turned into addition and subtraction operations, which are computationally much faster and more energy efficient for hardware platforms. In this paper, we study the applications of binary weights and activations for neural network acoustic modeling, reporting encouraging results on the WSJ and AMI corpora.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.09453

PDF

https://arxiv.org/pdf/1706.09453
Read All
An empirical study on the effectiveness of images in Multimodal Neural Machine Translation

2017-07-04

Jean-Benoit Delbrouck, Stéphane Dupont

arXiv_CL

arXiv_CL Attention NMT
Abstract

In state-of-the-art Neural Machine Translation (NMT), an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multimodal tasks, where it becomes possible to focus both on sentence parts and image regions that they describe. In this paper, we compare several attention mechanism on the multimodal translation task (English, image to German) and evaluate the ability of the model to make use of images to improve translation. We surpass state-of-the-art scores on the Multi30k data set, we nevertheless identify and report different misbehavior of the machine while translating.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.00995

PDF

https://arxiv.org/pdf/1707.00995
Read All
DeepStory: Video Story QA by Deep Embedded Memory Networks

2017-07-04

Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, Byoung-Tak Zhang

arXiv_CV

arXiv_CV QA Attention Embedding RNN Memory_Networks
Abstract

Question-answering (QA) on video contents is a significant challenge for achieving human-level intelligence as it involves both vision and language in real-world settings. Here we demonstrate the possibility of an AI agent performing video story QA by learning from a large amount of cartoon videos. We develop a video-story learning model, i.e. Deep Embedded Memory Networks (DEMN), to reconstruct stories from a joint scene-dialogue video stream using a latent embedding space of observed data. The video stories are stored in a long-term memory component. For a given question, an LSTM-based attention model uses the long-term memory to recall the best question-story-answer triplet by focusing on specific words containing key information. We trained the DEMN on a novel QA dataset of children’s cartoon video series, Pororo. The dataset contains 16,066 scene-dialogue pairs of 20.5-hour videos, 27,328 fine-grained sentences for scene description, and 8,913 story-related QA pairs. Our experimental results show that the DEMN outperforms other QA models. This is mainly due to 1) the reconstruction of video stories in a scene-dialogue combined form that utilize the latent embedding and 2) attention. DEMN also achieved state-of-the-art results on the MovieQA benchmark.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.00836

PDF

https://arxiv.org/pdf/1707.00836
Read All
Learning to Avoid Errors in GANs by Manipulating Input Spaces

2017-07-03

Alexander B. Jung

arXiv_CV

arXiv_CV GAN
Abstract

Despite recent advances, large scale visual artifacts are still a common occurrence in images generated by GANs. Previous work has focused on improving the generator’s capability to accurately imitate the data distribution $p_{data}$. In this paper, we instead explore methods that enable GANs to actively avoid errors by manipulating the input space. The core idea is to apply small changes to each noise vector in order to shift them away from areas in the input space that tend to result in errors. We derive three different architectures from that idea. The main one of these consists of a simple residual module that leads to significantly less visual artifacts, while only slightly decreasing diversity. The module is trivial to add to existing GANs and costs almost zero computation and memory.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.00768

PDF

https://arxiv.org/pdf/1707.00768
Read All
Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels

2017-07-03

Dhanesh Ramachandram, Michal Lisicki, Timothy J. Shields, Mohamed R. Amer, Graham W. Taylor

arXiv_CV

arXiv_CV Optimization Deep_Learning Recognition
Abstract

A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned representations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyper-parameter search and cast it as a discrete optimization problem under the Bayesian optimization framework. We propose a novel graph-induced kernel to compute structural similarities in the search space of tree-structured multimodal architectures and demonstrate its effectiveness using two challenging multimodal human activity recognition datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.00750

PDF

https://arxiv.org/pdf/1707.00750
Read All
Where to Play: Retrieval of Video Segments using Natural-Language Queries

2017-07-02

Sangkuk Lee, Daesik Kim, Myunggi Lee, Jihye Hwang, Nojun Kwak

arXiv_CV

arXiv_CV Image_Caption Tracking Caption Quantitative Relation
Abstract

In this paper, we propose a new approach for retrieval of video segments using natural language queries. Unlike most previous approaches such as concept-based methods or rule-based structured models, the proposed method uses image captioning model to construct sentential queries for visual information. In detail, our approach exploits multiple captions generated by visual features in each image with `Densecap’. Then, the similarities between captions of adjacent images are calculated, which is used to track semantically similar captions over multiple frames. Besides introducing this novel idea of ‘tracking by captioning’, the proposed method is one of the first approaches that uses a language generation model learned by neural networks to construct semantic query describing the relations and properties of visual information. To evaluate the effectiveness of our approach, we have created a new evaluation dataset, which contains about 348 segments of scenes in 20 movie-trailers. Through quantitative and qualitative evaluation, we show that our method is effective for retrieval of video segments using natural language queries.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.00251

PDF

https://arxiv.org/pdf/1707.00251
Read All
Learning from Ambiguously Labeled Face Images

2017-07-01

Ching-Hui Chen, Vishal M. Patel, Rama Chellappa

arXiv_CV

arXiv_CV Knowledge Face Caption
Abstract

Learning a classifier from ambiguously labeled face images is challenging since training images are not always explicitly-labeled. For instance, face images of two persons in a news photo are not explicitly labeled by their names in the caption. We propose a Matrix Completion for Ambiguity Resolution (MCar) method for predicting the actual labels from ambiguously labeled images. This step is followed by learning a standard supervised classifier from the disambiguated labels to classify new images. To prevent the majority labels from dominating the result of MCar, we generalize MCar to a weighted MCar (WMCar) that handles label imbalance. Since WMCar outputs a soft labeling vector of reduced ambiguity for each instance, we can iteratively refine it by feeding it as the input to WMCar. Nevertheless, such an iterative implementation can be affected by the noisy soft labeling vectors, and thus the performance may degrade. Our proposed Iterative Candidate Elimination (ICE) procedure makes the iterative ambiguity resolution possible by gradually eliminating a portion of least likely candidates in ambiguously labeled face. We further extend MCar to incorporate the labeling constraints between instances when such prior knowledge is available. Compared to existing methods, our approach demonstrates improvement on several ambiguously labeled datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1702.04455

PDF

https://arxiv.org/pdf/1702.04455
Read All
Do GANs actually learn the distribution? An empirical study

2017-07-01

Sanjeev Arora, Yi Zhang

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Do GANS (Generative Adversarial Nets) actually learn the target distribution? The foundational paper of (Goodfellow et al 2014) suggested they do, if they were given sufficiently large deep nets, sample size, and computation time. A recent theoretical analysis in Arora et al (to appear at ICML 2017) raised doubts whether the same holds when discriminator has finite size. It showed that the training objective can approach its optimum value even if the generated distribution has very low support —in other words, the training objective is unable to prevent mode collapse. The current note reports experiments suggesting that such problems are not merely theoretical. It presents empirical evidence that well-known GANs approaches do learn distributions of fairly low support, and thus presumably are not learning the target distribution. The main technical contribution is a new proposed test, based upon the famous birthday paradox, for estimating the support size of the generated distribution.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08224

PDF

https://arxiv.org/pdf/1706.08224
Read All
Island dynamics and anisotropy during vapor phase epitaxy of m-plane GaN

2017-06-29

Edith Perret, Dongwei Xu, M. J. Highland, G. B. Stephenson, P. Zapol, P. H. Fuoss, A. Munkholm, Carol Thompson

arXiv_CV

arXiv_CV GAN Face
Abstract

Using in situ grazing-incidence x-ray scattering, we have measured the diffuse scattering from islands that form during layer-by-layer growth of GaN by metal-organic vapor phase epitaxy on the (1010) m-plane surface. The diffuse scattering is extended in the (0001) in-plane direction in reciprocal space, indicating a strong anisotropy with islands elongated along [1 $\overline{2}$ 10] and closely spaced along [0001]. This is confirmed by atomic force microscopy of a quenched sample. Islands were characterized as a function of growth rate G and temperature. The island spacing along [0001] observed during the growth of the first monolayer obeys a power-law dependence on growth rate G$^{-n}$, with an exponent $n = 0.25 \pm 0.02$. Results are in agreement with recent kinetic Monte Carlo simulations, indicating that elongated islands result from the dominant anisotropy in step edge energy and not from surface diffusion anisotropy. The observed power-law exponent can be explained using a simple steady-state model, which gives n = 1/4.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.09955

PDF

https://arxiv.org/pdf/1706.09955
Read All
Stronger Baselines for Trustable Results in Neural Machine Translation

2017-06-29

Michael Denkowski, Graham Neubig

arXiv_CL

arXiv_CL NMT
Abstract

Interest in neural machine translation has grown rapidly as its effectiveness has been demonstrated across language and data scenarios. New research regularly introduces architectural and algorithmic improvements that lead to significant gains over “vanilla” NMT implementations. However, these new techniques are rarely evaluated in the context of previously published techniques, specifically those that are widely used in state-of-theart production and shared-task systems. As a result, it is often difficult to determine whether improvements from research will carry over to systems deployed for real-world use. In this work, we recommend three specific methods that are relatively easy to implement and result in much stronger experimental systems. Beyond reporting significantly higher BLEU scores, we conduct an in-depth analysis of where improvements originate and what inherent weaknesses of basic NMT models are being addressed. We then compare the relative gains afforded by several other techniques proposed in the literature when starting with vanilla systems versus our stronger baselines, showing that experimental conclusions may change depending on the baseline chosen. This indicates that choosing a strong baseline is crucial for reporting reliable experimental results.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.09733

PDF

https://arxiv.org/pdf/1706.09733
Read All
Autotuning GPU Kernels via Static and Predictive Analysis

2017-06-29

Robert V. Lim, Boyana Norris, Allen D. Malony

arXiv_CV

arXiv_CV
Abstract

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models or performing code transformations. Although empirical autotuning addresses some of these challenges, it requires extensive experimentation and search for optimal code variants. This research presents an approach for tuning CUDA kernels based on static analysis that considers fine-grained code structure and the specific GPU architecture features. Notably, our approach does not require any program runs in order to discover near-optimal parameter settings. We demonstrate the applicability of our approach in enabling code autotuners such as Orio to produce competitive code variants comparable with empirical-based methods, without the high cost of experiments.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1701.08547

PDF

https://arxiv.org/pdf/1701.08547
Read All
Co-salient Object Detection Based on Deep Saliency Networks and Seed Propagation over an Integrated Graph

2017-06-29

Dong-ju Jeong, Insung Hwang, Nam Ik Cho

arXiv_CV

arXiv_CV Salient Object_Detection Knowledge Segmentation Weakly_Supervised Detection
Abstract

This paper presents a co-salient object detection method to find common salient regions in a set of images. We utilize deep saliency networks to transfer co-saliency prior knowledge and better capture high-level semantic information, and the resulting initial co-saliency maps are enhanced by seed propagation steps over an integrated graph. The deep saliency networks are trained in a supervised manner to avoid online weakly supervised learning and exploit them not only to extract high-level features but also to produce both intra- and inter-image saliency maps. Through a refinement step, the initial co-saliency maps can uniformly highlight co-salient regions and locate accurate object boundaries. To handle input image groups inconsistent in size, we propose to pool multi-regional descriptors including both within-segment and within-group information. In addition, the integrated multilayer graph is constructed to find the regions that the previous steps may not detect by seed propagation with low-level descriptors. In this work, we utilize the useful complementary components of high-, low-level information, and several learning-based steps. Our experiments have demonstrated that the proposed approach outperforms comparable co-saliency detection methods on widely used public databases and can also be directly applied to co-segmentation tasks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.09650

PDF

https://arxiv.org/pdf/1706.09650
Read All
Emission of Linearly Polarized Single Photons from Quantum Dots Contained in Nonpolar, Semipolar, and Polar Sections of Pencil-Like InGaN/GaN Nanowires

2017-06-28

Z. Gacevic, M. Holmes, E. Chernysheva, M. Muller, A. Torres-Pardo, P. Veit, F. Bertram, J. Christen, J. M. Gonzalez-Calbet, Y. Arakawa, E. Calleja, S. Lazic

arXiv_CV

arXiv_CV GAN
Abstract

A pencil-like morphology of homoepitaxially grown GaN nanowires is exploited for the fabrication of thin conformal intrawire InGaN nanoshells which host quantum dots in nonpolar, semipolar and polar crystal regions. All three quantum dot types exhibit single photon emission with narrow emission line widths and high degrees of linear optical polarization. The host crystal region strongly affects both single photon wavelength and emission lifetime, reaching subnanosecond time scales for the non- and semipolar quantum dots. Localization sites in the InGaN potential landscape, most likely induced by indium fluctuations across the InGaN nanoshell, are identified as the driving mechanism for the single photon emission. The hereby reported pencil-like InGaN nanoshell is the first single nanostructure able to host all three types of single photon sources and is, thus, a promising building block for tunable quantum light devices integrated into future photonic circuits.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.03603

PDF

https://arxiv.org/pdf/1706.03603
Read All
Energy-Based Sequence GANs for Recommendation and Their Connection to Imitation Learning

2017-06-28

Jaeyoon Yoo, Heonseok Ha, Jihun Yi, Jongha Ryu, Chanju Kim, Jung-Woo Ha, Young-Han Kim, Sungroh Yoon

arXiv_CV

arXiv_CV Adversarial GAN Recommendation
Abstract

Recommender systems aim to find an accurate and efficient mapping from historic data of user-preferred items to a new item that is to be liked by a user. Towards this goal, energy-based sequence generative adversarial nets (EB-SeqGANs) are adopted for recommendation by learning a generative model for the time series of user-preferred items. By recasting the energy function as the feature function, the proposed EB-SeqGANs is interpreted as an instance of maximum-entropy imitation learning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.09200

PDF

https://arxiv.org/pdf/1706.09200
Read All
archivist: An R Package for Managing, Recording and Restoring Data Analysis Results

2017-06-27

Przemyslaw Biecek, Marcin Kosinski

arXiv_CV

arXiv_CV Tracking Caption Relation
Abstract

Everything that exists in R is an object [Chambers2016]. This article examines what would be possible if we kept copies of all R objects that have ever been created. Not only objects but also their properties, meta-data, relations with other objects and information about context in which they were created. We introduce archivist, an R package designed to improve the management of results of data analysis. Key functionalities of this package include: (i) management of local and remote repositories which contain R objects and their meta-data (objects’ properties and relations between them); (ii) archiving R objects to repositories; (iii) sharing and retrieving objects (and it’s pedigree) by their unique hooks; (iv) searching for objects with specific properties or relations to other objects; (v) verification of object’s identity and context of it’s creation. The presented archivist package extends, in a combination with packages such as knitr and Sweave, the reproducible research paradigm by creating new ways to retrieve and validate previously calculated objects. These new features give a variety of opportunities such as: sharing R objects within reports or articles; adding hooks to R objects in table or figure captions; interactive exploration of object repositories; caching function calls with their results; retrieving object’s pedigree (information about how the object was created); automated tracking of the performance of considered models, restoring R libraries to the state in which object was archived.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08822

PDF

https://arxiv.org/pdf/1706.08822
Read All
Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis

2017-06-27

Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu

arXiv_CV

arXiv_CV GAN
Abstract

In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg. Hei font) images (Fig. 1(a)). Recent works mostly follow the stroke extraction and assemble pipeline which is complex in the process and limited by the effect of stroke extraction. We treat the calligraphy synthesis problem as an image-to-image translation problem and propose a deep neural network based model which can generate calligraphy images from standard font images directly. Besides, we also construct a large scale benchmark that contains various styles for Chinese calligraphy synthesis. We evaluate our method as well as some baseline methods on the proposed dataset, and the experimental results demonstrate the effectiveness of our proposed model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08789

PDF

https://arxiv.org/pdf/1706.08789
Read All
Memory-augmented Chinese-Uyghur Neural Machine Translation

2017-06-27

Shiyue Zhang, Gulnigar Mahmut, Dong Wang, Askar Hamdulla

arXiv_CL

arXiv_CL Attention GAN NMT Inference
Abstract

Neural machine translation (NMT) has achieved notable performance recently. However, this approach has not been widely applied to the translation task between Chinese and Uyghur, partly due to the limited parallel data resource and the large proportion of rare words caused by the agglutinative nature of Uyghur. In this paper, we collect ~200,000 sentence pairs and show that with this middle-scale database, an attention-based NMT can perform very well on Chinese-Uyghur/Uyghur-Chinese translation. To tackle rare words, we propose a novel memory structure to assist the NMT inference. Our experiments demonstrated that the memory-augmented NMT (M-NMT) outperforms both the vanilla NMT and the phrase-based statistical machine translation (SMT). Interestingly, the memory structure provides an elegant way for dealing with words that are out of vocabulary.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08683

PDF

https://arxiv.org/pdf/1706.08683
Read All
Hidden long evolutionary memory in a model biochemical network

2017-06-26

Md. Zulfikar Ali, Ned S. Wingreen, Ranjan Mukhopadhyay

arXiv_CV

arXiv_CV
Abstract

We introduce a minimal model for the evolution of functional protein-interaction networks using a sequence-based mutational algorithm, and apply the model to study neutral drift in networks that yield oscillatory dynamics. Starting with a functional core module, random evolutionary drift increases network complexity even in the absence of specific selective pressures. Surprisingly, we uncover a hidden order in sequence space that gives rise to long-term evolutionary memory, implying strong constraints on network evolution due to the topology of accessible sequence space.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08499

PDF

https://arxiv.org/pdf/1706.08499
Read All
Distributed Coordinate Descent for Generalized Linear Models with Regularization

2017-06-26

Ilya Trofimov, Alexander Genkin

arXiv_CV

arXiv_CV Regularization Sparse Classification
Abstract

Generalized linear model with $L_1$ and $L_2$ regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1611.02101

PDF

https://arxiv.org/pdf/1611.02101
Read All
First detection of methanol towards a post-AGB object, HD101584

2017-06-26

H. Olofsson, W.H.T. Vlemmings, P. Bergman, E.M.L. Humphreys, M. Lindqvist. M. Maercker, L. Nyman, S. Ramstedt, D. Tafoya

arXiv_CV

arXiv_CV Detection
Abstract

The circumstellar environments of objects on the asymptotic giant branch and beyond are rich in molecular species. Nevertheless, methanol has never been detected in such an object, and is therefore often taken as a clear signpost for a young stellar object. However, we report the first detection of CH3OH in a post-AGB object, HD101584, using ALMA. Its emission, together with emissions from CO, SiO, SO, CS, and H2CO, comes from two extreme velocity spots on either side of the object where a high-velocity outflow appears to interact with the surrounding medium. We have derived molecular abundances, and propose that the detected molecular species are the effect of a post-shock chemistry where circumstellar grains play a role. We further provide evidence that HD101584 was a low-mass, M-type AGB star.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08254

PDF

https://arxiv.org/pdf/1706.08254
Read All
English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor

2017-06-26

Yukio Matsumura, Takayuki Sato, Mamoru Komachi

arXiv_CL

arXiv_CL NMT
Abstract

Neural machine translation (NMT) has recently become popular in the field of machine translation. However, NMT suffers from the problem of repeating or missing words in the translation. To address this problem, Tu et al. (2017) proposed an encoder-decoder-reconstructor framework for NMT using back-translation. In this method, they selected the best forward translation model in the same manner as Bahdanau et al. (2015), and then trained a bi-directional translation model as fine-tuning. Their experiments show that it offers significant improvement in BLEU scores in Chinese-English translation task. We confirm that our re-implementation also shows the same tendency and alleviates the problem of repeating and missing words in the translation on a English-Japanese task too. In addition, we evaluate the effectiveness of pre-training by comparing it with a jointly-trained model of forward translation and back-translation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08198

PDF

https://arxiv.org/pdf/1706.08198
Read All
Object Boundary Detection and Classification with Image-level Labels

2017-06-25

Jing Yu Koh, Wojciech Samek, Klaus-Robert Müller, Alexander Binder

arXiv_CV

arXiv_CV Classification Prediction Detection
Abstract

Semantic boundary and edge detection aims at simultaneously detecting object edge pixels in images and assigning class labels to them. Systematic training of predictors for this task requires the labeling of edges in images which is a particularly tedious task. We propose a novel strategy for solving this task, when pixel-level annotations are not available, performing it in an almost zero-shot manner by relying on conventional whole image neural net classifiers that were trained using large bounding boxes. Our method performs the following two steps at test time. Firstly it predicts the class labels by applying the trained whole image network to the test images. Secondly, it computes pixel-wise scores from the obtained predictions by applying backprop gradients as well as recent visualization algorithms such as deconvolution and layer-wise relevance propagation. We show that high pixel-wise scores are indicative for the location of semantic boundaries, which suggests that the semantic boundary problem can be approached without using edge labels during the training phase.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1606.09187

PDF

https://arxiv.org/pdf/1606.09187
Read All
A Semi-supervised Framework for Image Captioning

2017-06-24

Wenhu Chen, Aurelien Lucchi, Thomas Hofmann

arXiv_CV

arXiv_CV Image_Caption Salient Review Attention Caption Embedding
Abstract

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images, which is a much more abundant commodity. We here propose a novel way of using such textual data by artificially generating missing visual information. We evaluate this learning approach on a newly designed model that detects visual concepts present in an image and feed them to a reviewer-decoder architecture with an attention mechanism. Unlike previous approaches that encode visual concepts using word embeddings, we instead suggest using regional image features which capture more intrinsic information. The main benefit of this architecture is that it synthesizes meaningful thought vectors that capture salient image properties and then applies a soft attentive decoder to decode the thought vectors and generate image captions. We evaluate our model on both Microsoft COCO and Flickr30K datasets and demonstrate that this model combined with our semi-supervised learning method can largely improve performance and help the model to generate more accurate and diverse captions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1611.05321

PDF

https://arxiv.org/pdf/1611.05321
Read All

232/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL