GaN and ZnO microcavities have been grown on patterned silicon substrate. Thanks to a common platform these microcavities share similar photonic properties with large quality factors and low photonic disorder which gives the possibility to determine the optimal spot diameter and to realize a complete comparative phase diagram study. Both systems have been investigated under the same experimental condition. Experimental results are well reproduced by simulation using Boltzmann equations. Lower polariton lasing threshold has been measured at low temperature in the ZnO microcavity as expected due to a larger Rabi splitting. However the threshold is strongly impacted by LO phonons through phonon-assisted polariton relaxation. We observe and discuss this effect as a function of temperature and detuning. Finally the polariton lasing threshold at room temperature is quite similar in both microcavities. This study highlights polariton relaxation mechanism and their importance for threshold optimization.
https://arxiv.org/abs/1512.05985
This paper presents a restricted visual Turing test (VTT) for story-line based deep understanding in long-term and multi-camera captured videos. Given a set of videos of a scene (such as a multi-room office, a garden, and a parking lot.) and a sequence of story-line based queries, the task is to provide answers either simply in binary form “true/false” (to a polar query) or in an accurate natural language description (to a non-polar query). Queries, polar or non-polar, consist of view-based queries which can be answered from a particular camera view and scene-centered queries which involves joint inference across different cameras. The story lines are collected to cover spatial, temporal and causal understanding of input videos. The data and queries distinguish our VTT from recently proposed visual question answering in images and video captioning. A vision system is proposed to perform joint video and query parsing which integrates different vision modules, a knowledge base and a query engine. The system provides unified interfaces for different modules so that individual modules can be reconfigured to test a new method. We provide a benchmark dataset and a toolkit for ontology guided story-line query generation which consists of about 93.5 hours videos captured in four different locations and 3,426 queries split into 127 story lines. We also provide a baseline implementation and result analyses.
本文提出了一种基于故事情节的视觉图灵测试(VTT),其基于对长期和多摄像机捕捉视频的深入理解。给定一组场景的视频(例如多房间办公室,花园和停车场)和一系列基于故事情节的查询,其任务是提供简单的二进制形式的答案“真实/假“(对极性查询)或准确的自然语言描述(对非极性查询)。极性或非极性的查询由基于查看的查询组成,这些查询可以从特定的相机视图和以涉及不同相机之间的联合推断的以场景为中心的查询来回答。收集故事情节以涵盖输入视频的空间,时间和因果关系。数据和查询将我们的VTT与最近提出的图像和视频字幕的视觉问题区分开来。提出了一个视觉系统,将不同的视觉模块,知识库和查询引擎集成在一起,实现视频和查询的联合解析。系统为不同的模块提供统一的接口,使得单独的模块可以重新配置以测试新的方法。我们提供了基于本体的故事线查询生成的基准数据集和工具包,其中包括在四个不同位置捕获的约93.5小时的视频和分成127个故事线的3,426个查询。我们还提供基线实施和结果分析。
https://arxiv.org/abs/1512.01715
We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. .
我们描述了视觉问题回答的一个非常简单的词袋基线。该基线连接来自图像的问题和CNN特征的单词特征以预测答案。当对具有挑战性的VQA数据集[2]进行评估时,它显示了与使用递归神经网络的许多近期方法相当的性能。为了探索训练模型的优势和劣势,我们还提供了一个交互式的网络演示和开源代码。 。
https://arxiv.org/abs/1512.02167
The formation of network structure is mainly influenced by an individual node’s activity and its memory, where activity can usually be interpreted as the individual inherent property and memory can be represented by the interaction strength between nodes. In our study, we define the activity through the appearance pattern in the time-aggregated network representation, and quantify the memory through the contact pattern of empirical temporal networks. To address the role of activity and memory in epidemics on time-varying networks, we propose temporal-pattern coarsening of activity-driven growing networks with memory. In particular, we focus on the relation between time-scale coarsening and spreading dynamics in the context of dynamic scaling and finite-size scaling. Finally, we discuss the universality issue of spreading dynamics on time-varying networks for various memory-causality tests.
https://arxiv.org/abs/1508.03545
Partially observed control problems are a challenging aspect of reinforcement learning. We extend two related, model-free algorithms for continuous control – deterministic policy gradient and stochastic value gradient – to solve partially observed domains using recurrent neural networks trained with backpropagation through time. We demonstrate that this approach, coupled with long-short term memory is able to solve a variety of physical control problems exhibiting an assortment of memory requirements. These include the short-term integration of information from noisy sensors and the identification of system parameters, as well as long-term memory problems that require preserving information over many time steps. We also demonstrate success on a combined exploration and memory problem in the form of a simplified version of the well-known Morris water maze task. Finally, we show that our approach can deal with high-dimensional observations by learning directly from pixels. We find that recurrent deterministic and stochastic policies are able to learn similarly good solutions to these tasks, including the water maze where the agent must learn effective search strategies.
https://arxiv.org/abs/1512.04455
It is well known that contextual and multi-scale representations are important for accurate visual recognition. In this paper we present the Inside-Outside Net (ION), an object detector that exploits information both inside and outside the region of interest. Contextual information outside the region of interest is integrated using spatial recurrent neural networks. Inside, we use skip pooling to extract information at multiple scales and levels of abstraction. Through extensive experiments we evaluate the design space and provide readers with an overview of what tricks of the trade are important. ION improves state-of-the-art on PASCAL VOC 2012 object detection from 73.9% to 76.4% mAP. On the new and more challenging MS COCO dataset, we improve state-of-art-the from 19.7% to 33.1% mAP. In the 2015 MS COCO Detection Challenge, our ION model won the Best Student Entry and finished 3rd place overall. As intuition suggests, our detection results provide strong evidence that context and multi-scale representations improve small object detection.
https://arxiv.org/abs/1512.04143
The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One such style is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions. Of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.
最近在图像识别和语言建模方面的进展使得图像内容的自动描述成为现实。然而,当前系统中缺少书面描述的程式化的非实际方面。一种这样的风格是情绪描述,这在日常交流中是司空见惯的,影响着决策和人际关系。我们设计一个系统来描述一个情绪的图像,并提出一个模型,自动生成积极或消极的情绪字幕。我们提出了一种新颖的带有字级正则化的开关递归神经网络,它只能使用包含情感的2000多个训练语句产生情感图像字幕。我们使用不同的自动和众包指标来评估字幕。我们的模型在图像字幕的通用质量指标方面比较有利。在84.6%的情况下,生成的正面字幕被认为至少与事实字幕一样具有描述性。在这些积极的字幕中,88%被群众来源的工人确认为具有适当的情绪。
https://arxiv.org/abs/1510.01431
In this paper we consider the problem of continuously discovering image contents by actively asking image based questions and subsequently answering the questions being asked. The key components include a Visual Question Generation (VQG) module and a Visual Question Answering module, in which Recurrent Neural Networks (RNN) and Convolutional Neural Network (CNN) are used. Given a dataset that contains images, questions and their answers, both modules are trained at the same time, with the difference being VQG uses the images as input and the corresponding questions as output, while VQA uses images and questions as input and the corresponding answers as output. We evaluate the self talk process subjectively using Amazon Mechanical Turk, which show effectiveness of the proposed method.
在本文中,我们考虑通过主动提出基于图像的问题来不断发现图像内容的问题,并随后回答所提出的问题。关键部分包括视觉问题生成(VQG)模块和视觉问答模块,其中使用递归神经网络(RNN)和卷积神经网络(CNN)。给定一个包含图像,问题及其答案的数据集,两个模块同时被训练,不同之处在于VQG使用图像作为输入并将相应的问题作为输出,而VQA使用图像和问题作为输入以及相应的答案作为输出。我们使用Amazon Mechanical Turk主观地评估自我谈话过程,其显示了所提出的方法的有效性。
https://arxiv.org/abs/1512.03460
We study the robustness of the bucket brigade quantum random access memory model introduced by Giovannetti, Lloyd, and Maccone [Phys. Rev. Lett. 100, 160501 (2008)]. Due to a result of Regev and Schiff [ICALP ‘08 pp. 773], we show that for a class of error models the error rate per gate in the bucket brigade quantum memory has to be of order $o(2^{-n/2})$ (where $N=2^n$ is the size of the memory) whenever the memory is used as an oracle for the quantum searching problem. We conjecture that this is the case for any realistic error model that will be encountered in practice, and that for algorithms with super-polynomially many oracle queries the error rate must be super-polynomially small, which further motivates the need for quantum error correction. By contrast, for algorithms such as matrix inversion [Phys. Rev. Lett. 103, 150502 (2009)] or quantum machine learning [Phys. Rev. Lett. 113, 130503 (2014)] that only require a polynomial number of queries, the error rate only needs to be polynomially small and quantum error correction may not be required. We introduce a circuit model for the quantum bucket brigade architecture and argue that quantum error correction for the circuit causes the quantum bucket brigade architecture to lose its primary advantage of a small number of “active” gates, since all components have to be actively error corrected.
https://arxiv.org/abs/1502.03450
Object detection is a fundamental task in many computer vision applications, therefore the importance of evaluating the quality of object detection is well acknowledged in this domain. This process gives insight into the capabilities of methods in handling environmental changes. In this paper, a new method for object detection is introduced that combines the Selective Search and EdgeBoxes. We tested these three methods under environmental variations. Our experiments demonstrate the outperformance of the combination method under illumination and view point variations.
https://arxiv.org/abs/1512.03424
This paper studies short-range order (SRO) in the semiconductor alloy (GaN)${1-x}$(ZnO)$_x$. Monte Carlo simulations performed on a density functional theory (DFT)-based cluster expansion model show that the heterovalent alloys exhibit strong SRO because of the energetic preference for the valence-matched nearest-neighbor Ga-N and Zn-O pairs. To represent the SRO-related structural correlations, we introduce the concept of Special Quasi-ordered Structure (SQoS). Subsequent DFT calculations reveal dramatic influence of SRO on the atomic, electronic and vibrational properties of the (GaN)${1-x}$(ZnO)$_x$ alloy. Due to the enhanced statistical presence of the energetically unfavored Zn-N bonds with the strong Zn3$d$-N2$p$ repulsion, the disordered alloys exhibit much larger lattice bowing and band-gap reduction than those of the short-range ordered alloys. Inclusion of lattice vibrations stabilizes the disordered alloy.
https://arxiv.org/abs/1505.06329
In this paper, we describe the system for generating textual descriptions of short video clips using recurrent neural networks (RNN), which we used while participating in the Large Scale Movie Description Challenge 2015 in ICCV 2015. Our work builds on static image captioning systems with RNN based language models and extends this framework to videos utilizing both static image features and video-specific features. In addition, we study the usefulness of visual content classifiers as a source of additional information for caption generation. With experimental results we show that utilizing keyframe based features, dense trajectory video features and content classifier outputs together gives better performance than any one of them individually.
在本文中,我们描述了使用递归神经网络(RNN)生成短视频短片的文本描述的系统,我们在参加2015年ICCV大型电影描述挑战赛时使用了该系统。我们的工作基于静态图像字幕系统基于RNN的语言模型,并将此框架扩展为利用静态图像特征和视频特定特征的视频。另外,我们研究视觉内容分类器作为字幕生成附加信息的来源的有用性。通过实验结果,我们发现利用基于关键帧的特征,密集的轨迹视频特征和内容分类器输出一起给出比其中任何一个更好的性能。
https://arxiv.org/abs/1512.02949
Current high-quality object detection approaches use the scheme of salience-based object proposal methods followed by post-classification using deep convolutional features. This spurred recent research in improving object proposal methods. However, domain agnostic proposal generation has the principal drawback that the proposals come unranked or with very weak ranking, making it hard to trade-off quality for running time. This raises the more fundamental question of whether high-quality proposal generation requires careful engineering or can be derived just from data alone. We demonstrate that learning-based proposal methods can effectively match the performance of hand-engineered methods while allowing for very efficient runtime-quality trade-offs. Using the multi-scale convolutional MultiBox (MSC-MultiBox) approach, we substantially advance the state-of-the-art on the ILSVRC 2014 detection challenge data set, with $0.5$ mAP for a single model and $0.52$ mAP for an ensemble of two models. MSC-Multibox significantly improves the proposal quality over its predecessor MultiBox~method: AP increases from $0.42$ to $0.53$ for the ILSVRC detection challenge. Finally, we demonstrate improved bounding-box recall compared to Multiscale Combinatorial Grouping with less proposals on the Microsoft-COCO data set.
https://arxiv.org/abs/1412.1441
In existing works that learn representation for object detection, the relationship between a candidate window and the ground truth bounding box of an object is simplified by thresholding their overlap. This paper shows information loss in this simplification and picks up the relative location/size information discarded by thresholding. We propose a representation learning pipeline to use the relationship as supervision for improving the learned representation in object detection. Such relationship is not limited to object of the target category, but also includes surrounding objects of other categories. We show that image regions with multiple contexts and multiple rotations are effective in capturing such relationship during the representation learning process and in handling the semantic and visual variation caused by different window-object configurations. Experimental results show that the representation learned by our approach can improve the object detection accuracy by 6.4% in mean average precision (mAP) on ILSVRC2014. On the challenging ILSVRC2014 test dataset, 48.6% mAP is achieved by our single model and it is the best among published results. On PASCAL VOC, it outperforms the state-of-the-art result of Fast RCNN by 3.3% in absolute mAP.
https://arxiv.org/abs/1512.02736
Currently, potential threats of artificial intelligence (AI) to human have triggered a large controversy in society, behind which, the nature of the issue is whether the artificial intelligence (AI) system can be evaluated quantitatively. This article analyzes and evaluates the challenges that the AI development level is facing, and proposes that the evaluation methods for the human intelligence test and the AI system are not uniform; and the key reason for which is that none of the models can uniformly describe the AI system and the beings like human. Aiming at this problem, a standard intelligent system model is established in this study to describe the AI system and the beings like human uniformly. Based on the model, the article makes an abstract mathematical description, and builds the standard intelligent machine mathematical model; expands the Von Neumann architecture and proposes the Liufeng - Shiyong architecture; gives the definition of the artificial intelligence IQ, and establishes the artificial intelligence scale and the evaluation method; conduct the test on 50 search engines and three human subjects at different ages across the world, and finally obtains the ranking of the absolute IQ and deviation IQ ranking for artificial intelligence IQ 2014.
https://arxiv.org/abs/1512.00977
The optimal design of a fault-tolerant quantum computer involves finding an appropriate balance between the burden of large-scale integration of noisy components and the load of improving the reliability of hardware technology. This balance can be evaluated by quantitatively modeling the execution of quantum logic operations on a realistic quantum hardware containing limited computational resources. In this work, we report a complete performance simulation software tool capable of (1) searching the hardware design space by varying resource architecture and technology parameters, (2) synthesizing and scheduling fault-tolerant quantum algorithm within the hardware constraints, (3) quantifying the performance metrics such as the execution time and the failure probability of the algorithm, and (4) analyzing the breakdown of these metrics to highlight the performance bottlenecks and visualizing resource utilization to evaluate the adequacy of the chosen design. Using this tool we investigate a vast design space for implementing key building blocks of Shor’s algorithm to factor a 1,024-bit number with a baseline budget of 1.5 million qubits. We show that a trapped-ion quantum computer designed with twice as many qubits and one-tenth of the baseline infidelity of the communication channel can factor a 2,048-bit integer in less than five months.
https://arxiv.org/abs/1512.00796
In this paper we present a detailed analysis of the structural, electronic, and optical properties of an $m$-plane (In,Ga)N/GaN quantum well structure grown by metal organic vapor phase epitaxy. The sample has been structurally characterized by x-ray diffraction, scanning transmission electron microscopy, and 3D atom probe tomography. The optical properties of the sample have been studied by photoluminescence (PL), time-resolved PL spectroscopy, and polarized PL excitation spectroscopy. The PL spectrum consisted of a very broad PL line with a high degree of optical linear polarization. To understand the optical properties we have performed atomistic tight-binding calculations, and based on our initial atom probe tomography data, the model includes the effects of strain and built-in field variations arising from random alloy fluctuations. Furthermore, we included Coulomb effects in the calculations. Our microscopic theoretical description reveals strong hole wave function localization effects due to random alloy fluctuations, resulting in strong variations in ground state energies and consequently the corresponding transition energies. This is consistent with the experimentally observed broad PL peak. Furthermore, when including Coulomb contributions in the calculations we find strong exciton localization effects which explain the form of the PL decay transients. Additionally, the theoretical results confirm the experimentally observed high degree of optical linear polarization. Overall, the theoretical data are in very good agreement with the experimental findings, highlighting the strong impact of the microscopic alloy structure on the optoelectronic properties of these systems.
https://arxiv.org/abs/1509.07099
Automatically describing videos has ever been fascinating. In this work, we attempt to describe videos from a specific domain - broadcast videos of lawn tennis matches. Given a video shot from a tennis match, we intend to generate a textual commentary similar to what a human expert would write on a sports website. Unlike many recent works that focus on generating short captions, we are interested in generating semantically richer descriptions. This demands a detailed low-level analysis of the video content, specially the actions and interactions among subjects. We address this by limiting our domain to the game of lawn tennis. Rich descriptions are generated by leveraging a large corpus of human created descriptions harvested from Internet. We evaluate our method on a newly created tennis video data set. Extensive analysis demonstrate that our approach addresses both semantic correctness as well as readability aspects involved in the task.
自动描述视频令人着迷。在这项工作中,我们试图描述来自特定领域的视频 - 播放草地网球比赛的视频。鉴于网球比赛的视频,我们打算产生一个类似于人类专家在体育网站上写的文字评论。与许多最近着重于生成短字幕的作品不同,我们有兴趣生成语义更丰富的描述。这要求对视频内容进行详细的低级分析,特别是对象之间的操作和交互。我们通过将我们的域名限制在草地网球比赛来解决这个问题。通过利用从互联网收集的大量人类创建的描述来生成丰富的描述。我们在一个新创建的网球视频数据集上评估我们的方法。广泛的分析表明,我们的方法解决了任务涉及的语义正确性和可读性方面。
https://arxiv.org/abs/1511.08522
In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object detection and uses the spatial relationships between objects and between objects and scenes. Our models are able to capture precise spatial relationships between the context and the object of interest, and make effective use of the appearance of the contextual region. On the newly released COCO dataset, our models provide relative improvements of up to 5% over CNN-based state-of-the-art detectors, with the gains concentrated on hard cases such as small objects (10% relative improvement).
https://arxiv.org/abs/1511.08177
We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. It can also be seen as an extension of RNNsearch to the case where multiple computational steps (hops) are performed per output symbol. The flexibility of the model allows us to apply it to tasks as diverse as (synthetic) question answering and to language modeling. For the former our approach is competitive with Memory Networks, but with less supervision. For the latter, on the Penn TreeBank and Text8 datasets our approach demonstrates comparable performance to RNNs and LSTMs. In both cases we show that the key concept of multiple computational hops yields improved results.
https://arxiv.org/abs/1503.08895
Spoken language translation (SLT) is becoming more important in the increasingly globalized world, both from a social and economic point of view. It is one of the major challenges for automatic speech recognition (ASR) and machine translation (MT), driving intense research activities in these areas. While past research in SLT, due to technology limitations, dealt mostly with speech recorded under controlled conditions, today’s major challenge is the translation of spoken language as it can be found in real life. Considered application scenarios range from portable translators for tourists, lectures and presentations translation, to broadcast news and shows with live captioning. We would like to present PJIIT’s experiences in the SLT gained from the Eu-Bridge 7th framework project and the U-Star consortium activities for the Polish/English language pair. Presented research concentrates on ASR adaptation for Polish (state-of-the-art acoustic models: DBN-BLSTM training, Kaldi: LDA+MLLT+SAT+MMI), language modeling for ASR & MT (text normalization, RNN-based LMs, n-gram model domain interpolation) and statistical translation techniques (hierarchical models, factored translation models, automatic casing and punctuation, comparable and bilingual corpora preparation). While results for the well-defined domains (phrases for travelers, parliament speeches, medical documentation, movie subtitling) are very encouraging, less defined domains (presentation, lectures) still form a challenge. Our progress in the IWSLT TED task (MT only) will be presented, as well as current progress in the Polish ASR.
口头语言翻译(SLT)在日益全球化的世界变得越来越重要,无论从社会和经济的角度来看。这是自动语音识别(ASR)和机器翻译(MT)面临的主要挑战之一,推动了这些领域的深入研究。虽然过去在SLT方面的研究,由于技术的限制,主要是在受控条件下记录语音,今天的主要挑战是在现实生活中可以找到的口语翻译。考虑应用场景的范围从便携式翻译为游客,讲座和演示文稿翻译,广播新闻和节目与现场字幕。我们想介绍一下PJIIT在欧洲桥梁第七框架项目中获得的SLT的经验以及波兰语/英语对的U-Star联盟活动。提出的研究主要集中在ASR和MT(文本规范化,基于RNN的LM,语言规范化,语言规范化,语言规范化, n-gram模型域插值)和统计翻译技术(分层模型,因子翻译模型,自动套管和标点符号,可比较和双语语料库准备)。尽管定义明确的领域(旅行者的短语,议会的演讲,医疗文件,电影字幕)的结果是非常令人鼓舞的,但定义较少的领域(演讲,讲座)仍然是一个挑战。我们将在IWSLT TED任务(仅限MT)中介绍我们的进展,以及波兰ASR的最新进展。
https://arxiv.org/abs/1511.07788
We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language. The dense captioning task generalizes object detection when the descriptions consist of a single word, and Image Captioning when one predicted region covers the full image. To address the localization and description task jointly we propose a Fully Convolutional Localization Network (FCLN) architecture that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with a single round of optimization. The architecture is composed of a Convolutional Network, a novel dense localization layer, and Recurrent Neural Network language model that generates the label sequences. We evaluate our network on the Visual Genome dataset, which comprises 94,000 images and 4,100,000 region-grounded captions. We observe both speed and accuracy improvements over baselines based on current state of the art approaches in both generation and retrieval settings.
我们介绍密集字幕任务,这需要一个计算机视觉系统来定位和描述自然语言图像中的显着区域。密集字幕任务在描述由单个词组成的情况下将对象检测概括为:当一个预测区域覆盖整个图像时,图像字幕提供。为了共同解决本地化和描述任务,我们提出了一种全卷积本地化网络(FCLN)架构,其以单一高效的正向传递来处理图像,不需要外部区域建议,并且可以通过单个轮次端对端训练的优化。该体系结构由一个卷积网络,一个新颖的密集定位层和生成标签序列的递归神经网络语言模型组成。我们在Visual Genome数据集上评估我们的网络,其中包含94,000个图像和4,100,000个基于地区的字幕。在生成和检索设置中,我们基于当前最先进的方法观察基线的速度和准确度。
https://arxiv.org/abs/1511.07571
This paper investigates the simultaneous wireless information and power transfer (SWIPT) in cooperative relay networks, where a relay harvests energy from the radio frequency (RF) signals transmitted by a source and then uses the harvested energy to assist the information transmission from the source to its destination. Both source and relay transmissions use rateless code, which allows the destination to employ any of the two information receiving strategies, i.e., the mutual information accumulation (IA) and the energy accumulation (EA). The SWIPT-enabled relay employs three different SWIPT receiver architectures, the ideal receiver and two practical receivers (i.e., the power splitting (PS) and the time switch (TS) receivers). Accordingly, three relaying protocols, namely, ideal protocol, PS protocol and TS protocol, are presented. In order to explore the system performance limits with these three protocols, optimization problems are formulated to maximize their achievable information rates. For the ideal protocol, explicit expressions of the optimal solutions are derived. For the PS protocol, a linear-search algorithm is designed to solve the non-convex problems. For the TS protocol, two solving methods are presented. Numerical experiments are carried out to validate our analysis and algorithms, which also show that, with the same SWIPT receiver, the IA-based system outperforms the EA-based system, while with the same information receiving strategy, PS protocol outperforms TS protocol. Moreover, compared with conventional non-SWIPT and non-rateless-coded systems, the proposed protocols exhibit considerable performance gains, especially in relatively low signal-to-noise ratio (SNR) regime. Besides, the effects of the source-destination direct link and the relay position on system performance are also discussed, which provides insights on SWIPT-enabled relay systems.
https://arxiv.org/abs/1511.07556
Question Answering (QA) is fundamental to natural language processing in that most nlp problems can be phrased as QA (Kumar et al., 2015). Current weakly supervised memory network models that have been proposed so far struggle at answering questions that involve relations among multiple entities (such as facebook’s bAbi qa5-three-arg-relations in (Weston et al., 2015)). To address this problem of learning multi-argument multi-hop semantic relations for the purpose of QA, we propose a method that combines the jointly learned long-term read-write memory and attentive inference components of end-to-end memory networks (MemN2N) (Sukhbaatar et al., 2015) with distributed sentence vector representations encoded by a Skip-Thought model (Kiros et al., 2015). This choice to append Skip-Thought Vectors to the existing MemN2N framework is motivated by the fact that Skip-Thought Vectors have been shown to accurately model multi-argument semantic relations (Kiros et al., 2015).
https://arxiv.org/abs/1511.06420
Applications such as web search and social networking have been moving from centralized to decentralized cloud architectures to improve their scalability. MapReduce, a programming framework for processing large amounts of data using thousands of machines in a single cloud, also needs to be scaled out to multiple clouds to adapt to this evolution. The challenge of building a multi-cloud distributed architecture is substantial. Notwithstanding, the ability to deal with the new types of faults introduced by such setting, such as the outage of a whole datacenter or an arbitrary fault caused by a malicious cloud insider, increases the endeavor considerably. In this paper we propose Medusa, a platform that allows MapReduce computations to scale out to multiple clouds and tolerate several types of faults. Our solution fulfills four objectives. First, it is transparent to the user, who writes her typical MapReduce application without modification. Second, it does not require any modification to the widely used Hadoop framework. Third, the proposed system goes well beyond the fault-tolerance offered by MapReduce to tolerate arbitrary faults, cloud outages, and even malicious faults caused by corrupt cloud insiders. Fourth, it achieves this increased level of fault tolerance at reasonable cost. We performed an extensive experimental evaluation in the ExoGENI testbed, demonstrating that our solution significantly reduces execution time when compared to traditional methods that achieve the same level of resilience.
https://arxiv.org/abs/1511.07185
Here we report on the effect of rare earth Gd-doping on the magnetic properties and magnetotransport of GaN two-dimensional electron gasses (2DEGs). Samples are grown by plasma-assisted molecular beam epitaxy and consist of AlN/GaN heterostructures where Gd is delta-doped within a polarization-induced 2DEG. Ferromagnetism is observed in these Gd-doped 2DEGs with a Curie temperature above room temperature and an anisotropic spontaneous magnetization preferring an out-of-plane (c-axis) orientation. At magnetic fields up to 50 kOe, the magnetization remains smaller for in-plane configuration than for out-of-plane, which is indicative of exchange coupled spins locked along the polar c-axis. The sample with the lowest Gd concentration (2.3 $\times$ $10^{14}$ cm$^{-2}$) exhibits a saturation magnetization of 41.1 $\mu_B/Gd^{3+}$ at 5 K revealing that the Gd ion spins (7 ${\mu}_B$) alone do not account for the magnetization. Surprisingly, control samples grown without any Gd display inconsistent magnetic properties; in some control samples weak ferromagnetism is observed and in others paramagnetism. The ferromagnetic 2DEGs do not exhibit the anomalous Hall effect; the Hall resistance varies non-linearly with the magnetic field, but does not track the magnetization indicating the lack of coupling between the ferromagnetic phase and the conduction band electrons within the 2DEG.
https://arxiv.org/abs/1502.03478
We consider the visual sentiment task of mapping an image to an adjective noun pair (ANP) such as “cute baby”. To capture the two-factor structure of our ANP semantics as well as to overcome annotation noise and ambiguity, we propose a novel factorized CNN model which learns separate representations for adjectives and nouns but optimizes the classification performance over their product. Our experiments on the publicly available SentiBank dataset show that our model significantly outperforms not only independent ANP classifiers on unseen ANPs and on retrieving images of novel ANPs, but also image captioning models which capture word semantics from co-occurrence of natural text; the latter turn out to be surprisingly poor at capturing the sentiment evoked by pure visual experience. That is, our factorized ANP CNN not only trains better from noisy labels, generalizes better to new images, but can also expands the ANP vocabulary on its own.
我们考虑将图像映射到形容词名词对(ANP)(如“可爱的宝宝”)的视觉情感任务。为了捕捉ANP语义的双因素结构,克服注意噪声和模糊性,我们提出了一种新颖的因式分解CNN模型,它学习了形容词和名词的单独表示,但优化了其产品的分类性能。我们在公开可用的SentiBank数据集上进行的实验表明,我们的模型不仅能够显着地胜过看不见的ANP上的独立ANP分类器和检索新型ANPs的图像,而且能够捕捉自然文本同时出现的单词语义的图像字幕模型;后者在捕捉纯粹的视觉体验所引发的情绪方面竟然很差。也就是说,我们分解的ANP CNN不仅可以从嘈杂的标签中训练得更好,而且可以更好地推广到新的图像,而且还可以自行扩展ANP词汇。
https://arxiv.org/abs/1511.06838
Over the last few years deep learning methods have emerged as one of the most prominent approaches for video analysis. However, so far their most successful applications have been in the area of video classification and detection, i.e., problems involving the prediction of a single class label or a handful of output variables per video. Furthermore, while deep networks are commonly recognized as the best models to use in these domains, there is a widespread perception that in order to yield successful results they often require time-consuming architecture search, manual tweaking of parameters and computationally intensive pre-processing or post-processing methods. In this paper we challenge these views by presenting a deep 3D convolutional architecture trained end to end to perform voxel-level prediction, i.e., to output a variable at every voxel of the video. Most importantly, we show that the same exact architecture can be used to achieve competitive results on three widely different voxel-prediction tasks: video semantic segmentation, optical flow estimation, and video coloring. The three networks learned on these problems are trained from raw video without any form of preprocessing and their outputs do not require post-processing to achieve outstanding performance. Thus, they offer an efficient alternative to traditional and much more computationally expensive methods in these video domains.
https://arxiv.org/abs/1511.06681
In the coming decades, the discovery of the first truly Earth-like exoplanets is anticipated. The characterisation of those planets will play a vital role in determining which are chosen as targets for the search for life beyond the Solar system. One of the many variables that will be considered in that characterisation and selection process is the nature of the potential climatic variability of the exoEarths in question. In our own Solar system, the Earth’s long-term climate is driven by several factors - including the modifying influence of life on our atmosphere, and the temporal evolution of Solar luminosity. The gravitational influence of the other planets in our Solar system add an extra complication - driving the Milankovitch cycles that are thought to have caused the on-going series of glacial and interglacial periods that have dominated Earth’s climate for the past few million years. Here, we present the results of a large suite of dynamical simulations that investigate the influence of the giant planet Jupiter on the Earth’s Milankovitch cycles. If Jupiter was located on a different orbit, we find that the long-term variability of Earth’s orbit would be significantly different. Our results illustrate how small differences in the architecture of planetary systems can result in marked changes in the potential habitability of the planets therein, and are an important first step in developing a means to characterise the nature of climate variability on planets beyond our Solar system.
https://arxiv.org/abs/1511.06043
Visual Question Answering (VQA) emerges as one of the most fascinating topics in computer vision recently. Many state of the art methods naively use holistic visual features with language features into a Long Short-Term Memory (LSTM) module, neglecting the sophisticated interaction between them. This coarse modeling also blocks the possibilities of exploring finer-grained local features that contribute to the question answering dynamically over time. This paper addresses this fundamental problem by directly modeling the temporal dynamics between language and all possible local image patches. When traversing the question words sequentially, our end-to-end approach explicitly fuses the features associated to the words and the ones available at multiple local patches in an attention mechanism, and further combines the fused information to generate dynamic messages, which we call episode. We then feed the episodes to a standard question answering module together with the contextual visual information and linguistic information. Motivated by recent practices in deep learning, we use auxiliary loss functions during training to improve the performance. Our experiments on two latest public datasets suggest that our method has a superior performance. Notably, on the DARQUAR dataset we advanced the state of the art by 6$\%$, and we also evaluated our approach on the most recent MSCOCO-VQA dataset.
视觉问答(VQA)最近成为计算机视觉中最引人入胜的话题之一。许多先进的方法天真地使用具有语言特征的整体视觉特征到长期短期记忆(LSTM)模块中,忽略它们之间的复杂的相互作用。这种粗糙的建模也阻止了探索更细粒度的局部特征的可能性,这些特征随着时间的推移而动态回答问题。本文通过直接模拟语言和所有可能的局部图像块之间的时间动态来解决这个基本问题。当依次遍历问题单词时,我们的端到端方法明确地融合了单词和多个本地补丁在注意机制中可用的特征,并进一步将融合的信息进行组合以生成动态消息,我们称之为情节。然后,我们将这些事件与上下文的视觉信息和语言信息一起提供给一个标准的问题回答模块。受近期深度学习实践的启发,我们在培训期间使用辅助损失函数来提高性能。我们对两个最新公开数据集的实验表明,我们的方法具有优越的性能。值得注意的是,在DARQUAR数据集中,我们将先进的技术水平提高了6 $ \%$,并且我们也在最新的MSCOCO-VQA数据集上评估了我们的方法。
https://arxiv.org/abs/1511.05676
Modern applications and progress in deep learning research have created renewed interest for generative models of text and of images. However, even today it is unclear what objective functions one should use to train and evaluate these models. In this paper we present two contributions. Firstly, we present a critique of scheduled sampling, a state-of-the-art training method that contributed to the winning entry to the MSCOCO image captioning benchmark in 2015. Here we show that despite this impressive empirical performance, the objective function underlying scheduled sampling is improper and leads to an inconsistent learning algorithm. Secondly, we revisit the problems that scheduled sampling was meant to address, and present an alternative interpretation. We argue that maximum likelihood is an inappropriate training objective when the end-goal is to generate natural-looking samples. We go on to derive an ideal objective function to use in this situation instead. We introduce a generalisation of adversarial training, and show how such method can interpolate between maximum likelihood training and our ideal training objective. To our knowledge this is the first theoretical analysis that explains why adversarial training tends to produce samples with higher perceived quality.
现代应用和深度学习研究的进展已经引起人们对文本和图像生成模式的兴趣。然而,即使在今天,还不清楚用什么目标函数来训练和评估这些模型。在本文中,我们提出两个贡献。首先,我们对预定抽样进行了批评,这是一种最先进的训练方法,有助于2015年获得MSCOCO图像字幕基准。在这里我们表明,尽管这个令人印象深刻的经验表现,预定的目标函数抽样不当,导致学习算法不一致。其次,我们重新审视计划抽样所要解决的问题,并提出另一种解释。我们认为,当最终目标是生成自然样本时,最大可能性是不合适的培训目标。我们反过来在这种情况下继续推导出理想的目标函数。我们介绍了对抗训练的概括,并展示了这种方法如何在最大似然训练和理想训练目标之间进行插值。据我们所知,这是第一个理论分析,解释了为什么对抗训练倾向于产生质量较高的样本。
https://arxiv.org/abs/1511.05101
A high threshold voltage enhancement-mode GaN HEMT with p-type doped buffer is discussed and simulated. Analytical expressions are derived to explain the role of buffer capacitance in designing and enhancing threshold voltage. Simulations of the proposed device with p-type buffer show threshold voltages above 5 V, and a positive shift in threshold voltage as the oxide capacitance is reduced, thus enabling threshold voltage tunability over an unprecedented range for GaN-based HEMTs. The electric field profiles, breakdown performance, on-resistance and delay tradeoffs in the proposed pGaN back HEMT device are also discussed.
https://arxiv.org/abs/1511.04438
Understanding the luminescence of GaN doped with erbium (Er) requires a detailed knowledge of the interaction between the rare-earth dopant and the nitride host, including intrinsic defects and other impurities that may be present in the host material. We address this problem through a first-principles hybrid density functional study of the structure, energetics, and transition levels of the Er impurity and its complexes with N and Ga vacancies, substitutional C and O impurities, and H interstitials in wurtzite GaN. We find that, in the interior of the material, Er${\rm Ga}$ is the dominant Er$^{3+}$ center with a formation energy of 1.55 eV; Er${\rm Ga}$-$V_{\rm N}$ possesses a deep donor level at 0.61 eV which can assist in the transfer of energy to the 4$f$-electron core. Multiple optically active Er$^{3+}$ centers are possible in Er-doped GaN.
https://arxiv.org/abs/1509.03908
In this paper, we present a model which takes as input a corpus of images with relevant spoken captions and finds a correspondence between the two modalities. We employ a pair of convolutional neural networks to model visual objects and speech signals at the word level, and tie the networks together with an embedding and alignment model which learns a joint semantic space over both modalities. We evaluate our model using image search and annotation tasks on the Flickr8k dataset, which we augmented by collecting a corpus of 40,000 spoken captions using Amazon Mechanical Turk.
在本文中,我们提出了一个模型,其中输入的图像与相关的说明字幕的语料库,并找到两种模式之间的对应关系。我们使用一对卷积神经网络来在单词级别对视觉对象和语音信号进行建模,并将网络与嵌入和对齐模型结合在一起,在两种模式上学习联合语义空间。我们使用Flickr8k数据集上的图像搜索和注释任务来评估我们的模型,我们使用Amazon Mechanical Turk收集了40000个口头字幕的语料库。
https://arxiv.org/abs/1511.03690
Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNets), have achieved overwhelming accuracy with fast processing speed for image classification. Incorporating temporal structure with deep ConvNets for video representation becomes a fundamental problem for video content analysis. In this paper, we propose a new approach, namely Hierarchical Recurrent Neural Encoder (HRNE), to exploit temporal information of videos. Compared to recent video representation inference approaches, this paper makes the following three contributions. First, our HRNE is able to efficiently exploit video temporal structure in a longer range by reducing the length of input information flow, and compositing multiple consecutive inputs at a higher level. Second, computation operations are significantly lessened while attaining more non-linearity. Third, HRNE is able to uncover temporal transitions between frame chunks with different granularities, i.e., it can model the temporal transitions between frames as well as the transitions between segments. We apply the new method to video captioning where temporal information plays a crucial role. Experiments demonstrate that our method outperforms the state-of-the-art on video captioning benchmarks. Notably, even using a single network with only RGB stream as input, HRNE beats all the recent systems which combine multiple inputs, such as RGB ConvNet plus 3D ConvNet.
最近,深度学习方法,特别是深度卷积神经网络(ConvNets)已经实现了压倒性的准确性,图像分类处理速度快。将时间结构与深度ConvNets结合以用于视频表示成为视频内容分析的基本问题。在本文中,我们提出了一种新的方法,即分层递归神经编码器(HRNE),以利用视频的时间信息。与最近的视频表示推理方法相比,本文做出以下三点贡献。首先,我们的HRNE能够通过缩短输入信息流的长度,在更高的层次上合成多个连续的输入,从而有效利用更长时间范围内的视频时间结构。其次,计算操作明显减少,同时获得更多的非线性。第三,HRNE能够揭示不同粒度的帧块之间的时间转换,即它可以模拟帧之间的时间转换以及段之间的转换。我们将这种新的方法应用于时间信息起关键作用的视频字幕中。实验表明,我们的方法胜过了视频字幕基准的最新技术。值得注意的是,即使使用一个只有RGB流作为输入的单一网络,HRNE也能击败所有结合了多种输入的系统,比如RGB ConvNet plus 3D ConvNet。
https://arxiv.org/abs/1511.03476
We investigated the origin of the high reverse leakage current in light emitting diodes (LEDs) based on (In,Ga)N/GaN nanowire (NW) ensembles grown by molecular beam epitaxy on Si substrates. To this end, capacitance deep level transient spectroscopy (DLTS) and temperature-dependent current-voltage (I-V) measurements were performed on a fully processed NW-LED. The DLTS measurements reveal the presence of two distinct electron traps with high concentrations in the depletion region of the p-i-n junction. These band gap states are located at energies of $570\pm20$ and $840\pm30$ meV below the conduction band minimum. The physical origin of these deep level states is discussed. The temperature-dependent I-V characteristics, acquired between 83 and 403 K, show that different conduction mechanisms cause the observed leakage current. On the basis of all these results, we developed a quantitative physical model for charge transport in the reverse bias regime. By taking into account the mutual interaction of variable range hopping and electron emission from Coulombic trap states, with the latter being described by phonon-assisted tunnelling and the Poole-Frenkel effect, we can model the experimental I-V curves in the entire range of temperatures with a consistent set of parameters. Our model should be applicable to planar GaN-based LEDs as well. Furthermore, possible approaches to decrease the leakage current in NW-LEDs are proposed.
https://arxiv.org/abs/1511.04044
In this paper we propose the construction of linguistic descriptions of images. This is achieved through the extraction of scene description graphs (SDGs) from visual scenes using an automatically constructed knowledge base. SDGs are constructed using both vision and reasoning. Specifically, commonsense reasoning is applied on (a) detections obtained from existing perception methods on given images, (b) a “commonsense” knowledge base constructed using natural language processing of image annotations and (c) lexical ontological knowledge from resources such as WordNet. Amazon Mechanical Turk(AMT)-based evaluations on Flickr8k, Flickr30k and MS-COCO datasets show that in most cases, sentences auto-constructed from SDGs obtained by our method give a more relevant and thorough description of an image than a recent state-of-the-art image caption based approach. Our Image-Sentence Alignment Evaluation results are also comparable to that of the recent state-of-the art approaches.
在本文中,我们提出了图像的语言描述的建设。这是通过使用自动构建的知识库从视觉场景中提取场景描述图(SDG)来实现的。可持续发展目标是用视觉和推理来构建的。具体来说,常识推理应用于:(a)从给定图像的现有感知方法获得的检测;(b)使用图像注释的自然语言处理构建的“常识”知识库;以及(c)来自诸如WordNet的资源的词汇本体知识。以Flickr8k,Flickr30k和MS-COCO数据集为基础的基于亚马逊Mechanical Turk(AMT)的评估表明,在大多数情况下,由我们的方法获得的SDG自动构建的句子给出了比最近的状态更加相关和彻底的图像描述基于最先进的图像标题的方法。我们的图像句子对齐评估结果也与最近的最新技术方法相媲美。
https://arxiv.org/abs/1511.03292
A novel approach for the fusion of heterogeneous object detection methods is proposed. In order to effectively integrate the outputs of multiple detectors, the level of ambiguity in each individual detection score is estimated using the precision/recall relationship of the corresponding detector. The main contribution of the proposed work is a novel fusion method, called Dynamic Belief Fusion (DBF), which dynamically assigns probabilities to hypotheses (target, non-target, intermediate state (target or non-target)) based on confidence levels in the detection results conditioned on the prior performance of individual detectors. In DBF, a joint basic probability assignment, optimally fusing information from all detectors, is determined by the Dempster’s combination rule, and is easily reduced to a single fused detection score. Experiments on ARL and PASCAL VOC 07 datasets demonstrate that the detection accuracy of DBF is considerably greater than conventional fusion approaches as well as individual detectors used for the fusion.
https://arxiv.org/abs/1511.03183
In case of salient subject recognition, computer algorithms have been heavily relied on scanning of images from top-left to bottom-right systematically and apply brute-force when attempting to locate objects of interest. Thus, the process turns out to be quite time consuming. Here a novel approach and a simple solution to the above problem is discussed. In this paper, we implement an approach to object manipulation and detection through segmentation map, which would help to desaturate or, in other words, wash out the background of the image. Evaluation for the performance is carried out using the Jaccard index against the well-known Ground-truth target box technique.
https://arxiv.org/abs/1511.02999
On a minute-to-minute basis people undergo numerous fluid interactions with objects that barely register on a conscious level. Recent neuroscientific research demonstrates that humans have a fixed size prior for salient objects. This suggests that a salient object in 3D undergoes a consistent transformation such that people’s visual system perceives it with an approximately fixed size. This finding indicates that there exists a consistent egocentric object prior that can be characterized by shape, size, depth, and location in the first person view. In this paper, we develop an EgoObject Representation, which encodes these characteristics by incorporating shape, location, size and depth features from an egocentric RGBD image. We empirically show that this representation can accurately characterize the egocentric object prior by testing it on an egocentric RGBD dataset for three tasks: the 3D saliency detection, future saliency prediction, and interaction classification. This representation is evaluated on our new Egocentric RGBD Saliency dataset that includes various activities such as cooking, dining, and shopping. By using our EgoObject representation, we outperform previously proposed models for saliency detection (relative 30% improvement for 3D saliency detection task) on our dataset. Additionally, we demonstrate that this representation allows us to predict future salient objects based on the gaze cue and classify people’s interactions with objects.
https://arxiv.org/abs/1511.02682
Convolutional networks trained on large supervised dataset produce visual features which form the basis for the state-of-the-art in many computer-vision problems. Further improvements of these visual features will likely require even larger manually labeled data sets, which severely limits the pace at which progress can be made. In this paper, we explore the potential of leveraging massive, weakly-labeled image collections for learning good visual features. We train convolutional networks on a dataset of 100 million Flickr photos and captions, and show that these networks produce features that perform well in a range of vision problems. We also show that the networks appropriately capture word similarity, and learn correspondences between different languages.
在大型监督数据集上训练的卷积网络产生视觉特征,这构成了许多计算机视觉问题的最新技术的基础。这些视觉特征的进一步改进可能需要更大的手动标记的数据集,这严重限制了可以取得进展的速度。在本文中,我们探讨利用大量的弱标记图像集合学习良好视觉特征的潜力。我们在1亿个Flickr照片和标题的数据集上训练卷积网络,并显示这些网络产生的功能在一系列视觉问题中表现良好。我们还表明,网络适当捕捉单词相似性,并学习不同语言之间的对应关系。
https://arxiv.org/abs/1511.02251
In this paper we show an application of the Minimum Spanning Tree (MST) clustering method to the high-energy gamma-ray sky observed at energies higher than 10 GeV in 6.3 years by the Fermi-Large Area Telescope. We report the detection of 19 new high-energy gamma-ray clusters with good selection parameters whose centroid coordinates were found matching the positions of known BL Lac objects in the 5th Edition of the Roma-BZCAT catalogue. A brief summary of the properties of these sources is presented.
https://arxiv.org/abs/1505.02507
Present work investigates the structural, electronic and magnetic properties of wurtzite (0001) GaN nanowires (NWs) doped with Gd and point defects by employing the GGA+U approximation. We find that Ga vacancies (VGa) exhibit lower formation energy compared to N vacancies (VN). Further stabilization of point defects occurs due to the presence of Gd and ambient ferromagnetism (FM) can be stabilized in the NW by the additional positive charge induced by the VGa. Electronic structure analysis shows that VGa introduces additional levels in the band gap leading to ferromagnetic coupling due to the hybridization of the p states of the Ga and N atoms with the Gd d and f states. Ferromagnetic exchange coupling energy of 76.4meV is obtained in presence of Gd-VGa complex, and hence the FM is largely determined by the cation vacancy-rare earth complex defects in GaN NWs. On the other hand, the VN, which introduce additional electron carriers, does not assist in increasing the ferromagnetic exchange energy.
https://arxiv.org/abs/1511.01991
We utilize the recently demonstrated orders of magnitude enhancement of extremely nondegenerate two-photon absorption in direct-gap semiconductor photodiodes to perform scanned imaging of 3D structures using IR femtosecond illumination pulses (1.6 um and 4.93 um) gated on the GaN detector by sub-gap, femtosecond pulses. While transverse resolution is limited by the usual imaging criteria, the longitudinal or depth resolution can be less than a wavelength, dependent on the pulsewidths in this nonlinear interaction within the detector element. The imaging system can accommodate a wide range of wavelengths in the mid-IR and near-IR without the need to modify the detection and imaging systems.
https://arxiv.org/abs/1510.08967
Object detection is an important task in computer vision and learning systems. Multistage particle windows (MPW), proposed by Gualdi et al., is an algorithm of fast and accurate object detection. By sampling particle windows from a proposal distribution (PD), MPW avoids exhaustively scanning the image. Despite its success, it is unknown how to determine the number of stages and the number of particle windows in each stage. Moreover, it has to generate too many particle windows in the initialization step and it redraws unnecessary too many particle windows around object-like regions. In this paper, we attempt to solve the problems of MPW. An important fact we used is that there is large probability for a randomly generated particle window not to contain the object because the object is a sparse event relevant to the huge number of candidate windows. Therefore, we design the proposal distribution so as to efficiently reject the huge number of non-object windows. Specifically, we propose the concepts of rejection, acceptance, and ambiguity windows and regions. This contrasts to MPW which utilizes only on region of support. The PD of MPW is acceptance-oriented whereas the PD of our method (called iPW) is rejection-oriented. Experimental results on human and face detection demonstrate the efficiency and effectiveness of the iPW algorithm. The source code is publicly accessible.
https://arxiv.org/abs/1508.05581
Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states, for example those employed in studies of molecular kinetics, and models with spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. We therefore use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach is valuable for calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN, a Python script that users can easily apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs.
https://arxiv.org/abs/1508.01578
The exploration of planetary surfaces is predominately unmanned, calling for a landing vehicle and an autonomous and/or teleoperated rover. Artificial intelligence and machine learning techniques can be leveraged for better mission planning. This paper describes the coordinated use of both global navigation and metaheuristic optimization algorithms to plan the safe, efficient missions. The aim is to determine the least-cost combination of a safe landing zone (LZ) and global path plan, where avoiding terrain hazards for the lander and rover minimizes cost. Computer vision methods were used to identify surface craters, mounds, and rocks as obstacles. Multiple search methods were investigated for the rover global path plan. Several combinatorial optimization algorithms were implemented to select the shortest distance path as the preferred mission plan. Simulations were run for a sample Google Lunar X Prize mission. The result of this study is an optimization scheme that path plans with the A* search method, and uses simulated annealing to select ideal LZ-path- goal combination for the mission. Simulation results show the methods are effective in minimizing the risk of hazards and increasing efficiency. This paper is specific to a lunar mission, but the resulting architecture may be applied to a large variety of planetary missions and rovers.
https://arxiv.org/abs/1511.00195
In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what. Answering these questions entails discovering the mapping from image A to image B and then extending the mapping to image C and searching for the image D such that the relation from A to B holds for C to D. We pose this problem as learning an embedding that encourages pairs of analogous images with similar transformations to be close together using convolutional neural networks with a quadruple Siamese architecture. We introduce a dataset of visual analogy questions in natural images, and show first results of its kind on solving analogy questions on natural images.
https://arxiv.org/abs/1510.08973
The accurate absolute surface energies of (0001)/(000-1) surfaces of wurtzite structures are crucial in determining the thin film growth mode of important energy materials. However, the surface energies still remain to be solved due to the intrinsic difficulty of calculating dangling bond energy of asymmetrically bonded surface atoms. In this study, we used a pseudo-hydrogen passivation method to estimate the dangling bond energy and calculate the polar surfaces of ZnO and GaN. The calculations were based on the pseudo chemical potentials obtained from a set of tetrahedral clusters or simple pseudo-molecules, using density functional theory approaches. And the surface energies of (0001)/(000-1) surfaces of wurtzite ZnO and GaN we obtained showed relatively high self-consistencies. A wedge structure calculation with a new bottom surface passivation scheme of group I and group VII elements was also proposed and performed to show converged absolute surface energy of wurtzite ZnO polar surfaces, and the result were also compared with the above method. These calculations and comparisons may provide important insights to crystal growths of the above materials, thereby leading to significant performance enhancements of semiconductor devices.
https://arxiv.org/abs/1510.08961
We introduce and show preliminary results of a fast randomized method that finds a set of K paths lying in distinct homotopy classes. We frame the path planning task as a graph search problem, where the navigation graph is based on a Voronoi diagram. The search is biased by a cost function derived from the social force model that is used to generate and select the paths. We compare our method to Yen’s algorithm, and empirically show that our approach is faster to find a subset of homotopy classes. Furthermore our approach computes a set of more diverse paths with respect to the baseline while obtaining a negligible loss in path quality.
http://arxiv.org/abs/1510.08233