We propose an approach to learn spatio-temporal features in videos from intermediate visual representations we call “percepts” using Gated-Recurrent-Unit Recurrent Networks (GRUs).Our method relies on percepts that are extracted from all level of a deep convolutional network trained on the large ImageNet dataset. While high-level percepts contain highly discriminative information, they tend to have a low-spatial resolution. Low-level percepts, on the other hand, preserve a higher spatial resolution from which we can model finer motion patterns. Using low-level percepts can leads to high-dimensionality video representations. To mitigate this effect and control the model number of parameters, we introduce a variant of the GRU model that leverages the convolution operations to enforce sparse connectivity of the model units and share parameters across the input spatial locations. We empirically validate our approach on both Human Action Recognition and Video Captioning tasks. In particular, we achieve results equivalent to state-of-art on the YouTube2Text dataset using a simpler text-decoder model and without extra 3D CNN features.
我们提出了一种方法来学习中间视觉表示视频中的时空特征,我们称之为“感知”,使用门控循环单元递归网络(GRUs)。我们的方法依赖于从深层卷积网络在大的ImageNet数据集上。虽然高级知觉包含高度区别性信息,但它们往往具有低空间分辨率。另一方面,低级别感知能够保持更高的空间分辨率,从而可以对更精细的运动模式进行建模。使用低级别感知可以导致高维视频表示。为了减轻这种影响并控制参数的模型数量,我们引入了一个GRU模型的变体,它利用卷积运算来实现模型单元的稀疏连接,并跨输入空间位置共享参数。我们凭经验验证了我们在人体识别和视频字幕任务上的方法。特别是,我们使用更简单的文本解码器模型,无需额外的3D CNN功能,就可以在YouTube2Text数据集上获得与现有技术相当的效果。
https://arxiv.org/abs/1511.06432
Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation. Our results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks. Furthermore, we have established a new state-of-the-art result in constituent parsing with 93.0 F1. Lastly, we reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context: autoencoder helps less in terms of perplexities but more on BLEU scores compared to skip-thought.
序列学习的序列最近已经成为监督学习的新范式。迄今为止,大多数应用程序只关注一个任务,并没有多少工作探索这个框架的多任务。本文研究了序列到序列模型的三个多任务学习(MTL)设置:(a)一对多设置 - 编码器在几个任务之间共享,如机器翻译和句法分析;(b)多对多 - 一个设置 - 当翻译和图像标题生成的情况下只有解码器可以共享时是有用的,(c)多对多设置 - 其中共享多个编码器和解码器,这是无监督的情况目标和翻译。我们的研究结果表明,对少量解析和图像标题数据进行培训,可以使英语和德语之间的翻译质量提高1.5个BLEU点,超过WMT基准测试强大的单任务基线。此外,我们还用93.0 F1建立了一个新的最新成果解析结果。最后,我们在MTL上下文中揭示了两个无监督的学习目标(autoencoder和skip-thought)的有趣属性:自动编码器在困惑方面帮助较少,但与跳过思考相比,BLEU分数更多。
https://arxiv.org/abs/1511.06114
We report on the exciton propagation in polar (Al,Ga)N/GaN quantum wells over several micrometers and up to room temperature. The key ingredient to achieve this result is the crystalline quality of GaN quantum wells (QWs) grown on GaN template substrate. By comparing microphotoluminescence images of two identical QWs grown on sapphire and on GaN, we reveal the twofold role played by GaN substrate in the transport of excitons. First, the lower threading dislocation densities in such structures yield higher exciton radiative efficiency, thus limiting nonradiative losses of propagating excitons. Second, the absence of the dielectric mismatch between the substrate and the epilayer strongly limits the photon guiding effect in the plane of the structure,making exciton transport easier to distinguish from photon propagation. Our results pave the way towards room-temperature gate-controlled exciton transport in wide-bandgap polar heterostructures.
https://arxiv.org/abs/1603.00191
Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images. In this paper we advocate for explicitly modeling the partial order structure of this hierarchy. Towards this goal, we introduce a general method for learning ordered representations, and show how it can be applied to a variety of tasks involving images and language. We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval.
超词,文本蕴含和图像字幕可以被看作是单词,句子和图像上单个视觉语义层次的特例。在本文中,我们主张明确建模这个层次的部分顺序结构。为了实现这个目标,我们介绍了一种学习有序表示的一般方法,并且展示了如何将它应用于涉及图像和语言的各种任务。我们表明,由此产生的表示相对于当前的上位词预测和图像字幕检索的方法改善了性能。
https://arxiv.org/abs/1511.06361
Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate that our model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.
受到生成模型的最新进展的启发,我们引入了一种从自然语言描述中生成图像的模型。所提出的模型在画布上迭代地绘制补丁,同时参照描述中的相关单词。在Microsoft COCO上进行培训之后,我们将模型与几个基线生成模型进行比较,分析图像生成和检索任务。我们证明,我们的模型产生比其他方法更高质量的样本,并产生具有与数据集中以前看不见的标题相对应的新的场景组成的图像。
https://arxiv.org/abs/1511.02793
In this paper, we investigate the local Gan-Gross-Prasad conjecture for some pair of representations of $U(3)\times U(2)$ involving a non-generic representation. For a pair of generic $L$-parameters of $(U(n),U(n-1))$, it is known that there is a unique pair of representations in their associateed Vogan $L$-packets which produces the unique Bessel model of these $L$-parameters. We showed that this is not ture for some pair of $L$-parameters involving a non-generic one. On the other hand, we give the precise local theta correspondence for $(U(1),U(3))$ not at the level of $L$-parameters but of individual representations in the framework of the local Langlands correspondence for unitary group. As an applicaiton of these results, we prove an analog of Ichino-Ikeda local conjecture for some non-tempered case.
https://arxiv.org/abs/1501.00885
We visually inspected the light curves of 7557 Kepler Objects of Interest (KOIs) to search for single transit events (STEs) possibly due to long-period giant planets. We identified 28 STEs in 24 KOIs, among which 14 events are newly reported in this paper. We estimate the radius and orbital period of the objects causing STEs by fitting the STE light curves simultaneously with the transits of the other planets in the system or with the prior information on the host star density. As a result, we found that STEs in seven of those systems are consistent with Neptune- to Jupiter-sized objects of orbital periods ranging from a few to $\sim$ $20\,\mathrm{yr}$. We also estimate that $\gtrsim20\%$ of the compact multi-transiting systems host cool giant planets with periods $\gtrsim 3\,\mathrm{yr}$ on the basis of their occurrence in the KOIs with multiple candidates, assuming the small mutual inclination between inner and outer planetary orbits.
https://arxiv.org/abs/1602.07848
In this paper, we propose and investigate a novel memory architecture for neural networks called Hierarchical Attentive Memory (HAM). It is based on a binary tree with leaves corresponding to memory cells. This allows HAM to perform memory access in O(log n) complexity, which is a significant improvement over the standard attention mechanism that requires O(n) operations, where n is the size of the memory. We show that an LSTM network augmented with HAM can learn algorithms for problems like merging, sorting or binary searching from pure input-output examples. In particular, it learns to sort n numbers in time O(n log n) and generalizes well to input sequences much longer than the ones seen during the training. We also show that HAM can be trained to act like classic data structures: a stack, a FIFO queue and a priority queue.
https://arxiv.org/abs/1602.03218
An important logistics application of robotics involves manipulators that pick-and-place objects placed in warehouse shelves. A critical aspect of this task corre- sponds to detecting the pose of a known object in the shelf using visual data. Solving this problem can be assisted by the use of an RGB-D sensor, which also provides depth information beyond visual data. Nevertheless, it remains a challenging problem since multiple issues need to be addressed, such as low illumination inside shelves, clutter, texture-less and reflective objects as well as the limitations of depth sensors. This paper provides a new rich data set for advancing the state-of-the-art in RGBD- based 3D object pose estimation, which is focused on the challenges that arise when solving warehouse pick- and-place tasks. The publicly available data set includes thousands of images and corresponding ground truth data for the objects used during the first Amazon Picking Challenge at different poses and clutter conditions. Each image is accompanied with ground truth information to assist in the evaluation of algorithms for object detection. To show the utility of the data set, a recent algorithm for RGBD-based pose estimation is evaluated in this paper. Based on the measured performance of the algorithm on the data set, various modifications and improvements are applied to increase the accuracy of detection. These steps can be easily applied to a variety of different methodologies for object pose detection and improve performance in the domain of warehouse pick-and-place.
https://arxiv.org/abs/1509.01277
We demonstrate the self-assembled growth of vertically aligned GaN nanowire ensembles on a flexible Ti foil by plasma-assisted molecular beam epitaxy. The analysis of single nanowires by transmission electron microscopy reveals that they are single crystalline. Low-temperature photoluminescence spectroscopy demonstrates that, in comparison to standard GaN nanowires grown on Si, the nanowires prepared on the Ti foil exhibit a equivalent crystalline perfection, a higher density of basal-plane stacking faults, but a reduced density of inversion domain boundaries. The room-temperature photoluminescence spectrum of the nanowire ensemble is not influenced or degraded by the bending of the substrate. The present results pave the way for the fabrication of flexible optoelectronic devices based on GaN nanowires on metal foils.
https://arxiv.org/abs/1602.06204
We study the nature of excitons bound to I1 basal plane stacking faults in ensembles of ultrathin GaN nanowires by continuous-wave and time-resolved photoluminescence spectroscopy. These ultrathin nanowires, obtained by the thermal decomposition of spontaneously formed GaN nanowire ensembles, are tapered and have tip diameters down to 6 nm. With decreasing nanowire diameter, we observe a strong blue shift of the transition originating from the radiative decay of stacking fault-bound excitons. Moreover, the radiative lifetime of this transition in the ultrathin nanowires is independent of temperature up to 60 K and significantly longer than that of the corresponding transition in as-grown nanowires. These findings reveal a zero-dimensional character of the confined exciton state and thus demonstrate that I1 stacking faults in ultrathin nanowires act as genuine quantum dots.
https://arxiv.org/abs/1601.01162
High resolution coherent nonlinear optical spectroscopy of an ensemble of red-emitting InGaN quantum dots in GaN nanowires is reported. The data show a pronounced atom-like interaction between resonant laser fields and quantum dot excitons at low temperature that is difficult to observe in the linear absorption spectrum due to inhomogeneous broadening from indium fluctuation effects. We find that the nonlinear signal persists strongly at room temperature. The robust atom-like room temperature response indicates the possibility that this material could serve as the platform for proposed excitonic based applications without the need of cryogenics.
https://arxiv.org/abs/1509.03886
In this work, we study the challenging problem of identifying the irregular status of objects from images in an “open world” setting, that is, distinguishing the irregular status of an object category from its regular status as well as objects from other categories in the absence of “irregular object” training data. To address this problem, we propose a novel approach by inspecting the distribution of the detection scores at multiple image regions based on the detector trained from the “regular object” and “other objects”. The key observation motivating our approach is that for “regular object” images as well as “other objects” images, the region-level scores follow their own essential patterns in terms of both the score values and the spatial distributions while the detection scores obtained from an “irregular object” image tend to break these patterns. To model this distribution, we propose to use Gaussian Processes (GP) to construct two separate generative models for the case of the “regular object” and the “other objects”. More specifically, we design a new covariance function to simultaneously model the detection score at a single region and the score dependencies at multiple regions. We finally demonstrate the superior performance of our method on a large dataset newly proposed in this paper.
https://arxiv.org/abs/1602.04422
We report new Karl G. Jansky Very Large Array (JVLA), 0$”$.5 angular resolution observations of linearly polarized continuum emission at 6.9 mm, towards the Class 0 young stellar object (YSO) NGC1333 IRAS4A. This target source is a collapsing dense molecular core, which was resolved at short wavelengths to have hourglass shaped B-field configuration. We compare these 6.9 mm observations with previous polarization Submillimeter Array (SMA) observations at 0.88 mm, which have comparable angular resolution ($\sim$0$”$7). We found that at the same resolution, the observed polarization position angles at 6.9 mm are slightly deviated from those observed at 0.88 mm. Due to the lower optical depth of the emission at 6.9 mm, and the potential effect of dust grain growth, the new JVLA observations are likely probing B-field alignments in regions interior to those sampled by the previous polarization observations at higher frequencies. Our understanding can be improved by more sensitive observations, and observations for the more extended spatial scales.
https://arxiv.org/abs/1602.04077
We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yields the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.
https://arxiv.org/abs/1508.01774
Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions. Current research suggests using a simple temporal feature pooling strategy to take into account the temporal aspect of video. We demonstrate that this method is not sufficient for gesture recognition, where temporal information is more discriminative compared to general video classification tasks. We explore deep architectures for gesture recognition in video and propose a new end-to-end trainable neural network architecture incorporating temporal convolutions and bidirectional recurrence. Our main contributions are twofold; first, we show that recurrence is crucial for this task; second, we show that adding temporal convolutions leads to significant improvements. We evaluate the different approaches on the Montalbano gesture recognition dataset, where we achieve state-of-the-art results.
最近的研究已经证明了机器翻译,图像字幕和语音识别的递归神经网络的能力。然而,为了捕捉视频中的时间结构,仍然有许多开放的研究问题。目前的研究表明,使用简单的时间特征池策略来考虑视频的时间方面。我们证明,这种方法是不够的手势识别,其中时间信息比一般视频分类任务更具有区别性。我们探索深度架构的视频手势识别,并提出了一个新的端到端可训练的神经网络架构,结合了时间卷积和双向重现。我们的主要贡献是双重的;首先,我们表明复发对于这个任务是至关重要的;其次,我们表明,添加时间卷积导致显着的改善。我们评估Montalbano手势识别数据集的不同方法,在那里我们获得最先进的结果。
https://arxiv.org/abs/1506.01911
We study the three-dimensional deformation field induced by an axial (In,Ga)N segment in a GaN nanowire. Using the finite element method within the framework of linear elasticity theory, we study the dependence of the strain field on the ratio of segment length and nanowire radius. Contrary to intuition, the out-of-plane-component of the elastic strain tensor is found to assume large negative values for a length-to-radius ratio close to one. We show that this unexpected effect is a direct consequence of the deformation of the nanowire at the free sidewalls and the associated large shear strain components. Simulated reciprocal space maps of a single (In,Ga)N/GaN nanowire demonstrate that nanofocus x-ray diffraction is a suitable technique to assess this peculiar strain state experimentally.
https://arxiv.org/abs/1602.03397
We report GaN n++/p++ interband tunnel junctions with repeatable negative differential resistance and low resistance. Reverse and forward tunneling current densities were observed to increase as Si and Mg doping concentrations were increased. Hysteresis-free, bidirectional negative differential resistance was observed at room temperature from these junctions at a forward voltage of ~1.6-2 V. Thermionic PN junctions with tunnel contact to the p-layer exhibited forward current density of 150 kA/cm^2 at 7.6 V, with a low series device resistance of 1 x 10^-5 this http URL^2.
https://arxiv.org/abs/1601.04353
We show that density-dependent velocity saturation in a GaN High Electron Mobility Transistor (HEMT) can be related to the stimulated emission of longitudinal optical (LO) phonons. As the drift velocity of electrons increases, the drift of the Fermi distribution in reciprocal space produces population inversion and gain for the LO phonons. Once this gain reaches a threshold value, the avalanche-like increase of LO emission causes a rapid loss of electron energy and momentum and leads to drift velocity saturation. Our simple model correctly predicts both the general trend of the saturation velocity decreasing with increasing electron density and the values of saturation velocity measured in our experiments.
https://arxiv.org/abs/1602.02417
We report single-photon emission from electrically driven site-controlled InGaN/GaN quantum dots, fabricated from a planar light-emitting diode structure containing a single InGaN quantum well using a top-down approach. The location, dimension, and height of each single-photon-emitting diode are controlled lithographically, providing great flexibility for chip-scale integration.
https://arxiv.org/abs/1602.02325
The Internet of Drones (IoD) is a layered network control architecture designed mainly for coordinating the access of unmanned aerial vehicles to controlled airspace, and providing navigation services between locations referred to as nodes. The IoD provides generic services for various drone applications such as package delivery, traffic surveillance, search and rescue and more. In this paper, we present a conceptual model of how such an architecture can be organized and we specify the features that an IoD system based on our architecture should implement. For doing so, we extract key concepts from three existing large scale networks, namely the air traffic control network, the cellular network, and the Internet and explore their connections to our novel architecture for drone traffic management.
https://arxiv.org/abs/1601.01289
Spatial awareness in mammals is based on an internalized representation of the environment, encoded by large networks of spiking neurons. While such representations can last for a long time, the underlying neuronal network is transient: neuronal cells die every day, synaptic connections appear and disappear, the networks constantly change their architecture due to various forms of synaptic and structural plasticity. How can a network with a dynamic architecture encode a stable map of space? We address this question using a physiological model of a “flickering” neuronal network and demonstrate that it can maintain a robust topological representation of space.
https://arxiv.org/abs/1602.00681
In this work, we propose a quantum neural network named quantum perceptron over a field (QPF). Quantum computers are not yet a reality and the models and algorithms proposed in this work cannot be simulated in actual (or classical) computers. QPF is a direct generalization of a classical perceptron and solves some drawbacks found in previous models of quantum perceptrons. We also present a learning algorithm named Superposition based Architecture Learning algorithm (SAL) that optimizes the neural network weights and architectures. SAL searches for the best architecture in a finite set of neural network architectures with linear time over the number of patterns in the training set. SAL is the first learning algorithm to determine neural network architectures in polynomial time. This speedup is obtained by the use of quantum parallelism and a non-linear quantum operator.
https://arxiv.org/abs/1602.00709
Extracting moving objects from a video sequence and estimating the background of each individual image are fundamental issues in many practical applications such as visual surveillance, intelligent vehicle navigation, and traffic monitoring. Recently, some methods have been proposed to detect moving objects in a video via low-rank approximation and sparse outliers where the background is modeled with the computed low-rank component of the video and the foreground objects are detected as the sparse outliers in the low-rank approximation. All of these existing methods work in a batch manner, preventing them from being applied in real time and long duration tasks. In this paper, we present an online sequential framework, namely contiguous outliers representation via online low-rank approximation (COROLA), to detect moving objects and learn the background model at the same time. We also show that our model can detect moving objects with a moving camera. Our experimental evaluation uses simulated data and real public datasets and demonstrates the superior performance of COROLA in terms of both accuracy and execution time.
https://arxiv.org/abs/1505.03566
Object detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, for a given test domain (image or video), the performance of the detector depends on the domain it was trained on. In this paper, we examine the reasons behind this performance gap. We define and evaluate different domain shift factors: spatial location accuracy, appearance diversity, image quality and aspect distribution. We examine the impact of these factors by comparing performance before and after factoring them out. The results show that all four factors affect the performance of the detectors and their combined effect explains nearly the whole performance gap.
https://arxiv.org/abs/1501.01186
Many images, of natural or man-made scenes often contain Similar but Genuine Objects (SGO). This poses a challenge to existing Copy-Move Forgery Detection (CMFD) methods which match the key points / blocks, solely based on the pair similarity in the scene. To address such issue, we propose a novel CMFD method using Scaled Harris Feature Descriptors (SHFD) that preform consistently well on forged images with SGO. It involves the following main steps: (i) Pyramid scale space and orientation assignment are used to keep scaling and rotation invariance; (ii) Combined features are applied for precise texture description; (iii) Similar features of two points are matched and RANSAC is used to remove the false matches. The experimental results indicate that the proposed algorithm is effective in detecting SGO and copy-move forgery, which compares favorably to existing methods. Our method exhibits high robustness even when an image is operated by geometric transformation and post-processing
https://arxiv.org/abs/1601.07262
In this paper, we propose a deep part-based model (DeePM) for symbiotic object detection and semantic part localization. For this purpose, we annotate semantic parts for all 20 object categories on the PASCAL VOC 2012 dataset, which provides information on object pose, occlusion, viewpoint and functionality. DeePM is a latent graphical model based on the state-of-the-art R-CNN framework, which learns an explicit representation of the object-part configuration with flexible type sharing (e.g., a sideview horse head can be shared by a fully-visible sideview horse and a highly truncated sideview horse with head and neck only). For comparison, we also present an end-to-end Object-Part (OP) R-CNN which learns an implicit feature representation for jointly mapping an image ROI to the object and part bounding boxes. We evaluate the proposed methods for both the object and part detection performance on PASCAL VOC 2012, and show that DeePM consistently outperforms OP R-CNN in detecting objects and parts. In addition, it obtains superior performance to Fast and Faster R-CNNs in object detection.
https://arxiv.org/abs/1511.07131
Neural machine translation has shown very promising results lately. Most NMT models follow the encoder-decoder framework. To make encoder-decoder models more flexible, attention mechanism was introduced to machine translation and also other tasks like speech recognition and image captioning. We observe that the quality of translation by attention-based encoder-decoder can be significantly damaged when the alignment is incorrect. We attribute these problems to the lack of distortion and fertility models. Aiming to resolve these problems, we propose new variations of attention-based encoder-decoder and compare them with other models on machine translation. Our proposed method achieved an improvement of 2 BLEU points over the original attention-based encoder-decoder.
神经机器翻译最近显示出非常有希望的结果。大多数NMT模型遵循编码器 - 解码器框架。为了使编解码器模型更加灵活,机器翻译引入了注意机制,还有其他的任务,如语音识别和图像字幕。我们观察到当对齐不正确时,基于注意力的编码器 - 解码器的翻译质量可能被显着损坏。我们将这些问题归因于缺乏扭曲和生育模式。针对这些问题,提出了基于注意的编解码器的新变体,并将其与其他机器翻译模型进行了比较。我们提出的方法比原来的基于注意力的编码器 - 解码器提高了2个BLEU点。
https://arxiv.org/abs/1601.03317
Object proposals for detecting moving or static video objects need to address issues such as speed, memory complexity and temporal consistency. We propose an efficient Video Object Proposal (VOP) generation method and show its efficacy in learning a better video object detector. A deep-learning based video object detector learned using the proposed VOP achieves state-of-the-art detection performance on the Youtube-Objects dataset. We further propose a clustering of VOPs which can efficiently be used for detecting objects in video in a streaming fashion. As opposed to applying per-frame convolutional neural network (CNN) based object detection, our proposed method called Objects in Video Enabler thRough LAbel Propagation (OVERLAP) needs to classify only a small fraction of all candidate proposals in every video frame through streaming clustering of object proposals and class-label propagation. Source code will be made available soon.
https://arxiv.org/abs/1601.05447
The optical emission of non-polar GaN/AlN quantum dots has been investigated. The presence of stacking faults inside these quantum dots is evidenced in the dependence of the photoluminescence with temperature and excitation power. A theoretical model for the electronic structure and optical properties of non-polar quantum dots, taking into account their realistic shapes, is presented which predicts a substantial reduction of the internal electric field but a persisting quantum confined Stark effect, comparable to that of polar GaN/AlN quantum dots. Modeling the effect of a 3 monolayer stacking fault inside the quantum dot, which acts as zinc-blende inclusion into the wurtzite matrix, results in an additional 30 % reduction of the internal electric field and gives a better account of the observed optical features.
https://arxiv.org/abs/1601.04942
Wireless Gigabit (WiGig) access points (APs) using 60 GHz unlicensed frequency band are considered as key enablers for future Gbps WLANs. Due to its short range transmission with high susceptibility to path blocking, a multiple number of WiGig APs should be installed to fully cover a typical target environment. However, using autonomously operated WiGig APs with IEEE 802.11ad DCF, the exhaustive search analog beamforming and the maximum received power based autonomous users association prevent the establishment of optimal WiGig concurrent links that maximize the total system throughput in random access scenarios. In this paper, we formulate the problem of WiGig concurrent transmissions in random access scenarios as an optimization problem, then we propose a Wi-Fi/WiGig coordination architecture to solve it. The proposed coordinated Wi-Fi/WiGig WLAN is based on a tight coordination between the 5 GHz (Wi-Fi) and the 60 GHz (WiGig) unlicensed frequency bands. By which, the wide coverage Wi-Fi band controls the establishment of the WiGig concurrent links. Statistical learning using Wi-Fi fingerprinting is used for estimating the best candidate AP and its best beam identification (ID) for establishing the WiGig concurrent link without making any interference to the existing WiGig data links.
https://arxiv.org/abs/1601.04797
We introduce a new type of graphical model that we call a “memory factor network” (MFN). We show how to use MFNs to model the structure inherent in many types of data sets. We also introduce an associated message-passing style algorithm called “proactive message passing”’ (PMP) that performs inference on MFNs. PMP comes with convergence guarantees and is efficient in comparison to competing algorithms such as variants of belief propagation. We specialize MFNs and PMP to a number of distinct types of data (discrete, continuous, labelled) and inference problems (interpolation, hypothesis testing), provide examples, and discuss approaches for efficient implementation.
https://arxiv.org/abs/1601.04667
The STEREO experiment will search for a sterile neutrino by measuring the anti-neutrino energy spectrum as a function of the distance from the source, the ILL nuclear reactor. A dedicated electronic system, hosted in a single microTCA crate, was designed for this experiment. It performs triggering in two stages with various selectable conditions, processing and readout via UDP/IPBUS of 68 photomultiplier signals continuously digitized at 250 MSPS. Additionally, for detector performance monitoring, the electronics allow on-line calibration by driving LED synchronously with the data acquisition. This paper describes the electronics requirements, architecture and the performances achieved.
https://arxiv.org/abs/1510.08238
Microstructure reconstruction and compression techniques are designed to find a microstructure with desired properties. While the microstructure reconstruction searches for a microstructure with prescribed statistical properties, the microstructure compression focuses on efficient representation of material morphology for a purpose of multiscale modelling. Successful application of those techniques, nevertheless, requires proper understanding of underlying statistical descriptors quantifying material morphology. In this paper we focus on the lineal path function designed to capture namely short-range effects and phase connectedness, which can be hardly handled by the commonly used two-point probability function. The usage of the lineal path function is, however, significantly limited by huge computational requirements. So as to examine the properties of the lineal path function within the computationally exhaustive compression and reconstruction processes, we start with the acceleration of the lineal path evaluation, namely by porting part of its code to the graphics processing unit using the CUDA (Compute Unified Device Architecture) programming environment. This allows us to present a unique comparison of the entire lineal path function with the commonly used rough approximation based on the Monte Carlo and/or sampling template. Moreover, the accelerated version of the lineal path function is then compared with the two-point probability function within the compression and reconstruction of two-phase morphologies. Their significant features are thoroughly discussed and illustrated on a set of artificial periodic as well as real-world random microstructures.
https://arxiv.org/abs/1601.04359
This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks with Long Short-Term Memory (LSTM) cells. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detects the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms it for web document retrieval task.
https://arxiv.org/abs/1502.06922
This paper presents a novel ontology-driven software engineering approach for the development of industrial robotics control software. It introduces the ReApp architecture that synthesizes model-driven engineering with semantic technologies to facilitate the development and reuse of ROS-based components and applications. In ReApp, we show how different ontological classification systems for hardware, software, and capabilities help developers in discovering suitable software components for their tasks and in applying them correctly. The proposed model-driven tooling enables developers to work at higher abstraction levels and fosters automatic code generation. It is underpinned by ontologies to minimize discontinuities in the development workflow, with an integrated development environment presenting a seamless interface to the user. First results show the viability and synergy of the selected approach when searching for or developing software with reuse in mind.
https://arxiv.org/abs/1601.03998
Object detection systems based on the deep convolutional neural network (CNN) have recently made ground- breaking advances on several object detection benchmarks. While the features learned by these high-capacity neural networks are discriminative for categorization, inaccurate localization is still a major source of error for detection. Building upon high-capacity CNN architectures, we address the localization problem by 1) using a search algorithm based on Bayesian optimization that sequentially proposes candidate regions for an object bounding box, and 2) training the CNN with a structured loss that explicitly penalizes the localization inaccuracy. In experiments, we demonstrated that each of the proposed methods improves the detection performance over the baseline method on PASCAL VOC 2007 and 2012 datasets. Furthermore, two methods are complementary and significantly outperform the previous state-of-the-art when combined.
https://arxiv.org/abs/1504.03293
Training artificial neural networks requires a tedious empirical evaluation to determine a suitable neural network architecture. To avoid this empirical process several techniques have been proposed to automatise the architecture selection process. In this paper, we propose a method to perform parameter and architecture selection for a quantum weightless neural network (qWNN). The architecture selection is performed through the learning procedure of a qWNN with a learning algorithm that uses the principle of quantum superposition and a non-linear quantum operator. The main advantage of the proposed method is that it performs a global search in the space of qWNN architecture and parameters rather than a local search.
https://arxiv.org/abs/1601.03277
Resolved debris disc images can exhibit a range of radial and azimuthal structures, including gaps and rings, which can result from planetary companions shaping the disc by their gravitational influence. Currently there are no tools available to determine the architecture of potential companions from disc observations. Recent work by Rodigas et al. (2014) presents how one can estimate the maximum mass and minimum semi major axis of a hidden planet empirically from the width of the disc in scattered light. In this work, we use the predictions of Rodigas et al. applied to two debris discs HD 202628 and HD 207129. We aim to test if the predicted orbits of the planets can explain the features of their debris disc, such as eccentricity and sharp inner edge. We first run dynamical simulations using the predicted planetary parameters of Rodigas et al., and then numerically search for better parameters. Using a modified N-body code including radiation forces, we perform simulations over a broad range of planet parameters and compare synthetics images from our simulations to the observations. We find that the observational features of HD 202628 can be reproduced with a planet five times smaller than expected, located 30 AU beyond the predicted value, while the best match for HD 207129 is for a planet located 5-10 AU beyond the predicted location with a smaller eccentricity. We conclude that the predictions of Rodigas et al. provide a good starting point but should be complemented by numerical simulations.
https://arxiv.org/abs/1601.02272
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method exhibits significant improvements in answering questions such as “what color,” where it is necessary to evaluate a specific location, and “what room,” where it selectively identifies informative image regions. Our model is tested on the VQA dataset which is the largest human-annotated visual question answering dataset to our knowledge.
我们提出一种通过选择与基于文本的查询相关的图像区域来学习回答视觉问题的方法。我们的方法在回答诸如“什么颜色”这样的需要评估特定位置的问题方面显示出显着的改进,在选择性地识别信息图像区域的“什么房间”方面显示出显着的改进。我们的模型在VQA数据集上进行了测试,这是我们所知道的最大的带有人类注释的视觉问题解答数据集。
https://arxiv.org/abs/1511.07394
Since manual object detection is very inaccurate and time consuming, some automatic object detection tools have been developed in recent years. At the moment, there is no image analysis software available which provides an automatic, objective assessment of 3D foci which is generally applicable. Complications arise from discrete foci which are very close or even come in contact to other foci, moreover they are of variable sizes and show variable signal-to-noise, and must be analyzed fully in 3D. Therefore we introduce the 3D-OSCOS (3D-Object Segmentation and Colocalization Analysis based on Spatial statistics) algorithm which is implemented as a user-friendly toolbox for interactive detection of 3D objects and visualization of labeled images.
https://arxiv.org/abs/1601.01216
Text recognition in natural scene is a challenging problem due to the many factors affecting text appearance. In this paper, we presents a method that directly transcribes scene text images to text without needing of sophisticated character segmentation. We leverage recent advances of deep neural networks to model the appearance of scene text images with temporal dynamics. Specifically, we integrates convolutional neural network (CNN) and recurrent neural network (RNN) which is motivated by observing the complementary modeling capabilities of the two models. The main contribution of this work is investigating how temporal memory helps in an segmentation free fashion for this specific problem. By using long short-term memory (LSTM) blocks as hidden units, our model can retain long-term memory compared with HMMs which only maintain short-term state dependences. We conduct experiments on Street View House Number dataset containing highly variable number images. The results demonstrate the superiority of the proposed method over traditional HMM based methods.
https://arxiv.org/abs/1601.01100
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ‘attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.
https://arxiv.org/abs/1506.01497
In this paper, we propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback. The proposed FSMN is a standard fully-connected feedforward neural network equipped with some learnable memory blocks in its hidden layers. The memory blocks use a tapped-delay line structure to encode the long context information into a fixed-size representation as short-term memory mechanism. We have evaluated the proposed FSMNs in several standard benchmark tasks, including speech recognition and language modelling. Experimental results have shown FSMNs significantly outperform the conventional recurrent neural networks (RNN), including LSTMs, in modeling sequential signals like speech or language. Moreover, FSMNs can be learned much more reliably and faster than RNNs or LSTMs due to the inherent non-recurrent model structure.
https://arxiv.org/abs/1512.08301
In this paper we describe a novel framework and algorithms for discovering image patch patterns from a large corpus of weakly supervised image-caption pairs generated from news events. Current pattern mining techniques attempt to find patterns that are representative and discriminative, we stipulate that our discovered patterns must also be recognizable by humans and preferably with meaningful names. We propose a new multimodal pattern mining approach that leverages the descriptive captions often accompanying news images to learn semantically meaningful image patch patterns. The mutltimodal patterns are then named using words mined from the associated image captions for each pattern. A novel evaluation framework is provided that demonstrates our patterns are 26.2% more semantically meaningful than those discovered by the state of the art vision only pipeline, and that we can provide tags for the discovered images patches with 54.5% accuracy with no direct supervision. Our methods also discover named patterns beyond those covered by the existing image datasets like ImageNet. To the best of our knowledge this is the first algorithm developed to automatically mine image patch patterns that have strong semantic meaning specific to high-level news events, and then evaluate these patterns based on that criteria.
在本文中,我们描述了一个新的框架和算法,用于从新闻事件中产生的弱监督图像字幕对的大型语料库中发现图像块模式。目前的模式挖掘技术试图找到具有代表性和区分性的模式,我们规定我们发现的模式也必须能被人类识别,最好是有意义的名称。我们提出了一种新的多模式模式挖掘方法,利用通常伴随着新闻图像的描述性标题来学习语义上有意义的图像补丁模式。然后使用从每个模式的相关图像标题中挖掘出的单词来命名多重模式。提供了一种新颖的评估框架,证明我们的模式比仅由现有视觉管线发现的语义有26.2%的语义意义,并且可以为发现的图像块提供标签,精确度达到54.5%,无需直接监督。我们的方法还发现了已有的图像数据集(比如ImageNet)所涵盖的命名模式。据我们所知,这是第一个自动挖掘具有特定于高级新闻事件的强语义含义的图像补丁模式的算法,然后根据该标准评估这些模式。
https://arxiv.org/abs/1601.00022
Gallium nitride nanowires were grown on c-plane, r-plane and m-plane sapphire substrates in a showerhead metalorganic chemical vapor deposition system using nickel catalyst with trimethylgallium and ammonia as precursors. We studied the influence of carrier gas, growth temperature, reactor pressure, reactant flow rates and substrate orientation in order to obtain thin nanowires. The nanowires grew along the <10-11> and <10-10> axes depending on the substrate orientation. These nanowires were further characterized using x-ray diffraction, electron microscopy, photoluminescence and Raman spectroscopy.
https://arxiv.org/abs/1509.01507
Due to the coarse granularity of data accesses and the heavy use of latches, indices in the B-tree family are not efficient for in-memory databases, especially in the context of today’s multi-core architecture. In this paper, we present PI, a Parallel in-memory skip list based Index that lends itself naturally to the parallel and concurrent environment, particularly with non-uniform memory access. In PI, incoming queries are collected, and disjointly distributed among multiple threads for processing to avoid the use of latches. For each query, PI traverses the index in a Breadth-First-Search (BFS) manner to find the list node with the matching key, exploiting SIMD processing to speed up the search process. In order for query processing to be latch-free, PI employs a light-weight communication protocol that enables threads to re-distribute the query workload among themselves such that each list node that will be modified as a result of query processing will be accessed by exactly one thread. We conducted extensive experiments, and the results show that PI can be up to three times as fast as the Masstree, a state-of-the-art B-tree based index.
https://arxiv.org/abs/1601.00159
This paper presents a framework designed for the multi-object detection purposes and adjusted for the application of product search on the market shelves. The framework uses a single feedback loop and a pattern resizing mechanism to demonstrate the top effectiveness of the state-of-the-art local features. A high detection rate with a low false detection chance can be achieved with use of only one pattern per object and no manual parameters adjustments. The method incorporates well known local features and a basic matching process to create a reliable voting space. Further steps comprise of metric transformations, graphical vote space representation, two-phase vote aggregation process and a cascade of verifying filters.
https://arxiv.org/abs/1512.08648
This paper studies the bond valence method (BVM) and its application in the non-isovalent semiconductor alloy (GaN)${\rm{1-x}}$(ZnO)${\rm{x}}$. Particular attention is paid to the role of short-range order (SRO). A physical interpretation based on atomic orbital interaction is proposed and examined by density-functional theory (DFT) calculations. Combining BVM with Monte-Carlo simulations and a DFT-based cluster expansion model, bond-length distributions and bond-angle variations are predicted. The correlation between bond valence and bond stiffness is also revealed. Finally the concept of bond valence is extended into the modelling of an atomistic potential.
https://arxiv.org/abs/1509.04678
Vehicles are becoming more and more connected, this opens up a larger attack surface which not only affects the passengers inside vehicles, but also people around them. These vulnerabilities exist because modern systems are built on the comparatively less secure and old CAN bus framework which lacks even basic authentication. Since a new protocol can only help future vehicles and not older vehicles, our approach tries to solve the issue as a data analytics problem and use machine learning techniques to secure cars. We develop a Hidden Markov Model to detect anomalous states from real data collected from vehicles. Using this model, while a vehicle is in operation, we are able to detect and issue alerts. Our model could be integrated as a plug-n-play device in all new and old cars.
车辆变得越来越紧密,这开辟了更大的攻击面,不仅影响车内乘客,而且影响周围的人。存在这些漏洞是因为现代系统建立在相对不太安全且旧的CAN总线框架上,甚至缺乏基本认证。由于新协议只能帮助未来的车辆而不是旧车辆,我们的方法试图将问题解决为数据分析问题,并使用机器学习技术来保护汽车。我们开发了一种隐马尔可夫模型,用于从车辆收集的实际数据中检测异常状态。使用此模型,在车辆运行时,我们能够检测并发出警报。我们的模型可以作为即插即用设备集成到所有新旧车中。
http://arxiv.org/abs/1512.08048