White light emitting diodes based on III-nitride InGaN/GaN quantum wells currently offer the highest overall efficiency for solid state lighting applications. Although current phosphor-converted white LEDs have high electricity-to-light conversion efficiencies, it has been recently pointed out that the full potential of solid state lighting could be exploited only by color mixing approaches without employing phosphor-based wavelength conversion. Such an approach requires direct emitting LEDs of different colors, in particular in the green/yellow range ov the visible spectrum. This range, however, suffers from a systematic drop in efficiency, known as the “green gap”, whose physical origin has not been understood completely so far. In this work we show by atomistic simulations that a consistent part of the “green gap” in c-plane InGaN/GaN based light emitting diodes may be attributed to a decrease in the radiative recombination coefficient with increasing Indium content due to random fluctuations of the Indium concentration naturally present in any InGaN alloy.
https://arxiv.org/abs/1510.07831
Upcoming many core processors are expected to employ a distributed memory architecture similar to currently available supercomputers, but parallel pattern mining algorithms amenable to the architecture are not comprehensively studied. We present a novel closed pattern mining algorithm with a well-engineered communication protocol, and generalize it to find statistically significant patterns from personal genome data. For distributing communication evenly, it employs global load balancing with multiple stacks distributed on a set of cores organized as a hypercube with random edges. Our algorithm achieved up to 1175-fold speedup by using 1200 cores for solving a problem with 11,914 items and 697 transactions, while the naive approach of separating the search space failed completely.
https://arxiv.org/abs/1510.07787
The FM-index is a celebrated compressed data structure for full-text pattern searching. After the first wave of interest in its theoretical developments, we can observe a surge of interest in practical FM-index variants in the last few years. These enhancements are often related to a bit-vector representation, augmented with an efficient rank-handling data structure. In this work, we propose a new, cache-friendly, implementation of the rank primitive and advocate for a very simple architecture of the FM-index, which trades compression ratio for speed. Experimental results show that our variants are 2–3 times faster than the fastest known ones, for the price of using typically 1.5–5 times more space.
https://arxiv.org/abs/1506.04896
One of fundamental issues for security robots is to detect and track people in the surroundings. The main problems of this task are real-time constraints, a changing background, varying illumination conditions and a non-rigid shape of the person to be tracked. In this paper, we propose a solution for tracking with a pan-tilt camera and a passive infrared range (PIR) sensor to detect the moving object based on consecutive frame difference. The proposed method is excellent in real-time performance because it requires only a little memory and computation. Experiment results show that this method can detect the moving object such as human efficiently and accurately in non-stationary and complex indoor environment.
https://arxiv.org/abs/1510.07390
Convolutional networks (ConvNets) have become a popular approach to computer vision. It is important to accelerate ConvNet training, which is computationally costly. We propose a novel parallel algorithm based on decomposition into a set of tasks, most of which are convolutions or FFTs. Applying Brent’s theorem to the task dependency graph implies that linear speedup with the number of processors is attainable within the PRAM model of parallel computation, for wide network architectures. To attain such performance on real shared-memory machines, our algorithm computes convolutions converging on the same node of the network with temporal locality to reduce cache misses, and sums the convergent convolution outputs via an almost wait-free concurrent method to reduce time spent in critical sections. We implement the algorithm with a publicly available software package called ZNN. Benchmarking with multi-core CPUs shows that ZNN can attain speedup roughly equal to the number of physical cores. We also show that ZNN can attain over 90x speedup on a many-core CPU (Xeon Phi Knights Corner). These speedups are achieved for network architectures with widths that are in common use. The task parallelism of the ZNN algorithm is suited to CPUs, while the SIMD parallelism of previous algorithms is compatible with GPUs. Through examples, we show that ZNN can be either faster or slower than certain GPU implementations depending on specifics of the network architecture, kernel sizes, and density and size of the output patch. ZNN may be less costly to develop and maintain, due to the relative ease of general-purpose CPU programming.
https://arxiv.org/abs/1510.06706
A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.
http://arxiv.org/abs/1411.2374
Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for tagging sequential data, e.g. speech utterances or handwritten documents. While word embedding has been demoed as a powerful representation for characterizing the statistical properties of natural language. In this study, we propose to use BLSTM-RNN with word embedding for part-of-speech (POS) tagging task. When tested on Penn Treebank WSJ test set, a state-of-the-art performance of 97.40 tagging accuracy is achieved. Without using morphological features, this approach can also achieve a good performance comparable with the Stanford POS tagger.
https://arxiv.org/abs/1510.06168
Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).
真实世界的视频通常具有复杂的动态;并且用于生成开域视频描述的方法应该对时间结构敏感,并允许可变长度的输入(帧的序列)和输出(字的序列)。为了解决这个问题,我们提出了一个新颖的端到端序列到序列模型来生成视频的标题。为此,我们利用循环神经网络,特别是LSTM,它们在图像标题生成中展示了最先进的性能。我们的LSTM模型接受视频 - 句子对的训练,并学习将一系列视频帧与一系列单词相关联,以生成视频片段中事件的描述。我们的模型自然能够学习帧序列的时间结构以及生成的句子的序列模型,即语言模型。我们评估了我们模型的几个变体,它们在标准的YouTube视频集和两个电影描述数据集(M-VAD和MPII-MD)上利用不同的视觉特征。
https://arxiv.org/abs/1505.00487
Buffer leakage is an important parasitic loss mechanism in AlGaN/GaN HEMTs and hence various methods are employed to grow semi-insulating buffer layers. Quantification of carrier concentration in such buffers using conventional capacitance based profiling techniques is challenging due to their fully depleted nature even at zero bias voltages. We provide a simple and effective model to extract carrier concentrations in fully depleted GaN films using capacitance-voltage (C-V) measurements. Extensive mercury probe C-V profiling has been performed on GaN films of differing thicknesses and doping levels in order to validate this model. Carrier concentrations as extracted from both the conventional C-V technique for partially depleted films having the same doping concentration, and Hall measurements show excellent agreement with those predicted by the proposed model thus establishing the utility of this technique. This model can be readily extended to estimate background carrier concentrations from the depletion region capacitances of HEMT structures and fully depleted films of any class of semiconductor materials.
https://arxiv.org/abs/1506.07320
Countless learning tasks require dealing with sequential data. Image captioning, speech synthesis, and music generation all require that a model produce outputs that are sequences. In other domains, such as time series prediction, video analysis, and musical information retrieval, a model must learn from inputs that are sequences. Interactive tasks, such as translating natural language, engaging in dialogue, and controlling a robot, often demand both capabilities. Recurrent neural networks (RNNs) are connectionist models that capture the dynamics of sequences via cycles in the network of nodes. Unlike standard feedforward neural networks, recurrent networks retain a state that can represent information from an arbitrarily long context window. Although recurrent neural networks have traditionally been difficult to train, and often contain millions of parameters, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful large-scale learning with them. In recent years, systems based on long short-term memory (LSTM) and bidirectional (BRNN) architectures have demonstrated ground-breaking performance on tasks as varied as image captioning, language translation, and handwriting recognition. In this survey, we review and synthesize the research that over the past three decades first yielded and then made practical these powerful learning models. When appropriate, we reconcile conflicting notation and nomenclature. Our goal is to provide a self-contained explication of the state of the art together with a historical perspective and references to primary research.
无数的学习任务需要处理顺序数据。图像字幕,语音合成和音乐生成都需要一个模型产生序列的输出。在其他领域,如时间序列预测,视频分析和音乐信息检索,模型必须从输入序列中学习。诸如翻译自然语言,参与对话和控制机器人等交互式任务经常要求两种能力。递归神经网络(RNN)是连接模型,通过节点网络中的周期捕获序列的动态。与标准前馈神经网络不同,经常性网络保留一个状态,可以从任意长的上下文窗口中表示信息。尽管循环神经网络传统上难以训练,并且通常包含数百万个参数,但是网络架构,优化技术和并行计算方面的最新进展使得他们能够成功进行大规模的学习。近年来,基于长时间短时记忆(LSTM)和双向(BRNN)体系结构的系统已经在图像字幕,语言翻译和手写识别等各种任务上展现了突破性的性能。在这次调查中,我们回顾并综合了过去三十年来的研究成果,并将这些强有力的学习模式变为现实。在适当的时候,我们调和冲突的符号和术语。我们的目标是提供最先进的解释和历史观点,并参考主要研究。
https://arxiv.org/abs/1506.00019
The future wireless network, such as Centralized Radio Access Network (C-RAN), will need to deliver data rate about 100 to 1000 times the current 4G technology. For C-RAN based network architecture, there is a pressing need for tremendous enhancement of the effective data rate of the Common Public Radio Interface (CPRI). Compression of CPRI data is one of the potential enhancements. In this paper, we introduce a vector quantization based compression algorithm for CPRI links, utilizing Lloyd algorithm. Methods to vectorize the I/Q samples and enhanced initialization of Lloyd algorithm for codebook training are investigated for improved performance. Multi-stage vector quantization and unequally protected multi-group quantization are considered to reduce codebook search complexity and codebook size. Simulation results show that our solution can achieve compression of 4 times for uplink and 4.5 times for downlink, within 2% Error Vector Magnitude (EVM) distortion. Remarkably, vector quantization codebook proves to be quite robust against data modulation mismatch, fading, signal-to-noise ratio (SNR) and Doppler spread.
https://arxiv.org/abs/1510.04940
This paper assesses intersubband transitions in the 1 to 10 THz frequency range in nonpolar m-plane GaN/AlGaN multi-quantum-wells deposited on free-standing semi-insulating GaN substrates. The quantum wells were designed to contain two confined electronic levels, decoupled from the neighboring wells. Structural analysis reveals flat and regular quantum wells in the two perpendicular inplane directions, with high-resolution images showing inhomogeneities of the Al composition in the barriers along the growth axis. We do not observe extended structural defects introduced by the epitaxial process. Low-temperature intersubband absorption from 1.5 to 9 THz is demonstrated, covering part of the 7 to 10 THz band forbidden to GaAs-based technologies.
https://arxiv.org/abs/1506.00353
We report the growth of high-quality triangular GaN nanomesas, 30-nm thick, on the C-face of 4H-SiC using nano selective area growth (NSAG) with patterned epitaxial graphene grown on SiC as an embedded mask. NSAG alleviates the problems of defective crystals in the heteroepitaxial growth of nitrides, and the high mobility graphene film can readily provide the back low-dissipative electrode in GaN-based optoelectronic devices. The process consists in first growing a 5-8 graphene layers film on the C-face of 4H- SiC by confinement-controlled sublimation of silicon carbide. The graphene film is then patterned and arrays of 75-nanometer-wide openings are etched in graphene revealing the SiC substrate. 30-nanometer-thick GaN is subsequently grown by metal organic vapor phase epitaxy. GaN nanomesas grow epitaxially with perfect selectivity on SiC, in openings patterned through graphene, with no nucleation on graphene. The up-or-down orientation of the mesas on SiC, their triangular faceting, and cross-sectional scanning transmission electron microscopy show that they are biphasic. The core is a zinc blende monocrystal surrounded with single-crystal hexagonal wurtzite. The GaN crystalline nanomesas have no threading dislocations, and do not show any V-pit. This NSAG process potentially leads to integration of high-quality III-nitrides on the wafer scalable epitaxial graphene / silicon carbide platform.
https://arxiv.org/abs/1510.04513
Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.
最近的两种方法已经在图像字幕方面取得了最新的成果。第一种是使用流水线处理,其中由图像上的卷积神经网络(CNN)生成一组候选词,然后使用最大熵(ME)语言模型将这些词排列成连贯句子。第二个使用CNN的倒数第二个激活层作为循环神经网络(RNN)的输入,然后产生字幕序列。在本文中,我们首次使用相同的最先进的CNN作为输入来比较这些不同的语言建模方法的优点。我们用不同的方法来研究问题,包括语言违规,标题重复和数据集重叠。通过结合ME和RNN方法的关键方面,我们在基准COCO数据集的先前发表的结果上实现了新的记录性能。然而,我们在BLEU看到的收益并不能转化为人类的判断。
https://arxiv.org/abs/1505.01809
Coastally associated rainfall is a common feature especially in tropical and subtropical regions. However, it has been difficult to quantify the contribution of coastal rainfall features to the overall local rainfall. We develop a novel technique to objectively identify precipitation associated with land-sea interaction and apply it to satellite based rainfall estimates. The Maritime Continent, the Bight of Panama, Madagascar and the Mediterranean are found to be regions where land-sea interactions plays a crucial role in the formation of precipitation. In these regions $\approx$ 40% to 60% of the total rainfall can be related to coastline effects. Due to its importance for the climate system, the Maritime Continent is a particular region of interest with high overall amounts of rainfall and large fractions resulting from land-sea interactions throughout the year. To demonstrate the utility of our identification method we investigate the influence of several modes of variability, such as the Madden-Julian-Oscillation and the El Niño Southern Oscillation, on coastal rainfall behavior. The results suggest that during large scale suppressed convective conditions coastal effects tend modulate the rainfall over the Maritime Continent leading to enhanced rainfall over land regions compared to the surrounding oceans. We propose that the novel objective dataset of coastally influenced precipitation can be used in a variety of ways, such as to inform cumulus parametrization or as an additional tool for evaluating the simulation of coastal precipitation within weather and climate models.
https://arxiv.org/abs/1501.06265
A novel efficient method for extraction of object proposals is introduced. Its “objectness” function exploits deep spatial pyramid features, a novel fast-to-compute HoG-based edge statistic and the EdgeBoxes score. The efficiency is achieved by the use of spatial bins in a novel combination with sparsity-inducing group normalized SVM. State-of-the-art recall performance is achieved on Pascal VOC07, significantly outperforming methods with comparable speed. Interestingly, when only 100 proposals per image are considered the method attains 78% recall on VOC07. The method improves mAP of the RCNN state-of-the-art class-specific detector, increasing it by 10 points when only 50 proposals are used in each image. The system trained on twenty classes performs well on the two hundred class ILSVRC2013 set confirming generalization capability.
https://arxiv.org/abs/1504.07029
We propose a resonant electromagnetic detector to search for hidden-photon dark matter over an extensive range of masses. Hidden-photon dark matter can be described as a weakly coupled “hidden electric field,” oscillating at a frequency fixed by the mass, and able to penetrate any shielding. At low frequencies (compared to the inverse size of the shielding), we find that observable effect of the hidden photon inside any shielding is a real, oscillating magnetic field. We outline experimental setups designed to search for hidden-photon dark matter, using a tunable, resonant LC circuit designed to couple to this magnetic field. Our “straw man” setups take into consideration resonator design, readout architecture and noise estimates. At high frequencies,there is an upper limit to the useful size of a single resonator set by $1/\nu$. However, many resonators may be multiplexed within a hidden-photon coherence length to increase the sensitivity in this regime. Hidden-photon dark matter has an enormous range of possible frequencies, but current experiments search only over a few narrow pieces of that range. We find the potential sensitivity of our proposal is many orders of magnitude beyond current limits over an extensive range of frequencies, from 100 Hz up to 700 GHz and potentially higher.
https://arxiv.org/abs/1411.7382
Traffic scene perception (TSP) aims to real-time extract accurate on-road environment information, which in- volves three phases: detection of objects of interest, recognition of detected objects, and tracking of objects in motion. Since recognition and tracking often rely on the results from detection, the ability to detect objects of interest effectively plays a crucial role in TSP. In this paper, we focus on three important classes of objects: traffic signs, cars, and cyclists. We propose to detect all the three important objects in a single learning based detection framework. The proposed framework consists of a dense feature extractor and detectors of three important classes. Once the dense features have been extracted, these features are shared with all detectors. The advantage of using one common framework is that the detection speed is much faster, since all dense features need only to be evaluated once in the testing phase. In contrast, most previous works have designed specific detectors using different features for each of these objects. To enhance the feature robustness to noises and image deformations, we introduce spatially pooled features as a part of aggregated channel features. In order to further improve the generalization performance, we propose an object subcategorization method as a means of capturing intra-class variation of objects. We experimentally demonstrate the effectiveness and efficiency of the proposed framework in three detection applications: traffic sign detection, car detection, and cyclist detection. The proposed framework achieves the competitive performance with state-of- the-art approaches on several benchmark datasets.
https://arxiv.org/abs/1510.03125
Large scale object detection with thousands of classes introduces the problem of many contradicting false positive detections, which have to be suppressed. Class-independent non-maximum suppression has traditionally been used for this step, but it does not scale well as the number of classes grows. Traditional non-maximum suppression does not consider label- and instance-level relationships nor does it allow an exploitation of the spatial layout of detection proposals. We propose a new multi-class spatial semantic regularisation method based on affinity propagation clustering, which simultaneously optimises across all categories and all proposed locations in the image, to improve both the localisation and categorisation of selected detection proposals. Constraints are shared across the labels through the semantic WordNet hierarchy. Our approach proves to be especially useful in large scale settings with thousands of classes, where spatial and semantic interactions are very frequent and only weakly supervised detectors can be built due to a lack of bounding box annotations. Detection experiments are conducted on the ImageNet and COCO dataset, and in settings with thousands of detected categories. Our method provides a significant precision improvement by reducing false positives, while simultaneously improving the recall.
https://arxiv.org/abs/1510.02949
We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback. The proposed FSMN is a standard feedforward neural networks equipped with learnable sequential memory blocks in the hidden layers. In this work, we have applied FSMN to several language modeling (LM) tasks. Experimental results have shown that the memory blocks in FSMN can learn effective representations of long history. Experiments have shown that FSMN based language models can significantly outperform not only feedforward neural network (FNN) based LMs but also the popular recurrent neural network (RNN) LMs.
https://arxiv.org/abs/1510.02693
This work attempts to give new theoretical insights to the absence of intermediate stages in the evolution of language. In particular, it is developed an automata networks approach to a crucial question: how a population of language users can reach agreement on a linguistic convention? To describe the appearance of sharp transitions in the self-organization of language, it is adopted an extremely simple model of (working) memory. At each time step, language users simply loss part of their word-memories. Through computer simulations of low-dimensional lattices, it appear sharp transitions at critical values that depend on the size of the vicinities of the individuals.
https://arxiv.org/abs/1508.01580
Efficient generation of high-quality object proposals is an essential step in state-of-the-art object detection systems based on deep convolutional neural networks (DCNN) features. Current object proposal algorithms are computationally inefficient in processing high resolution images containing small objects, which makes them the bottleneck in object detection systems. In this paper we present effective methods to detect objects for high resolution images. We combine two complementary strategies. The first approach is to predict bounding boxes based on adjacent visual features. The second approach uses high level image features to guide a two-step search process that adaptively focuses on regions that are likely to contain small objects. We extract features required for the two strategies by utilizing a pre-trained DCNN model known as AlexNet. We demonstrate the effectiveness of our algorithm by showing its performance on a high-resolution image subset of the SUN 2012 object detection dataset.
https://arxiv.org/abs/1510.01257
Large scale-free graphs are famously difficult to process efficiently: the skewed vertex degree distribution makes it difficult to obtain balanced partitioning. Our research instead aims to turn this into an advantage by partitioning the workload to match the strength of the individual computing elements in a Hybrid, GPU-accelerated architecture. As a proof of concept we focus on the direction-optimized breadth first search algorithm. We present the key graph partitioning, workload allocation, and communication strategies required for massive concurrency and good overall performance. We show that exploiting specialization enables gains as high as 2.4x in terms of time-to-solution and 2.0x in terms of energy efficiency by adding 2 GPUs to a 2 CPU-only baseline, for synthetic graphs with up to 16 Billion undirected edges as well as for large real-world graphs. We also show that, for a capped energy envelope, it is more efficient to add a GPU than an additional CPU. Finally, our performance would place us at the top of today’s [Green]Graph500 challenges for Scale29 graphs.
https://arxiv.org/abs/1503.04359
In this paper, we address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, our method is able to efficiently hypothesize the semantic meaning of new words and add them to its word dictionary so that they can be used to describe images which contain these novel concepts. Our method has an image captioning module based on m-RNN with several improvements. In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task. We propose methods to prevent overfitting the new concepts. In addition, three novel concept datasets are constructed for this new task. In the experiments, we show that our method effectively learns novel visual concepts from a few examples without disturbing the previously learned concepts. The project page is this http URL
在本文中,我们从几个带有句子描述的图像着手,学习新颖的视觉概念,以及它们与其他概念的相互作用。通过使用语言上下文和视觉特征,我们的方法能够有效地假设新词的语义,并将其添加到单词词典中,以便它们可以用来描述包含这些新颖概念的图像。我们的方法有一个基于m-RNN的图像字幕模块,并进行了一些改进。具体而言,我们提出了一种转置权重分配方案,不仅提高了图像字幕的性能,而且使得该模型更适合新颖的概念学习任务。我们提出防止过度配合新概念的方法。另外,为这个新任务构建了三个新的概念数据集。在实验中,我们展示了我们的方法有效地从几个例子中学习新颖的视觉概念,而不会干扰以前学到的概念。项目页面是这个http URL
https://arxiv.org/abs/1504.06692
In this paper, we present a Pt/Al multilayer stack-based ohmic contact metallization for AlGaN/GaN heterostructures. CTLM structures were fabricated to assess the electrical properties of the proposed metallization. The fabricated stack shows excellent stability after more than 100 hours of continuous aging at 600oC in air. Measured I-V characteristics of the fabricated samples show excellent linearity after the aging. The Pt/Al-based metallization shows great potential for future device and sensor applications in extreme environment conditions.
https://arxiv.org/abs/1509.09178
Moving object detection is a key to intelligent video analysis. On the one hand, what moves is not only interesting objects but also noise and cluttered background. On the other hand, moving objects without rich texture are prone not to be detected. So there are undesirable false alarms and missed alarms in many algorithms of moving object detection. To reduce the false alarms and missed alarms, in this paper, we propose to incorporate a saliency map into an incremental subspace analysis framework where the saliency map makes estimated background has less chance than foreground (i.e., moving objects) to contain salient objects. The proposed objective function systematically takes account into the properties of sparsity, low-rank, connectivity, and saliency. An alternative minimization algorithm is proposed to seek the optimal solutions. Experimental results on the Perception Test Images Sequences demonstrate that the proposed method is effective in reducing false alarms and missed alarms.
https://arxiv.org/abs/1509.09089
Most III-nitride semiconductors are grown on non-lattice-matched substrates like sapphire or silicon due to the extreme difficulty of obtaining a native GaN substrate. We show that several layered transition-metal dichalcogenides are closely lattice matched to GaN and report the growth of GaN on a range of such layered materials. We report detailed studies of the growth of GaN on mechanically-exfoliated flakes WS$_2$ and MoS$_2$ by metalorganic vapour phase epitaxy. Structural and optical characterization show that strain-free, single-crystal islands of GaN are obtained on the underlying chalcogenide flakes. We obtain strong near-band-edge emission from these layers, and analyse their temperature-dependent photoluminescence properties. We also report a proof-of-concept demonstration of large-area epitaxial growth of GaN on CVD MoS$_2$. Our results show that the transition-metal dichalcogenides can serve as novel near-lattice-matched substrates for nitride growth.
https://arxiv.org/abs/1509.08256
We present a novel detection method using a deep convolutional neural network (CNN), named AttentionNet. We cast an object detection problem as an iterative classification problem, which is the most suitable form of a CNN. AttentionNet provides quantized weak directions pointing a target object and the ensemble of iterative predictions from AttentionNet converges to an accurate object boundary box. Since AttentionNet is a unified network for object detection, it detects objects without any separated models from the object proposal to the post bounding-box regression. We evaluate AttentionNet by a human detection task and achieve the state-of-the-art performance of 65% (AP) on PASCAL VOC 2007/2012 with an 8-layered architecture only.
https://arxiv.org/abs/1506.07704
In this paper, we evaluate convolutional neural network (CNN) features using the AlexNet architecture and very deep convolutional network (VGGNet) architecture. To date, most CNN researchers have employed the last layers before output, which were extracted from the fully connected feature layers. However, since it is unlikely that feature representation effectiveness is dependent on the problem, this study evaluates additional convolutional layers that are adjacent to fully connected layers, in addition to executing simple tuning for feature concatenation (e.g., layer 3 + layer 5 + layer 7) and transformation, using tools such as principal component analysis. In our experiments, we carried out detection and classification tasks using the Caltech 101 and Daimler Pedestrian Benchmark Datasets.
https://arxiv.org/abs/1509.07627
The influence of GaN nanowires on the optical and electrical properties of graphene deposited on them was studied using Raman spectroscopy and microwave induced electron transport method. It was found that interaction with the nanowires induces spectral changes as well as large enhancement of Raman scattering intensity. Surprisingly, the smallest enhancement (about 30-fold) was observed for the defect induced D’ process and the highest intensity increase (over 50-fold) was found for the 2D transition. The observed energy shifts of the G and 2D bands allowed to determine carrier concentration fluctuations induced by GaN nanowires. Comparison of Raman scattering spatial intensity maps and the images obtained using scanning electron microscope led to conclusion that vertically aligned GaN nanowires induce a homogenous strain, substantial spatial modulation of carrier concentration in graphene and unexpected homogenous distribution of defects created by interaction with nanowires. The analysis of the D and D’ peak intensity ratio showed that interaction with nanowires also changes the probability of scattering on different types of defects. The Raman studies were correlated with weak localization effect measured using microwave induced contactless electron transport. Temperature dependence of weak localization signal showed electron-electron scattering as a main decoherence mechanism with additional, temperature independent scattering reducing coherence length. We attributed it to the interaction of electrons in graphene with charges present on the top of nanowires due to spontaneous and piezoelectric polarization of GaN. Thus, nanowires act as antennas and generate enhanced near field which can explain the observed large enhancement of Raman scattering intensity.
https://arxiv.org/abs/1506.00217
We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization. We exploit the above properties of our recognition module by integrating it on an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model. Thanks to the efficient use of our modules, we detect objects with very high localization accuracy. On the detection challenges of PASCAL VOC2007 and PASCAL VOC2012 we achieve mAP of 78.2% and 73.9% correspondingly, surpassing any other published work by a significant margin.
https://arxiv.org/abs/1505.01749
Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the likelihood of each token in the sequence given the current (recurrent) state and the previous token. At inference, the unknown previous token is then replaced by a token generated by the model itself. This discrepancy between training and inference can yield errors that can accumulate quickly along the generated sequence. We propose a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead. Experiments on several sequence prediction tasks show that this approach yields significant improvements. Moreover, it was used successfully in our winning entry to the MSCOCO image captioning challenge, 2015.
递归神经网络可以被训练以产生一些输入的令牌序列,例如最近在机器翻译和图像字幕中的结果。目前的训练方法包括在给定当前(重复)状态和前一个令牌的情况下,最大化序列中每个令牌的可能性。在推断中,未知的先前的标记然后被由模型自身产生的标记代替。训练和推理之间的这种差异会产生可以沿着生成的序列快速积累的错误。我们提出了一个课程学习策略,从一个完全指导的方案,使用真正的上一个令牌,转向一个主要使用生成的令牌的较少指导方案,从而轻松地将训练过程改变。对几个序列预测任务的实验表明,这种方法产生了显着的改进。此外,它成功地用于我们赢得2015年的MSCOCO图像字幕挑战。
https://arxiv.org/abs/1506.03099
Policy learning for partially observed control tasks requires policies that can remember salient information from past observations. In this paper, we present a method for learning policies with internal memory for high-dimensional, continuous systems, such as robotic manipulators. Our approach consists of augmenting the state and action space of the system with continuous-valued memory states that the policy can read from and write to. Learning general-purpose policies with this type of memory representation directly is difficult, because the policy must automatically figure out the most salient information to memorize at each time step. We show that, by decomposing this policy search problem into a trajectory optimization phase and a supervised learning phase through a method called guided policy search, we can acquire policies with effective memorization and recall strategies. Intuitively, the trajectory optimization phase chooses the values of the memory states that will make it easier for the policy to produce the right action in future states, while the supervised learning phase encourages the policy to use memorization actions to produce those memory states. We evaluate our method on tasks involving continuous control in manipulation and navigation settings, and show that our method can learn complex policies that successfully complete a range of tasks that require memory.
https://arxiv.org/abs/1507.01273
Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations. Stochastic attention-based models have been shown to improve computational efficiency at test time, but they remain difficult to train because of intractable posterior inference and high variance in the stochastic gradient estimates. Borrowing techniques from the literature on training deep generative models, we present the Wake-Sleep Recurrent Attention Model, a method for training stochastic attention networks which improves posterior inference and which reduces the variability in the stochastic gradients. We show that our method can greatly speed up the training time for stochastic attention networks in the domains of image classification and caption generation.
尽管他们成功了,卷积神经网络在计算上是昂贵的,因为他们必须检查所有的图像位置。已经显示随机基于注意的模型在测试时间提高了计算效率,但是由于难以处理的后验推断和随机梯度估计的高方差,它们仍然难以训练。借助文献中关于训练深度生成模型的技巧,我们提出了唤醒 - 睡眠复发注意模型,这是一种训练随机注意网络的方法,它可以改善后验推断,并降低随机梯度的变异性。我们表明,我们的方法可以大大加快随机关注网络在图像分类和字幕生成领域的训练时间。
https://arxiv.org/abs/1509.06812
Most of the current boundary detection systems rely exclusively on low-level features, such as color and texture. However, perception studies suggest that humans employ object-level reasoning when judging if a particular pixel is a boundary. Inspired by this observation, in this work we show how to predict boundaries by exploiting object-level features from a pretrained object-classification network. Our method can be viewed as a “High-for-Low” approach where high-level object features inform the low-level boundary detection process. Our model achieves state-of-the-art performance on an established boundary detection benchmark and it is efficient to run. Additionally, we show that due to the semantic nature of our boundaries we can use them to aid a number of high-level vision tasks. We demonstrate that using our boundaries we improve the performance of state-of-the-art methods on the problems of semantic boundary labeling, semantic segmentation and object proposal generation. We can view this process as a “Low-for-High” scheme, where low-level boundaries aid high-level vision tasks. Thus, our contributions include a boundary detection system that is accurate, efficient, generalizes well to multiple datasets, and is also shown to improve existing state-of-the-art high-level vision methods on three distinct tasks.
https://arxiv.org/abs/1504.06201
An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches over the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems which already incorporate known techniques such as dropout. Our ensemble model using different attention architectures has established a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker.
https://arxiv.org/abs/1508.04025
Sponsored search is a multi-billion dollar industry and makes up a major source of revenue for search engines (SE). click-through-rate (CTR) estimation plays a crucial role for ads selection, and greatly affects the SE revenue, advertiser traffic and user experience. We propose a novel architecture for solving CTR prediction problem by combining artificial neural networks (ANN) with decision trees. First we compare ANN with respect to other popular machine learning models being used for this task. Then we go on to combine ANN with MatrixNet (proprietary implementation of boosted trees) and evaluate the performance of the system as a whole. The results show that our approach provides significant improvement over existing models.
https://arxiv.org/abs/1412.6601
How can a single fully convolutional neural network (FCN) perform on object detection? We introduce DenseBox, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Our contribution is two-fold. First, we show that a single FCN, if designed and optimized carefully, can detect multiple different objects extremely accurately and efficiently. Second, we show that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray. We present experimental results on public benchmark datasets including MALF face detection and KITTI car detection, that indicate our DenseBox is the state-of-the-art system for detecting challenging objects such as faces and cars.
https://arxiv.org/abs/1509.04874
A model of an Ant System where ants are controlled by a spiking neural circuit and a second order pheromone mechanism in a foraging task is presented. A neural circuit is trained for individual ants and subsequently the ants are exposed to a virtual environment where a swarm of ants performed a resource foraging task. The model comprises an associative and unsupervised learning strategy for the neural circuit of the ant. The neural circuit adapts to the environment by means of classical conditioning. The initially unknown environment includes different types of stimuli representing food and obstacles which, when they come in direct contact with the ant, elicit a reflex response in the motor neural system of the ant: moving towards or away from the source of the stimulus. The ants are released on a landscape with multiple food sources where one ant alone would have difficulty harvesting the landscape to maximum efficiency. The introduction of a double pheromone mechanism yields better results than traditional ant colony optimization strategies. Traditional ant systems include mainly a positive reinforcement pheromone. This approach uses a second pheromone that acts as a marker for forbidden paths (negative feedback). This blockade is not permanent and is controlled by the evaporation rate of the pheromones. The combined action of both pheromones acts as a collective stigmergic memory of the swarm, which reduces the search space of the problem. This paper explores how the adaptation and learning abilities observed in biologically inspired cognitive architectures is synergistically enhanced by swarm optimization strategies. The model portraits two forms of artificial intelligent behaviour: at the individual level the spiking neural network is the main controller and at the collective level the pheromone distribution is a map towards the solution emerged by the colony.
https://arxiv.org/abs/1507.08467
Writing formal specifications for distributed systems is difficult. Even simple consistency requirements often turn out to be unrealizable because of the complicated information flow in the distributed system: not all information is available in every component, and information transmitted from other components may arrive with a delay or not at all, especially in the presence of faults. The problem of checking the distributed realizability of a temporal specification is, in general, undecidable. Semi-algorithms for synthesis, such as bounded synthesis, are only useful in the positive case, where they construct an implementation for a realizable specification, but not in the negative case: if the specification is unrealizable, the search for the implementation never terminates. In this paper, we introduce counterexamples to distributed realizability and present a method for the detection of such counterexamples for specifications given in linear-time temporal logic (LTL). A counterexample consists of a set of paths, each representing a different sequence of inputs from the environment, such that, no matter how the components are implemented, the specification is violated on at least one of these paths. We present a method for finding such counterexamples both for the classic distributed realizability problem and for the fault-tolerant realizability problem. Our method considers, incrementally, larger and larger sets of paths until a counterexample is found. For safety specifications in weakly ordered architectures we obtain a decision procedure, while counterexamples for full LTL and arbitrary architectures may consist of infinitely many paths. Experimental results, obtained with a QBF-based prototype implementation, show that our method finds simple errors very quickly, and even problems with high combinatorial complexity, like the Byzantine Generals’ Problem, are tractable.
https://arxiv.org/abs/1505.06862
In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search in order to prevent from favoring short sentences. On various benchmark datasets such as Flickr8K, Flickr30K and MS COCO, we obtain results that are on par with or even outperform the current state-of-the-art.
在这项工作中,我们专注于图像标题生成的问题。我们提出了长期短期记忆(LSTM)模型的扩展,我们简称gLSTM。特别地,我们将从图像中提取的语义信息作为额外的输入添加到LSTM块的每个单元中,目的是将模型引导到与图像内容更紧密耦合的解决方案。此外,我们探索不同长度的波束搜索的规范化策略,以防止短句的偏爱。在各种基准数据集(如Flickr8K,Flickr30K和MS COCO)上,我们获得的结果与目前的最新技术相比甚至超越了目前的水平。
https://arxiv.org/abs/1509.04942
Near-infrared (NIR) spectra that have an angular resolution of ~ 0.15 arcsec are used to examine the stellar content of the central regions of the nearby elliptical galaxy Maffei 1. The spectra were recorded at the Subaru Telescope, with wavefront distortions corrected by the RAVEN Multi-Object Adaptive Optics science demonstrator. The Ballick-Ramsey C_2 absorption bandhead near 1.76 microns is detected, and models in which 10 - 20% of the light near 1.8 microns originates from stars of spectral type C5 reproduce this feature. Archival NIR and mid-infrared images are also used to probe the structural and photometric properties of the galaxy. Comparisons with models suggest that an intermediate age population dominates the spectral energy distribution between 1 and 5 microns near the galaxy center. This is consistent not only with the presence of C stars, but also with the large HBeta index that has been measured previously for Maffei 1. The J-K color is more-or-less constant within 15 arcsec of the galaxy center, suggesting that the brightest red stars are well-mixed in this area.
https://arxiv.org/abs/1509.04338
We propose a Green Cloudlet Network (\emph{GCN}) architecture to provide seamless Mobile Cloud Computing (\emph{MCC}) services to User Equipments (\emph{UE}s) with low latency in which each cloudlet is powered by both green and brown energy. Fully utilizing green energy can significantly reduce the operational cost of cloudlet providers. However, owing to the spatial dynamics of energy demand and green energy generation, the energy gap among different cloudlets in the network is unbalanced, i.e., some cloudlets’ energy demands can be fully provided by green energy but others need to utilize on-grid energy (i.e., brown energy) to satisfy their energy demands. We propose a Green-energy awarE Avatar migRation (\emph{GEAR}) strategy to minimize the on-grid energy consumption in GCN by redistributing the energy demands via Avatar migration among cloudlets according to cloudlets’ green energy generation. Furthermore, GEAR ensures the Service Level Agreement (\emph{SLA}) in terms of the maximum Avatar propagation delay by avoiding Avatars hosted in the remote cloudlets. We formulate the GEAR strategy as a mixed integer linear programming problem, which is NP-hard, and thus apply the Branch and Bound search to find its sub-optimal solution. Simulation results demonstrate that GEAR can save on-grid energy consumption significantly as compared to the Follow me AvataR (\emph{FAR}) migration strategy, which aims to minimize the propagation delay between an UE and its Avatar.
https://arxiv.org/abs/1509.03603
We report epitaxial growth and characterization of GaN thin films on sapphire (0001) substrates using low temperature GaN intermediate layer by plasma assisted molecular beam epitaxy (PA-MBE) technique. As grown and annealed GaN thin films were studied by high- resolution X-ray diffraction (HRXRD), atomic force microscopy (AFM), Hall Effect and photoluminescence (PL). It has been found that there is a significant improvement in the quality of the GaN films after annealing at 725 \degree C in terms of electron mobility, full width at half maximum (FWHM) of omega ({\omega}) scan around (0002) XRD peak of GaN films. Screw dislocation density obtained from the FWHM of GaN (0002) \omega scan and etch pitch density calculated from AFM image are 6.4 \times 10^8 cm^{-2} and 5.1\times 10^8 cm^{-2} respectively. In PL measurement, FWHM of near band edge (NBE) peak of GaN films has been found to be 30 meV.
https://arxiv.org/abs/1509.00416
Recent advances in supervised salient object detection has resulted in significant performance on benchmark datasets. Training such models, however, requires expensive pixel-wise annotations of salient objects. Moreover, many existing salient object detection models assume that at least one salient object exists in the input image. Such an assumption often leads to less appealing saliency maps on the background images, which contain no salient object at all. To avoid the requirement of expensive pixel-wise salient region annotations, in this paper, we study weakly supervised learning approaches for salient object detection. Given a set of background images and salient object images, we propose a solution toward jointly addressing the salient object existence and detection tasks. We adopt the latent SVM framework and formulate the two problems together in a single integrated objective function: saliency labels of superpixels are modeled as hidden variables and involved in a classification term conditioned to the salient object existence variable, which in turn depends on both global image and regional saliency features and saliency label assignment. Experimental results on benchmark datasets validate the effectiveness of our proposed approach.
https://arxiv.org/abs/1501.07492
A noise-based non-parametric technique for detecting nebulous objects, for example, irregular or clumpy galaxies, and their structure in noise is introduced. “Noise-based” and “non-parametric” imply that this technique imposes negligible constraints on the properties of the targets and that it employs no regression analysis or fittings. The sub-sky detection threshold is defined and initial detections are found, independently of the sky value. False detections are then estimated and removed using the ambient noise as a reference. This results in a purity level of 0.89 for the final detections as compared to 0.29 for SExtractor when a completeness of 1 is desired for a sample of extremely faint and diffuse mock galaxy profiles. The difference in the mean of the undetected pixels with the known background of mock images is decreased by 4.6 times depending on the diffuseness of the test profiles, quantifying the success in their detection. A non-parametric approach to defining substructure over a detected region is also introduced. NoiseChisel is our software implementation of this new technique. Contrary to the existing signal-based approach to detection, in its various implementations, signal-related parameters such as the image point spread function or known object shapes and models are irrelevant here. Such features make this technique very useful in astrophysical applications such as detection, photometry, or morphological analysis of nebulous objects buried in noise, for example, galaxies that do not generically have a known shape when imaged.
https://arxiv.org/abs/1505.01664
The evolution of surface morphology during the growth of N-polar (000-1) GaN under N-rich condition is studied by kinetic Monte Carlo (kMC) simulations for two substrates miscuts 2deg and 4deg. The results are compared with experimentally observed surface morphologies of (000-1) GaN layers grown by plasma-assisted molecular beam epitaxy. The proposed kMC two-component model of GaN(000-1) surface where both types of atoms: nitrogen and gallium attach the surface and diffuse independently, explains that at relatively high rates of the step flow (miscut angle <2deg) the low diffusion of gallium adatoms causes surface instabilities and leads to experimentally observed roughening while for low rates of the step flow (miscut 4deg), smooth surface can be obtained. In the presence of almost immobile nitrogen atoms under N-rich conditions, the growth is realized by the process of two-dimensional island nucleation and coalescence. Additionally, we show that higher crystal miscut, lower crystal growth rate or higher temperature results in similar effect of the smoothening of the surface. We show that the surface also smoothens for the growth conditions with very high N-excess. The presence of large number of nitrogen atoms changes locally mobility of gallium atoms thus providing easier coalescence of separated island.
https://arxiv.org/abs/1509.01035
This paper proposes a new algorithm based on multi-scale stochastic local search with binary representation for training neural networks. In particular, we study the effects of neighborhood evaluation strategies, the effect of the number of bits per weight and that of the maximum weight range used for mapping binary strings to real values. Following this preliminary investigation, we propose a telescopic multi-scale version of local search where the number of bits is increased in an adaptive manner, leading to a faster search and to local minima of better quality. An analysis related to adapting the number of bits in a dynamic way is also presented. The control on the number of bits, which happens in a natural manner in the proposed method, is effective to increase the generalization performance. Benchmark tasks include a highly non-linear artificial problem, a control problem requiring either feed-forward or recurrent architectures for feedback control, and challenging real-world tasks in different application domains. The results demonstrate the effectiveness of the proposed method.
https://arxiv.org/abs/1509.00174
Growth of GaN nanowires are carried out via metal initiated vapor-liquid-solid mechanism, with Au as the catalyst. In chemical vapour deposition technique, GaN nanowires are usually grown at high temperatures in the range of 900-1100 ^oC because of low vapor pressure of Ga below 900 ^oC. In the present study, we have grown the GaN nanowires at a temperature, as low as 700 ^oC. Role of indium in the reduction of growth temperature is discussed in the ambit of Raoult’s law. Indium is used to increase the vapor pressure of the Ga sufficiently to evaporate even at low temperature initiating the growth of GaN nanowires. In addition to the studies related to structural and vibrational properties, optical properties of the grown nanowires are also reported for detailed structural analysis.
https://arxiv.org/abs/1508.07808
Photoresponse of Au nanoparticle functionalized semiconducting GaN (Au-GaN) nanowires is reported for an optical switching using 532 excitation. Wide band gap GaN nanowires are grown by catalyst assisted chemical vapour deposition technique and functionalized with Au in the chemical route. Au-GaN nanowires show surface plasmon resonance (SPR) mode of Au nanoclusters around 550 nm along with characteristic band for GaN around 365 nm. An optical switching is observed for Au-GaN nanowires with a sub-band gap excitation of 532 nm suggesting possible role of surface plasmon polariton assisted transport of electron in the system. Role of band conduction is ruled out in the absence of optical switching using 325 nm excitation which is higher in energy that the reported band gap of GaN about 3.4 eV (365 nm) at room temperature. A finite amount of interband contribution of Au plays an important role along with the inter-particle separation. The switching device is also successfully tested for a single GaN nanowire functionalized with Au nanoclusters. A resistivity value of 0.05 Ohm-cm is measured for surface plasmon polariton assisted electrical transport of carrier in the single GaN nanowire.
https://arxiv.org/abs/1508.07801