Recurrent neural networks (RNNs), including long short-term memory (LSTM) RNNs, have produced state-of-the-art results on a variety of speech recognition tasks. However, these models are often too large in size for deployment on mobile devices with memory and latency constraints. In this work, we study mechanisms for learning compact RNNs and LSTMs via low-rank factorizations and parameter sharing schemes. Our goal is to investigate redundancies in recurrent architectures where compression can be admitted without losing performance. A hybrid strategy of using structured matrices in the bottom layers and shared low-rank factors on the top layers is found to be particularly effective, reducing the parameters of a standard LSTM by 75%, at a small cost of 0.3% increase in WER, on a 2,000-hr English Voice Search task.
https://arxiv.org/abs/1604.02594
In this paper, we show that ultrathin GaN membranes having a thickness of 15 nm and planar dimensions of 12x184 microns act as memristive devices. This fact is due to the migration of the negatively charged deep traps, which form in the volume of the membrane during the fabrication process, towards the unoccupied surface states of the suspended membranes. The time constant of the migration process is of the order of tens of second and varies with the current or voltage sweep.
https://arxiv.org/abs/1604.02586
We analyse the electroencephalogram signals in the beta band of working memory representation recorded from young healthy volunteers performing several different Visual Short-Term Memory (VSTM) tasks which have proven useful in the assessment of clinical and preclinical Alzheimer’s disease. We compare network analysis using Maximum Spanning Trees (MSTs) with network analysis obtained using 20% and 25% connection thresholds on the VSTM data. MSTs are a promising method of network analysis negating the more classical use of thresholds which are so far chosen arbitrarily. However, we find that the threshold analyses outperforms MSTs for detection of functional network differences. Particularly, MSTs fail to find any significant differences. Further, the thresholds detect significant differences between shape and shape-colour binding tasks when these are tested in the left side of the display screen, but no such differences are detected when these tasks are tested for in the right side of the display screen. This provides evidence that contralateral activity is a significant factor in sensitivity for detection of cognitive task differences.
https://arxiv.org/abs/1604.02404
The encoder-decoder framework for neural machine translation (NMT) has been shown effective in large data scenarios, but is much less effective for low-resource languages. We present a transfer learning method that significantly improves Bleu scores across a range of low-resource languages. Our key idea is to first train a high-resource language pair (the parent model), then transfer some of the learned parameters to the low-resource pair (the child model) to initialize and constrain training. Using our transfer learning method we improve baseline NMT models by an average of 5.6 Bleu on four low-resource language pairs. Ensembling and unknown word replacement add another 2 Bleu which brings the NMT performance on low-resource machine translation close to a strong syntax based machine translation (SBMT) system, exceeding its performance on one language pair. Additionally, using the transfer learning model for re-scoring, we can improve the SBMT system by an average of 1.3 Bleu, improving the state-of-the-art on low-resource machine translation.
https://arxiv.org/abs/1604.02201
Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions. Example structured facts include attributed objects (e.g., <flower, red>), actions (e.g., <baby, smile>), interactions (e.g., <man, walking, dog>), and positional information (e.g., <vase, on, table>). The collected annotations are in the form of fact-image pairs (e.g.,<man, walking, dog> and an image region containing this fact). With a language approach, the proposed method is able to collect hundreds of thousands of visual fact annotations with accuracy of 83% according to human judgment. Our method automatically collected more than 380,000 visual fact annotations and more than 110,000 unique visual facts from images with captions and localized them in images in less than one day of processing time on standard CPU platforms.
受事实级图像理解的应用的启发,我们提出了一种自动方法,用于从带有字幕的图像中对结构化视觉事实进行数据收集。示例结构化事实包括属性对象(例如<flower,red>),动作(例如<baby,smile>),交互(例如<man,walking,dog>)和位置信息(例如<vase,on ,表>)。所收集的注释是以事实图像对(例如,<男人,步行,狗>和包含这个事实的图像区域)的形式。用语言的方法,根据人的判断,所提出的方法能够以83%的准确率收集成千上万的视觉事实注释。我们的方法自动收集超过380,000个视觉事实注释和超过110,000个独特的视觉事实,并且在标准CPU平台上的处理时间少于一天的时间内将图像定位在图像中。
https://arxiv.org/abs/1604.00466
We propose a novel object localization methodology with the purpose of boosting the localization accuracy of state-of-the-art object detection systems. Our model, given a search region, aims at returning the bounding box of an object of interest inside this region. To accomplish its goal, it relies on assigning conditional probabilities to each row and column of this region, where these probabilities provide useful information regarding the location of the boundaries of the object inside the search region and allow the accurate inference of the object bounding box under a simple probabilistic framework. For implementing our localization model, we make use of a convolutional neural network architecture that is properly adapted for this task, called LocNet. We show experimentally that LocNet achieves a very significant improvement on the mAP for high IoU thresholds on PASCAL VOC2007 test set and that it can be very easily coupled with recent state-of-the-art object detection systems, helping them to boost their performance. Finally, we demonstrate that our detection approach can achieve high detection accuracy even when it is given as input a set of sliding windows, thus proving that it is independent of box proposal methods.
https://arxiv.org/abs/1511.07763
We report the $\gamma$-ray detection of a young radio galaxy, PKS 1718$-$649, belonging to the class of Compact Symmetric Objects (CSOs), with the Large Area Telescope (LAT) on board the {\it Fermi} satellite. The third {\it Fermi} Gamma-ray LAT catalog (3FGL) includes an unassociated $\gamma$-ray source, 3FGL J1728.0$-$6446, located close to PKS 1718$-$649. Using the latest Pass 8 calibration, we confirm that the best fit $1 \sigma$ position of the $\gamma$-ray source is compatible with the radio location of PKS 1718$-$649. Cross-matching of the $\gamma$-ray source position with the positions of blazar sources from several catalogs yields negative results. Thus, we conclude that PKS 1718$-$649 is the most likely counterpart to the unassociated LAT source. We obtain a detection test statistics TS$\sim 36$ ($>$5$\sigma$) with a best fit photon spectral index $\Gamma=$2.9$\pm$0.3 and a 0.1-100 GeV photon flux density $F_{\rm 0.1-100GeV}=$(11.5$\pm$0.3)$\times{\rm 10^{-9}}$ ph cm$^{-2}$ s$^{-1}$. We argue that the linear size ($\sim$2 pc), the kinematic age ($\sim$100 years), and the source distance ($z=0.014$) make PKS 1718$-$649 an ideal candidate for $\gamma$-ray detection in the framework of the model proposing that the most compact and the youngest CSOs can efficiently produce GeV radiation via inverse-Compton scattering of the ambient photon fields by the radio lobe non-thermal electrons. Thus, our detection of the source in $\gamma$-rays establishes young radio galaxies as a distinct class of extragalactic high-energy emitters, and yields an unique insight on the physical conditions in compact radio lobes interacting with the interstellar medium of the host galaxy.
https://arxiv.org/abs/1604.01987
An in situ study of the epitaxial growth of SmN thin films on Ga-polar GaN (0001)templates by molecular beam epitaxy is reported. Using X-ray photoelectron spectroscopy we found that Ga segregates at the surface during the first stages of growth. We showed that the problem related to Ga surface segregation can be simply suppressed by growing a few monolayers of AlN before starting the SmN growth. This results in a significant improvement of the crystallinity of SmN thin films assessed by X-ray diffraction.
https://arxiv.org/abs/1604.01900
Visual Question and Answering (VQA) problems are attracting increasing interest from multiple research disciplines. Solving VQA problems requires techniques from both computer vision for understanding the visual contents of a presented image or video, as well as the ones from natural language processing for understanding semantics of the question and generating the answers. Regarding visual content modeling, most of existing VQA methods adopt the strategy of extracting global features from the image or video, which inevitably fails in capturing fine-grained information such as spatial configuration of multiple objects. Extracting features from auto-generated regions – as some region-based image recognition methods do – cannot essentially address this problem and may introduce some overwhelming irrelevant features with the question. In this work, we propose a novel Focused Dynamic Attention (FDA) model to provide better aligned image content representation with proposed questions. Being aware of the key words in the question, FDA employs off-the-shelf object detector to identify important regions and fuse the information from the regions and global features via an LSTM unit. Such question-driven representations are then combined with question representation and fed into a reasoning unit for generating the answers. Extensive evaluation on a large-scale benchmark dataset, VQA, clearly demonstrate the superior performance of FDA over well-established baselines.
视觉问答(VQA)问题越来越引起越来越多的研究学科的兴趣。解决VQA问题需要计算机视觉技术来理解所呈现的图像或视频的视觉内容,以及来自自然语言处理的技术以理解问题的语义并生成答案。在视觉内容建模方面,现有的大多数VQA方法都采用从图像或视频中提取全局特征的策略,难以捕捉到多个对象的空间配置等细粒度信息。从自动生成的区域中提取特征 - 就像一些基于区域的图像识别方法一样 - 本质上不能解决这个问题,并且可能会引入一些压倒性的不相关的特征。在这项工作中,我们提出了一种新颖的聚焦动态注意(FDA)模型来提供更好的对齐图像内容表示与提出的问题。意识到问题中的关键词,FDA采用现成的物体检测器来识别重要区域,并通过LSTM单元融合来自区域和全局特征的信息。然后将这样的问题驱动的表示与问题表示相结合,并送入推理单元以产生答案。对大规模基准数据集VQA的广泛评估清楚地表明了FDA相对于完善的基线的卓越性能。
https://arxiv.org/abs/1604.01485
We present an approach that exploits hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem, i.e., generating one or multiple sentences to describe a realistic video. Our hierarchical framework contains a sentence generator and a paragraph generator. The sentence generator produces one simple short sentence that describes a specific short video interval. It exploits both temporal- and spatial-attention mechanisms to selectively focus on visual elements during generation. The paragraph generator captures the inter-sentence dependency by taking as input the sentential embedding produced by the sentence generator, combining it with the paragraph history, and outputting the new initial state for the sentence generator. We evaluate our approach on two large-scale benchmark datasets: YouTubeClips and TACoS-MultiLevel. The experiments demonstrate that our approach significantly outperforms the current state-of-the-art methods with BLEU@4 scores 0.499 and 0.305 respectively.
我们提出了一种利用递归神经网络(RNNs)来处理视频字幕问题的方法,即生成一个或多个句子来描述一个真实的视频。我们的分层框架包含一个句子生成器和一个段落生成器句子生成器产生一个简单的短句子,描述一个特定的短视频间隔。它利用时间和空间的注意机制来选择性地关注生成过程中的视觉元素。段落生成器通过将由句子生成器生成的句子嵌入作为输入,将其与段落历史相结合,并输出句子生成器的新的初始状态来捕获句间依赖。我们在两个大型基准数据集上评估我们的方法:YouTubeClips和TACoS-MultiLevel。实验证明,我们的方法在BLEU @ 4分数分别为0.499和0.305的情况下明显优于目前最先进的方法。
https://arxiv.org/abs/1510.07712
Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have been successfully applied to a variety of sequence modeling tasks. In this paper we develop Tree Long Short-Term Memory (TreeLSTM), a neural network model based on LSTM, which is designed to predict a tree rather than a linear sequence. TreeLSTM defines the probability of a sentence by estimating the generation probability of its dependency tree. At each time step, a node is generated based on the representation of the generated sub-tree. We further enhance the modeling power of TreeLSTM by explicitly representing the correlations between left and right dependents. Application of our model to the MSR sentence completion challenge achieves results beyond the current state of the art. We also report results on dependency parsing reranking achieving competitive performance.
https://arxiv.org/abs/1511.00060
We propose a novel attention based deep learning architecture for visual question answering task (VQA). Given an image and an image related natural language question, VQA generates the natural language answer for the question. Generating the correct answers requires the model’s attention to focus on the regions corresponding to the question, because different questions inquire about the attributes of different image regions. We introduce an attention based configurable convolutional neural network (ABC-CNN) to learn such question-guided attention. ABC-CNN determines an attention map for an image-question pair by convolving the image feature map with configurable convolutional kernels derived from the question’s semantics. We evaluate the ABC-CNN architecture on three benchmark VQA datasets: Toronto COCO-QA, DAQUAR, and VQA dataset. ABC-CNN model achieves significant improvements over state-of-the-art methods on these datasets. The question-guided attention generated by ABC-CNN is also shown to reflect the regions that are highly relevant to the questions.
我们提出了一种新颖的基于视觉问题回答任务(VQA)的基于注意力的深度学习体系结构。给定图像和图像相关的自然语言问题,VQA为该问题生成自然语言答案。生成正确的答案需要模型的注意力集中在与问题对应的区域,因为不同的问题询问不同图像区域的属性。我们引入基于注意的可配置卷积神经网络(ABC-CNN)来学习这样的问题引导注意力。 ABC-CNN通过将图像特征映射与从问题语义派生的可配置卷积核进行卷积来确定图像问题对的注意图。我们在三个基准VQA数据集上评估ABC-CNN体系结构:多伦多COCO-QA,DAQUAR和VQA数据集。 ABC-CNN模型比这些数据集上最先进的方法有了显着的改进。 ABC-CNN提出的问题引导注意力也反映了与问题高度相关的地区。
https://arxiv.org/abs/1511.05960
Almost all of the current top-performing object detection networks employ region proposals to guide the search for object instances. State-of-the-art region proposal methods usually need several thousand proposals to get high recall, thus hurting the detection efficiency. Although the latest Region Proposal Network method gets promising detection accuracy with several hundred proposals, it still struggles in small-size object detection and precise localization (e.g., large IoU thresholds), mainly due to the coarseness of its feature maps. In this paper, we present a deep hierarchical network, namely HyperNet, for handling region proposal generation and object detection jointly. Our HyperNet is primarily based on an elaborately designed Hyper Feature which aggregates hierarchical feature maps first and then compresses them into a uniform space. The Hyper Features well incorporate deep but highly semantic, intermediate but really complementary, and shallow but naturally high-resolution features of the image, thus enabling us to construct HyperNet by sharing them both in generating proposals and detecting objects via an end-to-end joint training strategy. For the deep VGG16 model, our method achieves completely leading recall and state-of-the-art object detection accuracy on PASCAL VOC 2007 and 2012 using only 100 proposals per image. It runs with a speed of 5 fps (including all steps) on a GPU, thus having the potential for real-time processing.
https://arxiv.org/abs/1604.00600
Recurrent neural network architectures combining with attention mechanism, or neural attention model, have shown promising performance recently for the tasks including speech recognition, image caption generation, visual question answering and machine translation. In this paper, neural attention model is applied on two sequence classification tasks, dialogue act detection and key term extraction. In the sequence labeling tasks, the model input is a sequence, and the output is the label of the input sequence. The major difficulty of sequence labeling is that when the input sequence is long, it can include many noisy or irrelevant part. If the information in the whole sequence is treated equally, the noisy or irrelevant part may degrade the classification performance. The attention mechanism is helpful for sequence classification task because it is capable of highlighting important part among the entire sequence for the classification task. The experimental results show that with the attention mechanism, discernible improvements were achieved in the sequence labeling task considered here. The roles of the attention mechanism in the tasks are further analyzed and visualized in this paper.
结合注意机制或神经注意模型的递归神经网络体系结构最近在包括语音识别,图像标题生成,视觉问题解答和机器翻译在内的任务中表现出有希望的性能。本文将神经注意模型应用于两个序列分类任务,对话行为检测和关键词提取。在序列标签任务中,模型输入是一个序列,输出是输入序列的标签。序列标记的主要难点在于,当输入序列较长时,可能包含很多噪声或不相关的部分。如果整个序列中的信息被同等对待,则噪声或不相关部分可能会降低分类性能。注意机制对于序列分类任务是有帮助的,因为它能够突出整个序列中的重要部分进行分类任务。实验结果表明,在注意机制下,在这里考虑的序列标注任务中取得了明显的改进。关注机制在任务中的作用进一步分析和可视化。
https://arxiv.org/abs/1604.00077
We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out of-domain datasets.
我们提出了一个图像标题系统,以解决在野外自动描述图像的新挑战。面临的挑战包括高质量的字幕质量与人类判断,数据处理不当,以及许多应用程序所需的低延迟。建立在最先进的框架之上,我们开发了深度视觉模型,可以检测广泛的视觉概念,识别名人和地标的实体识别模型以及字幕输出的可信模型。实验结果表明,我们的字幕引擎在领域内数据集(即MS COCO)和域外数据集上均优于先前的先进系统。
https://arxiv.org/abs/1603.09016
Distributed Search Engine Architecture (DSEA) hosts numerous independent topic-specific search engines and selects a subset of the databases to search within the architecture. The objective of this approach is to reduce the amount of space needed to perform a search by querying only a subset of the total data available. In order to manipulate data across many databases, it is most efficient to identify a smaller subset of databases that would be most likely to return the data of specific interest that can then be examined in greater detail. The selection index has been most commonly used as a method for choosing the most applicable databases as it captures broad information about each database and its indexed documents. Employing this type of database allows the researcher to find information more quickly, not only with less cost, but it also minimizes the potential for biases. This paper investigates the effectiveness of different databases selected within the framework and scope of the distributed search engine architecture. The purpose of the study is to improve the quality of distributed information retrieval.
https://arxiv.org/abs/1603.09434
Online social media have greatly affected the way in which we communicate with each other. However, little is known about what are the fundamental mechanisms driving dynamical information flow in online social systems. Here, we introduce a generative model for online sharing behavior that is analytically tractable and which can reproduce several characteristics of empirical micro-blogging data on hashtag usage, such as (time-dependent) heavy-tailed distributions of meme popularity. The presented framework constitutes a null model for social spreading phenomena which, in contrast to purely empirical studies or simulation-based models, clearly distinguishes the roles of two distinct factors affecting meme popularity: the memory time of users and the connectivity structure of the social network.
https://arxiv.org/abs/1501.05956
The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning task. However, since CNN features are originally designed for classification task, it is mostly concerned with the main conspicuous element of the image, and often fails to correctly convey information on local, secondary elements. We propose to incorporate coding with vector of locally aggregated descriptors (VLAD) on spatial pyramid for CNN features of sub-regions in order to generate image representations that better reflect the local information of the images. Our results show that our method of compact VLAD coding can match CNN features with as little as 3% of dimensionality and, when combined with spatial pyramid, it results in image captions that more accurately take local elements into account.
使用卷积神经网络(CNN)从图像中提取特征并使用递归神经网络(RNN)生成字幕的工作流已经成为图像字幕任务的事实上的标准。但是,由于CNN特征本来就是为分类任务设计的,因此主要关注图像的主要显着元素,而往往不能正确传达本地,次要元素的信息。为了生成更好地反映图像局部信息的图像表示,我们提出将编码与空间金字塔上的局部聚合描述符(VLAD)矢量合并为子区域的CNN特征。我们的结果表明,我们的紧凑型VLAD编码方法可以将CNN特征与维度的3%进行匹配,并且当与空间金字塔结合时,会导致图像标题更加准确地考虑到局部元素。
https://arxiv.org/abs/1603.09046
Using \textit{multiple streams} can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams. The evaluation work is performed at two levels: the microbenchmarking level and the real-world application level. Our experimental results at the microbenchmark level show that data transfers and kernel execution can be overlapped on Phi, while data transfers in both directions are performed in a serial manner. At the real-world application level, we show that both overlappable and non-overlappable applications can benefit from using multiple streams (with an performance improvement of up to 24\%). We also quantify how task granularity and resource granularity impact the overall performance. Finally, we present a set of heuristics to reduce the search space when determining a proper task granularity and resource granularity. To conclude, our evaluation work provides lots of insights for runtime and architecture designers when using multiple streams on Phi.
https://arxiv.org/abs/1603.08619
The present study elucidates the correlation between the structural and optical properties of GaN nanowires grown on Si(111) substrate by plasma assisted molecular beam epitaxy (PA-MBE) technique under various growth conditions. GaN NWs exhibiting different shapes, sizes and distribution were grown at various substrate temperatures with same Ga-N (III-V) ratio of 0.4. We observe that sample grown at lower substrate temperature (~700 degC) results 2-dimensional island like structure with almost very little (~30%) circularity while increasing substrate temperature (>770 degC) leads to growth of individual GaN NW (>80% circularity) with excellent structural properties. The temperature dependent photoluminescence measurement together with analysis of RAMAN active modes provides legitimate evidences of strong correlation between the structure and optical properties GaN nanowires grown on Si substrates.
https://arxiv.org/abs/1603.08603
Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. We propose a new model that focuses on the discriminating properties of the visible object, jointly predicts a class label, and explains why the predicted label is appropriate for the image. We propose a novel loss function based on sampling and reinforcement learning that learns to generate sentences that realize a global sentence property, such as class specificity. Our results on a fine-grained bird species classification dataset show that our model is able to generate explanations which are not only consistent with an image but also more discriminative than descriptions produced by existing captioning methods.
明确地向最终用户解释分类决策的基本原理与决策本身一样重要。现有的深度视觉识别方法通常是不透明的,不会输出任何理由文本;当代的视觉语言模型可以描述图像内容,但是没有考虑视觉预测合理化的类别识别图像方面。我们提出了一个新的模型,着眼于可见对象的区分属性,联合预测一个类的标签,并解释了为什么预测标签适合图像。我们提出了一种基于抽样和强化学习的新型损失函数,它学习生成实现全局句子属性的句子,例如类特异性。我们在一个细粒度的鸟类分类数据集上的结果表明,我们的模型能够产生不仅与图像一致的解释,而且比现有的字幕方法产生的描述更具有区别性。
https://arxiv.org/abs/1603.08507
Despite the recent advances in automatically describing image contents, their applications have been mostly limited to image caption datasets containing natural images (e.g., Flickr 30k, MSCOCO). In this paper, we present a deep learning model to efficiently detect a disease from an image and annotate its contexts (e.g., location, severity and the affected organs). We employ a publicly available radiology dataset of chest x-rays and their reports, and use its image annotations to mine disease names to train convolutional neural networks (CNNs). In doing so, we adopt various regularization techniques to circumvent the large normal-vs-diseased cases bias. Recurrent neural networks (RNNs) are then trained to describe the contexts of a detected disease, based on the deep CNN features. Moreover, we introduce a novel approach to use the weights of the already trained pair of CNN/RNN on the domain-specific image/text dataset, to infer the joint image/text contexts for composite image labeling. Significantly improved image annotation results are demonstrated using the recurrent neural cascade model by taking the joint image/text contexts into account.
尽管最近在自动描述图像内容方面取得了进展,但其应用大多局限于包含自然图像的图像标题数据集(例如,Flickr 30k,MSCOCO)。在本文中,我们提出了一个深度学习模型,从图像中有效地检测疾病,并注释其背景(例如,位置,严重程度和受影响的器官)。我们采用公开可用的胸部X射线放射学数据集及其报告,并使用其图像注释来挖掘疾病名称以训练卷积神经网络(CNN)。这样做,我们采取各种正规化技术来规避大的正常与病态的偏见。然后训练递归神经网络(RNN),以深度CNN特征为基础描述检测到的疾病的背景。此外,我们引入了一种新颖的方法来使用已经训练的一对CNN / RNN在特定领域的图像/文本数据集上的权重来推断用于复合图像标记的联合图像/文本上下文。通过考虑联合图像/文本背景,使用递归神经级联模型来证明显着改善的图像注释结果。
https://arxiv.org/abs/1603.08486
Popular Hough Transform-based object detection approaches usually construct an appearance codebook by clustering local image features. However, how to choose appropriate values for the parameters used in the clustering step remains an open problem. Moreover, some popular histogram features extracted from overlapping image blocks may cause a high degree of redundancy and multicollinearity. In this paper, we propose a novel Hough Transform-based object detection approach. First, to address the above issues, we exploit a Bridge Partial Least Squares (BPLS) technique to establish context-encoded Hough Regression Models (HRMs), which are linear regression models that cast probabilistic Hough votes to predict object locations. BPLS is an efficient variant of Partial Least Squares (PLS). PLS-based regression techniques (including BPLS) can reduce the redundancy and eliminate the multicollinearity of a feature set. And the appropriate value of the only parameter used in PLS (i.e., the number of latent components) can be determined by using a cross-validation procedure. Second, to efficiently handle object scale changes, we propose a novel multi-scale voting scheme. In this scheme, multiple Hough images corresponding to multiple object scales can be obtained simultaneously. Third, an object in a test image may correspond to multiple true and false positive hypotheses at different scales. Based on the proposed multi-scale voting scheme, a principled strategy is proposed to fuse hypotheses to reduce false positives by evaluating normalized pointwise mutual information between hypotheses. In the experiments, we also compare the proposed HRM approach with its several variants to evaluate the influences of its components on its performance. Experimental results show that the proposed HRM approach has achieved desirable performances on popular benchmark datasets.
https://arxiv.org/abs/1603.08092
Modern deep neural network based object detection methods typically classify candidate proposals using their interior features. However, global and local surrounding contexts that are believed to be valuable for object detection are not fully exploited by existing methods yet. In this work, we take a step towards understanding what is a robust practice to extract and utilize contextual information to facilitate object detection in practice. Specifically, we consider the following two questions: “how to identify useful global contextual information for detecting a certain object?” and “how to exploit local context surrounding a proposal for better inferring its contents?”. We provide preliminary answers to these questions through developing a novel Attention to Context Convolution Neural Network (AC-CNN) based object detection model. AC-CNN effectively incorporates global and local contextual information into the region-based CNN (e.g. Fast RCNN) detection model and provides better object detection performance. It consists of one attention-based global contextualized (AGC) sub-network and one multi-scale local contextualized (MLC) sub-network. To capture global context, the AGC sub-network recurrently generates an attention map for an input image to highlight useful global contextual locations, through multiple stacked Long Short-Term Memory (LSTM) layers. For capturing surrounding local context, the MLC sub-network exploits both the inside and outside contextual information of each specific proposal at multiple scales. The global and local context are then fused together for making the final decision for detection. Extensive experiments on PASCAL VOC 2007 and VOC 2012 well demonstrate the superiority of the proposed AC-CNN over well-established baselines. In particular, AC-CNN outperforms the popular Fast-RCNN by 2.0% and 2.2% on VOC 2007 and VOC 2012 in terms of mAP, respectively.
https://arxiv.org/abs/1603.07415
Samtla (Search And Mining Tools with Linguistic Analysis) is a digital humanities system designed in collaboration with historians and linguists to assist them with their research work in quantifying the content of any textual corpora through approximate phrase search and document comparison. The retrieval engine uses a character-based n-gram language model rather than the conventional word-based one so as to achieve great flexibility in language agnostic query processing. The index is implemented as a space-optimised character-based suffix tree with an accompanying database of document content and metadata. A number of text mining tools are integrated into the system to allow researchers to discover textual patterns, perform comparative analysis, and find out what is currently popular in the research community. Herein we describe the system architecture, user interface, models and algorithms, and data storage of the Samtla system. We also present several case studies of its usage in practice together with an evaluation of the systems’ ranking performance through crowdsourcing.
https://arxiv.org/abs/1603.07150
Building upon recent Deep Neural Network architectures, current approaches lying in the intersection of computer vision and natural language processing have achieved unprecedented breakthroughs in tasks like automatic captioning or image retrieval. Most of these learning methods, though, rely on large training sets of images associated with human annotations that specifically describe the visual content. In this paper we propose to go a step further and explore the more complex cases where textual descriptions are loosely related to the images. We focus on the particular domain of News articles in which the textual content often expresses connotative and ambiguous relations that are only suggested but not directly inferred from images. We introduce new deep learning methods that address source detection, popularity prediction, article illustration and geolocation of articles. An adaptive CNN architecture is proposed, that shares most of the structure for all the tasks, and is suitable for multitask and transfer learning. Deep Canonical Correlation Analysis is deployed for article illustration, and a new loss function based on Great Circle Distance is proposed for geolocation. Furthermore, we present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (such as GPS coordinates and popularity metrics). We show this dataset to be appropriate to explore all aforementioned problems, for which we provide a baseline performance using various Deep Learning architectures, and different representations of the textual and visual features. We report very promising results and bring to light several limitations of current state-of-the-art in this kind of domain, which we hope will help spur progress in the field.
基于最近的深度神经网络体系结构,当前处于计算机视觉和自然语言处理交叉点的方法在自动字幕或图像检索等任务中取得了前所未有的突破。然而,大多数这些学习方法依赖于与人类注释相关联的大量训练集,这些人类注释专门描述了视觉内容。在本文中,我们建议进一步探索更复杂的情况,其中文本描述与图像松散相关。我们专注于新闻文章的特定领域,其中文本内容往往表达内涵和模糊的关系,只是建议,而不是从图像直接推断。我们引入新的深度学习方法,解决源检测,流行预测,文章插图和文章的地理位置。提出了一种自适应的CNN架构,它共享所有任务的大部分结构,适用于多任务和传输学习。文章阐述了深度典型相关分析,提出了基于大圆距离的新损失函数进行地理定位。此外,我们还提供BreakingNews,这是一个新颖的数据集,大约包括图像,文本和标题在内的大约10万篇新闻文章,并丰富了异构元数据(如GPS坐标和流行度量度)。我们展示这个数据集适合探索所有上述问题,为此我们提供了使用各种深度学习架构的基线性能,以及不同的文本和视觉特征表示。我们报告了非常有希望的成果,并揭示了这一领域当前最新技术的几个局限性,我们希望这将有助于刺激该领域的进展。
https://arxiv.org/abs/1603.07141
Object detection in optical remote sensing images, being a fundamental but challenging problem in the field of aerial and satellite image analysis, plays an important role for a wide range of applications and is receiving significant attention in recent years. While enormous methods exist, a deep review of the literature concerning generic object detection is still lacking. This paper aims to provide a review of the recent progress in this field. Different from several previously published surveys that focus on a specific object class such as building and road, we concentrate on more generic object categories including, but are not limited to, road, building, tree, vehicle, ship, airport, urban-area. Covering about 270 publications we survey 1) template matching-based object detection methods, 2) knowledge-based object detection methods, 3) object-based image analysis (OBIA)-based object detection methods, 4) machine learning-based object detection methods, and 5) five publicly available datasets and three standard evaluation metrics. We also discuss the challenges of current studies and propose two promising research directions, namely deep learning-based feature representation and weakly supervised learning-based geospatial object detection. It is our hope that this survey will be beneficial for the researchers to have better understanding of this research field.
https://arxiv.org/abs/1603.06201
This paper introduces an active object detection and localization framework that combines a robust untextured object detection and 3D pose estimation algorithm with a novel next-best-view selection strategy. We address the detection and localization problems by proposing an edge-based registration algorithm that refines the object position by minimizing a cost directly extracted from a 3D image tensor that encodes the minimum distance to an edge point in a joint direction/location space. We face the next-best-view problem by exploiting a sequential decision process that, for each step, selects the next camera position which maximizes the mutual information between the state and the next observations. We solve the intrinsic intractability of this solution by generating observations that represent scene realizations, i.e. combination samples of object hypothesis provided by the object detector, while modeling the state by means of a set of constantly resampled particles. Experiments performed on different real world, challenging datasets confirm the effectiveness of the proposed methods.
https://arxiv.org/abs/1603.07022
By incorporating a multilayer network and time-decaying memory into the original voter model, the coupled effects of spatial and temporal cumulation of peer pressure on consensus are investigated. Heterogeneity in peer pressure and time-decaying mechanism are both found to be detrimental to consensus. The transition points, below which a consensus can always be reached and above which two opposed opinions are more likely to coexist, are found. A mean-field analysis indicates that the phase transitions in the present model are governed by the cumulative influence of peer pressure and the updating threshold. A functional relation between the consensus threshold and the decaying rate of the influence of peer pressure is found. As to the time to reach a consensus, it is governed by the coupling of the memory length and the decaying rate. An intermediate decaying rate may lead to much lower time to reach a consensus.
https://arxiv.org/abs/1603.06650
Many effective supervised discriminative dictionary learning methods have been developed in the literature. However, when training these algorithms, precise ground-truth of the training data is required to provide very accurate point-wise labels. Yet, in many applications, accurate labels are not always feasible. This is especially true in the case of buried object detection in which the size of the objects are not consistent. In this paper, a new multiple instance dictionary learning algorithm for detecting buried objects using a handheld WEMI sensor is detailed. The new algorithm, Task Driven Extended Functions of Multiple Instances, can overcome data that does not have very precise point-wise labels and still learn a highly discriminative dictionary. Results are presented and discussed on measured WEMI data.
https://arxiv.org/abs/1603.06121
We address the problem of Visual Question Answering (VQA), which requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deep image captioning methods based on convolutional-recurrent networks to this problem, but have failed to model spatial inference. To remedy this, we propose a model we call the Spatial Memory Network and apply it to the VQA task. Memory networks are recurrent neural networks with an explicit attention mechanism that selects certain parts of the information stored in memory. Our Spatial Memory Network stores neuron activations from different spatial regions of the image in its memory, and uses the question to choose relevant regions for computing the answer, a process of which constitutes a single “hop” in the network. We propose a novel spatial attention architecture that aligns words with image patches in the first hop, and obtain improved results by adding a second attention hop which considers the whole question to choose visual evidence based on the results of the first hop. To better understand the inference process learned by the network, we design synthetic questions that specifically require spatial inference and visualize the attention weights. We evaluate our model on two published visual question answering datasets, DAQUAR [1] and VQA [2], and obtain improved results compared to a strong deep baseline model (iBOWIMG) which concatenates image and question features to predict the answer [3].
我们解决视觉问答(VQA)的问题,这需要联合图像和语言理解来回答关于给定照片的问题。最近的方法已经应用了基于卷积循环网络的深度图像字幕方法来解决这个问题,但是没有对空间推理进行建模。为了弥补这一点,我们提出了一个模型,我们称之为空间记忆网络,并将其应用于VQA任务。记忆网络是具有明确注意机制的循环神经网络,用于选择存储在存储器中的信息的某些部分。我们的空间记忆网络在其记忆中存储来自图像的不同空间区域的神经元激活,并且使用该问题来选择相关区域来计算答案,其过程在网络中构成单个“跳跃”。我们提出了一种新颖的空间注意结构,它将第一跳中的图像块与单词对齐,并且通过添加第二注意跳来获得改善的结果,该第二注意跳依据第一跳的结果考虑整个问题来选择视觉证据。为了更好地理解网络学习的推理过程,我们设计了专门需要空间推理和可视化注意权重的综合问题。我们在两个已发表的视觉问题解答数据集DAQUAR [1]和VQA [2]上评估我们的模型,并且获得改善的结果,与连接图像和问题特征以预测答案的强基础深度模型(iBOWIMG)相比[3]。
https://arxiv.org/abs/1511.05234
In this paper, we present a novel and efficient architecture for addressing computer vision problems that use `Analysis by Synthesis’. Analysis by synthesis involves the minimization of the reconstruction error which is typically a non-convex function of the latent target variables. State-of-the-art methods adopt a hybrid scheme where discriminatively trained predictors like Random Forests or Convolutional Neural Networks are used to initialize local search algorithms. While these methods have been shown to produce promising results, they often get stuck in local optima. Our method goes beyond the conventional hybrid architecture by not only proposing multiple accurate initial solutions but by also defining a navigational structure over the solution space that can be used for extremely efficient gradient-free local search. We demonstrate the efficacy of our approach on the challenging problem of RGB Camera Relocalization. To make the RGB camera relocalization problem particularly challenging, we introduce a new dataset of 3D environments which are significantly larger than those found in other publicly-available datasets. Our experiments reveal that the proposed method is able to achieve state-of-the-art camera relocalization results. We also demonstrate the generalizability of our approach on Hand Pose Estimation and Image Retrieval tasks.
https://arxiv.org/abs/1603.05772
This paper highlights distinctive features of the “SP theory of intelligence” and its apparent advantages compared with some AI-related alternatives. Distinctive features and advantages are: simplification and integration of observations and concepts; simplification and integration of structures and processes in computing systems; the theory is itself a theory of computing; it can be the basis for new architectures for computers; information compression via the matching and unification of patterns and, more specifically, via multiple alignment, is fundamental; transparency in the representation and processing of knowledge; the discovery of ‘natural’ structures via information compression (DONSVIC); interpretations of mathematics; interpretations in human perception and cognition; and realisation of abstract concepts in terms of neurons and their inter-connections (“SP-neural”). These things relate to AI-related alternatives: minimum length encoding and related concepts; deep learning in neural networks; unified theories of cognition and related research; universal search; Bayesian networks and more; pattern recognition and vision; the analysis, production, and translation of natural language; Unsupervised learning of natural language; exact and inexact forms of reasoning; representation and processing of diverse forms of knowledge; IBM’s Watson; software engineering; solving problems associated with big data, and in the development of intelligence in autonomous robots. In conclusion, the SP system can provide a firm foundation for the long-term development of AI, with many potential benefits and applications. It may also deliver useful results on relatively short timescales. A high-parallel, open-source version of the SP machine, derived from the SP computer model, would be a means for researchers everywhere to explore what can be done with the system, and to create new versions of it.
https://arxiv.org/abs/1508.04087
We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network. Different from previous low-level edge detection, our algorithm focuses on detecting higher-level object contours. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. We find that the learned model generalizes well to unseen object classes from the same super-categories on MS COCO and can match state-of-the-art edge detection on BSDS500 with fine-tuning. By combining with the multiscale combinatorial grouping algorithm, our method can generate high-quality segmented object proposals, which significantly advance the state-of-the-art on PASCAL VOC (improving average recall from 0.62 to 0.67) with a relatively small amount of candidates ($\sim$1660 per image).
https://arxiv.org/abs/1603.04530
Sequential activation of neurons is a common feature of network activity during a variety of behaviors, including working memory and decision making. Previous network models for sequences and memory emphasized specialized architectures in which a principled mechanism is pre-wired into their connectivity. Here we demonstrate that, starting from random connectivity and modifying a small fraction of connections, a largely disordered recur- rent network can produce sequences and implement working memory efficiently. We use this process, called Partial In-Network Training (PINning), to model and match cellular resolution imaging data from the posterior parietal cortex during a virtual memory- guided two-alternative forced-choice task. Analysis of the connectivity reveals that sequences propagate by the cooperation between recurrent synaptic interactions and external inputs, rather than through feedforward or asymmetric connections. Together our results suggest that neural sequences may emerge through learning from largely unstructured network architectures.
https://arxiv.org/abs/1603.04687
In this paper, we propose a multi-object detection and tracking method using depth cameras. Depth maps are very noisy and obscure in object detection. We first propose a region-based method to suppress high magnitude noise which cannot be filtered using spatial filters. Second, the proposed method detect Region of Interests by temporal learning which are then tracked using weighted graph-based approach. We demonstrate the performance of the proposed method on standard depth camera datasets with and without object occlusions. Experimental results show that the proposed method is able to suppress high magnitude noise in depth maps and detect/track the objects (with and without occlusion).
https://arxiv.org/abs/1603.03783
Similarity-preserving hashing is a commonly used method for nearest neighbour search in large-scale image retrieval. For image retrieval, deep-networks-based hashing methods are appealing since they can simultaneously learn effective image representations and compact hash codes. This paper focuses on deep-networks-based hashing for multi-label images, each of which may contain objects of multiple categories. In most existing hashing methods, each image is represented by one piece of hash code, which is referred to as semantic hashing. This setting may be suboptimal for multi-label image retrieval. To solve this problem, we propose a deep architecture that learns \textbf{instance-aware} image representations for multi-label image data, which are organized in multiple groups, with each group containing the features for one category. The instance-aware representations not only bring advantages to semantic hashing, but also can be used in category-aware hashing, in which an image is represented by multiple pieces of hash codes and each piece of code corresponds to a category. Extensive evaluations conducted on several benchmark datasets demonstrate that, for both semantic hashing and category-aware hashing, the proposed method shows substantial improvement over the state-of-the-art supervised and unsupervised hashing methods.
https://arxiv.org/abs/1603.03234
We focus on the task of amodal 3D object detection in RGB-D images, which aims to produce a 3D bounding box of an object in metric form at its full extent. We introduce Deep Sliding Shapes, a 3D ConvNet formulation that takes a 3D volumetric scene from a RGB-D image as input and outputs 3D object bounding boxes. In our approach, we propose the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D. In particular, we handle objects of various sizes by training an amodal RPN at two different scales and an ORN to regress 3D bounding boxes. Experiments show that our algorithm outperforms the state-of-the-art by 13.8 in mAP and is 200x faster than the original Sliding Shapes. All source code and pre-trained models will be available at GitHub.
https://arxiv.org/abs/1511.02300
We compare the temperature dependence of optical and electrical characteristics of commercially available GaN light-emitting diodes (LEDs) grown on silicon and sapphire substrates. Contrary to conventional expectations, LEDs grown on silicon substrates, commonly referred to as GaN-on-Si LEDs, show less efficiency droop at higher temperatures even with more threading dislocations. Analysis of the junction temperature reveals that GaN-on-Si LEDs have a cooler junction despite sharing identical epitaxial structures and packaging compared to LEDs grown on sapphire substrates. We also observe a decrease in ideality factor with increase in ambient temperature for GaN-on-Si LEDs, indicating an increase in ideal diode current with temperature. Analysis of the strain and temperature coefficient measurements suggests that there is an increase in hole transport efficiency within the active region for GaN-on-Si LEDs compared to the LEDs grown on sapphire, which accounts for the less temperature-dependent efficiency droop.
https://arxiv.org/abs/1603.02338
ZeroDB is an end-to-end encrypted database that enables clients to operate on (search, sort, query, and share) encrypted data without exposing encryption keys or cleartext data to the database server. The familiar client-server architecture is unchanged, but query logic and encryption keys are pushed client-side. Since the server has no insight into the nature of the data, the risk of data being exposed via a server-side data breach is eliminated. Even if the server is successfully infiltrated, adversaries would not have access to the cleartext data and cannot derive anything useful out of disk or RAM snapshots. ZeroDB provides end-to-end encryption while maintaining much of the functionality expected of a modern database, such as full-text search, sort, and range queries. Additionally, ZeroDB uses proxy re-encryption and/or delta key technology to enable secure, granular sharing of encrypted data without exposing keys to the server and without sharing the same encryption key between users of the database.
https://arxiv.org/abs/1602.07168
The threshold current density of narrow (1.5 {\mu}m) ridge-waveguide InGaN multi-quantum-well laser diodes, as well as the shape of their lateral far-field patterns, strongly depend on the etch depth of the ridge waveguide. Both effects can be attributed to strong index-antiguiding. A value of the antiguiding factor R = 10 is experimentally determined near threshold by measurements of the current-dependent gain and refractive index spectra. The device performances are simulated self-consistently solving the Schrödinger-Poisson equations and the equations for charge transport and waveguiding. Assuming a carrier-induced index change which matches the experimentally determined antiguiding factor, both the measured high threshold current and the shape of the far-field pattern of lasers with shallow ridges can be reproduced theoretically.
https://arxiv.org/abs/1603.02528
Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence, in this paper, we propose an approach for detecting fraudulent reviews which combines these 2 approaches in a principled manner, allowing successful detection even when one of these signs is not present. To combine these 2 approaches, we formulate our Bayesian Inference for Rating Data (BIRD) model, a flexible Bayesian model of user rating behavior. Based on our model we formulate a likelihood-based suspiciousness metric, Normalized Expected Surprise Total (NEST). We propose a linear-time algorithm for performing Bayesian inference using our model and computing the metric. Experiments on real data show that BIRDNEST successfully spots review fraud in large, real-world graphs: the 50 most suspicious users of the Flipkart platform flagged by our algorithm were investigated and all identified as fraudulent by domain experts at Flipkart.
http://arxiv.org/abs/1511.06030
We present TTC, an open-source parallel compiler for multidimensional tensor transpositions. In order to generate high-performance C++ code, TTC explores a number of optimizations, including software prefetching, blocking, loop-reordering, and explicit vectorization. To evaluate the performance of multidimensional transpositions across a range of possible use-cases, we also release a benchmark covering arbitrary transpositions of up to six dimensions. Performance results show that the routines generated by TTC achieve close to peak memory bandwidth on both the Intel Haswell and the AMD Steamroller architectures, and yield significant performance gains over modern compilers. By implementing a set of pruning heuristics, TTC allows users to limit the number of potential solutions; this option is especially useful when dealing with high-dimensional tensors, as the search space might become prohibitively large. Experiments indicate that when only 100 potential solutions are considered, the resulting performance is about 99% of that achieved with exhaustive search.
https://arxiv.org/abs/1603.02297
Salient object detection has recently witnessed substantial progress due to powerful features extracted using deep convolutional neural networks (CNNs). However, existing CNN-based methods operate at the patch level instead of the pixel level. Resulting saliency maps are typically blurry, especially near the boundary of salient objects. Furthermore, image patches are treated as independent samples even when they are overlapping, giving rise to significant redundancy in computation and storage. In this CVPR 2016 paper, we propose an end-to-end deep contrast network to overcome the aforementioned limitations. Our deep network consists of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream. The first stream directly produces a saliency map with pixel-level accuracy from an input image. The second stream extracts segment-wise features very efficiently, and better models saliency discontinuities along object boundaries. Finally, a fully connected CRF model can be optionally incorporated to improve spatial coherence and contour localization in the fused result from these two streams. Experimental results demonstrate that our deep model significantly improves the state of the art.
https://arxiv.org/abs/1603.01976
Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebook’s bAbI dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.
https://arxiv.org/abs/1506.07285
The realization of a network of quantum registers is an outstanding challenge in quantum science and technology. We experimentally investigate a network node that consists of a single nitrogen-vacancy (NV) center electronic spin hyperfine-coupled to nearby nuclear spins. We demonstrate individual control and readout of five nuclear spin qubits within one node. We then characterize the storage of quantum superpositions in individual nuclear spins under repeated application of a probabilistic optical inter-node entangling protocol. We find that the storage fidelity is limited by dephasing during the electronic spin reset after failed attempts. By encoding quantum states into a decoherence-protected subspace of two nuclear spins we show that quantum coherence can be maintained for over 1000 repetitions of the remote entangling protocol. These results and insights pave the way towards remote entanglement purification and the realisation of a quantum repeater using NV center quantum network nodes.
https://arxiv.org/abs/1603.01602
Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.
https://arxiv.org/abs/1603.01417
Epitaxy of semiconductors is a process of tremendous importance in applied science and optoelectronic industry. Controlling of defects introduced during epitaxial growth is a key point in manufacturing devices of high efficiency and durability. In this work, we demonstrate how useful hybrid reflections are on the study of epitaxial structures with anisotropic strain gradients due to patterned substrates. High accuracy to detect and distinguish elastic and plastic relaxations are one of the greatest advantages of measuring this type of reflection, as well as the fact that it can be exploited in symmetrical reflection geometry on a commercial high-resolution diffractometer.
https://arxiv.org/abs/1603.00793
First-principles calculations are made for the primary pyroelectric coefficients of wurtzite GaN and ZnO. The pyroelectricity is attributed to the quasiharmonic thermal shifts of internal strains (internal displacements of cations and anions carrying their Born effective charges). The primary (zero-external-strain) pyroelectricity dominates at low temperatures, while the secondary pyroelectricity (the correction from external thermal strains) becomes comparable with the primary pyroelectricity at high temperatures. Contributions from the acoustic and the optical phonon modes to the primary pyroelectric coefficient are only moderately well described by the corresponding Debye function and Einstein function respectively.
https://arxiv.org/abs/1603.00657
Although recent advances in regional Convolutional Neural Networks (CNNs) enable them to outperform conventional techniques on standard object detection and classification tasks, their response time is still slow for real-time performance. To address this issue, we propose a method for region proposal as an alternative to selective search, which is used in current state-of-the art object detection algorithms. We evaluate our Keypoint Density-based Region Proposal (KDRP) approach and show that it speeds up detection and classification on fine-grained tasks by 100% versus the existing selective search region proposal technique without compromising classification accuracy. KDRP makes the application of CNNs to real-time detection and classification feasible.
https://arxiv.org/abs/1603.00502