Query Understanding concerns about inferring the precise intent of search by the user with his formulated query, which is challenging because the queries are often very short and ambiguous. The report discusses the various kind of queries that can be put to a Search Engine and illustrates the Role of Query Understanding for return of relevant results. With different advances in techniques for deep understanding of queries as well as documents, the Search Technology has witnessed three major era. A lot of interesting real world examples have been used to illustrate the role of Query Understanding in each of them. The Query Understanding Module is responsible to correct the mistakes done by user in the query, to guide him in formulation of query with precise intent, and to precisely infer the intent of the user query. The report describes the complete architecture to handle aforementioned three tasks, and then discusses basic as well as recent advanced techniques for each of the component, through appropriate papers from reputed conferences and journals.
https://arxiv.org/abs/1505.05187
Owing to the large breakdown electric field, wide bandgap semiconductors such as SiC, GaN, Ga2O3 and diamond based power devices are the focus for next generation power switching applications. The unipolar trade-off relationship between the area specific-on resistance and breakdown voltage is often employed to compare the performance limitation among various materials. The GaN material system has a unique advantage due to its prominent spontaneous and piezoelectric polarization effects in GaN, AlN, InN, AlxInyGaN alloys and flexibility in inserting appropriate heterojunctions thus dramatically broaden the device design space.
https://arxiv.org/abs/1505.04651
We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the “consensus” of the set of candidate captions gathered from the nearest neighbor images. When measured by automatic evaluation metrics on the MS COCO caption evaluation server, these approaches perform as well as many recent approaches that generate novel captions. However, human studies show that a method that generates novel captions is still preferred over the nearest neighbor approach.
我们探索图像字幕的各种最近邻基线方法。这些方法在训练集中找到一组最近的邻居图像,从中可以为查询图像借用标题。我们通过找到最能代表从最近的邻居图像中收集到的一组候选字幕的“共识”的字幕来选择查询图像的标题。当通过MS COCO字幕评估服务器上的自动评估指标进行衡量时,这些方法以及许多最近产生新颖字幕的方法都可以执行。然而,人类研究表明,产生新颖的字幕的方法仍然优于最近邻的方法。
https://arxiv.org/abs/1505.04467
Neuromorphic computing is a brainlike information processing paradigm that requires adaptive learning mechanisms. A spiking neuro-evolutionary system is used for this purpose; plastic resistive memories are implemented as synapses in spiking neural networks. The evolutionary design process exploits parameter self-adaptation and allows the topology and synaptic weights to be evolved for each network in an autonomous manner. Variable resistive memories are the focus of this research; each synapse has its own conductance profile which modifies the plastic behaviour of the device and may be altered during evolution. These variable resistive networks are evaluated on a noisy robotic dynamic-reward scenario against two static resistive memories and a system containing standard connections only. Results indicate that the extra behavioural degrees of freedom available to the networks incorporating variable resistive memories enable them to outperform the comparative synapse types.
https://arxiv.org/abs/1505.04357
In image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of categories. In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers. During HD-CNN training, component-wise pretraining is followed by global finetuning with a multinomial logistic loss regularized by a coarse category consistency term. In addition, conditional executions of fine category classifiers and layer parameter compression make HD-CNNs scalable for large-scale visual recognition. We achieve state-of-the-art results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In our experiments, we build up three different HD-CNNs and they lower the top-1 error of the standard CNNs by 2.65%, 3.1% and 1.1%, respectively.
在图像分类中,不同对象类别之间的视觉可分性非常不均衡,有些类别比其他类别更难以区分。这些困难的类别需要更多的专用分类器。然而,现有的深度卷积神经网络(CNN)被训练为平坦的N路分类器,并且很少有人利用类别的层次结构。在本文中,我们通过将深度CNN嵌入到类别层次结构中来引入分层深度CNN(HD-CNN)。 HD-CNN使用粗分类分类器分离易分类,同时使用精分类分类器区分难分类。在HD-CNN培训期间,组件式预训练之后是全局微调,其中多项式逻辑损失由粗类别一致性项调整。此外,精细类别分类器和图层参数压缩的条件执行使得HD-CNN可以进行大规模视觉识别。我们在CIFAR100和大型ImageNet 1000级基准数据集上实现了最新的结果。在我们的实验中,我们建立了三种不同的HD-CNN,他们分别将标准CNN的前1误差分别降低了2.65%,3.1%和1.1%。
http://arxiv.org/abs/1410.0736
Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous research on constructing deep recurrent neural networks (RNNs), alternative deep LSTM architectures are proposed and empirically evaluated on a large vocabulary conversational telephone speech recognition task. Meanwhile, regarding to multi-GPU devices, the training process for LSTM networks is introduced and discussed. Experimental results demonstrate that the deep LSTM networks benefit from the depth and yield the state-of-the-art performance on this task.
https://arxiv.org/abs/1410.4281
LeoPARD supports the implementation of knowledge representation and reasoning tools for higher-order logic(s). It combines a sophisticated data structure layer (polymorphically typed {\lambda}-calculus with nameless spine notation, explicit substitutions, and perfect term sharing) with an ambitious multi-agent blackboard architecture (supporting prover parallelism at the term, clause, and search level). Further features of LeoPARD include a parser for all TPTP dialects, a command line interpreter, and generic means for the integration of external reasoners.
https://arxiv.org/abs/1505.01629
In this paper, we propose several novel deep learning methods for object saliency detection based on the powerful convolutional neural networks. In our approach, we use a gradient descent method to iteratively modify an input image based on the pixel-wise gradients to reduce a cost function measuring the class-specific objectness of the image. The pixel-wise gradients can be efficiently computed using the back-propagation algorithm. The discrepancy between the modified image and the original one may be used as a saliency map for the image. Moreover, we have further proposed several new training methods to learn saliency-specific convolutional nets for object saliency detection, in order to leverage the available pixel-wise segmentation information. Our methods are extremely computationally efficient (processing 20-40 images per second in one GPU). In this work, we use the computed saliency maps for image segmentation. Experimental results on two benchmark tasks, namely Microsoft COCO and Pascal VOC 2012, have shown that our proposed methods can generate high-quality salience maps, clearly outperforming many existing methods. In particular, our approaches excel in handling many difficult images, which contain complex background, highly-variable salient objects, multiple objects, and/or very small salient objects.
https://arxiv.org/abs/1505.01173
Many events in the vertebrate immune system are influenced by some element of chance. The objective of the present work is to describe affinity maturation of B lymphocytes (in which random events are perhaps the most characteristic), and to study a possible network model of immune memory. In our model stochastic processes govern all events. A major novelty of this approach is that it permits studying random variations in the immune process. Four basic components are simulated in the model: non-immune self cells, nonself cells (pathogens), B lymphocytes, and bone marrow cells that produce naive B lymphocytes. A point in a generalized shape space plus the size of the corresponding population represents nonself and non-immune self cells. On the other hand, each individual B cell is represented by a disc that models its recognition region in the shape space. Infection is simulated by an “injection” of nonself cells into the system. Division of pathogens may instigate an attack of naive B cells, which in turn may induce clonal proliferation and hypermutation in the attacking B cells, and which eventually may slow down and stop the exponential growth of pathogens. Affinity maturation of newly produced B cells becomes expressed as a result of selection when the number of pathogens decreases. Under favorable conditions, the expanded primary B cell clones may stimulate the expansion of secondary B cell clones carrying complementary receptors to the stimulating B cells. Like in a hall of mirrors, the image of pathogens in the primary B cell clones then will be reflected in secondary B cell clones. This “ping-pong” game may survive for a long time even in the absence of the pathogen, creating a local network memory. This memory ensures that repeated infection by the same pathogen will be eliminated more efficiently.
https://arxiv.org/abs/1505.00660
This paper presents a novel multi scale gradient and a corner point based shape descriptors. The novel multi scale gradient based shape descriptor is combined with generic Fourier descriptors to extract contour and region based shape information. Shape information based object class detection and classification technique with a random forest classifier has been optimized. Proposed integrated descriptor in this paper is robust to rotation, scale, translation, affine deformations, noisy contours and noisy shapes. The new corner point based interpolated shape descriptor has been exploited for fast object detection and classification with higher accuracy.
https://arxiv.org/abs/1505.00432
Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words. By transferring knowledge from 1.2M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies. We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation.
解决视觉符号接地问题一直是人工智能的一个目标。随着静态图像中自然语言接地的深度学习的近期突破,该领域似乎正朝着这个目标前进。在本文中,我们建议使用具有卷积和循环结构的统一深度神经网络将视频直接翻译成句子。所描述的视频数据集是稀缺的,大多数现有的方法已经被应用到可能的单词词汇量小的玩具领域。通过将带有类别标签的1.2M +图像和带有字幕的100,000+图像的知识转移,我们的方法能够创建具有大词汇表的开放域视频的句子描述。我们使用语言生成指标,主题,动词和对象预测准确度以及人为评估来比较我们的方法与最近的工作。
https://arxiv.org/abs/1412.4729
To date, work on formalizing connectionist computation in a way that is at least Turing-complete has focused on recurrent architectures and developed equivalences to Turing machines or similar super-Turing models, which are of more theoretical than practical significance. We instead develop connectionist computation within the framework of information propagation networks extended with unbounded recursion, which is related to constraint logic programming and is more declarative than the semantics typically used in practical programming, but is still formally known to be Turing-complete. This approach yields contributions to the theory and practice of both connectionist computation and programming languages. Connectionist computations are carried out in a way that lets them communicate with, and be understood and interrogated directly in terms of the high-level semantics of a general-purpose programming language. Meanwhile, difficult (unbounded-dimension, NP-hard) search problems in programming that have previously been left to the programmer to solve in a heuristic, domain-specific way are solved uniformly a priori in a way that approximately achieves information-theoretic limits on performance.
https://arxiv.org/abs/1505.00002
This technical report provides extra details of the deep multimodal similarity model (DMSM) which was proposed in (Fang et al. 2015, arXiv:1411.4952). The model is trained via maximizing global semantic similarity between images and their captions in natural language using the public Microsoft COCO database, which consists of a large set of images and their corresponding captions. The learned representations attempt to capture the combination of various visual concepts and cues.
这份技术报告提供了在(Fang et al。2015,arXiv:1411.4952)中提出的深度多模式相似度模型(DMSM)的额外细节。该模型通过使用公共的Microsoft COCO数据库(其由大量图像及其相应字幕组成)来最大化自然语言中的图像及其标题的全局语义相似性来训练。学习的表示试图捕捉各种视觉概念和线索的组合。
https://arxiv.org/abs/1504.03083
Building on the success of recent discriminative mid-level elements, we propose a surprisingly simple approach for object detection which performs comparable to the current state-of-the-art approaches on PASCAL VOC comp-3 detection challenge (no external data). Through extensive experiments and ablation analysis, we show how our approach effectively improves upon the HOG-based pipelines by adding an intermediate mid-level representation for the task of object detection. This representation is easily interpretable and allows us to visualize what our object detector “sees”. We also discuss the insights our approach shares with CNN-based methods, such as sharing representation between categories helps.
https://arxiv.org/abs/1504.07284
For money-like informational commodities the notions of architectural adequacy and evolutionary adequacy are proposed as the first two stages of a moneyness maturity hierarchy. Then three classes of informational commodities are distinguished: exclusively informational commodities, strictly informational commodities, and ownable informational commodities. For each class money-like instances of that commodity class, as well as monies of that class may exist. With the help of these classifications and making use of previous assessments of Bitcoin, it is argued that at this stage Bitcoin is unlikely ever to evolve into a money. Assessing the evolutionary adequacy of Bitcoin is perceived in terms of a search through its design hull for superior design alternatives. An extensive comparison is made between the search for superior design alternatives to Bitcoin and the search for design alternatives to a specific and unconventional view on the definition of fractions.
https://arxiv.org/abs/1504.07184
We present ab initio calculations on the effect of in-plane equi-biaxial strain on the structural and electronic properties of hypothetical graphene-like GaN monolayer (ML-GaN). It was found that ML-GaN got buckled for compressive strain in excess of 7.281 %; buckling parameter increased quadratic-ally with compressive strain. The 2D bulk modulus of ML-GaN was found to be smaller than that of graphene and graphene-like ML-BN, which reflects weaker bond in ML-GaN. More importantly, the band gap and effective masses of charge carriers in ML-GaN were found to be tunable by application of in-plane equi-biaxial strain. In particular, when compressive biaxial strain of about 3 % was reached, a transition from indirect to direct band gap-phase occurred with significant change in the value and nature of effective masses of charge carriers; buckling and tensile strain reduced the band gap - the band gap reduced to 50 % of its unstrained value at 6.36 % tensile strain and to 0 eV at an extrapolated tensile strain of 12.72 %, which is well within its predicted ultimate tensile strain limit of 16 %. These predictions of strain-engineered electronic properties of highly strain sensitive ML-GaN may be exploited in future for potential applications in strain sensors and other nano-devices such as the nano-electromechanical systems (NEMS).
https://arxiv.org/abs/1504.04672
In this paper we present a novel architecture for storing visual data. Effective storing, browsing and searching collections of images is one of the most important challenges of computer science. The design of architecture for storing such data requires a set of tools and frameworks such as SQL database management systems and service-oriented frameworks. The proposed solution is based on a multi-layer architecture, which allows to replace any component without recompilation of other components. The approach contains five components, i.e. Model, Base Engine, Concrete Engine, CBIR service and Presentation. They were based on two well-known design patterns: Dependency Injection and Inverse of Control. For experimental purposes we implemented the SURF local interest point detector as a feature extractor and $K$-means clustering as indexer. The presented architecture is intended for content-based retrieval systems simulation purposes as well as for real-world CBIR tasks.
https://arxiv.org/abs/1504.06867
Intuitively, the appearance of true object boundaries varies from image to image. Hence the usual monolithic approach of training a single boundary predictor and applying it to all images regardless of their content is bound to be suboptimal. In this paper we therefore propose situational object boundary detection: We first define a variety of situations and train a specialized object boundary detector for each of them using [Dollar and Zitnick 2013]. Then given a test image, we classify it into these situations using its context, which we model by global image appearance. We apply the corresponding situational object boundary detectors, and fuse them based on the classification probabilities. In experiments on ImageNet, Microsoft COCO, and Pascal VOC 2012 segmentation we show that our situational object boundary detection gives significant improvements over a monolithic approach. Additionally, our method substantially outperforms [Hariharan et al. 2011] on semantic contour detection on their SBD dataset.
https://arxiv.org/abs/1504.06434
The search for life via characterization of earth-like planets in the habitable zone is one of the key scientific objectives in Astronomy. We describe a new phase-occulting (PO) interferometric nulling coronagraphy (NC) approach. The PO-NC approach employs beamwalk and freeform optical surfaces internal to the interferometer cavity to introduce a radially dependent plate scale difference between each interferometer arm (optical path) that nulls the central star at high contrast while transmitting the off-axis field. The design is readily implemented on segmented-mirror telescope architectures, utilizing a single nulling interferometer to achieve high throughput, a small inner working angle (IWA), sixth-order or higher starlight suppression, and full off-axis discovery space, a combination of features that other coronagraph designs generally must trade. Unlike previous NC approaches, the PO-NC approach does not require pupil shearing; this increases throughput and renders it less sensitive to on-axis common-mode telescope errors, permitting relief of the observatory stability required to achieve contrast levels of $\leq10^{-10}$. Observatory operations are also simplified by removing the need for multiple telescope rolls and shears to construct a high contrast image. The design goals for a PO nuller are similar to other coronagraphs intended for direct detection of habitable zone (HZ) exoEarth signal: contrasts on the order of $10^{-10}$ at an IWA of $\leq3\lambda/D$ over $\geq10$% bandpass with a large ($>10$~m) segmented aperture space-telescope operating in visible and near infrared bands. This work presents an introduction to the PO nulling coronagraphy approach based on its Visible Nulling Coronagraph (VNC) heritage and relation to the radial shearing interferometer.
https://arxiv.org/abs/1504.05747
This paper assesses nonpolar m- and a-plane GaN/Al(Ga)N multi-quantum-wells grown on bulk GaN for intersubband optoelectronics in the short- and mid-wavelength infrared ranges. The characterization results are compared to those for reference samples grown on the polar c-plane, and are verified by self-consistent Schrödinger-Poisson calculations. The best results in terms of mosaicity, surface roughness, photoluminescence linewidth and intensity, as well as intersubband absorption are obtained from m-plane structures, which display room-temperature intersubband absorption in the range from 1.5 to 2.9 um. Based on these results, a series of m-plane GaN/AlGaN multi-quantum-wells were designed to determine the accessible spectral range in the mid-infrared. These samples exhibit tunable room-temperature intersubband absorption from 4.0 to 5.8 um, the long-wavelength limit being set by the absorption associated with the second order of the Reststrahlen band in the GaN substrates.
https://arxiv.org/abs/1504.04989
GaN/AlN nanowire heterostructures can display photoluminescence (PL) decay times on the order of microseconds that persist up to room temperature. Doping the GaN nanodisk insertions with Ge can reduce these PL decay times by two orders of magnitude. These phenomena are explained by the three-dimensional electric field distribution within the GaN nanodisks, which has an axial component in the range of a few MV/cm associated to the spontaneous and piezoelectric polarization, and a radial piezoelectric contribution associated to the shear components of the lattice strain. At low dopant concentrations, a large electron-hole separation in both the axial and radial directions is present. The relatively weak radial electric fields, which are about one order of magnitude smaller than the axial fields, are rapidly screened by doping. This bidirectional screening leads to a radial and axial centralization of the hole underneath the electron, and consequently, to large decreases in PL decay times, in addition to luminescence blue shifts.
https://arxiv.org/abs/1412.7720
Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent. This is achieved by using a slight structural modification of the simple recurrent neural network architecture. We encourage some of the hidden units to change their state slowly by making part of the recurrent weight matrix close to identity, thus forming kind of a longer term memory. We evaluate our model in language modeling experiments, where we obtain similar performance to the much more complex Long Short Term Memory (LSTM) networks (Hochreiter & Schmidhuber, 1997).
https://arxiv.org/abs/1412.7753
For some images, descriptions written by multiple people are consistent with each other. But for other images, descriptions across people vary considerably. In other words, some images are specific $-$ they elicit consistent descriptions from different people $-$ while other images are ambiguous. Applications involving images and text can benefit from an understanding of which images are specific and which ones are ambiguous. For instance, consider text-based image retrieval. If a query description is moderately similar to the caption (or reference description) of an ambiguous image, that query may be considered a decent match to the image. But if the image is very specific, a moderate similarity between the query and the reference description may not be sufficient to retrieve the image. In this paper, we introduce the notion of image specificity. We present two mechanisms to measure specificity given multiple descriptions of an image: an automated measure and a measure that relies on human judgement. We analyze image specificity with respect to image content and properties to better understand what makes an image specific. We then train models to automatically predict the specificity of an image from image features alone without requiring textual descriptions of the image. Finally, we show that modeling image specificity leads to improvements in a text-based image retrieval application.
对于一些图像,由多人写的描述是一致的。但对于其他图像,跨人的描述差异很大。换句话说,一些图像是特定的$ - $他们引起不同人的一致的描述$ - $而其他图像是不明确的。涉及图像和文字的应用程序可以通过了解哪些图像是特定的以及哪些图像是模糊的而受益。例如,考虑基于文本的图像检索。如果查询描述与模糊图像的标题(或参考描述)中等相似,那么该查询可以被认为是与图像相称的匹配。但是,如果图像非常具体,查询和参考描述之间的适度相似可能不足以检索图像。在本文中,我们介绍图像特异性的概念。我们提出了两种机制来衡量一个特定的图像的多重描述的特异性:一个自动化的措施和措施,依靠人的判断。我们分析图像内容和属性的图像特异性,以更好地了解是什么使图像具体。然后,我们训练模型,自动从图像特征自动预测图像的特异性,而不需要对图像进行文字描述。最后,我们展示了建模图像特异性导致基于文本的图像检索应用程序的改进。
https://arxiv.org/abs/1502.04569
This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34% of the time.
本文提出了一种自动生成图像描述的新方法:视觉检测器,语言模型和多模态相似度模型直接从图像标题数据集中学习。我们使用多实例学习来训练通常在字幕中出现的单词的视觉检测器,包括许多不同的词类,如名词,动词和形容词。单词检测器输出用作最大熵语言模型的条件输入。语言模型从一组超过40万个图像描述中学习,以获取单词使用的统计数据。我们通过使用句级特征和深度多模态相似度模型对标题候选者进行重新排序来捕捉全局语义。我们的系统是官方Microsoft COCO基准测试的最新技术,产生了29.1%的BLEU-4分数。当人们将系统字幕与其他人的字幕相比较时,系统字幕在34%的时间内具有相同或更好的质量。
https://arxiv.org/abs/1411.4952
Object class detectors typically apply a window classifier to all the windows in a large set, either in a sliding window manner or using object proposals. In this paper, we develop an active search strategy that sequentially chooses the next window to evaluate based on all the information gathered before. This results in a substantial reduction in the number of classifier evaluations and in a more elegant approach in general. Our search strategy is guided by two forces. First, we exploit context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. This enables to jump across distant regions in the image (e.g. observing a sky region suggests that cars might be far below) and is done efficiently in a Random Forest framework. Second, we exploit the score of the classifier to attract the search to promising areas surrounding a highly scored window, and to keep away from areas near low scored ones. Our search strategy can be applied on top of any classifier as it treats it as a black-box. In experiments with R-CNN on the challenging SUN2012 dataset, our method matches the detection accuracy of evaluating all windows independently, while evaluating 9x fewer windows.
https://arxiv.org/abs/1412.3709
Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. For most existing hashing methods, an image is first encoded as a vector of hand-engineering visual features, followed by another separate projection or quantization step that generates binary codes. However, such visual feature vectors may not be optimally compatible with the coding process, thus producing sub-optimal hashing codes. In this paper, we propose a deep architecture for supervised hashing, in which images are mapped into binary codes via carefully designed deep neural networks. The pipeline of the proposed deep architecture consists of three building blocks: 1) a sub-network with a stack of convolution layers to produce the effective intermediate image features; 2) a divide-and-encode module to divide the intermediate image features into multiple branches, each encoded into one hash bit; and 3) a triplet ranking loss designed to characterize that one image is more similar to the second image than to the third one. Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.
https://arxiv.org/abs/1504.03410
We present observations of $^{13}$CO(1-0) in 17 Combined Array for Research in Millimeter Astronomy (CARMA) Atlas3D early-type galaxies (ETGs), obtained simultaneously with $^{12}$CO(1-0) observations. The $^{13}$CO in six ETGs is sufficiently bright to create images. In these 6 sources, we do not detect any significant radial gradient in the $^{13}$CO/$^{12}$CO ratio between the nucleus and the outlying molecular gas. Using the $^{12}$CO channel maps as 3D masks to stack the $^{13}$CO emission, we are able to detect 15/17 galaxies to $>3\sigma$ (and 12/17 to at least 5$\sigma$) significance in a spatially integrated manner. Overall, ETGs show a wide distribution of $^{13}$CO/$^{12}$CO ratios, but Virgo cluster and group galaxies preferentially show a $^{13}$CO/$^{12}$CO ratio about 2 times larger than field galaxies, although this could also be due to a mass dependence, or the CO spatial extent ($R_{\rm CO}/R_{\rm e}$). ETGs whose gas has a morphologically-settled appearance also show boosted $^{13}$CO/$^{12}$CO ratios. We hypothesize that this variation could be caused by (i) the extra enrichment of gas from molecular reprocessing occurring in low-mass stars (boosting the abundance of $^{13}$C to $^{12}$C in the absence of external gas accretion), (ii) much higher pressure being exerted on the midplane gas (by the intracluster medium) in the cluster environment than in isolated galaxies, or (iii) all but the densest molecular gas clumps being stripped as the galaxies fall into the cluster. Further observations of $^{13}$CO in dense environments, particularly of spirals, as well as studies of other isotopologues, should be able to distinguish between these hypotheses.
https://arxiv.org/abs/1504.02095
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representation (generated from a previously trained Convolutional Neural Network) and phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on caption syntax statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results on the recently release Microsoft COCO dataset.
生成图像的新颖文本描述是一个将计算机视觉和自然语言处理联系在一起的有趣问题。在本文中,我们提出一个简单的模型,能够生成描述性句子给出一个样本图像。这个模型非常注重描述的语法。我们训练一个纯粹的双线性模型,学习一个图像表示(从先前训练的卷积神经网络产生)与用来描述它们的短语之间的度量。系统然后能够从给定的图像样本中推断短语。基于字幕语法统计,我们提出了一个简单的语言模型,可以使用所推断的短语为给定的测试图像产生相关的描述。我们的方法比最先进的模型简单得多,可以在最近发布的Microsoft COCO数据集上获得可比的结果。
https://arxiv.org/abs/1412.8419
This article presents main results of the pilot study of approaches to the subject information search based on automated semantic processing of mass scientific and technical data. The authors focus on technology of building and qualification of search queries with the following filtering and ranking of search data. Software architecture, specific features of subject search and research results application are considered.
https://arxiv.org/abs/1504.02362
This paper explores the potential for using Brain Computer Interfaces (BCI) as a relevance feedback mechanism in content-based image retrieval. We investigate if it is possible to capture useful EEG signals to detect if relevant objects are present in a dataset of realistic and complex images. We perform several experiments using a rapid serial visual presentation (RSVP) of images at different rates (5Hz and 10Hz) on 8 users with different degrees of familiarization with BCI and the dataset. We then use the feedback from the BCI and mouse-based interfaces to retrieve localized objects in a subset of TRECVid images. We show that it is indeed possible to detect such objects in complex images and, also, that users with previous knowledge on the dataset or experience with the RSVP outperform others. When the users have limited time to annotate the images (100 seconds in our experiments) both interfaces are comparable in performance. Comparing our best users in a retrieval task, we found that EEG-based relevance feedback outperforms mouse-based feedback. The realistic and complex image dataset differentiates our work from previous studies on EEG for image retrieval.
https://arxiv.org/abs/1504.02356
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representation (generated from a previously trained Convolutional Neural Network) and phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on caption syntax statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the recently proposed Microsoft COCO.
生成图像的新颖文本描述是一个将计算机视觉和自然语言处理联系在一起的有趣问题。在本文中,我们提出一个简单的模型,能够生成描述性句子给出一个样本图像。这个模型非常注重描述的语法。我们训练一个纯粹的双线性模型,学习一个图像表示(从先前训练的卷积神经网络产生)与用来描述它们的短语之间的度量。系统然后能够从给定的图像样本中推断短语。基于字幕语法统计,我们提出了一个简单的语言模型,可以使用所推断的短语为给定的测试图像产生相关的描述。我们的方法比最先进的模型简单得多,可以在两个流行的数据集中得到类似的结果:Flickr30k和最近提出的Microsoft COCO。
https://arxiv.org/abs/1502.03671
As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions with influence from journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network between a selection of journals and unavoidably depend on this selection. However, little is known about how robust rankings are to the selection of included journals. Here we compare the robustness of three journal rankings based on network flows induced on citation networks. They model pathways of researchers navigating scholarly literature, stepping between journals and remembering their previous steps to different degree: zero-step memory as impact factor, one-step memory as Eigenfactor, and two-step memory, corresponding to zero-, first-, and second-order Markov models of citation flow between journals. We conclude that higher-order Markov models perform better and are more robust to the selection of journals. Whereas our analysis indicates that higher-order models perform better, the performance gain for the second-order Markov model comes at the cost of requiring more citation data over a longer time period.
https://arxiv.org/abs/1405.7832
In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is used. The evaluation server receives candidate captions and scores them using several popular metrics, including BLEU, METEOR, ROUGE and CIDEr. Instructions for using the evaluation server are provided.
在本文中,我们描述了Microsoft COCO Caption数据集和评估服务器。完成后,数据集将包含超过一百五十万个字幕,描述超过33万张图像。对于训练和验证图像,将提供五个独立的人造字幕。为了确保自动字幕生成算法的评估的一致性,使用评估服务器。评估服务器接收候选字幕并使用多种常用指标(包括BLEU,METEOR,ROUGE和CIDEr)对其进行评分。提供了使用评估服务器的说明。
https://arxiv.org/abs/1504.00325
Searching through a large volume of data is very critical for companies, scientists, and searching engines applications due to time complexity and memory complexity. In this paper, a new technique of generating FuzzyFind Dictionary for text mining was introduced. We simply mapped the 23 bits of the English alphabet into a FuzzyFind Dictionary or more than 23 bits by using more FuzzyFind Dictionary, and reflecting the presence or absence of particular letters. This representation preserves closeness of word distortions in terms of closeness of the created binary vectors within Hamming distance of 2 deviations. This paper talks about the Golay Coding Transformation Hash Table and how it can be used on a FuzzyFind Dictionary as a new technology for using in searching through big data. This method is introduced by linear time complexity for generating the dictionary and constant time complexity to access the data and update by new data sets, also updating for new data sets is linear time depends on new data points. This technique is based on searching only for letters of English that each segment has 23 bits, and also we have more than 23-bit and also it could work with more segments as reference table.
http://arxiv.org/abs/1503.06483
The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS’s search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using RESTful web services. Taking one step further, we will discuss how we plan to expose the treasure trove of information hosted by ADS (10 million records and fulltext for much of the Astronomy and Physics refereed literature) to partners interested in using this API. This will provide you (and your intelligent applications) with access to ADS’s underlying data to enable the extraction of new knowledge and the ingestion of these results back into the ADS. Using this framework, researchers could run controlled experiments with content extraction, machine learning, natural language processing, etc. In this talk, we will discuss what is already implemented, what will be available soon, and where we are going next.
https://arxiv.org/abs/1503.05881
Object class detection has been a synonym for 2D bounding box localization for the longest time, fueled by the success of powerful statistical learning techniques, combined with robust image representations. Only recently, there has been a growing interest in revisiting the promise of computer vision from the early days: to precisely delineate the contents of a visual scene, object by object, in 3D. In this paper, we draw from recent advances in object detection and 2D-3D object lifting in order to design an object class detector that is particularly tailored towards 3D object class detection. Our 3D object class detection method consists of several stages gradually enriching the object detection output with object viewpoint, keypoints and 3D shape estimates. Following careful design, in each stage it constantly improves the performance and achieves state-ofthe-art performance in simultaneous 2D bounding box and viewpoint estimation on the challenging Pascal3D+ dataset.
https://arxiv.org/abs/1503.05038
We consider detecting objects in an image by iteratively selecting from a set of arbitrarily shaped candidate regions. Our generic approach, which we term visual chunking, reasons about the locations of multiple object instances in an image while expressively describing object boundaries. We design an optimization criterion for measuring the performance of a list of such detections as a natural extension to a common per-instance metric. We present an efficient algorithm with provable performance for building a high-quality list of detections from any candidate set of region-based proposals. We also develop a simple class-specific algorithm to generate a candidate region instance in near-linear time in the number of low-level superpixels that outperforms other region generating methods. In order to make predictions on novel images at testing time without access to ground truth, we develop learning approaches to emulate these algorithms’ behaviors. We demonstrate that our new approach outperforms sophisticated baselines on benchmark datasets.
https://arxiv.org/abs/1410.7376
The apsis toolkit presented in this paper provides a flexible framework for hyperparameter optimization and includes both random search and a bayesian optimizer. It is implemented in Python and its architecture features adaptability to any desired machine learning code. It can easily be used with common Python ML frameworks such as scikit-learn. Published under the MIT License other researchers are heavily encouraged to check out the code, contribute or raise any suggestions. The code can be found at this http URL.
https://arxiv.org/abs/1503.02946
The proliferation of massive datasets combined with the development of sophisticated analytical techniques have enabled a wide variety of novel applications such as improved product recommendations, automatic image tagging, and improved speech-driven interfaces. These and many other applications can be supported by Predictive Analytic Queries (PAQs). A major obstacle to supporting PAQs is the challenging and expensive process of identifying and training an appropriate predictive model. Recent efforts aiming to automate this process have focused on single node implementations and have assumed that model training itself is a black box, thus limiting the effectiveness of such approaches on large-scale problems. In this work, we build upon these recent efforts and propose an integrated PAQ planning architecture that combines advanced model search techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching. The result is TuPAQ, a component of the MLbase system, which solves the PAQ planning problem with comparable quality to exhaustive strategies but an order of magnitude more efficiently than the standard baseline approach, and can scale to models trained on terabytes of data across hundreds of machines.
https://arxiv.org/abs/1502.00068
We demonstrate a novel method for nanowire formation by natural selection during wet chemical etching in boiling Phosphoric acid. It is observed that wire lateral dimensions of sub-10 nm and lengths of 700 nm or more have been naturally formed during the wet etching. The dimension variation is controlled through etching times wherein the underlying cause is the merging of the nearby crystallographic hexagonal etch pits. The emission processes involving excitons are found to be efficient and lead to enhanced emission characteristics. The exciton binding energy is augmented by using quantum confinement whereby enforcing greater overlap of the electron-hole wave-function. The surviving nanowires are nearly defect-free, have large exciton binding energies of around 45 meV and a small temperature variation of the output electroluminescent light. We have observed superluminescent behaviour of the LEDs formed on these nanowires. There is no observable efficiency roll off till current densities of 400 A/cm2. The present work thus provides an innovative and cost effective manner of device fabrication on the formed nanowires and proves the immediate performance enhancement achievable.
https://arxiv.org/abs/1503.02279
Context. Astrometric monitoring of directly-imaged exoplanets allows the study of their orbital parameters and system architectures. Because most directly-imaged planets have long orbital periods (>20 AU), accurate astrometry is challenging when based on data acquired on timescales of a few years and usually with different instruments. The LMIRCam camera on the LBT is being used for the LEECH survey to search for and characterize young and adolescent exoplanets in L’ band, including their system architectures. Aims. We first aim to provide a good astrometric calibration of LMIRCam. Then, we derive new astrometry, test the predictions of the orbital model of 8:4:2:1 mean motion resonance proposed by Goździewski & Migaszewski, and perform new orbital fitting of the HR 8799 bcde planets. We also present deep limits on a putative fifth planet interior to the known planets. Methods. We use observations of HR 8799 and the Theta1 Ori C field obtained during the same run in October 2013. Results. We first characterize the distortion of LMIRCam. We determine a platescale and a true north orientation for the images of 10.707 +/- 0.012 mas/pix and -0.430 +/- 0.076 deg, respectively. The errors on the platescale and true north orientation translate into astrometric accuracies at a separation of 1 of 1.1 mas and 1.3 mas, respectively. The measurements for all planets are usually in agreement within 3 sigma with the ephemeris predicted by Goździewski & Migaszewski. The orbital fitting based on the new astrometric measurements favors an architecture for the planetary system based on 8:4:2:1 mean motion resonance. The detection limits allow us to exclude a fifth planet slightly brighter/more massive than HR 8799 b at the location of the 2:1 resonance with HR 8799 e (~9.5 AU) and about twice as bright as HR 8799 cde at the location of the 3:1 resonance with HR 8799 e (~7.5 AU).
https://arxiv.org/abs/1412.6989
Cortical networks are hypothesized to rely on transient network activity to support short term memory (STM). In this paper we study the capacity of randomly connected recurrent linear networks for performing STM when the input signals are approximately sparse in some basis. We leverage results from compressed sensing to provide rigorous non asymptotic recovery guarantees, quantifying the impact of the input sparsity level, the input sparsity basis, and the network characteristics on the system capacity. Our analysis demonstrates that network memory capacities can scale superlinearly with the number of nodes, and in some situations can achieve STM capacities that are much larger than the network size. We provide perfect recovery guarantees for finite sequences and recovery bounds for infinite sequences. The latter analysis predicts that network STM systems may have an optimal recovery length that balances errors due to omission and recall mistakes. Furthermore, we show that the conditions yielding optimal STM capacity can be embodied in several network topologies, including networks with sparse or dense connectivities.
https://arxiv.org/abs/1307.7970
We present Context Forest (ConF), a technique for predicting properties of the objects in an image based on its global appearance. Compared to standard nearest-neighbour techniques, ConF is more accurate, fast and memory efficient. We train ConF to predict which aspects of an object class are likely to appear in a given image (e.g. which viewpoint). This enables to speed-up multi-component object detectors, by automatically selecting the most relevant components to run on that image. This is particularly useful for detectors trained from large datasets, which typically need many components to fully absorb the data and reach their peak performance. ConF provides a speed-up of 2x for the DPM detector [1] and of 10x for the EE-SVM detector [2]. To show ConF’s generality, we also train it to predict at which locations objects are likely to appear in an image. Incorporating this information in the detector score improves mAP performance by about 2% by removing false positive detections in unlikely locations.
https://arxiv.org/abs/1503.00787
Past research has challenged us with the task of showing relational patterns between text-based data and then clustering for predictive analysis using Golay Code technique. We focus on a novel approach to extract metaknowledge in multimedia datasets. Our collaboration has been an on-going task of studying the relational patterns between datapoints based on metafeatures extracted from metaknowledge in multimedia datasets. Those selected are significant to suit the mining technique we applied, Golay Code algorithm. In this research paper we summarize findings in optimization of metaknowledge representation for 23-bit representation of structured and unstructured multimedia data in order to
http://arxiv.org/abs/1503.00245
The global influence of Big Data is not only growing but seemingly endless. The trend is leaning towards knowledge that is attained easily and quickly from massive pools of Big Data. Today we are living in the technological world that Dr. Usama Fayyad and his distinguished research fellows discussed in the introductory explanations of Knowledge Discovery in Databases (KDD) predicted nearly two decades ago. Indeed, they were precise in their outlook on Big Data analytics. In fact, the continued improvement of the interoperability of machine learning, statistics, database building and querying fused to create this increasingly popular science- Data Mining and Knowledge Discovery. The next generation computational theories are geared towards helping to extract insightful knowledge from even larger volumes of data at higher rates of speed. As the trend increases in popularity, the need for a highly adaptive solution for knowledge discovery will be necessary. In this research paper, we are introducing the investigation and development of 23 bit-questions for a Metaknowledge template for Big Data Processing and clustering purposes. This research aims to demonstrate the construction of this methodology and proves the validity and the beneficial utilization that brings Knowledge Discovery from Big Data.
http://arxiv.org/abs/1503.00244
In this paper we study the application of convolutional neural networks for jointly detecting objects depicted in still images and estimating their 3D pose. We identify different feature representations of oriented objects, and energies that lead a network to learn this representations. The choice of the representation is crucial since the pose of an object has a natural, continuous structure while its category is a discrete variable. We evaluate the different approaches on the joint object detection and pose estimation task of the Pascal3D+ benchmark using Average Viewpoint Precision. We show that a classification approach on discretized viewpoints achieves state-of-the-art performance for joint object detection and pose estimation, and significantly outperforms existing baselines on this benchmark.
https://arxiv.org/abs/1412.7190
Using the high-resolution x-ray diffraction (XRD) analysis, scanning electron microscopy (SEM), and the temperature-dependent microwave resonator characterization, structural properties, phase assemblage and dielectric properties of La(Mg1/2Ti1/2)O3 (LMT) and Nd(Mg1/2Ti1/2)O3 (NMT) ceramics prepared via the mixed oxide route were investigated in this study. Single-phase ceramics were synthesized for both LMT and NMT at sintering temperatures from 1250 degree C to 1675 degree C. On the basis of the XRD analysis we have found that the LMT and NMT compounds have cubic and monoclinic crystal structures, respectively. We have also observed that the relative densities of LMT and NMT vary between about 93 and 99% of the theoretical density, depending on the sintering temperature. Finally, concerning the dielectric properties of the microwave resonators made of LMT and NMT compounds we have measured their temperature coefficient of the resonant frequency ({\tau}f) and the quality factor (Q). It is interesting to notice that ({\tau}f) in the case of the NMT compound (-16 ppm 1/K) is essentially smaller than in the case of the LMT compound (-72 ppm 1/K), therefore proving a better stability against temperature variations in the NMT based resonators. On the other hand, the Q values are very similar, being 34000 at the resonance frequency of 8.07 GHz and 38000 at the resonance frequency of 9.76 GHz in the LMT and NMT cases, respectively. Keywords: dielectric properties; microstructure-final: grain growth; single phase Ln(Mg0.5Ti0.5)O3 (Ln=Nd,La); Lanthanum magnesium titanium oxide.
https://arxiv.org/abs/1502.07547
Deep LSTM is an ideal candidate for text recognition. However text recognition involves some initial image processing steps like segmentation of lines and words which can induce error to the recognition system. Without segmentation, learning very long range context is difficult and becomes computationally intractable. Therefore, alternative soft decisions are needed at the pre-processing level. This paper proposes a hybrid text recognizer using a deep recurrent neural network with multiple layers of abstraction and long range context along with a language model to verify the performance of the deep neural network. In this paper we construct a multi-hypotheses tree architecture with candidate segments of line sequences from different segmentation algorithms at its different branches. The deep neural network is trained on perfectly segmented data and tests each of the candidate segments, generating unicode sequences. In the verification step, these unicode sequences are validated using a sub-string match with the language model and best first search is used to find the best possible combination of alternative hypothesis from the tree structure. Thus the verification framework using language models eliminates wrong segmentation outputs and filters recognition errors.
https://arxiv.org/abs/1502.07540
Starting with valise supermultiplets obtained from 0-branes plus field redefinitions, valise adinkra networks, and the “Garden Algebra,” we discuss an architecture for algorithms that (starting from on-shell theories and, through a well-defined computation procedure), search for off-shell completions. We show in one dimension how to directly attack the notorious “off-shell auxiliary field” problem of supersymmetry with algorithms in the adinkra network-world formulation.
https://arxiv.org/abs/1502.04164