A dual-channel AlN/GaN high electron mobility transistor (HEMT) architecture is demonstrated that leverages ultra-thin epitaxial layers to suppress surface-state related gate lag. Two high-density two-dimensional electron gas (2DEG) channels are utilized in an AlN/GaN/AlN/GaN heterostructure wherein the top 2DEG serves as a quasi-equipotential that screens potential fluctuations resulting from surface and interface trapped charge. The bottom channel serves as the transistor’s modulated channel. Dual-channel AlN/GaN heterostructures were grown by molecular beam epitaxy on free-standing HVPE GaN substrates where 300 nm long recessed and non-recessed gate HEMTs were fabricated. The recessed-gate HEMT demonstrated a gate lag ratio (GLR) of 0.88 with no collapse in drain current and supporting small signal metrics $f_t/f_{max}$ of 27/46 GHz. These performance results are contrasted with the non-recessed gate dual-channel HEMT with a GLR of 0.74 and 82 mA/mm current collapse with $f_t/f_{max}$ of 48/60 GHz.
https://arxiv.org/abs/1508.07794
Doping of III-nitride based compound semiconductor nanowires is still a challenging issue to have a control over the dopant distribution in precise locations of the nanowire optoelectronic devices. Knowledge of the dopant incorporation and its pathways in nanowires for such devices is limited by the growth methods. We report the direct evidence of incorporation pathway for Mg dopants in p-type nonpolar GaN nanowires grown via vapour-liquid-solid (VLS) method in a chemical vapour deposition technique for the first time. Mg incorporation is confirmed using X-ray photoelectron (XPS) and electron energy loss spectroscopic (EELS) measurements. Energy filtered transmission electron microscopic (EFTEM) studies are used for finding the Mg incorporation pathway in the GaN nanowire. Photoluminescence studies on Mg doped GaN nanowires along with the electrical characterization on heterojunction formed between nanowires and n-Si confirm the activation of Mg atoms as p-type dopants in nonpolar GaN nanowires.
https://arxiv.org/abs/1508.07785
Manipulation of surface architecture of semiconducting nanowires with a control in surface polarity is one of the important objectives for nanowire based electronic and optoelectronic devices for commercialization. We report the growth of exceptionally high structural and optical quality nonpolar GaN nanowires with controlled and uniform surface morphology and size distribution, for large scale production. The role of O contamination (~1-10^5 ppm) in the surface architecture of these nanowires is investigated with the possible mechanism involved. Nonpolar GaN nanowires grown in O rich condition show the inhomogeneous surface morphologies and sizes (50 - 150 nm) while nanowires are having precise sizes of 40(5) nm and uniform surface morphology, for the samples grown in O reduced condition. Relative O contents are estimated using electron energy loss spectroscopy studies. Size-selective growth of uniform nanowires is also demonstrated, in the O reduced condition, using different catalyst sizes. Photoluminescence studies along with the observation of single-mode waveguide formation, as far field bright violet multiple emission spots, reveal the high optical quality of the nonpolar GaN nanowires grown in the O reduced condition.
https://arxiv.org/abs/1508.07425
Recently datasets that contain sentence descriptions of images have enabled models that can automatically generate image captions. However, collecting these datasets are still very expensive. Here, we present SentenceRacer, an online game that gathers and verifies descriptions of images at no cost. Similar to the game hangman, players compete to uncover words in a sentence that ultimately describes an image. SentenceRacer both generates and verifies that the sentences are accurate descriptions. We show that SentenceRacer generates annotations of higher quality than those generated on Amazon Mechanical Turk (AMT).
最近包含图像句子描述的数据集已启用可自动生成图像标题的模型。但是,收集这些数据集仍然非常昂贵。在这里,我们介绍SentenceRacer,一个在线游戏,收集和验证图像的描述,免费。类似于游戏的hang子手,玩家竞争揭示最终描述图像的句子中的单词。 SentenceRacer产生并验证这些句子是准确的描述。我们展示了SentenceRacer生成的注释质量高于Amazon Mechanical Turk(AMT)上生成的注释。
https://arxiv.org/abs/1508.07053
We report on the direct measurement of two-dimensional sheet charge density dependence of electron transport in AlGaN/GaN high electron mobility transistors. Pulsed IV measurements established increasing electron velocities with decreasing sheet charge densities, resulting in saturation velocity of 1.9 x 10^7 cm/s at a low sheet charge density of 7.8 x 10^11 cm-2. A new optical phonon emission-based electron velocity model for GaN is also presented. It accommodates stimulated LO phonon emission which clamps the electron velocity with strong electron-phonon interaction and long LO phonon lifetime in GaN. A comparison with the measured density-dependent saturation velocity shows that it captures the dependence rather well. Finally, the experimental result is applied in TCAD-based device simulator to predict DC and small signal characteristics of a reported GaN HEMT. Good agreement between the simulated and reported experimental results validated the measurement presented in this report and established accurate modeling of GaN HEMTs.
https://arxiv.org/abs/1508.07050
(Abbreviated) Kepler planet candidates require both spectroscopic and imaging follow-up observations to rule out false positives and detect blended stars. […] In this paper, we examine a sample of 11 Kepler host stars with companions detected by two techniques – near-infrared adaptive optics and/or optical speckle interferometry imaging, and a new spectroscopic deblending method. We compare the companion Teff and flux ratios (F_B/F_A, where A is the primary and B is the companion) derived from each technique, and find no cases where both companion parameters agree within 1sigma errors. In 3/11 cases the companion Teff values agree within 1sigma errors, and in 2/11 cases the companion F_B/F_A values agree within 1sigma errors. Examining each Kepler system individually considering multiple avenues (isochrone mapping, contrast curves, probability of being bound), we suggest two cases for which the techniques most likely agree in their companion detections (detect the same companion star). Overall, our results support the advantage the spectroscopic deblending technique has for finding very close-in companions ($\theta \lesssim$0.02-0.05”) that are not easily detectable with imaging. However, we also specifically show how high-contrast AO and speckle imaging observations detect companions at larger separations ($\theta \geq$0.02-0.05”) that are missed by the spectroscopic technique, provide additional information for characterizing the companion and its potential contamination (e.g., PA, separation, $\Delta$m), and cover a wider range of primary star effective temperatures. The investigation presented here illustrates the utility of combining the two techniques to reveal higher-order multiples in known planet-hosting systems.
https://arxiv.org/abs/1508.06502
We investigate the influence of modified growth conditions during the spontaneous formation of GaN nanowires on Si(111) in plasma-assisted molecular beam epitaxy. We find that a two-step growth approach, where the substrate temperature is increased during the nucleation stage, is an efficient method to gain control over the area coverage, average diameter, and coalescence degree of GaN nanowire ensembles. Furthermore, we also demonstrate that the growth conditions employed during the incubation time that precedes nanowire nucleation do not influence the properties of the final nanowire ensemble. Therefore, when growing GaN nanowires at elevated temperatures or with low Ga/N ratios, the total growth time can be reduced significantly by using more favorable growth conditions for nanowire nucleation during the incubation time.
https://arxiv.org/abs/1508.06266
Crystal morphologies are important for the design and functionality of devices based on low-dimensional nanomaterials. The equilibrium crystal shape (ECS) is a key quantity in this context. It is determined by surface energies, which are hard to access experimentally but can generally be well predicted by first-principles methods. Unfortunately, this is not necessarily so for polar and semipolar surfaces of wurtzite crystals. By extending the concept of Wulff construction, we demonstrate that the ECSs can nevertheless be obtained for this class of materials. For the example of GaN, we identify different crystal shapes depending on the chemical potential, shedding light on experimentally observed GaN nanostructures.
https://arxiv.org/abs/1411.4839
During May 2013, a gamma-ray flare from the BL Lac object 1ES 1727+502 (z=0.055) has been detected with the VERITAS Cherenkov telescopes. This detection represents the first evidence of very-high-energy (E>100 GeV) variability from this blazar and has been achieved using a reduced-high-voltage configuration which allows observations under bright moonlight. The integral flux is about five times higher than the archival VHE flux measured by MAGIC. The detection triggered additional VERITAS observations during standard dark-time and multiwavelength observations from infrared to X-rays with the FLWO 48” telescope and the Swift satellite. The results from this campaign are presented and used to produce the first spectral energy distribution of this object during gamma-ray flaring activity. The spectral energy distribution is then fit with a standard synchrotron-self-Compton model, placing constraints on the properties of the emitting region in the blazar.
https://arxiv.org/abs/1508.05551
By the insertion of thin InGaN layers into Nitrogen-polar GaN p-n junctions, polarization-induced Zener tunnel junctions are studied. The reverse-bias interband Zener tunneling current is found to be weakly temperature dependent, as opposed to the strongly temperature-dependent forward bias current. This indicates tunneling as the primary reverse-bias current transport mechanism. The Indium composition in the InGaN layer is systematically varied to demonstrate the increase in the interband tunneling current. Comparing the experimentally measured tunneling currents to a model helps identify the specific challenges in potentially taking such junctions towards nitride-based polarization-induced tunneling field-effect transistors.
https://arxiv.org/abs/1508.05536
An intelligent version of the sliding-puzzle game is developed using the new Go programming language, which uses a concurrent version of the A* Informed Search Algorithm to power solver-bot that runs in the background. The game runs in computer system’s terminals. Mainly, it was developed for UNIX-type systems but it works pretty well in nearly all the operating systems because of cross-platform compatibility of the programming language used. The game uses language’s concurrency primitives to simplify most of the hefty parts of the game. A real-time notification delivery architecture is developed using language’s built-in concurrency support, which performs similar to event based context aware invocations like we see on the web platform.
https://arxiv.org/abs/1503.08345
Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
理解蕴含和矛盾是理解自然语言的基础,推理和矛盾是语义表征发展的宝贵试验基础。然而,由于缺乏大规模的资源,这一领域的机器学习研究受到极大的限制。为了解决这个问题,我们介绍了斯坦福自然语言推理语料库,这是一个由人类基于图像字幕做一个新的基础任务的新的,免费提供的标签语句对集合。在570K对,它比所有其他类型的资源大两个数量级。规模的增加使得词法分类器能够胜过一些复杂的现有蕴涵模型,并且它允许基于神经网络的模型首次在自然语言推理基准上竞争性地执行。
https://arxiv.org/abs/1508.05326
Sequence-to-sequence translation methods based on generation with a side-conditioned language model have recently shown promising results in several tasks. In machine translation, models conditioned on source side words have been used to produce target-language text, and in image captioning, models conditioned images have been used to generate caption text. Past work with this approach has focused on large vocabulary tasks, and measured quality in terms of BLEU. In this paper, we explore the applicability of such models to the qualitatively different grapheme-to-phoneme task. Here, the input and output side vocabularies are small, plain n-gram models do well, and credit is only given when the output is exactly correct. We find that the simple side-conditioned generation approach is able to rival the state-of-the-art, and we are able to significantly advance the stat-of-the-art with bi-directional long short-term memory (LSTM) neural networks that use the same alignment information that is used in conventional approaches.
基于侧边语言模型生成的序列到序列的翻译方法最近在几个任务中显示出有希望的结果。在机器翻译中,以源语词汇为条件的模型已被用于生成目标语言文本,而在图像字幕中,模型条件图像已被用于生成字幕文本。过去使用这种方法的工作集中在大量的词汇任务上,并以BLEU来衡量质量。在本文中,我们探讨了这种模型对定性不同的字形音素任务的适用性。在这里,输入和输出方面的词汇是很小的,简单的n元模型做的很好,而且只有当输出是完全正确的时候才能得到信用。我们发现,简单的边界条件生成方法能够与最新的技术相匹敌,并且我们能够通过双向长时间短期记忆(LSTM)来显着提高最新的技术水平,神经网络使用在传统方法中使用的相同的对准信息。
https://arxiv.org/abs/1506.00196
In this paper, we propose a novel deep neural network framework embedded with low-level features (LCNN) for salient object detection in complex images. We utilise the advantage of convolutional neural networks to automatically learn the high-level features that capture the structured information and semantic context in the image. In order to better adapt a CNN model into the saliency task, we redesign the network architecture based on the small-scale datasets. Several low-level features are extracted, which can effectively capture contrast and spatial information in the salient regions, and incorporated to compensate with the learned high-level features at the output of the last fully connected layer. The concatenated feature vector is further fed into a hinge-loss SVM detector in a joint discriminative learning manner and the final saliency score of each region within the bounding box is obtained by the linear combination of the detector’s weights. Experiments on three challenging benchmark (MSRA-5000, PASCAL-S, ECCSD) demonstrate our algorithm to be effective and superior than most low-level oriented state-of-the-arts in terms of P-R curves, F-measure and mean absolute errors.
https://arxiv.org/abs/1508.03928
Relation classification is an important research arena in the field of natural language processing (NLP). In this paper, we present SDP-LSTM, a novel neural network to classify the relation of two entities in a sentence. Our neural architecture leverages the shortest dependency path (SDP) between two entities; multichannel recurrent neural networks, with long short term memory (LSTM) units, pick up heterogeneous information along the SDP. Our proposed model has several distinct features: (1) The shortest dependency paths retain most relevant information (to relation classification), while eliminating irrelevant words in the sentence. (2) The multichannel LSTM networks allow effective information integration from heterogeneous sources over the dependency paths. (3) A customized dropout strategy regularizes the neural network to alleviate overfitting. We test our model on the SemEval 2010 relation classification task, and achieve an $F_1$-score of 83.7\%, higher than competing methods in the literature.
https://arxiv.org/abs/1508.03720
Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers. In many low-level vision problems, not only it is known that the underlying structure of clean data is low-rank, but the exact rank of clean data is also known. Yet, when applying conventional rank minimization for those problems, the objective function is formulated in a way that does not fully utilize a priori target rank information about the problems. This observation motivates us to investigate whether there is a better alternative solution when using rank minimization. In this paper, instead of minimizing the nuclear norm, we propose to minimize the partial sum of singular values, which implicitly encourages the target rank constraint. Our experimental analyses show that, when the number of samples is deficient, our approach leads to a higher success rate than conventional rank minimization, while the solutions obtained by the two approaches are almost identical when the number of samples is more than sufficient. We apply our approach to various low-level vision problems, e.g. high dynamic range imaging, motion edge detection, photometric stereo, image alignment and recovery, and show that our results outperform those obtained by the conventional nuclear norm rank minimization method.
http://arxiv.org/abs/1503.01444
We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a state-of-the-art neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. We replicate this result in a new, fine-grained, transfer learned captioning domain, consisting of 66K recipe image/title pairs. We also provide some experiments regarding the appropriateness of datasets for automatic captioning, and find that having multiple captions per image is beneficial, but not an absolute requirement.
我们研究了最近在自动标题生成方面有希望的结果主要是由于语言模型的可能性。通过改变由卷积神经网络产生的图像表示质量,我们发现即使在提供令人惊讶的图像表示的情况下,最先进的神经字幕算法也能够产生高质量的字幕。我们将这个结果复制到一个新的,精细的转移学习字幕领域,由66K配方图像/标题对组成。我们还提供了一些关于数据集适合自动字幕的实验,并发现每个图像有多个字幕是有益的,但不是绝对的要求。
https://arxiv.org/abs/1508.02091
We report direct evidence of two mechanisms responsible for the excitation of optically active Er3+ ions in GaN epilayers grown by metal-organic chemical vapor deposition. These mechanisms, resonant excitation via the higher-lying inner 4f shell transitions and band-to-band excitation of the semiconductor host, lead to narrow emission lines from isolated and the defect-related Er centers. However, these centers have different photoluminescence spectra, decay dynamics, and excitation cross sections. The isolated Er optical center, which can be excited by either mechanism, has the same decay dynamics, but possesses a much higher cross-section under band-to-band excitation. In contrast, the defect-related Er center can only be excited through band-to-band excitation but has the largest cross-section. These results explain the difficulty in achieving gain in Er doped GaN and indicate new approaches for realization of optical amplification, and possibly lasing, at room temperature.
https://arxiv.org/abs/1507.05119
Millimeter-wave (mmw) frequency bands, especially 60 GHz unlicensed band, are considered as a promising solution for gigabit short range wireless communication systems. IEEE standard 802.11ad, also known as WiGig, is standardized for the usage of the 60 GHz unlicensed band for wireless local area networks (WLANs). By using this mmw WLAN, multi-Gbps rate can be achieved to support bandwidth-intensive multimedia applications. Exhaustive search along with beamforming (BF) is usually used to overcome 60 GHz channel propagation loss and accomplish data transmissions in such mmw WLANs. Because of its short range transmission with a high susceptibility to path blocking, multiple number of mmw access points (APs) should be used to fully cover a typical target environment for future high capacity multi-Gbps WLANs. Therefore, coordination among mmw APs is highly needed to overcome packet collisions resulting from un-coordinated exhaustive search BF and to increase the total capacity of mmw WLANs. In this paper, we firstly give the current status of mmw WLANs with our developed WiGig AP prototype. Then, we highlight the great need for coordinated transmissions among mmw APs as a key enabler for future high capacity mmw WLANs. Two different types of coordinated mmw WLAN architecture are introduced. One is the distributed antenna type architecture to realize centralized coordination, while the other is an autonomous coordination with the assistance of legacy Wi-Fi signaling. Moreover, two heterogeneous network (HetNet) architectures are also introduced to efficiently extend the coordinated mmw WLANs to be used for future 5th Generation (5G) cellular networks.
https://arxiv.org/abs/1507.04518
Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.
贝叶斯优化是对昂贵的评估功能进行全局优化的有效方法。它依赖于查询由相对便宜的代理模型定义的函数的分布。这种功能分布的准确模型对于该方法的有效性至关重要,并且通常使用高斯过程(GP)来拟合。然而,由于全球定位系统与观测数量成正比,因此要处理优化需要许多评估的目标是非常具有挑战性的,因此大规模优化并行化。在这项工作中,我们探索使用神经网络作为GP的替代模型分布函数。我们展示了使用神经网络执行自适应基函数回归作为参数形式与最先进的基于GP的方法竞争性地执行,但是与数据的数量成线性比例而不是立方尺寸。这使我们能够实现先前难以处理的并行性,我们将其应用于大规模超参数优化,使用卷积网络快速找到基准物体识别任务的竞争模型,以及使用神经语言模型生成图像标题。
https://arxiv.org/abs/1502.05700
Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a probabilistic suggestion model that is able to account for sequences of previous queries of arbitrary lengths. Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that it outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our model is general enough to be used in a variety of other applications.
https://arxiv.org/abs/1507.02221
A discrete-time random process is described which can generate bursty sequences of events. A Bernoulli process, where the probability of an event occurring at time $t$ is given by a fixed probability $x$, is modified to include a memory effect where the event probability is increased proportionally to the number of events which occurred within a given amount of time preceding $t$. For small values of $x$ the inter-event time distribution follows a power-law with exponent $-2-x$. We consider a dynamic network where each node forms, and breaks connections according to this process. The value of $x$ for each node depends on the fitness distribution, $\rho(x)$, from which it is drawn; we find exact solutions for the expectation of the degree distribution for a variety of possible fitness distributions, and for both cases where the memory effect either is, or is not present. This work can potentially lead to methods to uncover hidden fitness distributions from fast changing, temporal network data such as online social communications and fMRI scans.
https://arxiv.org/abs/1501.05198
The unprecedented range of second-generation gravitational-wave (GW) observatories calls for refining the predictions of potential sources and detection rates. The coalescence of double compact objects (DCOs)—i.e., neutron star-neutron star (NS-NS), black hole-neutron star (BH-NS), and black hole-black hole (BH-BH) binary systems—is the most promising source of GWs for these detectors. We compute detection rates of coalescing DCOs in second-generation GW detectors using the latest models for their cosmological evolution, and implementing inspiral-merger-ringdown (IMR) gravitational waveform models in our signal-to-noise ratio calculations. We find that: (1) the inclusion of the merger/ringdown portion of the signal does not significantly affect rates for NS-NS and BH-NS systems, but it boosts rates by a factor $\sim 1.5$ for BH-BH systems; (2) in almost all of our models BH-BH systems yield by far the largest rates, followed by NS-NS and BH-NS systems, respectively, and (3) a majority of the detectable BH-BH systems were formed in the early Universe in low-metallicity environments. We make predictions for the distributions of detected binaries and discuss what the first GW detections will teach us about the astrophysics underlying binary formation and evolution.
https://arxiv.org/abs/1405.7016
Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output problems, where the observed target is composed of multiple random variables that have a rich joint distribution, given the input. We focus in this paper on the case where the input also has a rich structure and the input and output structures are somehow related. We describe systems that learn to attend to different places in the input, for each element of the output, for a variety of tasks: machine translation, image caption generation, video clip description and speech recognition. All these systems are based on a shared set of building blocks: gated recurrent neural networks and convolutional neural networks, along with trained attention mechanisms. We report on experimental results with these systems, showing impressively good performance and the advantage of the attention mechanism.
尽管深度神经网络最初主要用于分类任务,但在结构化输出问题领域,它们正在迅速扩大,在已知输入的情况下,观察目标由具有丰富联合分布的多个随机变量组成。我们把重点放在投入结构丰富,投入产出结构相关的情况。我们描述了学习输入不同位置的输入的每个元素的系统,用于各种任务:机器翻译,图像标题生成,视频剪辑描述和语音识别。所有这些系统都基于共享的构建块:门控递归神经网络和卷积神经网络,以及训练的注意机制。我们用这些系统报告实验结果,显示出令人印象深刻的良好性能和关注机制的优势。
https://arxiv.org/abs/1507.01053
The New Yorker publishes a weekly captionless cartoon. More than 5,000 readers submit captions for it. The editors select three of them and ask the readers to pick the funniest one. We describe an experiment that compares a dozen automatic methods for selecting the funniest caption. We show that negative sentiment, human-centeredness, and lexical centrality most strongly match the funniest captions, followed by positive sentiment. These results are useful for understanding humor and also in the design of more engaging conversational agents in text and multimodal (vision+text) systems. As part of this work, a large set of cartoons and captions is being made available to the community.
纽约客每周发行一张无字幕的漫画。超过5000名读者为其提供字幕。编辑选择其中三个,并要求读者选择最有趣的一个。我们描述了一个比较十几个自动方法来选择最有趣的标题的实验。我们显示消极的情绪,以人为本,和词汇中心性最有力匹配最有趣的字幕,其次是积极的情绪。这些结果对于理解幽默以及在文本和多模式(视觉+文本)系统中更有吸引力的会话代理的设计是有用的。作为这项工作的一部分,一大批漫画和标题正在提供给社区。
https://arxiv.org/abs/1506.08126
Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation in reaches a competitive 18.7% phoneme error rate (PER) on the TIMIT phoneme recognition task, it can only be applied to utterances which are roughly as long as the ones it was trained on. We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue. The new method yields a model that is robust to long inputs and achieves 18% PER in single utterances and 20% in 10-times longer (repeated) utterances. Finally, we propose a change to the at- tention mechanism that prevents it from concentrating too much on single frames, which further reduces PER to 17.6% level.
通过关注机制对输入数据进行处理的循环顺序生成器最近在包括机器翻译,手写合成和图像标题生成在内的各种任务中表现出非常好的性能。我们通过语音识别所需的特性来扩展注意机制。我们表明,尽管用于机器翻译的模型的适应性在TIMIT音素识别任务上达到了有竞争力的18.7%的音素错误率(PER),但它只能被应用于与其被训练的长度大致相同的话语上。我们提供了这种失败的定性解释,并提出了一种新颖的,通用的方法来增加位置意识的注意机制,以缓解这个问题。新方法产生的模型对于长时间投入是稳健的,在单个话语中PER达到18%,在10倍长(重复)话语中PER达到20%。最后,我们建议改变注意机制,以防止它过分集中在单帧上,这进一步将PER降低到17.6%的水平。
https://arxiv.org/abs/1506.07503
Salient object detection has become an important task in many image processing applications. The existing approaches exploit background prior and contrast prior to attain state of the art results. In this paper, instead of using background cues, we estimate the foreground regions in an image using objectness proposals and utilize it to obtain smooth and accurate saliency maps. We propose a novel saliency measure called `foreground connectivity’ which determines how tightly a pixel or a region is connected to the estimated foreground. We use the values assigned by this measure as foreground weights and integrate these in an optimization framework to obtain the final saliency maps. We extensively evaluate the proposed approach on two benchmark databases and demonstrate that the results obtained are better than the existing state of the art approaches.
https://arxiv.org/abs/1506.07363
We report on the methods used in our recent DeepEnsembleCoco submission to the PASCAL VOC 2012 challenge, which achieves state-of-the-art performance on the object detection task. Our method is a variant of the R-CNN model proposed Girshick:CVPR14 with two key improvements to training and evaluation. First, our method constructs an ensemble of deep CNN models with different architectures that are complementary to each other. Second, we augment the PASCAL VOC training set with images from the Microsoft COCO dataset to significantly enlarge the amount training data. Importantly, we select a subset of the Microsoft COCO images to be consistent with the PASCAL VOC task. Results on the PASCAL VOC evaluation server show that our proposed method outperform all previous methods on the PASCAL VOC 2012 detection task at time of submission.
https://arxiv.org/abs/1506.07224
Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in current datasets. To align movies and books we exploit a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.
书籍是细粒度信息,角色,物体或场景的外观,以及高层语义,人们在想什么,感觉以及这些状态如何通过故事演变的丰富资源。本文旨在将书籍与他们的电影版本对齐,以便为语义上远远超出当前数据集中可用字幕的视觉内容提供丰富的描述性解释。为了对齐电影和书籍,我们利用从大型书籍库以无监督方式训练的神经语句嵌入,以及用于计算书中电影剪辑和句子之间相似性的视频 - 文本神经嵌入。我们提出了一个上下文感知CNN来结合来自多个来源的信息。我们展示了电影/书籍对齐的良好定量性能,并展示了几个定性的例子,展示了我们的模型可用于多样化的任务。
https://arxiv.org/abs/1506.06724
Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image caption system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifting among the visual regions imposes a thread of visual ordering. This alignment characterizes the flow of “abstract meaning”, encoding what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets. We show that using either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.
自动生成图像字幕的最新进展表明,可以用精确和有意义的句子描述图像传递的最显着的信息。在本文中,我们提出了一个图像标题系统,利用图像和句子之间的平行结构。在我们的模型中,给定先前生成的单词的下一个单词的生成过程与视觉感知体验是一致的,在视觉体验中,视觉区域中的注意力转移强加了视觉排序的线索。这种对齐表征了“抽象含义”的流程,即对视觉场景和文本描述两者在语义上共享的内容进行编码。我们的系统还通过引入捕获图像中编码的更高级语义信息的场景特定上下文来做出另一种新颖的建模贡献。上下文将词生成的语言模型调整为特定的场景类型。我们对我们的系统进行了基准测试,并与几个流行数据集上公布的结果进行对比。我们显示,使用基于区域的注意力或场景特定的上下文可以改善没有这些组件的系统。此外,将这两种造型成分结合起来,达到了最先进的性能。
https://arxiv.org/abs/1506.06272
During moonlit nights, observations with ground-based Cherenkov telescopes at very high energies (VHE, $E>100$ GeV) are constrained since the photomultiplier tubes (PMTs) in the telescope camera are extremely sensitive to the background moonlight. Observations with the VERITAS telescopes in the standard configuration are performed only with a moon illumination less than 35$\%$ of full moon. Since 2012, the VERITAS collaboration has implemented a new observing mode under bright moonlight, by either reducing the voltage applied to the PMTs (reduced-high-voltage configuration, RHV), or by utilizing UV-transparent filters. While these operating modes result in lower sensitivity and increased energy thresholds, the extension of the available observing time is useful for monitoring variable sources such as blazars and sources requiring spectral measurements at the highest energies. In this paper we report the detection of $\gamma$-ray flaring activity from the BL Lac object 1ES 1727+502 during RHV observations. This detection represents the first evidence of VHE variability from this blazar. The integral flux is $(1.1\pm0.2)\times10^{-11}\mathrm{cm^{-2}s^{-1}}$ above 250 GeV, which is about five times higher than the low-flux state. The detection triggered additional \veritas\ observations during standard dark-time. Multiwavelength observations with the FLWO 48” telescope, and the Swift and Fermi satellites are presented and used to produce the first spectral energy distribution (SED) of this object during $\gamma$-ray flaring activity. The SED is then fitted with a standard synchrotron-self-Compton model, placing constraints on the properties of the emitting region and of the acceleration mechanism at the origin of the relativistic particle population in the jet.
https://arxiv.org/abs/1506.06246
Wireless Gigabit (WiGig) access points (APs) using 60 GHz unlicensed frequency band are considered as key enablers for future Gbps wireless local area networks (WLANs). Exhaustive search analog beamforming (BF) is mainly used with WiGig transmissions to overcome channel propagation loss and accomplish high rate data transmissions. Due to its short range transmission with high susceptibility to path blocking, a multiple number of WiGig APs should be installed to fully cover a typical target environment. Therefore, coordination among the installed APs is highly needed for enabling WiGig concurrent transmissions while overcoming packet collisions and reducing interference, which highly increases the total throughput of WiGig WLANs. In this paper, we propose a comprehensive architecture for coordinated WiGig WLANs. The proposed WiGig WLAN is based on a tight coordination between the 5 GHz (WiFi) and the 60 GHz (WiGig) unlicensed frequency bands. By which, the wide coverage WiFi band is used to do the signaling required for organizing WiGig concurrent data transmissions using control/user (C/U) plane splitting. To reduce interference to existing WiGig data links while doing BF, a novel location based BF mechanism is also proposed based on WiFi fingerprinting. The proposed coordinated WiGig WLAN highly outperforms conventional un-coordinated one in terms of total throughput, average packet delay and packet dropping rate.
https://arxiv.org/abs/1506.05857
The computational complexity of kernel methods has often been a major barrier for applying them to large-scale learning problems. We argue that this barrier can be effectively overcome. In particular, we develop methods to scale up kernel models to successfully tackle large-scale learning problems that are so far only approachable by deep learning architectures. Based on the seminal work by Rahimi and Recht on approximating kernel functions with features derived from random projections, we advance the state-of-the-art by proposing methods that can efficiently train models with hundreds of millions of parameters, and learn optimal representations from multiple kernels. We conduct extensive empirical studies on problems from image recognition and automatic speech recognition, and show that the performance of our kernel models matches that of well-engineered deep neural nets (DNNs). To the best of our knowledge, this is the first time that a direct comparison between these two methods on large-scale problems is reported. Our kernel methods have several appealing properties: training with convex optimization, cost for training a single model comparable to DNNs, and significantly reduced total cost due to fewer hyperparameters to tune for model selection. Our contrastive study between these two very different but equally competitive models sheds light on fundamental questions such as how to learn good representations.
http://arxiv.org/abs/1411.4000
We propose a novel deep neural network architecture for semi-supervised semantic segmentation using heterogeneous annotations. Contrary to existing approaches posing semantic segmentation as a single task of region-based classification, our algorithm decouples classification and segmentation, and learns a separate network for each task. In this architecture, labels associated with an image are identified by classification network, and binary segmentation is subsequently performed for each identified label in segmentation network. The decoupled architecture enables us to learn classification and segmentation networks separately based on the training data with image-level and pixel-wise class labels, respectively. It facilitates to reduce search space for segmentation effectively by exploiting class-specific activation maps obtained from bridging layers. Our algorithm shows outstanding performance compared to other semi-supervised approaches even with much less training images with strong annotations in PASCAL VOC dataset.
https://arxiv.org/abs/1506.04924
The FEAST eigensolver package is a free high-performance numerical library for solving the Hermitian and non-Hermitian eigenvalue problems, and obtaining all the eigenvalues and (right/left) eigenvectors within a given search interval or arbitrary contour in the complex plane. Its originality lies with a new transformative numerical approach to the traditional eigenvalue algorithm design - the FEAST algorithm. The FEAST eigensolver combines simplicity and efficiency and it offers many important capabilities for achieving high performance, robustness, accuracy, and scalability on parallel architectures. FEAST is both a comprehensive library package, and an easy to use software. It includes flexible reverse communication interfaces and ready to use predefined interfaces for dense, banded and sparse systems. The current version v3.0 of the FEAST package can address both Hermitian and non-Hermitian eigenvalue problems (real symmetric, real non-symmetric, complex Hermitian, complex symmetric, or complex general systems) on both shared-memory and distributed memory architectures (i.e contains both FEAST-SMP and FEAST-MPI packages). This User’s guide provides instructions for installation setup, a detailed description of the FEAST interfaces and a large number of examples.
https://arxiv.org/abs/1203.4031
This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is competitive in the ratio of captions which pass the Turing test and which are assessed as better or equal to human captions.
本报告将我们提交给2015年MS COCO字幕挑战。该方法使用卷积神经网络激活作为嵌入来查找语义相似的图像。从这些图像中,最典型的标题是基于单字频率选择的。尽管该方法在自动评估指标和人类评估的平均正确性方面获得了低分,但它在通过图灵测试的字幕比例上具有竞争性,并被评估为与人类字幕更好或相同。
https://arxiv.org/abs/1506.03995
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/~junhua.mao/m-RNN.html .
在本文中,我们提出了一个多模式递归神经网络(m-RNN)模型来生成新的图像字幕。它直接模拟给出前一个词和一个图像产生一个词的概率分布。图像标题是通过从这个分布采样生成的。该模型由两个子网络组成:用于句子的深度递归神经网络和用于图像的深度卷积网络。这两个子网在多模态层相互作用形成整个m-RNN模型。我们模型的有效性通过四个基准数据集(IAPR TC-12,Flickr 8K,Flickr 30K和MS COCO)进行验证。我们的模型胜过最先进的方法。另外,我们将m-RNN模型应用于检索图像或句子的检索任务,并且相对于直接优化检索的排序目标函数的现有技术的方法实现显着的性能改进。这项工作的项目页面是:www.stat.ucla.edu/~junhua.mao/m-RNN.html。
https://arxiv.org/abs/1412.6632
The Hopfield recurrent neural network is a classical auto-associative model of memory, in which collections of symmetrically-coupled McCulloch-Pitts neurons interact to perform emergent computation. Although previous researchers have explored the potential of this network to solve combinatorial optimization problems and store memories as attractors of its deterministic dynamics, a basic open problem is to design a family of Hopfield networks with a number of noise-tolerant memories that grows exponentially with neural population size. Here, we discover such networks by minimizing probability flow, a recently proposed objective for estimating parameters in discrete maximum entropy models. By descending the gradient of the convex probability flow, our networks adapt synaptic weights to achieve robust exponential storage, even when presented with vanishingly small numbers of training patterns. In addition to providing a new set of error-correcting codes that achieve Shannon’s channel capacity bound, these networks also efficiently solve a variant of the hidden clique problem in computer science, opening new avenues for real-world applications of computational models originating from biology.
https://arxiv.org/abs/1411.4625
Training large-scale question answering systems is complicated because training sources usually cover a small portion of the range of possible questions. This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions. To this end, we introduce a new dataset of 100k questions that we use in conjunction with existing benchmarks. We conduct our study within the framework of Memory Networks (Weston et al., 2015) because this perspective allows us to eventually scale up to more complex reasoning, and show that Memory Networks can be successfully trained to achieve excellent performance.
https://arxiv.org/abs/1506.02075
Generating descriptions for videos has many applications including assisting blind people and human-robot interaction. The recent advances in image captioning as well as the release of large-scale movie description datasets such as MPII Movie Description allow to study this task in more depth. Many of the proposed methods for image captioning rely on pre-trained object classifier CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating descriptions. While image description focuses on objects, we argue that it is important to distinguish verbs, objects, and places in the challenging setting of movie description. In this work we show how to learn robust visual classifiers from the weak annotations of the sentence descriptions. Based on these visual classifiers we learn how to generate a description using an LSTM. We explore different design choices to build and train the LSTM and achieve the best performance to date on the challenging MPII-MD dataset. We compare and analyze our approach and prior work along various dimensions to better understand the key challenges of the movie description task.
为视频生成描述有许多应用,包括帮助盲人和人机交互。最近在图像字幕方面的进展以及MPII Movie Description等大型电影描述数据集的发布,可以更深入地研究这一任务。许多所提出的图像字幕的方法依赖于预先训练的对象分类器CNN和长 - 短期记忆递归网络(LSTM)来生成描述。虽然图像描述侧重于对象,但我们认为在电影描述的挑战性环境中区分动词,宾语和地点是很重要的。在这项工作中,我们展示了如何从弱句子描述中学习强健的视觉分类器。基于这些视觉分类器,我们学习如何使用LSTM生成描述。我们探索不同的设计选择来构建和培训LSTM,并在具有挑战性的MPII-MD数据集上实现迄今为止的最佳性能。我们比较和分析我们的方法和沿着各个维度的前期工作,以更好地理解电影描述任务的关键挑战。
https://arxiv.org/abs/1506.01698
Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.
https://arxiv.org/abs/1506.00842
In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new deep learning object detection framework has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of model averaging. The proposed approach improves the mean averaged precision obtained by RCNN \cite{girshick2014rich}, which was the state-of-the-art, from 31\% to 50.3\% on the ILSVRC2014 detection test set. It also outperforms the winner of ILSVRC2014, GoogLeNet, by 6.1\%. Detailed component-wise analysis is also provided through extensive experimental evaluation, which provide a global view for people to understand the deep learning object detection pipeline.
https://arxiv.org/abs/1412.5661
Recurrent Neural Networks (RNNs) have become increasingly popular for the task of language understanding. In this task, a semantic tagger is deployed to associate a semantic label to each word in an input sequence. The success of RNN may be attributed to its ability to memorize long-term dependence that relates the current-time semantic label prediction to the observations many time instances away. However, the memory capacity of simple RNNs is limited because of the gradient vanishing and exploding problem. We propose to use an external memory to improve memorization capability of RNNs. We conducted experiments on the ATIS dataset, and observed that the proposed model was able to achieve the state-of-the-art results. We compare our proposed model with alternative models and report analysis results that may provide insights for future research.
https://arxiv.org/abs/1506.00195
Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their inability to correctly translate very rare words: end-to-end NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible out-of-vocabulary (OOV) word. In this paper, we propose and implement an effective technique to address this problem. We train an NMT system on data that is augmented by the output of a word alignment algorithm, allowing the NMT system to emit, for each OOV word in the target sentence, the position of its corresponding word in the source sentence. This information is later utilized in a post-processing step that translates every OOV word using a dictionary. Our experiments on the WMT14 English to French translation task show that this method provides a substantial improvement of up to 2.8 BLEU points over an equivalent NMT system that does not use this technique. With 37.5 BLEU points, our NMT system is the first to surpass the best result achieved on a WMT14 contest task.
https://arxiv.org/abs/1410.8206
Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).
https://arxiv.org/abs/1503.00075
In this paper, we propose using \textit{augmented hypotheses} which consider objectness, foreground and compactness for salient object detection. Our algorithm consists of four basic steps. First, our method generates the objectness map via objectness hypotheses. Based on the objectness map, we estimate the foreground margin and compute the corresponding foreground map which prefers the foreground objects. From the objectness map and the foreground map, the compactness map is formed to favor the compact objects. We then derive a saliency measure that produces a pixel-accurate saliency map which uniformly covers the objects of interest and consistently separates fore- and background. We finally evaluate the proposed framework on two challenging datasets, MSRA-1000 and iCoSeg. Our extensive experimental results show that our method outperforms state-of-the-art approaches.
https://arxiv.org/abs/1505.07930
The paper describes clustering problems from the combinatorial viewpoint. A brief systemic survey is presented including the following: (i) basic clustering problems (e.g., classification, clustering, sorting, clustering with an order over cluster), (ii) basic approaches to assessment of objects and object proximities (i.e., scales, comparison, aggregation issues), (iii) basic approaches to evaluation of local quality characteristics for clusters and total quality characteristics for clustering solutions, (iv) clustering as multicriteria optimization problem, (v) generalized modular clustering framework, (vi) basic clustering models/methods (e.g., hierarchical clustering, k-means clustering, minimum spanning tree based clustering, clustering as assignment, detection of clisue/quasi-clique based clustering, correlation clustering, network communities based clustering), Special attention is targeted to formulation of clustering as multicriteria optimization models. Combinatorial optimization models are used as auxiliary problems (e.g., assignment, partitioning, knapsack problem, multiple choice problem, morphological clique problem, searching for consensus/median for structures). Numerical examples illustrate problem formulations, solving methods, and applications. The material can be used as follows: (a) a research survey, (b) a fundamental for designing the structure/architecture of composite modular clustering software, (c) a bibliography reference collection, and (d) a tutorial.
https://arxiv.org/abs/1505.07872
In this paper, we propose a novel label propagation based method for saliency detection. A key observation is that saliency in an image can be estimated by propagating the labels extracted from the most certain background and object regions. For most natural images, some boundary superpixels serve as the background labels and the saliency of other superpixels are determined by ranking their similarities to the boundary labels based on an inner propagation scheme. For images of complex scenes, we further deploy a 3-cue-center-biased objectness measure to pick out and propagate foreground labels. A co-transduction algorithm is devised to fuse both boundary and objectness labels based on an inter propagation scheme. The compactness criterion decides whether the incorporation of objectness labels is necessary, thus greatly enhancing computational efficiency. Results on five benchmark datasets with pixel-wise accurate annotations show that the proposed method achieves superior performance compared with the newest state-of-the-arts in terms of different evaluation metrics.
https://arxiv.org/abs/1505.07192
Virality of online content on social networking websites is an important but esoteric phenomenon often studied in fields like marketing, psychology and data mining. In this paper we study viral images from a computer vision perspective. We introduce three new image datasets from Reddit, and define a virality score using Reddit metadata. We train classifiers with state-of-the-art image features to predict virality of individual images, relative virality in pairs of images, and the dominant topic of a viral image. We also compare machine performance to human performance on these tasks. We find that computers perform poorly with low level features, and high level information is critical for predicting virality. We encode semantic information through relative attributes. We identify the 5 key visual attributes that correlate with virality. We create an attribute-based characterization of images that can predict relative virality with 68.10% accuracy (SVM+Deep Relative Attributes) – better than humans at 60.12%. Finally, we study how human prediction of image virality varies with different `contexts’ in which the images are viewed, such as the influence of neighbouring images, images recently viewed, as well as the image title or caption. This work is a first step in understanding the complex but important phenomenon of image virality. Our datasets and annotations will be made publicly available.
社交网站上的在线内容的病毒性是在市场营销,心理学和数据挖掘等领域经常研究的一个重要但深奥的现象。在本文中,我们从计算机视觉角度研究病毒图像。我们介绍来自Reddit的三个新的图像数据集,并使用Reddit元数据定义病毒传播得分。我们使用最先进的图像特征来训练分类器,以预测单个图像的病毒传播,图像对的相对病毒传播速度以及病毒图像的主要话题。我们还将这些任务的机器性能与人的表现进行比较。我们发现,计算机表现不佳,低层次的特征,高层次的信息对于预测病毒传播是至关重要的。我们通过相关属性编码语义信息。我们确定与病毒传播相关的5个关键视觉属性。我们创建了一个基于属性的图像表征,可以以68.10%的准确率(SVM +深度相对属性)预测相对的病毒传播速度 - 比人类在60.12%更好。最后,我们研究了图像传播性的人预测如何与这些图像被视为不同的`环境,如相邻图像的影响,最近看的图像,以及图像标题或标题变化。这项工作是了解图像病毒传播的复杂而重要现象的第一步。我们的数据集和注释将公开发布。
https://arxiv.org/abs/1503.02318
From the numerous detected planets outside the Solar system, no terrestrial planet comparable to our Earth has been discovered so far. The search for an Exo-Earth is certainly a big challenge which may require the detections of planetary systems resembling our Solar system in order to find life like on Earth. However, even if we find Solar system analogues, it is not certain that a planet in Earth position will have similar circumstances as those of Earth. Small changes in the architecture of the giant planets can lead to orbital perturbations which may change the conditions of habitability for a terrestrial planet in the habitable zone (HZ). We present a numerical investigation where we first study the motion of test-planets in a particular Jupiter-Saturn configuration for which we can expect strong gravitational perturbations on the motion at Earth position according to a previous work. In this study, we show that these strong perturbations can be reduced significantly by the neighboring planets of Earth. In the second part of our study we investigate the motion of test-planets in inclined Jupiter-Saturn systems where we analyze changes in the dynamical behavior of the inner planetary system. Moderate values of inclination seem to counteract the perturbations in the HZ while high inclinations induce more chaos in this region. Finally, we carry out a stability study of the actual orbits of Venus, Earth and Mars moving in the inclined Jupiter-Saturn systems for which we used the Solar system parameters. This study shows that the three terrestrial planets will only move in low-eccentric orbits if Saturn’s inclination is <=10°. Therefore, it seems that it is advantageous for the habitability of Earth when all planets move nearly in the same plane.
https://arxiv.org/abs/1505.07039