papers AI Learner
The Github is limit! Click to go to the new site.

Effectively Searching Maps in Web Documents

2009-01-26
Qingzhao Tan, Prasenjit Mitra, C. Lee Giles

Abstract

Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to cull through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.

Abstract (translated by Google)

地图是考古学和其他科学的重要信息来源。用户需要搜索历史地图,以确定不同时期地区政治地理的历史记录,找出究竟发生了哪些考古文物等等。目前,他们必须使用通用搜索引擎,并添加术语地图其他关键字来搜索地图。这种粗糙的方法将产生大量的误报,用户将需要剔除以获得期望的结果。为了减少他们的手工努力,我们提出了一个自动地图识别,索引和检索系统,使用户能够使用简单的关键字查询搜索和检索出现在数字文档的大型语料库中的地图。我们确定可以帮助区分地图和数字文档中其他图形的功能,并展示如何使用支持向量机分类器来识别地图。我们提出了地图级别的元数据,例如标题,文本中对地图的引用等等,以及文档级别的元数据,例如标题,摘要,引用,出版物的最近时间等,并显示如何自动提取并索引。我们的新颖的排名算法对不同的元数据字段进行不同的加权,并使用文档级元数据来帮助对检索到的地图进行排名实证评估显示应该选择哪些特征,哪些元数据域应该加权得更多。与适应现有的地图检索方法相比,我们还展示了改进的检索结果。我们的地图搜索引擎已经部署在盲检数字图书馆系统的在线地图搜索系统中。

URL

https://arxiv.org/abs/0901.3939

PDF

https://arxiv.org/pdf/0901.3939


Similar Posts

Comments