Geração automática de padrões de navegação para web sites de conteúdo dinâmico

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal do Amazonas

Resumo

A growing number of Web applications need to process collection of similar pages obtained from Web sites. These applications have the ultimate goal of taking advantage of the valuable information implicitly available in these pages to perform such tasks as querying, searching, data extraction and mining. For some of these applications, the criteria to determine when a Web page must be present in a collection are related to features of the content of the page. However, there are many other important applications in which the inherent structure of the pages, instead of its content, provides a better criterion for gathering the pages. Motivated by this problem, we propose in this work a new approach for generating structure-driven crawlers that requires a minimum effort from the user, since it only require an example of the page to be crawled and an entry point to the Web site. Another important feature in our approach is that it is capable of dealing with Web sites in which the pages to be collected are dynamically generated through the filling of forms. Contrary to existing methods in the literature, our approach does not require a sample database to help in the process of filling out forms and it also does not demand a great interaction with users. Results obtained in experiments with our approach demonstrate a 100% value of precision in craws performed over 17 real Web sites with static and dynamic contents and at least 95% of recall in all 11 static Web sites.

Descrição

Citação

VIDAL, Márcio Luiz Assis.Geração automática de padrões de navegação para web sites de conteúdo dinâmico. 2006. 61 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2006.

Avaliação

Revisão

Suplementado Por

Referenciado Por