Processamento eficiente de consultas em sistemas de busca

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal do Amazonas

Resumo

Search systems have been one of the main forms of locating and retrieving information in digital environments in recent decades. They are present in a large number of applications, such as web search engines and e-commerce systems. Users of these systems more often than not have very specific information needs, only being satisfied with a few, highly relevant results. Due to this behavior, part of the recent research effort related to search systems aims to reduce computational costs to compute the top results of queries, which are the ones usually presented to most users. In this thesis, we study the problem of computing the top k results of a ranking in search engines. We present two novel document-at-a-time algorithms for fast computing of top-k query results in search systems, named as Block Max WAND with Candidate Selection and Preserving Top-K Results (BMW-CSP) and Waves. Both algorithms use multi-tier indexes for reducing the computational time required for processing queries. BMW-CSP is an extension of BMW-CS, a method previously proposed in the literature. Although very efficient, BMW-CS does not guarantee the preservation of the top-k results for a given query. Algorithms that do not preserve the top results may reduce the quality of ranking results in search systems. BMW-CSP extends BMW-CS to ensure that the top-k results will have their rankings preserved. In the experiments we performed for computing the top-10 results, the final average time required for processing queries with BMW-CSP was lesser than the ones required by the baselines adopted. As with BMWCS, the price paid by BMW-CSP, when compared to other document-at-a-time methods, is extra memory required to store partial scores of documents. Further studying the problem of query processing, we then proposed Waves. It performs successive tentative evaluations of results which we call waves. Each wave traverses the index, starting from a specific tier level i. Each wave i may insert only those documents that occur in that tier level into the answer. After processing a wave, the alv gorithm checks whether the answer achieved might be changed by successive waves or not. A new wave is started only if it has a chance of changing the top-k scores. We show through experiments that such lazy query processing strategy results in smaller query processing times when compared to previous approaches proposed in the literature. When compared to BMW-CSP, Waves presents the advantage of not requiring extra memory to store partial scores. We present experiments to compare the performance of Waves to BMW-CSP and to other state-of-the-art document-at-a-time query processing methods that preserve top-k results. These experiments indicate that the method can be an effective alternative algorithm for computing top-k results.

Descrição

Citação

DAOUD, Caio Moura. Processamento eficiente de consultas em sistemas de busca. 2016. 80 f. Tese (Doutorado em Informática) - Universidade Federal do Amazonas, Manaus, 2016.

Avaliação

Revisão

Suplementado Por

Referenciado Por

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto