Minería de términos frasales aplicada en tareas de recuperación de información

Sánchez Vera, Zulema

Minería de términos frasales aplicada en tareas de recuperación de información

Arquivos

Primário Dissertação_ZulemaSanchezVera_PPGI.pdf (1.11 MB)

333 Folha de Aprovação - Zulema Sanchez (Assinada).pdf (307.62 KB)

333 ATA de Defesa - Zulema Sanchez (Assinada).pdf (477.87 KB)

Carta de orientador.pdf (290.67 KB)

Data

2019-04-29

Autores

Sánchez Vera, Zulema

Editor

Universidade Federal do Amazonas

Resumo

The spectacular and constant growth of the web with the consequent increase in the number of digital documents available and the increasingly frequent use of systems that deal with textual information, have motivated constant efforts in the development of effective systems for the treatment of information. who perform tasks such as search, classification and clustering in textual databases. Well-known relevance of the representation of the text in the results of the retrieval of information, this research investigates the impact of the addition of frasal terms as units, due to its superior interpretability, with the aim of enriching the traditional representation of the BoW model. The idea is that with the use of phrasal terms the inherent noise and ambiguity of the representation of the text based only on individual words is reduced, resulting in higher quality in the results obtained. For the mining of phrasal terms the method was used Autophrase that integrates the segmentation and quality evaluation approaches for the extraction of word sequences, which constitute complete semantic units, does not require human experts, is independent of the language, domain and incorporates syntactic information in the form of POS labels provided it is available. In the ad hoc search the vector model was used in the data sets: OHSUMED, Cystic Fibrosis and Glasgow Herald 1995, the experiments performed show gains in the order of 34.97 % using the MAP metric. Observing that the addition of semantic information in the form of phrasal terms in the queries, favors the identification of the relevant documents. In the tasks of classification and clustering, performance improvement in terms of precision was compared, when the best phrasal terms evaluated by the techniques Chi2 and mutual information were added to extend the representation of the documents, based in individual words in the collections 20 newsgroups, DBpedia ontological classification and AG’news corpus respectively. For this comparison, the classifiers Naive Bayes, Support vector machines were used in classification and K-means in the clustering. The results did not show significant advances with the incorporation of the phrasal terms. The conclusion, in this case, is that the documents already contain enough information in the form of unigrams that contribute more weight than the phrasal terms that increase the dispersion of the data.

Palavras-chave

Recuperação da informação, Sistemas de recuperação da informação, Termos-chave

Citação

SÁNCHEZ VERA, Zulema. Minería de términos frasales aplicada en tareas de recuperación de información. 2019. 57 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2019.

URI

https://tede.ufam.edu.br/handle/tede/7189

Coleções

Mestrado em Informática

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto

Página do item completo

Minería de términos frasales aplicada en tareas de recuperación de información

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

Avaliação

Revisão

Suplementado Por

Referenciado Por

Licença Creative Commons