Minería de términos frasales aplicada en tareas de recuperación de información
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Amazonas
Resumo
The spectacular and constant growth of the web with the consequent increase in the number of digital documents available and the increasingly frequent use of systems that deal with textual information, have motivated constant efforts in the development of effective systems for the treatment of information. who perform tasks such as search, classification and clustering in textual databases. Well-known relevance of the representation of the text in the results of the retrieval of information, this research investigates the impact of the addition of frasal terms as units, due to its superior interpretability, with the aim of enriching the traditional representation of the BoW model. The idea is that with the use of phrasal terms the inherent noise and ambiguity of the representation of the text based only on individual words is reduced, resulting in higher quality in the results obtained.
For the mining of phrasal terms the method was used Autophrase that integrates
the segmentation and quality evaluation approaches for the extraction of word sequences, which constitute complete semantic units, does not require human experts, is independent of the language, domain and incorporates syntactic information in the form of POS labels provided it is available. In the ad hoc search the vector model was used in the data sets: OHSUMED, Cystic Fibrosis and Glasgow Herald 1995, the experiments performed show gains in the order of 34.97 % using the MAP metric. Observing that the addition of semantic information in the form of phrasal terms in the queries, favors the identification of the relevant documents.
In the tasks of classification and clustering, performance improvement in terms of
precision was compared, when the best phrasal terms evaluated by the techniques Chi2
and mutual information were added to extend the representation of the documents, based in individual words in the collections 20 newsgroups, DBpedia ontological classification and AG’news corpus respectively. For this comparison, the classifiers Naive Bayes, Support vector machines were used in classification and K-means in the clustering. The results did not show significant advances with the incorporation of the phrasal terms. The conclusion, in this case, is that the documents already contain enough information in the form of unigrams that contribute more weight than the phrasal terms that increase the dispersion of the data.
Descrição
Citação
SÁNCHEZ VERA, Zulema. Minería de términos frasales aplicada en tareas de recuperación de información. 2019. 57 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2019.
Coleções
Avaliação
Revisão
Suplementado Por
Referenciado Por
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto

