Uma estratégia eficiente de treinamento para Programação Genética aplicada a deduplicação de registros

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal do Amazonas

Resumo

The amount of information available through digital media has increased considerably in recent decades. This fact causes concern among managers of large data repositories. Dealing with this growth and protect the data effectively is an even greater challenge. In many repositories, one of the main problems is the existence of replicated data. This can impact the quality of data and the ability to provide services able to meet the demands of its customers. However, the removal of replicated records is a task that requires a lot of time and processing effort. Nowadays, one of the techniques that has been effectively applied in the task of identify records that are replicated is the Genetic Programming (GP). One of the main requirements of this technique is the use examples (usually created manually) in its training step. Another GP major requirement is its processing time. This happens because during the training step each record is compared to all other existing ones in the data repository. Thus, the time required to perform all these comparisons during the GP training step can be very costly, even for small repositories. For those reasons, this dissertation proposes a novel approach based in a strategy the combines a clustering technique with a sliding window, aiming at minimize the number of comparisons required in the PG training stage. Experiments using synthetic and real datasets show that it is possible to reduce the time cost of GP training step up to 70%, without a significant reduction in the quality of generated solutions

Descrição

Citação

SILVA, Davi Guimarães da. Uma estratégia eficiente de treinamento para Programação Genética aplicada a deduplicação de registros. 2016. 80 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2016.

Avaliação

Revisão

Suplementado Por

Referenciado Por

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto