Uma estratégia eficiente de treinamento para Programação Genética aplicada a deduplicação de registros
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Amazonas
Resumo
The amount of information available through digital media has increased considerably
in recent decades. This fact causes concern among managers of large data repositories.
Dealing with this growth and protect the data effectively is an even greater challenge.
In many repositories, one of the main problems is the existence of replicated data.
This can impact the quality of data and the ability to provide services able to meet the
demands of its customers. However, the removal of replicated records is a task that
requires a lot of time and processing effort.
Nowadays, one of the techniques that has been effectively applied in the task of
identify records that are replicated is the Genetic Programming (GP). One of the main
requirements of this technique is the use examples (usually created manually) in its
training step. Another GP major requirement is its processing time. This happens
because during the training step each record is compared to all other existing ones in
the data repository. Thus, the time required to perform all these comparisons during
the GP training step can be very costly, even for small repositories.
For those reasons, this dissertation proposes a novel approach based in a strategy
the combines a clustering technique with a sliding window, aiming at minimize the
number of comparisons required in the PG training stage. Experiments using synthetic
and real datasets show that it is possible to reduce the time cost of GP training step
up to 70%, without a significant reduction in the quality of generated solutions
Descrição
Citação
SILVA, Davi Guimarães da. Uma estratégia eficiente de treinamento para Programação Genética aplicada a deduplicação de registros. 2016. 80 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2016.
Coleções
Avaliação
Revisão
Suplementado Por
Referenciado Por
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto

