Seleção de atributos relevantes: aplicando técnicas na base de dados do Herbário Virtual da Flora e dos Fungos
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Amazonas
Resumo
Virtual herbariums aim to disseminate scientific information and contribute to the
conservation and sustainable use of Brazilian biological resources. It currently includes 120
national herbaria and 25 herbariums from abroad, together provide more than 5,4 million
records and more than one million images, in addition to several free access tools, opening
space for the application of Machine Learning techniques, among them classifiers. In the
Machine Learning process, Attribute Selection is part of the pre-processing of data and can
correspond to 80% of the data mining phase, for this it is necessary to study the approaches
used to make the selection of a subset of attributes that better generalize the basis to be
induced to the model of machine learning. The objective of this work is to apply the attributes
selection processes with the following filter, wrapper and embedded approaches in the
National Institute of Science and Technology (NIST) - Virtual Herbarium of Flora and Fungi,
this base contains 87,732 records and 51 features, with 119 collections and sub-collections,
86,967 online records, 80,513 georeferenced records, 12,073 different accepted species. The
first phase of machine learning processes is the pre-processing, which will analyze the
database and will result in a more general and ready basis for the application of the predictive
models of classification, after the filter of the most relevant subset of attributes, the Machine
Learning algorithms are applied, which in this research was: Decision Tree, Network Neural
Artificial and Logistic Regression. The evaluation of the models will be through the confusion
matrix using the accuracy and the analysis of the area on the ROC curve. Among the models
studied, the Logistic Regression was the one that obtained the performance with a total
accuracy of 77.25%, with the filter approach and 76.25% with the wrapper.
Descrição
Citação
SOUZA, Adriano Honorato de. Seleção de atributos relevantes: aplicando técnicas na base de dados do Herbário Virtual da Flora e dos Fungos. 2017. 81 f. Dissertação (Mestrado em Ciência e Tecnologia para Recursos Amazônicos) - Universidade Federal do Amazonas, Itacoatiara, 2017.
Avaliação
Revisão
Suplementado Por
Referenciado Por
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto

