Abordagens adversariais para explicação de imagens em redes neurais profundas

Resumo

Complex machine learning models have been increasingly adopted due its range of well-succeded applications. In many applications, the poor knowledge on the inside aspects of a complex model represents a challenge in detecting biases or injustices related to the decision-making process. That might directly affect the confidence on the model so its results may be considered not reliable, which can be risky when such models are responsible for critical or sensitive decisions. These problems motivate the development of methods to explain the reasons behind its decisions in a global (“Why did this model make this decision?”) or local (“Why was this category assigned to this sample?”) fashion. In this work, we focus on the local explanation problem. Given the complexity of the decision boundary on complex models, e.g. deep neural networks, it has been common to adopt explanation approaches based on neighborhood analysis, to specify the unique features and sensitivity of a sample in comparison to a neighbor class. The idea behind these approches is to find the closest separation surface which supports the samples we want to explain. Since many adversarial techniques explore this same region to fool machine learning models, it may be possible to use similar ideas to explain a class transition of a sample in terms of feature importance. Thus, in this work, we propose an approach that combines adversarial model concepts with instance explaining requirements. We asssume that an adversarial instance is a good starting point to estimate the minimum effort for a class transition. Then, we propose ways to improve this initial description to obtain one that is more suitable for users according to their perception. By means of blind tests, users have evaluated the explanations provided by our methods for justifying errors of a CNN when used to classify images of the MNIST dataset. In 68 to 74% of the judments, they considered the provided explanations significantly better than those provided by two other baselines published in the literature, one of which is also based on an adversarial strategy.

Descrição

Citação

JUNIOR, Antonio Jose Sobrinho. Abordagens adversariais para explicação de imagens em redes neurais profundas. 2021. 84 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2021.

Avaliação

Revisão

Suplementado Por

Referenciado Por

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto