Reconhecimento de indivíduos multimodal (face e voz): análise comparativa entre uma abordagem de aprendizado de máquina clássica e uma proposta utilizando rede neural profunda
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Amazonas
Resumo
Humans use body features such as face, voice and eyes in conjunction with other contextual information to recognize themselves. Biometric recognition seeks to identify an individual using behavioral, physical or psychological characteristics. This work presents a comparative analysis between a classical machine learning approach and a proposal using a deep neural network in the individual recognition activity. Two biometric modes were used: face and voice. These data were obtained from the MOBIO bimodal database (MCCOOL et al, 2012). Fifty individuals were used, 37 men and 13 women. A pre-processing was applied to the images, extracting the face, standardizing it in 64x80 and converting it to monochrome. An autoencoder was used to obtain a reduced face data representation. For voice, an activity detector was chosen to classify audio excerpts with or without voice. Mel Cepstral coefficients and their derived coefficients were extracted, composing 39 coefficients. Unimodal and multimodal models of biometric identification were developed, totaling 6 architectures. The multimodal model with machine learning techniques has a fusion step at the scoring level and Learning Vector Quantization (LVQ). The multimodal model with deep machine learning techniques has a feature level fusion and a Convolutional Neural Network (CNN). The proposed architectures were tested in different cluster scenarios, audio frames number, encoding layer dimension, MFCCs coefficients number, regularization and optimizers. The systems were evaluated through the area under the ROC curve (AUC-ROC), True Acceptance Rate (TAR) and False Acceptance Rate (FAR) and best operating point threshold. In addition, the training and testing time of networks was measured. The results show that for the multimodal proposal with LVQ, an AUC-ROC of 0.98 was obtained and the multimodal proposal with CNN reached an AUC-ROC value of 0.99. The results showed that deep learning produces better performances, in addition to more optimized training. Thus, the architectures proposed in this work can constitute a good starting point for implementing a robust system for automatic identification of individuals.
Descrição
Citação
NEGREIRO, João Victor Campos de. Reconhecimento de indivíduos multimodal (face e voz): análise comparativa entre uma abordagem de aprendizado de máquina clássica e uma proposta utilizando rede neural profunda. 2022. 103 f. Dissertação (Mestrado em Engenharia Elétrica) - Universidade Federal do Amazonas, Manaus (AM), 2022.
Coleções
Avaliação
Revisão
Suplementado Por
Referenciado Por
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto

