Reconhecimento de indivíduos multimodal (face e voz): análise comparativa entre uma abordagem de aprendizado de máquina clássica e uma proposta utilizando rede neural profunda

Resumo

Humans use body features such as face, voice and eyes in conjunction with other contextual information to recognize themselves. Biometric recognition seeks to identify an individual using behavioral, physical or psychological characteristics. This work presents a comparative analysis between a classical machine learning approach and a proposal using a deep neural network in the individual recognition activity. Two biometric modes were used: face and voice. These data were obtained from the MOBIO bimodal database (MCCOOL et al, 2012). Fifty individuals were used, 37 men and 13 women. A pre-processing was applied to the images, extracting the face, standardizing it in 64x80 and converting it to monochrome. An autoencoder was used to obtain a reduced face data representation. For voice, an activity detector was chosen to classify audio excerpts with or without voice. Mel Cepstral coefficients and their derived coefficients were extracted, composing 39 coefficients. Unimodal and multimodal models of biometric identification were developed, totaling 6 architectures. The multimodal model with machine learning techniques has a fusion step at the scoring level and Learning Vector Quantization (LVQ). The multimodal model with deep machine learning techniques has a feature level fusion and a Convolutional Neural Network (CNN). The proposed architectures were tested in different cluster scenarios, audio frames number, encoding layer dimension, MFCCs coefficients number, regularization and optimizers. The systems were evaluated through the area under the ROC curve (AUC-ROC), True Acceptance Rate (TAR) and False Acceptance Rate (FAR) and best operating point threshold. In addition, the training and testing time of networks was measured. The results show that for the multimodal proposal with LVQ, an AUC-ROC of 0.98 was obtained and the multimodal proposal with CNN reached an AUC-ROC value of 0.99. The results showed that deep learning produces better performances, in addition to more optimized training. Thus, the architectures proposed in this work can constitute a good starting point for implementing a robust system for automatic identification of individuals.

Descrição

Citação

NEGREIRO, João Victor Campos de. Reconhecimento de indivíduos multimodal (face e voz): análise comparativa entre uma abordagem de aprendizado de máquina clássica e uma proposta utilizando rede neural profunda. 2022. 103 f. Dissertação (Mestrado em Engenharia Elétrica) - Universidade Federal do Amazonas, Manaus (AM), 2022.

Avaliação

Revisão

Suplementado Por

Referenciado Por

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto