Avaliação do uso de quantificadores de teoria da informação para identificação de conversas online de pedofilia

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal do Amazonas

Resumo

Social networks of instant messaging, such as Whatsapp, represent a real threat for children and teenagers, who can easily become targets of sexual predators and pedophiles. Hence, the automatic identification of pedophile chats represent a key tool to protect the young users of social networks. However, these networks have two sensitive particularities: (1) messages are often stored only locally; (2) mobile devices of limited processing power are the major interfaces. In this context, the state-of-the-art has a prohibitive cost to run on mobile devices. On the other hand, the nature of the peer-to-peer communication of such networks make it inviable to process the chat on the cloud, without risking to expose the victims. In this work, we present a new method, based on the Shannon entropy and the Jensen-Shannon divergence, to identify pedophile chats, that achieves nearly 90% of F1 and F0.5, and can be up to 72.8% faster than the state-of-the-art. In this work, we present a method for extracting text features based on two information theory quantifiers, using individual histograms of words representing the conversations and three mean histograms that represent the discourse pattern of possible types of authors present on the basis of Data: Predator (pedophile), victim and regular (neither victim nor predator). The first quantifier is Shannon’s entropy which indicates repetition of the subject’s subject in conversations, the second is the Jensen-Shannon divergence that measures the similarity between speech in a conversation relative to the discourse pattern of author types. The proposed method is able to summarize the conversations considered in the study in three characteristics of entropy and three characteristics of divergence independent of the amount of conversations considered in the experiments. This compact feature vector allows a classifier to be able to identify pedophile conversations with a performance close to 90%, considering the measures F1 and F0.5, and that it becomes 72.8% faster than the state of the art.

Descrição

Citação

POSTAL, Juliana Gorayeb. Avaliação do uso de quantificadores de teoria da informação para identificação de conversas online de pedofilia. 2017. 66 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2017.

Avaliação

Revisão

Suplementado Por

Referenciado Por

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto