Avaliação do uso de quantificadores de teoria da informação para identificação de conversas online de pedofilia
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Amazonas
Resumo
Social networks of instant messaging, such as Whatsapp, represent a real threat for children
and teenagers, who can easily become targets of sexual predators and pedophiles.
Hence, the automatic identification of pedophile chats represent a key tool to protect the
young users of social networks. However, these networks have two sensitive particularities:
(1) messages are often stored only locally; (2) mobile devices of limited processing power
are the major interfaces. In this context, the state-of-the-art has a prohibitive cost to run
on mobile devices. On the other hand, the nature of the peer-to-peer communication of
such networks make it inviable to process the chat on the cloud, without risking to expose
the victims. In this work, we present a new method, based on the Shannon entropy and
the Jensen-Shannon divergence, to identify pedophile chats, that achieves nearly 90% of
F1 and F0.5, and can be up to 72.8% faster than the state-of-the-art. In this work, we
present a method for extracting text features based on two information theory quantifiers,
using individual histograms of words representing the conversations and three mean
histograms that represent the discourse pattern of possible types of authors present on the
basis of Data: Predator (pedophile), victim and regular (neither victim nor predator). The
first quantifier is Shannon’s entropy which indicates repetition of the subject’s subject in
conversations, the second is the Jensen-Shannon divergence that measures the similarity
between speech in a conversation relative to the discourse pattern of author types.
The proposed method is able to summarize the conversations considered in the study
in three characteristics of entropy and three characteristics of divergence independent of
the amount of conversations considered in the experiments. This compact feature vector
allows a classifier to be able to identify pedophile conversations with a performance close
to 90%, considering the measures F1 and F0.5, and that it becomes 72.8% faster than the
state of the art.
Descrição
Palavras-chave
Citação
POSTAL, Juliana Gorayeb. Avaliação do uso de quantificadores de teoria da informação para identificação de conversas online de pedofilia. 2017. 66 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2017.
Coleções
Avaliação
Revisão
Suplementado Por
Referenciado Por
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto

