Deteção de Spam baseada na evolução das características com presença de Concept Drift
Carregando...
Arquivos
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Amazonas
Resumo
Electronic messages (emails) are still considered the most significant tools in business
and personal applications due to their low cost and easy access. However, e-mails have
become a major problem owing to the high amount of junk mail, named spam, which fill
the e-mail boxes of users. Among the many problems caused by spam messages, we may
highlight the fact that it is currently the main vector for the spread of malicious activities
such as viruses, worms, trojans, phishing, botnets, among others. Such activities allow
the attacker to have illegal access to penetrating data, trade secrets or to invade the privacy of the sufferers to get some advantage. Several approaches have been proposed to prevent sending unsolicited e-mail messages, such as filters implemented in e-mail servers, spam message classification mechanisms for users to define when particular issue or author is a source of spread of spam and even filters implemented in network electronics. In general, e-mail filter approaches are based on analysis of message content to determine whether or not a message is spam. A major problem with this approach is spam detection in the presence of concept drift. The literature defines concept drift as changes occurring in the concept of data over time, as the change in the features that describe an attack or occurrence of new features. Numerous Intrusion Detection Systems (IDS) use machine learning techniques to monitor the classification error rate in order to detect change. However, when detection occurs, some damage has been caused to the system, a fact that requires updating the classification process and the system operator intervention. To overcome the problems mentioned above, this work proposes a new changing detection method, named Method oriented to the Analysis of the Development of Attacks Characteristics (MECA). The proposed method consists of three steps: 1) classification model training; 2) concept drift detection; and 3) transfer learning. The first step generates classification models as it is commonly conducted in machine learning. The second step introduces two new strategies to avoid concept drift: HFS (Historical-based Features Selection) that analyzes the evolution of the features based on over time historical; and SFS (Similarity-based Features Selection) that analyzes the evolution of the features from the level of similarity obtained between the features vectors of the source and target domains. Finally, the third step focuses on the following questions: what, how and when to transfer acquired knowledge. The answer to the first question is provided by the concept drift detection strategies that identify the new features and store them to be
transferred. To answer the second question, the feature representation transfer approach
is employed. Finally, the transfer of new knowledge is executed as soon as changes that
compromise the classification task performance are identified. The proposed method was developed and validated using two public databases, being one of the datasets built along this thesis. The results of the experiments shown that it is possible to infer a threshold to detect changes in order to ensure the classification model is updated through knowledge transfer. In addition, MECA architecture is able to perform the classification task, as well as the concept drift detection, as two parallel and independent tasks. Finally, MECA uses SVM machine learning algorithm (Support Vector Machines), which is less adherent to the training samples. The results obtained with MECA showed that it is possible to detect changes through feature evolution monitoring before a significant degradation in classification models is achieved.
Descrição
Citação
HENKE, Márcia. Deteção de Spam baseada na evolução das características com presença de Concept Drift. 2015. 135 f. Tese (Doutorado em Informática) - Universidade Federal do Amazonas, Manaus, 2015.
