My areas of research are machine learning and multimedia forensics. Specifically, I work under the DejáVù Project on the development of an automated method to determine whether a piece of social media data is related or not to a given forensic event.
Forensic analysts often use social media imagery and texts to understand important events. A primary challenge is the initial sifting of irrelevant posts. This work introduces an interactive process for training an event-centric, learning-based multimodal classification model that automates sanitization. We propose a method based on Bayesian Graph Neural Networks (BGNNs) and evaluate active learning and pseudo-labeling formulations to reduce the number of posts the analyst must manually annotate. Our results indicate that BGNNs are useful for social-media data sifting for forensics investigations of events of interest, the value of active learning and pseudo-labeling varies based on the setting, and incorporating unlabelled data from other events improves performance.
2022
WIFS
Few-shot Learning for Multi-modal Social Media Event Filtering
J. Nascimento, J. P. Cardenuto, J. Yang, and 1 more author
In 2022 IEEE International Workshop on Information Forensics and Security (WIFS), 2022
When a forensic event of large scale happens, immediately visual content related to it is shared on social networks. These data might be potentially useful for a posterior forensic inspection, given that they can depict different views in different moments of the event. However, an analysis of social media imagery related to an event might be harmed by the abundant number of irrelevant items that are retrieved by a collection procedure, such as memes and images from previous events. Manually sanitizing the dataset at hand is unfeasible, since it might contain thousands of items. To tackle this problem, we study the employment of machine learning techniques to speed up the procedure and reduce the required human force. In detail, our work follows four paths. The first one aims to provide a good representation for images, exploring different pre-trained convolutional neural networks and feature fusion; the second one targets including humans to the loop of the machine learning pipeline, by employing instance selection and active learning techniques; the goal of the third one is to perform classification with few samples, using semi-supervised techniques that vary from graph-based methods to graph neural networks; and the last one aims to incorporate knowledge from previous events to the machine learning pipeline, using domain adaptation and available datasets from previous events. These four paths are promising, and can improve the performance of methods for this task.