Forensic analysts often use social media imagery and texts to understand important events. A primary challenge is the initial sifting of irrelevant posts. This work introduces an interactive process for training an event-centric, learning-based multimodal classification model that automates sanitization. We propose a method based on Bayesian Graph Neural Networks (BGNNs) and evaluate active learning and pseudo-labeling formulations to reduce the number of posts the analyst must manually annotate. Our results indicate that BGNNs are useful for social-media data sifting for forensics investigations of events of interest, the value of active learning and pseudo-labeling varies based on the setting, and incorporating unlabelled data from other events improves performance.
AI Knows What You Did Last Summer: Applications in Digital Forensics
J. Yang, J. Nascimento, G. Bertocco, and 5 more authors
In Computer Vision: Challenges, Trends, and Opportunities, 2024
The past decade saw an explosive advancement in smartphones and cameras, making them even more powerful and cheaper. As a result, not only can we record high-quality footage of the world around us, but their increased accessibility makes it rare that a significant event goes unnoticed. Furthermore, once something is captured, it can be uploaded instantly to social media and accessed by millions of people. Even though this connectivity has multiple positive effects, social media also raised new challenges in our society, such as the spread of fake news and misinformation. This scenario has created a range of new problems that Digital Forensics is concerned about, from analyzing major events in the world — such as, terrorist attempts, wars, and natural disasters — to understanding how these are perceived and discussed online. To cope with the scale of such problems, they are often approached by powerful artificial intelligence (AI) techniques developed during the last decades. This chapter presents modern applications of AI in the Digital Forensics field. Person re-identification (PReID), visual analytics, fact-checking, event filtering, and authorship attribution are just a few examples of forensic tasks that have been leveraged from recent advances in AI research in this new scenario. We discuss each task’s associated problems, challenges, current methods, and future research directions and hope to give the reader a general idea of the challenges of the field and its imperative demands for new tools and smart techniques to face the challenges of the 21st century.
2023
Real-world-events data sifting through ultra-small labeled datasets and graph fusion
D. Vega-Oliveros, J. Nascimento, B. Lavi, and 1 more author
The information on social media is vital, especially for events such as natural disasters or terrorist attacks, that might cause rapid growth of data sharing through social media networks. However, collecting and processing data of an event is a challenging task and essentially requires a great deal of data cleaning and filtering out what is relevant/irrelevant to the event. Data sifting task endeavors to identifying the related content to the depicted event data. We propose a learning strategy to dynamically learn complementary contributions from different data-driven features through a semi-supervised graph-fusion technique. Our proposed method relies upon minimal training labeled data samples — ultra-small data learning. Learning through a small labeled set is also of particular interest to forensic investigators and medical researchers — concerning massive data labeling and minimizing energy-efficient computing to reduce redundancy and repetitions. We assess the effectiveness of the proposed semi-supervised method on five datasets from real-world events. Compared with prior-art (supervised and semi-supervised ones), experimental results show the proposed method achieves the best classification results and most efficient computational footprint.
2022
WIFS
Few-shot Learning for Multi-modal Social Media Event Filtering
J. Nascimento, J. P. Cardenuto, J. Yang, and 1 more author
In 2022 IEEE International Workshop on Information Forensics and Security (WIFS), 2022
When a forensic event of large scale happens, immediately visual content related to it is shared on social networks. These data might be potentially useful for a posterior forensic inspection, given that they can depict different views in different moments of the event. However, an analysis of social media imagery related to an event might be harmed by the abundant number of irrelevant items that are retrieved by a collection procedure, such as memes and images from previous events. Manually sanitizing the dataset at hand is unfeasible, since it might contain thousands of items. To tackle this problem, we study the employment of machine learning techniques to speed up the procedure and reduce the required human force. In detail, our work follows four paths. The first one aims to provide a good representation for images, exploring different pre-trained convolutional neural networks and feature fusion; the second one targets including humans to the loop of the machine learning pipeline, by employing instance selection and active learning techniques; the goal of the third one is to perform classification with few samples, using semi-supervised techniques that vary from graph-based methods to graph neural networks; and the last one aims to incorporate knowledge from previous events to the machine learning pipeline, using domain adaptation and available datasets from previous events. These four paths are promising, and can improve the performance of methods for this task.
2021
A Inteligência Artificial e os desafios da Ciência Forense Digital no século XXI
R. Padilha, A. Theóphilo, F. A. Andaló, and 6 more authors
A Ciência Forense Digital surgiu da necessidade de tratar problemas forenses na era digital. Seu mais recente desafio está relacionado ao surgimento das mídias sociais, intensificado pelos avanços da Inteligência Artificial. A produção massiva de dados nas mídias sociais tornou a análise forense mais complexa, especialmente pelo aperfeiçoamento de modelos computacionais capazes de gerar conteúdo artificial com alto realismo. Assim, tem-se a necessidade da aplicação de técnicas de Inteligência Artificial para tratar esse imenso volume de informação. Neste artigo, apresentamos desafios e oportunidades associados à aplicação dessas técnicas, além de fornecer exemplos de seu uso em situações reais. Discutimos os problemas que surgem em contextos sensíveis e como a comunidade científica tem abordado esses tópicos. Por fim, delineamos futuros caminhos de pesquisa a serem explorados.
ICASSP
Semi-Supervised Feature Embedding for Data Sanitization in Real-World Events
B. Lavi, J. Nascimento, and A. Rocha
In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
With the rapid growth of data sharing through social media networks, determining relevant data items concerning a particular subject becomes paramount. We address the issue of establishing which images represent an event of interest through a semi-supervised learning technique. The method learns consistent and shared features related to an event (from a small set of examples) to propagate them to an unlabeled set. We investigate the behavior of five image feature representations considering low- and high-level features and their combinations. We evaluate the effectiveness of the feature embedding approach on five collected datasets from real-world events.