Misinformation Detection on COVID

One of my projects is automated misinformation detection. We have applied it for COVID-19 misinformation detection from Twitter, with applications deployed on EDNA. In addition, EDNA for misinformation detection is a popular project for students in Enterprise Computing, as well as Real-Time Systems courses at Georgia Tech.

Motivation

The findings in [^1] suggest a need for automated misinformation and disinformation detection. Since misinformation spreads exponentially quickly, and identifying it quickly manually requires significant investment in labor and analysis. Automated detection can improve response times to rapidly spreading misinformation by identifying potential new correlated keywords. In addition, automated detection can complement manual detection. Integrating periodic, manual misinformation detection with automated detection can ensure a best-of-both-worlds scenario. Automated detection can scale towards different varieties of misinformation and manual detection can help identify new types of misinformation.

Problem Statement

Such an automated misinformation filter requires dynamic trend models to identify misinformation keywords as well as language models to detect fake news and markers in text that indicate misinformation and disinformation. Furthermore, these models need to be adaptive, since misinformation changes rapidly. Old misinformation is discarded by disinformation agents and replaced with new forms over time. So, models need to be continuously updated to handle concept drift in the streaming data. Such an approach then requires automatic data generation to create training and evaluation sets. An end-to-end system would integrate data generation, model training, deployment, and updates. We present such a system in EDNA-COVID (note: it is still WIP).

System Overview

This is our end-to-end system for dynamic misinformation tracking.

Misinformation model uses language model and keyword detector to identify potential misinformation from the raw stream
If misinformation is detected, we filter it out from the stream
Detected misinformation from either component is used to create updated training data
We then use this training data to periodically update the misinformation model

The misinformation model is an ensemble of experts, composed of models to detect misinformation in a variety of ways:

A keyword detection model to extract misinformation related keywords
A language model to perform sentence-level fake news detection

When any member of the ensemble of experts detects misinformation, the system filters it out of the stream. Misinformation data is recorded as training data for other members of the expert. Periodically, we trigger an update for the component models.

Additional Component Details

WIP

[^1] Some Like it Hoax: Automated Fake News Detection in Social Networks. https://arxiv.org/abs/1704.07506