Abhijit Suprem

Hi! I am Abhijit Suprem.

I am a PhD student at Georgia Tech in the School of Computer Science. My advisor is Professor Calton Pu. I work on Time-Aware Machine Learning, an emerging area at the intersection of Machine Learning, Real-Time Systems, and Concept Drift Adaptation. My research emphasizes time-aware adaptation of the traditional paradigm of fixed machine learning models into a real-time setting with built-in monitoring of predictions, maintenance of classifiers, management of ensembles, and reproducibility of deployments.

Time-Awareness: Knowledge expires, and so do machine learning models. ML models are trained on static snapshots that contain timeless knowledge that never changes (e.g. how many seasons are there?) and ephemeral knowledge that continuously change (e.g. who is the President?). My work focuses on adjusting to this paradigm by injecting time-awareness to ML pipelines during training and deployment. Time-aware pipelines comprise of time-aware datasets that recognize fact expiration, time-aware classifiers that measure divergence between prediction and training data, and time-aware model/data management that can proactively adapt training knowledgebases and classifier ensembles when knowledge evolves. I have explored effects of time-awareness in fake news and misinformation detection (IEEE CIC), vehicle re-id (IEEE CogMI), video analytics (VLDB), and social sensor event detection (ACM TOIT, IEEE CIC, ACM DEBS). I currently work on data-centric and classifier-centric metrics to improve time-awareness with representation learning.

Representation Learning: We can integrate efficient time-awareness ML pipelines with representation learning, which learns low-dimensional semantic embeddings of high-dimensional data. By projecting text, image, video, and other multi-modal data into a lower-dimensional manifold, we can learn improtant clustering characteristics that help us quickly identify divergences between training and prediction data. I have developed 2 classes of time-awareness metrics: DataFit (IEEE CIC) and ModelFit (CoRR). DataFit metrics measure how similar prediction data is to training data, and ModelFit measures how well a classifier has learned the neighborhood of some prediction data.

Interpretability: A key facet of recognizing time-awareness is interpretability. Most classifiers are black-boxes, and cause significant ethical concerns in healthcare and criminal justice, per Rudin. Inherently interpretable classifiers that provide explanations of their predictions with ground-up design are crucial in developing safe, ethical AI (compared to ad-hoc explanability that are themselves black-boxes). With our assumption that knowledge can expire, interpretable classifiers become even more integral to time-awareness: classifiers that can provide their own explanations can also self-diagnose knowledge expiration due to conflicting explanations (IEEE CogMI).

Reproducibility: Reproducibility (and the reproducibility crisis) is a connecting thread through each of these research areas; it would be fair to say reproducibility is a connecting thread through most of ML. Several approaches have emerged to grapple with the [ML Reproducibility Crisis]: reproducibility checklists, code sharing, ML provenance, and data provenance. I have made reproducibility a central tenent of my work. Over the past three years, I have developed EdnaML, a toolchain, framework, and API for reproducible ML pipelines. Most of our research publications cite classifiers and datasets developed on EdnaML.

Currently, I am working on three projects: (i) adaptive misinformation detection on twitter (ii) time-aware datasets, and (iii) EdnaML, a framework and declarative API for reproducible machine learning.

Projects

EdnaML (current)

Page PDF Code

CoRR, Pedagogy, GitHub

Covid Misinformation Detection (current)

Page PDF

SCF '22, IEEE CIC '22, IEEE CogMi '20

SAFR Re-ID (current)

Page PDF Code

CogMI '19, EDGE '20

LITMUS (current)

Page PDF

SERVICES '19, DEBS '19, IEEE CIC '19

ODIN

Page PDF Code

VLDB '20

EventMapper

Page PDF Code

DEBS '19, CIC '19, SERVICES '19

Publications

Continuously Reliable Detection of New-Normal Misinformation: Semantic Masking and Contrastive Smoothing in High-Density Latent Regions.
A. Suprem, C. Pu. CoRR 2023
PDF

EdnaML: A Declarative API and Framework for Reproducible Deep Learning.
A. Suprem, S. Vaidya, A. Venugopal, J. Ferreira, C. Pu. CoRR 2022
PDF Code

The New-Normal of Fake News: Enhanced and Sustainable Multi-Expert Detection through Timely Adaptation to Counter Knowledge Obsolescence.
A. Suprem, C. Pu. CoRR 2022
PDF

Time-Aware Datasets are Adaptive Knowledgebases for the New Normal.
C. Pu, A. Suprem, J. Ferreira. SCF Keynote 2022
PDF

Constructive Interpretability with CoLabel: Corroborative Integration, Complementary Features, and Collaborative Learning.
A. Suprem, S. Vaidya, P. Singh, S. Cherkadi, J. Ferreira, C. Pu. IEEE CogMi 2022
Proceedings PDF Code

Exploring Generalizability of Fine-Tuned Models for Fake News Detection.
A. Suprem, S. Vaidya, C. Pu. IEEE CIC 2022
Proceedings PDF Code

MiDAS: Multi-integrated Domain Adaptive Supervision for Fake News Detection.
A. Suprem, C. Pu. CoRR 2022
PDF Code

ATEAM: Knowledge Integration from Federated Datasets for Vehicle Feature Extraction using Annotation Team of Experts.
A. Suprem, P. Singh, S. Cherkadi, S. Vaidya, J. Ferreira, C. Pu. in sub, ECCV 2022
PDF

PerfML: Smart Management of Complex Performance Data and Analytics.
J. Kimball, RA. Lima, A. Suprem, Q. Wang, Y. Kanemasu, C. Pu. IEEE CogMi 2021
Proceedings

Challenges and Opportunities in Rapid Epidemic Information Propagation with Live Knowledge Aggregation from Social Media.
C. Pu, A. Suprem, RA. Lima. IEEE CogMi 2020
PDF Code

EDNA-Covid: A Large-Scale Covid-19 Tweets Dataset Collected with the EDNA Streaming Toolkit.
A. Suprem, C. Pu. arXiv 2020
PDF Code

ODIN: Automated Drift Detection and Recovery in Video Analytics.
A. Suprem, J. Arulraj, JE. Ferreira, C. Pu. VLDB 2020
Proceedings PDF Code Video

Small, Accurate, and Fast Re-ID on the Edge: The SAFR Approach.
A. Suprem, JE. Ferreira, C. Pu. EDGE 2020
Proceedings PDF Code Video

Beyond Artificial Reality: Finding and Monitoring Live Events from Social Sensors.
C. Pu, A. Suprem, RA. Lima, A. Musaev, D. Wang, D. Irani, S. Webb, JE. Ferreira. ACM TOIT 20(1)
PDF

Robust, Extensible, and Fast: Teamed Classifiers for Vehicle Tracking in Multi-Camera Networks.
A. Suprem, RA. Lima, B. Padilha, JE. Ferreira, C. Pu. IEEE Cognitive Machine Intelligence 2019
Proceedings PDF Code

Event Detection in Noisy Streaming Data with Combination of Corroborative and Probabilistic Sources.
A. Suprem, C. Pu. IEEE Collaboration in Computing 2019
Proceedings PDF Code Best Paper Award

Concept Drift Detection and Adaptation with Weak Supervision on Streaming Unlabeled Data.
A. Suprem. ArXiv preprint
PDF Code

Concept Drift Adaptive Physical Event Detection for Social Media Streams.
A. Suprem, A. Musaev, C. Pu. World Congress on Services 2019
Proceedings PDF Code Best Paper Award

ASSED: A framework for identifying physical events through adaptive social sensor data filtering.
A. Suprem, C. Pu. ACM DEBS 2019
Proceedings PDF Code

Approximate Query Matching for Graph-Based Holistic Image Retrieval.
A. Suprem, D.H. Chau, C. Pu. Big Data 2018
Proceedings Code

Orientation and displacement detection for smartphone device based imus.
A. Suprem, V. Deep, T. Elarabi. IEEE Access 2016
Proceedings PDF Code

Internships

IBM (Summer 2021): IBM Research Internship
IBM (Summer 2020): IBM Research Internship
University of Sau Paulo, Brazil (Summer 2019): DATA Group research internship. Advised by Dr. Joao Ferreira.
Georgia Tech (Summer 2018): CERCS Group research internship with Dr. Calton Pu.

Education

PhD at Georgia Tech (2017-)
Advisor: Calton Pu
Area: Computer Science
Minor: Public Policy

MS at Columbia University (2016-2017)
Area: Machine Learning

BS at California State University, Fresno (2011-2016)
Major: Electrical Engineering
Minors: Computer Engineering, Mathematics, Premed (MCAT: 35)
President's Scholar; Graduated with Honors

Awards

Student Travel Grant, IEEE CIC 2019
Best Paper Award, IEEE CIC 2019
Best Paper Award, Services 2019
Chair's Fellowship, Georgia Tech 2017
Doctoral Fellowship, UT San Antonio 2016 (declined)
Valero Doctoral Fellowship, Valero Energy 2016 (declined)
Doctoral Fellowship, UT Dallas 2016 (declined)
Calif. State Univ. (CSU) Research Award (First Place), CSU System 2016
Smittcamp Honors Scholarship (full-ride), Fresno State 2011-2015
Senior Design Research Grant, Fresno State 2015-2016
Lyles Center Innoventures Design Grant, Fresno State 2015-2016
Charles Buckley Engineering Scholarship, Fresno State 2011, 2012, 2013
Lyles Center Engineering Scholarship, Fresno State, 2013, 2014
Best Student Paper Award, ICMLDA 2013
National Institute of Health Fellowship, IEEE EMBC 2012

Service

Reviewer for CVPR '22, ACM Transactions on Internet Technologies, ECCV '22, NeurIPS '22
Reviewer for Georgia Tech PURA (2020, 2021)
Graduent Student Mentor, Georgia Tech Undergraduate Student Mentor, Georgia Tech

Advising

I have been fortunate enough to advise several brilliant students on their projects or research.

Project Student(s): Deepak Challiyil, Manvi Goel, Vedant Mahulkar, Samina Mulani, Gioia Rosier, Caden Farley, Harigovind Anil, siyuan Chen, Sukhesh Nuthalapati, Brandon Colbert, Abrar Mohammed, Jaya Pandey, Navdha Agarwal, Jithin Sojan, Lawrence Bradley, Nagasayee Gopalakrishna, Sneh Shah
Project: Real-time systems / Embedded systems Structured Projects
Context: Course Projects, Georgia Tech, 2020-2023
Summary: Advise students on projects for advisors' course on Real-time Systems and Embedded Systems; Provide project deadlines and writing guidelines, plus hold office hours, perform team management.

Student(s): Sanjyot Vaidya
Project: Reproducible Machine Learning
Context: MS Thesis, Georgia Tech, 2021-2023
Summary: Extend EdnaML framework with declarative API and configurable ML pipeline management for reproducible machine learning. Apply EdnaML framework for fake news detection, vehicle classification, image classification, and re-id

Student(s): Suma Cherkadi, Purva Singh
Project: Data engineering for vehicle classification
Context: Course project, Georgia Tech, 2020-2021
Summary: Build adaptive and fault-tolerant data crawler, cleaner, and weak labeler for vehicle classifier pipeline. Crawler downloads images of vehicles from multiple angles, performs weak labeling of unavailable annotations

Student(s): Eric Gastineau
Project: Fixing GAN-based Data Augmentation
Context: MS Thesis, Georgia Tech, 2019-2020
Summary: GANs are useful in creating synthetic data, but there remains performance discrepancy between using real images versus using synthetic images for training object detectors/classifiers. This project is investigating dynamic compression methods to reduce the performance discrepancy.

Student(s): Shreeshaa Kulkarni, Elsa Smaci, Kanksha Zaveri, Jayanta Bhowmick, Mitesh Kothari
Project: Automatic Model Generation for Vehicle Re-ID
Context: Course Project, Georgia Tech, Fall 2019
Summary: Build a system for automated, end-to-end pipeline generation for vehicle re-id. For each camera in a heterogeneous camera network, a detection model is generated using camera profile. Simialrity, vehicles are tracked using compressed re-id models for real-time performance.

Student(s): Ka-Ho Chow, Anjian Peng, Xiao Feng
Project: LITMUS Wildfire
Context: Course Project, Georgia Tech, Fall 2019
Summary: Extend the LITMUS system to work with wildfire and flooding

Student(s): Sravanthi Kumar
Project: LITMUS Automation
Context: Course Project, Georgia Tech, Fall 2019
Summary: Automate the ASSED framework and LITMUS system for faster deployment; identify and fix bottlenecks

Student(s): Jenita Jebasingh
Project: LITMUS refactor under ASSED
Context: MS Thesis, Georgia Tech, Spring 2019
Summary: Refactor the LITMUS system under the ASSED framework

Student(s): Jacqueline Elliott, Madeleine Brickell, and Kristen Goldie
Project: HealthFeed
Context: Course Project, Georgia Tech, Spring 2019
Summary: Dense, global, and real-time detection of viral events and health epidemics using social sensors and blogs