Analysis of Machine Learning Methods for Trash Detection

Jerome Newhouse, Ryan Roche, Isaac Berlin, Robert Wang

Introduction

This project investigates the application of computer vision and machine learning methodologies for automated waste detection and segregation. Four models—GroundingDINO, DETR, YOLOv8, and ResNet—are trained and evaluated using the TACO dataset to identify the most effective approach for robust trash classification in real-world environments. The paper that accompanies this project is available here.

Dataset

The TACO dataset (Trash Annotations in Context) is a comprehensive collection of images depicting trash and recycling objects in real-world settings. This dataset, used for training and evaluating our models, includes images containing between 0 and 40 objects spanning 19 classes, such as bottles, cans, and plastic bags. Additionally, it features a "catch-all" class, labeled as unlabeled litter, for unidentified items. The dataset comprises 4,000 images, which were augmented to expand the total to 6,000 images, all resized to 416x416 pixels. The annotations were created using Roboflow and are available here.

Models Used

Key Findings

The YOLOv8 model achieved the highest precision and recall for real-time applications, outperforming other model types. Smaller objects, such as bottle caps, remain challenging due to resolution limitations in the dataset and difficulty of the task. Our final results are available below.


Evaluation Metric YOLO DETR ResNet GroundingDino
Precision 0.777 0.612 0.503 N/A
Recall 0.398 0.285 0.379 N/A
mAP50 0.491 0.337 0.352 N/A
mAP50-95 0.403 0.260 0.297 N/A
FPS 200 13 10 2
Table 1: Evaluation metric scores for each model.