StreetView-Waste

Abstract

Urban waste management remains a critical challenge for the development of smart cities. Despite the growing number of litter detection datasets, the problem of monitoring overflowing waste containers, particularly from images captured by garbage trucks, has received little attention. While existing datasets are valuable, they often lack annotations for specific container tracking or are captured in static, decontextualized environments, limiting their utility for real-world logistics. To address this gap, we present StreetView-Waste, a comprehensive dataset of urban scenes featuring litter and waste containers. The dataset supports three key evaluation tasks: (1) waste container detection, (2) waste container tracking, and (3) waste overflow segmentation. Alongside the dataset, we provide baselines for each task by benchmarking state-of-the-art models in object detection, tracking, and segmentation. Additionally, we enhance baseline performance by proposing two complementary strategies: a heuristic-based method for improved waste container tracking and a model-agnostic framework that leverages geometric priors to refine litter segmentation. Our experimental results show that while fine-tuned object detectors achieve reasonable performance in detecting waste containers, baseline tracking methods struggle to accurately estimate their number; however, our proposed heuristics reduce the mean absolute counting error by 79.6%. Similarly, while segmenting amorphous litter is challenging, our geometry-aware strategy improves segmentation mAP@0.5 by 27% on lightweight models, demonstrating the value of multimodal inputs for this task. Ultimately, StreetView-Waste provides a challenging benchmark to encourage research into real-world perception systems for urban waste management.

Dataset

Task 1: Waste Container Detection

71,170

Annotated Instances

36,478

Total Images

This task provides a large-scale benchmark for detecting seven classes of waste containers. The dataset is designed to test model robustness in diverse urban scenes captured from a moving vehicle.

Download Detection Data

Task 2: Multi-Object Tracking

376

Unique Container Tracks

This task focuses on the challenge of associating container detections across frames. It includes dense annotations with unique IDs, enabling the evaluation of tracking performance on real-world challenges like occlusion.

Download Tracking Data

Task 3: Overflow Segmentation

5,149

Instance Masks

7,230

Total Images

This task aims to produce pixel-accurate masks for unstructured waste spilling from containers. The dataset subset for this task is balanced to facilitate fair model training and evaluation.

Download Segmentation Data

Proposed Strategies

Heuristic-based Tracking Refinement

To improve tracking and counting, we introduce a set of heuristic rules applied as a post-processing step. These include a minimum track duration filter to remove noisy, short-lived tracks; a temporal merging rule to reconnect tracks fragmented by brief occlusions; and a spatial proximity constraint to ensure that merged tracks correspond to the same physical object.

Geometric-Aware Overflow Segmentation

To tackle the challenging task of segmenting amorphous overflowing waste, we propose a model-agnostic strategy that enriches the standard RGB input. We use a zero-shot model to infer depth and surface normal maps, which are then concatenated with the RGB image to form a 7-channel input tensor. This additional geometric information helps models better distinguish foreground waste from cluttered backgrounds.

Results

Detection Results

Class	DiffusionVID			YOLOv11
Class	AP@0.5	AP@[.5:.95]	AR@[.5:.95]	AP@0.5	AP@[.5:.95]	AR@[.5:.95]
Default Container	0.96±0.02	0.75±0.03	0.78±0.03	0.98±0.01	0.82±0.02	0.82±0.02
Green Container	0.92±0.03	0.76±0.03	0.82±0.03	0.97±0.02	0.82±0.03	0.84±0.02
Biodegradable Container	0.92±0.02	0.75±0.02	0.77±0.03	0.96±0.01	0.85±0.02	0.86±0.02
Blue Container	0.95±0.02	0.73±0.03	0.79±0.03	0.96±0.02	0.80±0.02	0.81±0.02
Yellow Container	0.90±0.03	0.73±0.03	0.80±0.03	0.96±0.01	0.82±0.02	0.84±0.02
Oil Container	0.92±0.02	0.62±0.04	0.69±0.04	0.93±0.01	0.70±0.03	0.70±0.03
Battery Container	0.82±0.05	0.57±0.05	0.61±0.05	0.80±0.06	0.61±0.05	0.55±0.05
All Classes	0.91±0.03	0.70±0.03	0.76±0.03	0.94±0.02	0.77±0.03	0.77±0.03

We report AP@0.5, AP@[0.5:0.95], and AR@[0.5:0.95] (mean ± std). YOLOv11 outperforms DiffusionVID overall, with DiffusionVID being stronger on Battery containers.

Tracking & Counting Results

Tracking Accuracy

Model	Experiment	MOTA↑	IDF1↑	HOTA↑	DetA↑	AssA↑
ByteTrack	Baseline	76.80%	81.40%	69.76%	69.96%	69.83%
	H₁ (Duration)	77.10%	82.20%	69.73%	69.30%	70.40%
	H₁+H₂ (Temporal)	77.00%	55.10%	50.98%	45.62%	57.03%
	H₁+H₂+H₃ (Spatial)	77.20%	66.80%	59.60%	56.35%	63.15%
BoT-SORT	Baseline	82.50%	79.60%	71.50%	75.27%	68.17%
	H₁ (Duration)	82.40%	82.10%	72.09%	73.80%	70.61%
	H₁+H₂ (Temporal)	82.50%	75.80%	67.34%	65.82%	69.06%
	H₁+H₂+H₃ (Spatial)	82.50%	79.80%	70.30%	70.69%	70.06%

Counting Accuracy

Model	Experiment	MAE↓	SAD↓	RMSE↓	MAPE↓
ByteTrack	Baseline	3.48	73	7.80	82.96%
	H₁ (Duration)	1.05	22	1.93	24.96%
	H₁+H₂ (Temporal)	0.76	16	1.75	17.77%
	H₁+H₂+H₃ (Spatial)	0.71	15	1.70	16.03%
BoT-SORT	Baseline	6.43	135	9.60	187.61%
	H₁ (Duration)	1.19	25	1.91	27.19%
	H₁+H₂ (Temporal)	0.81	17	1.36	16.03%
	H₁+H₂+H₃ (Spatial)	0.90	19	1.79	17.96%

We report tracking metrics (MOTA/IDF1/HOTA/DetA/AssA) and counting metrics (MAE/SAD/RMSE/MAPE) for ByteTrack and BoT-SORT with progressive heuristics.

Overflow Segmentation Results

Model	Experiment	mAP@0.5	mAP@[0.5:0.95]	B-IoU
YOLOv11	Baseline (RGB)	0.50±0.01	0.30±0.01	0.77±0.01
YOLOv11	Ours (Geometric Cues)	0.52±0.01	0.31±0.01	0.77±0.01
YOLACT	Baseline (RGB)	0.41±0.02	0.22±0.01	0.87±0.02
YOLACT	Ours (Geometric Cues)	0.52±0.02	0.31±0.02	0.90±0.01
Mask2Former	Baseline (RGB)	0.18±0.02	0.11±0.01	0.38±0.02
Mask2Former	Ours (Geometric Cues)	0.29±0.02	0.13±0.01	0.32±0.02
Mask R-CNN	Baseline (RGB)	0.41±0.02	0.26±0.01	0.54±0.02
Mask R-CNN	Ours (Geometric Cues)	0.12±0.01	0.06±0.01	0.38±0.01
SOLOv2	Baseline (RGB)	0.20±0.03	0.10±0.01	0.30±0.02
SOLOv2	Ours (Geometric Cues)	0.07±0.01	0.03±0.00	0.20±0.01

We report mAP@0.5, mAP@[0.5:0.95], and boundary IoU (B-IoU) for overflow segmentation (mean ± std).

Resources

Paper

Dataset Repository

Dataset

BibTeX

If you find this work useful for your research, please cite:

@inproceedings{paulo2025streetviewwastemultitaskdataseturban,
  title={{StreetView-Waste: A Multi-Task Dataset for Urban Waste Management}},
  author={Diogo J. Paulo, João Martins, Hugo Proença and João C. Neves},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2025}
}

StreetView-Waste: A Multi-Task Dataset for Urban Waste Management

Diogo J. Paulo^1,2, João Martins¹, Hugo Proença^1,2, João C. Neves^1,3

Paper GitHub Dataset

Abstract

Dataset

Task 1: Waste Container Detection

Task 2: Multi-Object Tracking

Task 3: Overflow Segmentation

Proposed Strategies

Heuristic-based Tracking Refinement

Geometric-Aware Overflow Segmentation

Results

Detection Results

Tracking & Counting Results

Overflow Segmentation Results

Resources

Paper

Dataset Repository

Dataset

BibTeX

StreetView-Waste: A Multi-Task Dataset for Urban Waste Management

Diogo J. Paulo1,2, João Martins1, Hugo Proença1,2, João C. Neves1,3

Paper GitHub Dataset

Abstract

Dataset

Task 1: Waste Container Detection

Task 2: Multi-Object Tracking

Task 3: Overflow Segmentation

Proposed Strategies

Heuristic-based Tracking Refinement

Geometric-Aware Overflow Segmentation

Results

Detection Results

Tracking & Counting Results

Overflow Segmentation Results

Resources

Paper

Dataset Repository

Dataset

BibTeX

Diogo J. Paulo^1,2, João Martins¹, Hugo Proença^1,2, João C. Neves^1,3