StreetView-Waste: A Multi-Task Dataset for Urban Waste Management

Diogo J. Paulo1,2,  João Martins1,  Hugo Proença1,2,  João C. Neves1,3

1University of Beira Interior, Portugal 2IT: Instituto de Telecomunicações 3NOVA LINCS
diogo.paulo@ubi.pt
Teaser image showing the three tasks

 Paper  GitHub  Dataset

Abstract


Urban waste management remains a critical challenge for the development of smart cities. Despite the growing number of litter detection datasets, the problem of monitoring overflowing waste containers, particularly from images captured by garbage trucks, has received little attention. While existing datasets are valuable, they often lack annotations for specific container tracking or are captured in static, decontextualized environments, limiting their utility for real-world logistics. To address this gap, we present StreetView-Waste, a comprehensive dataset of urban scenes featuring litter and waste containers. The dataset supports three key evaluation tasks: (1) waste container detection, (2) waste container tracking, and (3) waste overflow segmentation. Alongside the dataset, we provide baselines for each task by benchmarking state-of-the-art models in object detection, tracking, and segmentation. Additionally, we enhance baseline performance by proposing two complementary strategies: a heuristic-based method for improved waste container tracking and a model-agnostic framework that leverages geometric priors to refine litter segmentation. Our experimental results show that while fine-tuned object detectors achieve reasonable performance in detecting waste containers, baseline tracking methods struggle to accurately estimate their number; however, our proposed heuristics reduce the mean absolute counting error by 79.6%. Similarly, while segmenting amorphous litter is challenging, our geometry-aware strategy improves segmentation mAP@0.5 by 27% on lightweight models, demonstrating the value of multimodal inputs for this task. Ultimately, StreetView-Waste provides a challenging benchmark to encourage research into real-world perception systems for urban waste management.

Dataset


Task 1: Waste Container Detection

71,170
Annotated Instances
36,478
Total Images

This task provides a large-scale benchmark for detecting seven classes of waste containers. The dataset is designed to test model robustness in diverse urban scenes captured from a moving vehicle.

Download Detection Data

Task 2: Multi-Object Tracking

376
Unique Container Tracks

This task focuses on the challenge of associating container detections across frames. It includes dense annotations with unique IDs, enabling the evaluation of tracking performance on real-world challenges like occlusion.

Download Tracking Data

Task 3: Overflow Segmentation

5,149
Instance Masks
7,230
Total Images

This task aims to produce pixel-accurate masks for unstructured waste spilling from containers. The dataset subset for this task is balanced to facilitate fair model training and evaluation.

Download Segmentation Data

Proposed Strategies


Heuristic-based Tracking Refinement

To improve tracking and counting, we introduce a set of heuristic rules applied as a post-processing step. These include a minimum track duration filter to remove noisy, short-lived tracks; a temporal merging rule to reconnect tracks fragmented by brief occlusions; and a spatial proximity constraint to ensure that merged tracks correspond to the same physical object.

Geometric-Aware Overflow Segmentation

To tackle the challenging task of segmenting amorphous overflowing waste, we propose a model-agnostic strategy that enriches the standard RGB input. We use a zero-shot model to infer depth and surface normal maps, which are then concatenated with the RGB image to form a 7-channel input tensor. This additional geometric information helps models better distinguish foreground waste from cluttered backgrounds.

Results


Detection Results

Class DiffusionVID YOLOv11
AP@0.5 AP@[.5:.95] AR@[.5:.95] AP@0.5 AP@[.5:.95] AR@[.5:.95]
Default Container 0.96±0.02 0.75±0.03 0.78±0.03 0.98±0.01 0.82±0.02 0.82±0.02
Green Container 0.92±0.03 0.76±0.03 0.82±0.03 0.97±0.02 0.82±0.03 0.84±0.02
Biodegradable Container 0.92±0.02 0.75±0.02 0.77±0.03 0.96±0.01 0.85±0.02 0.86±0.02
Blue Container 0.95±0.02 0.73±0.03 0.79±0.03 0.96±0.02 0.80±0.02 0.81±0.02
Yellow Container 0.90±0.03 0.73±0.03 0.80±0.03 0.96±0.01 0.82±0.02 0.84±0.02
Oil Container 0.92±0.02 0.62±0.04 0.69±0.04 0.93±0.01 0.70±0.03 0.70±0.03
Battery Container 0.82±0.05 0.57±0.05 0.61±0.05 0.80±0.06 0.61±0.05 0.55±0.05
All Classes 0.91±0.03 0.70±0.03 0.76±0.03 0.94±0.02 0.77±0.03 0.77±0.03

We report AP@0.5, AP@[0.5:0.95], and AR@[0.5:0.95] (mean ± std). YOLOv11 outperforms DiffusionVID overall, with DiffusionVID being stronger on Battery containers.

Tracking & Counting Results

Tracking Accuracy

Model Experiment MOTA IDF1 HOTA DetA AssA
ByteTrack Baseline 76.80% 81.40% 69.76% 69.96% 69.83%
H₁ (Duration) 77.10% 82.20% 69.73% 69.30% 70.40%
H₁+H₂ (Temporal) 77.00% 55.10% 50.98% 45.62% 57.03%
H₁+H₂+H₃ (Spatial) 77.20% 66.80% 59.60% 56.35% 63.15%
BoT-SORT Baseline 82.50% 79.60% 71.50% 75.27% 68.17%
H₁ (Duration) 82.40% 82.10% 72.09% 73.80% 70.61%
H₁+H₂ (Temporal) 82.50% 75.80% 67.34% 65.82% 69.06%
H₁+H₂+H₃ (Spatial) 82.50% 79.80% 70.30% 70.69% 70.06%

Counting Accuracy

Model Experiment MAE SAD RMSE MAPE
ByteTrack Baseline 3.48 73 7.80 82.96%
H₁ (Duration) 1.05 22 1.93 24.96%
H₁+H₂ (Temporal) 0.76 16 1.75 17.77%
H₁+H₂+H₃ (Spatial) 0.71 15 1.70 16.03%
BoT-SORT Baseline 6.43 135 9.60 187.61%
H₁ (Duration) 1.19 25 1.91 27.19%
H₁+H₂ (Temporal) 0.81 17 1.36 16.03%
H₁+H₂+H₃ (Spatial) 0.90 19 1.79 17.96%

We report tracking metrics (MOTA/IDF1/HOTA/DetA/AssA) and counting metrics (MAE/SAD/RMSE/MAPE) for ByteTrack and BoT-SORT with progressive heuristics.

Overflow Segmentation Results

Model Experiment mAP@0.5 mAP@[0.5:0.95] B-IoU
YOLOv11 Baseline (RGB) 0.50±0.01 0.30±0.01 0.77±0.01
Ours (Geometric Cues) 0.52±0.01 0.31±0.01 0.77±0.01
YOLACT Baseline (RGB) 0.41±0.02 0.22±0.01 0.87±0.02
Ours (Geometric Cues) 0.52±0.02 0.31±0.02 0.90±0.01
Mask2Former Baseline (RGB) 0.18±0.02 0.11±0.01 0.38±0.02
Ours (Geometric Cues) 0.29±0.02 0.13±0.01 0.32±0.02
Mask R-CNN Baseline (RGB) 0.41±0.02 0.26±0.01 0.54±0.02
Ours (Geometric Cues) 0.12±0.01 0.06±0.01 0.38±0.01
SOLOv2 Baseline (RGB) 0.20±0.03 0.10±0.01 0.30±0.02
Ours (Geometric Cues) 0.07±0.01 0.03±0.00 0.20±0.01

We report mAP@0.5, mAP@[0.5:0.95], and boundary IoU (B-IoU) for overflow segmentation (mean ± std).

Resources


Paper

Research Paper WACV

Dataset Repository

Dataset Repository

Dataset

Dataset Download

BibTeX

If you find this work useful for your research, please cite:
@inproceedings{paulo2025streetviewwastemultitaskdataseturban,
  title={{StreetView-Waste: A Multi-Task Dataset for Urban Waste Management}},
  author={Diogo J. Paulo, João Martins, Hugo Proença and João C. Neves},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2025}
}

© This webpage was in part inspired from this template.