Abstract
Datasets bridging computer vision and robotics by providing high-quality visual annotations in manipulation-relevant scenes remain limited. This work introduces the Multi-Object Photoreal Simulation (MOPS) dataset, which provides comprehensive ground truth annotations for photorealistic simulated environments. MOPS employs a zero-shot asset augmentation pipeline based on Large Language Models (LLM) to automatically normalize 3D object scale and generate part-level affordances. The dataset features pixel-level segmentations for tasks crucial to robotic perception, including fine-grained part segmentation and affordance prediction (e.g., “graspable” or “pushable”). By combining detailed annotations with photorealistic simulation, MOPS generates a vast, diverse collection of scenes to accelerate progress in robot perception and manipulation. We validate MOPS through vision and robot learning benchmarks.
Annotation Modalities
Rich, multi-modal ground truth for every scene




Key Features
Photorealistic Simulation
High-quality rendering via ManiSkill3 & SAPIEN, built on a normalized asset pipeline with automatic part-level annotation across multiple 3D libraries.
LLM-Powered Annotation
Zero-shot asset augmentation using large language models for automatic part-level labeling, scale normalization, and semantic understanding.
Multi-Modal Ground Truth
RGB, depth, surface normals, part segmentation, affordance maps (graspable, pushable, …), and 6D pose — all pixel-aligned.
Diverse Environments
Kitchen environments, cluttered tabletops, and isolated object scenarios spanning 137 object categories and 56 affordance labels.
Results
Dataset Comparison
Taxonomic coverage vs. existing affordance datasets
| Dataset | Aff. | Cat. | Obj. |
|---|---|---|---|
| RGB-D Part | 7 | 17 | 105 |
| 3D-AffNet | 16 | 23 | 22,949 |
| MOPS (Total) | 56 | 137 | 3,353 |
MOPS leads on affordance label and category breadth; 3D-AffNet has more raw instances.
Robot Manipulation
Imitation learning on 24 RoboCasa tasks · 10 seeds each
| Policy Inputs | Success | Gain |
|---|---|---|
| RGB only | 13.33% | — |
| + MOPS Affordances | 21.25% | +7.92 |
MOPS affordances provide a consistent boost across all 24 tasks.
Getting Started
mops-data — Image generation in ManiSkill3
Available
🤖
mops-il — Robot trajectories in RoboCasa v0.1
Coming Soon
Prerequisites: Python 3.10 · CUDA-compatible GPU · 16 GB+ RAM
conda create -n mops python=3.10
conda activate mops
pip install mani_skill
git clone https://github.com/LiXiling/mops-data
cd mops-data
pip install -e .
Citation
If you use MOPS in your research, please cite our work
@article{li2026mops,
title = {Multi-Objective Photoreal Simulation (MOPS) Dataset
for Computer Vision in Robot Manipulation},
author = {Maximilian Xiling Li and Paul Mattes and
Nils Blank and Rudolf Lioutikov},
year = {2026}
}