MOPS: Multi-Object Photoreal Simulation Dataset
for Computer Vision in Robot Manipulation

Maximilian X. Li, Paul Mattes, Nils Blank, Rudolf Lioutikov
Intuitive Robots Lab, Karlsruhe Institute of Technology, Germany

Abstract

Datasets bridging computer vision and robotics by providing high-quality visual annotations in manipulation-relevant scenes remain limited. This work introduces the Multi-Object Photoreal Simulation (MOPS) dataset, which provides comprehensive ground truth annotations for photorealistic simulated environments. MOPS employs a zero-shot asset augmentation pipeline based on Large Language Models (LLM) to automatically normalize 3D object scale and generate part-level affordances. The dataset features pixel-level segmentations for tasks crucial to robotic perception, including fine-grained part segmentation and affordance prediction (e.g., “graspable” or “pushable”). By combining detailed annotations with photorealistic simulation, MOPS generates a vast, diverse collection of scenes to accelerate progress in robot perception and manipulation. We validate MOPS through vision and robot learning benchmarks.


Annotation Modalities

Rich, multi-modal ground truth for every scene


Key Features

🎨

Photorealistic Simulation

High-quality rendering via ManiSkill3 & SAPIEN, built on a normalized asset pipeline with automatic part-level annotation across multiple 3D libraries.

🤖

LLM-Powered Annotation

Zero-shot asset augmentation using large language models for automatic part-level labeling, scale normalization, and semantic understanding.

🏷️

Multi-Modal Ground Truth

RGB, depth, surface normals, part segmentation, affordance maps (graspable, pushable, …), and 6D pose — all pixel-aligned.

🏠

Diverse Environments

Kitchen environments, cluttered tabletops, and isolated object scenarios spanning 137 object categories and 56 affordance labels.


Results

Dataset Comparison

Taxonomic coverage vs. existing affordance datasets

Dataset Aff. Cat. Obj.
RGB-D Part 7 17 105
3D-AffNet 16 23 22,949
MOPS (Total) 56 137 3,353

MOPS leads on affordance label and category breadth; 3D-AffNet has more raw instances.

Robot Manipulation

Imitation learning on 24 RoboCasa tasks · 10 seeds each

21.25%
Success Rate
RGB + MOPS Affordances
+7.92pp
Absolute Gain
over RGB-only baseline
Policy Inputs Success Gain
RGB only 13.33%
+ MOPS Affordances 21.25% +7.92

MOPS affordances provide a consistent boost across all 24 tasks.


Getting Started

Alpha
Early release — API may change. Code is split across two repositories:

Prerequisites: Python 3.10  ·  CUDA-compatible GPU  ·  16 GB+ RAM

conda create -n mops python=3.10
conda activate mops

pip install mani_skill
git clone https://github.com/LiXiling/mops-data
cd mops-data
pip install -e .

📖 Full Installation Guide →


Citation

If you use MOPS in your research, please cite our work

@article{li2026mops,
  title   = {Multi-Objective Photoreal Simulation (MOPS) Dataset
             for Computer Vision in Robot Manipulation},
  author  = {Maximilian Xiling Li and Paul Mattes and
             Nils Blank and Rudolf Lioutikov},
  year    = {2026}
}