This repository contains the data generation pipeline and evaluation scripts for ReMindView-Bench for paper Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning
- Paper: Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning (arXiv:2512.02340).
- Dataset: ReMindView-Bench hosted on Hugging Face at Xue0823/ReMindView-Bench; scripts here reproduce the synthetic data.
- Figures in the repository mirror the organization of the generation pipeline (scene generation, rendering, QA construction).
- Blender: 3.6 LTS (bundled Python 3.10) or newer with
bpy==3.6.0(included in infinigen environment). - Python: 3.10 (matches Blender’s bundled interpreter).
- GPU: NVIDIA GPU with at least 12 GB VRAM recommended for rendering.
- OS: Linux or macOS; commands below assume Bash.
- Python dependencies: see
requirements.txt(mirrorsenvironment.yml) with tested versions for reproducibility.
- Clone the repo and switch to the root directory.
- Point to Blender’s bundled Python and install dependencies:
export BLENDER_ROOT=/path/to/blender-3.6 export BLENDER_BIN="$BLENDER_ROOT/blender" export BLENDER_PYTHON="$BLENDER_ROOT/3.6/python/bin/python3.10" "$BLENDER_PYTHON" -m pip install --upgrade pip "$BLENDER_PYTHON" -m pip install -r requirements.txt
- Ensure imports resolve when running through Blender by adding the repo to
PYTHONPATH:export PYTHONPATH="$PWD:$PYTHONPATH"
- Example: run a script with Blender’s interpreter so
bpyis available:Paths above are the defaults produced by the generation script; command-line arguments stay exposed to let you override locations if you customize the pipeline."$BLENDER_BIN" --background --python view_centric_view_frame_generation.py -- \ --config view_centric_view_frame_generation_config.json \ --output_dir view_centric_view_frame_outputs/Bedroom/Bedroom_0 \ --room_type bedroom \ --scene_path outputs/indoors/Bedroom_0/scene.blend
- Generate scenes and renders (object-centric and view-centric) from the repo root:
Scenes land in
bash scene_generation.sh
outputs/indoors/<ROOM>_<SEED>, object-centric frames inobject_centric_view_frame_outputs/<ROOM>/<ROOM>_<SEED>, and view-centric frames inview_centric_view_frame_outputs/<ROOM>/<ROOM>_<SEED>. - Clean empty or invalid views (already invoked inside
scene_generation.sh, rerun if you tweak outputs):python clean_visual_data.py --dir_path object_centric_view_frame_outputs python clean_visual_data.py --dir_path view_centric_view_frame_outputs
- Build QA CSVs (choose one of
view_view,view_object,object_object):Output is stored next to the image folder (e.g.,python ground_truth_generation.py \ --image_folder object_centric_view_frame_outputs \ --qa_type object_object
object_centric_view_frame_outputs/object_object_qa.csv). - (Optional) Adjust outputs without editing scripts by passing flags (e.g.,
--output_dir,--scene_path). Defaults match the repo layout to minimize user error.
| Stage | Script(s) | Inputs | Outputs |
|---|---|---|---|
| Scene generation | scene_generation.sh (+ infinigen_examples.generate_indoors) |
Seeds, room types, gin configs | Blender scenes under outputs/indoors/<ROOM>_<SEED> |
| View rendering | object_centric_view_frame_generation.py, view_centric_view_frame_generation.py |
scene.blend, camera configs (*_generation_config.json) |
Rendered frames and metadata under object_centric_view_frame_outputs/ and view_centric_view_frame_outputs/ |
| Visibility pruning | clean_visual_data.py |
Rendered frame dirs | Pruned frame dirs without empty renders |
| Metadata extraction | generate_object_list.py |
Rendered frame dirs | Object visibility summaries used for QA |
| VQA construction | ground_truth_generation.py, ground_truth_generation.sh |
Visibility metadata, rename maps in object_rename/, query_template.json |
Multiple-choice QA CSVs for view-view, view-object, and object-object tasks |
ReMindView-Bench/
├─ scene_generation.sh # End-to-end loop over seeds/room types
├─ object_centric_view_frame_generation.py
├─ view_centric_view_frame_generation.py
├─ object_centric_view_frame_generation_config.json
├─ view_centric_view_frame_generation_config.json
├─ clean_visual_data.py
├─ generate_object_list.py
├─ ground_truth_generation.py
├─ ground_truth_generation.sh
├─ object_rename/ # Rename/orientation metadata per room
├─ evaluation/ # Model evaluation entrypoints (see notes)
├─ figures/ # Paper figures used above
├─ environment.yml
└─ requirements.txt
The organization mirrors the paper sections (scene generation → rendering → QA). Files are listed once to avoid duplication between sections.
Evaluation entrypoints live in evaluation/ (eval_models.sh, eval_internvl35.py, eval_llava_onevision.py, eval_qwen_vl.py). Some evaluation code paths are intentionally left minimal; add checkpoints and dataset paths before use. Additional evaluation scripts will be added here as they are released.
Download the dataset from Hugging Face or produce it locally via the pipeline above. Simple example to load QA pairs for a VQA task:
import pandas as pd
qa = pd.read_csv("object_centric_view_frame_outputs/object_object_qa.csv")
print(qa.head())Each row includes the frame folder, query text, ground truth, and choices, so you can pair images from the same folder for VQA training or evaluation.
If you use ReMindView-Bench, please cite the accompanying paper:
@article{xue2025reasoning,
title={Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective},
author={Xue, Qiyao and Liu, Weichen and Wang, Shiqi and Wang, Haoming and Wu, Yuyang and Gao, Wei},
journal={arXiv preprint arXiv:2512.02340},
year={2025}
}See LICENSE for usage terms. A permissive research-friendly license will be finalized; until then, reach out to the authors for questions about reuse.

