3DGraphLLM

In this work, we propose 3DGraphLLM, a method for constructing a learnable representation of a 3D scene graph, which serves as input for LLMs to perform 3D vision-language tasks.

News

[2024.12] We release 3DGraphLLM pre-training on GT instance segmentation scene graphs

[2024.12] We release 3DGraphLLM paper code

🔥 Semantic relations boost LLM performance on 3D Referred Object Grounding and Dense Scene Captioning tasks

	ScanRefer		Multi3dRefer		Scan2Cap		ScanQA		SQA3D
	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	CIDEr	B-4	EM
Chat-Scene	55.5	50.2	57.1	52.3	77.1	36.3	87.7	14.3	54.6
3DGraphLLM Vicuna-1.5	57.0	51.3	60.1	55.4	81.2	36.3	87.6	12.1	53.1
3DGraphLLM LLAMA3-8B	60.2	54.6	63.0	58.2	82.9	37.8	83.1	12.5	55.2

🔨 Preparation

Prepare the environment:

conda create -n 3dgraphllm python=3.9.17
conda activate 3dgraphllm
conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

If you don't have root permissions to install java (needed for pycocoeval scripts for metrics such as BLEU and CIDER), install it with conda:

conda install -c conda-forge openjdk

Download LLM backbone:
- We use LLAMA3-8B-Instruct in our experiments, which can be downloaded from Hugging Face.
- Change the llama_model_path in config.py to the path of LLAMA3-8B-Instruct.
Annotations and extracted features:

Please follow the instructions in preprocess.

🤖 Training and Inference

Pre-training on GT instance segmentation scene graphs.
- Modify run_gt_pretrain.sh:
```
train_tag="scanrefer#scan2cap#scanqa#sqa3d#multi3dref#nr3d_caption#obj_align"
val_tag="scanrefer#scan2cap#scanqa#sqa3d#multi3dref"
evaluate=False
```
  Explanation of "train_tag" and "val_tag"
  - Use # to seperate different datasets
  - Datasets:
    - scanrefer: ScanRefer Dataset
    - scan2cap: Scan2Cap Dataset
    - scanqa: ScanQA Dataset
    - sqa3d: SQA3D Dataset
    - multi3dref: Multi3dRefer Dataset
    - nr3d_caption: A captioning dataset originated from Nr3D.
    - obj_align: A dataset originated from ScanRefer to align the object identifiers with object tokens.
- Run: bash scripts/run_gt_pretrain.sh

Training

Modify run.sh:

train_tag="scanrefer#scan2cap#scanqa#sqa3d#multi3dref#nr3d_caption#obj_align"
val_tag="scanrefer#scan2cap#scanqa#sqa3d#multi3dref"
evaluate=False
pretrained_path="outputs/llama3-8b-gt-pretrain-2/ckpt_00_28927.pth"

Run: bash scripts/run.sh

Inference

Modify run.sh:

val_tag="scanrefer#scan2cap#scanqa#sqa3d#multi3dref"
evaluate=True
pretrained_path="/path/to/pretrained_model.pth"

Run: bash scripts/run.sh

🚀 Demo

Run: bash demo/run_demo.sh. You will be prompted to ask different queries about Scene 435 of ScanNet.

📪 Contact

If you have any questions about the project, please open an issue in this repository or send an email to Tatiana Zemskova.

📑 Citation

If you find this work helpful, please consider citing our work as:

@misc{zemskova20243dgraphllm,
      title={3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding}, 
      author={Tatiana Zemskova and Dmitry Yudin},
      year={2024},
      eprint={2412.18450},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.18450}, 
}

😊 Acknowledgement

Thanks to the open source of the following projects:

Chat-Scene

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
dataset		dataset
demo		demo
models		models
others		others
preprocess		preprocess
prompts		prompts
scripts		scripts
tasks		tasks
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

3DGraphLLM

News

🔥 Semantic relations boost LLM performance on 3D Referred Object Grounding and Dense Scene Captioning tasks

🔨 Preparation

🤖 Training and Inference

🚀 Demo

📪 Contact

📑 Citation

😊 Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

CognitiveAISystems/3DGraphLLM

Folders and files

Latest commit

History

Repository files navigation

3DGraphLLM

News

🔥 Semantic relations boost LLM performance on 3D Referred Object Grounding and Dense Scene Captioning tasks

🔨 Preparation

🤖 Training and Inference

🚀 Demo

📪 Contact

📑 Citation

😊 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages