Skip to content

Commit 7eb394a

Browse files
committed
Initial commit
1 parent 75e1f87 commit 7eb394a

File tree

10 files changed

+1506
-2
lines changed

10 files changed

+1506
-2
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,3 +160,5 @@ cython_debug/
160160
# and can be added to the global gitignore or merged into this file. For a more nuclear
161161
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162162
#.idea/
163+
data
164+
results

README.md

Lines changed: 166 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,166 @@
1-
# EmbodiedEval
2-
Evaluate Multimodal LLMs as Embodied Agents
1+
# EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
2+
<p align="center">
3+
<a href="https://embodiedeval.github.io" target="_blank">🌐 Project Page</a> | <a href="https://huggingface.co/datasets/EmbodiedEval/EmbodiedEval" target="_blank">🤗 Dataset</a> | <a href="" target="_blank">📃 Paper </a>
4+
</p>
5+
6+
**EmbodiedEval** is a comprehensive and interactive benchmark designed to evaluate the capabilities of MLLMs in embodied tasks.
7+
8+
## Installation
9+
10+
### Setup Simulation Environment
11+
12+
EmbodiedEval includes a 3D simulator for realtime simulation. You have two options to run the simulator:
13+
14+
Option 1: Run the simulator on your personal computer with a display (Windows/MacOS/Linux). No additional configuration is required. The subsequent installation and data download (approximately 20GB of space) will take place on your computer.
15+
16+
Option 2: Run the simulator on a Linux server, which requires sudo access, up-to-date NVIDIA drivers, and running outside a Docker container. Additional configurations are required as follows:
17+
18+
<details>
19+
<summary>Additional configurations</summary>
20+
<br>
21+
22+
1. Install Xorg:
23+
24+
```
25+
sudo apt install -y gcc make pkg-config xorg
26+
```
27+
28+
2. Generate .conf file:
29+
30+
```
31+
sudo nvidia-xconfig --no-xinerama --probe-all-gpus --use-display-device=none
32+
sudo cp /etc/X11/xorg.conf /etc/X11/xorg-0.conf
33+
```
34+
35+
3. Edit /etc/X11/xorg-0.conf:
36+
37+
- Remove "ServerLayout" and "Screen" section.
38+
- Set `BoardName` and `BusID` of "Device" section to the corresponding `Name` and `PCI BusID` of a GPU displayed by the `nvidia-xconfig --query-gpu-info` command. For example:
39+
```
40+
Section "Device"
41+
Identifier "Device0"
42+
Driver "nvidia"
43+
VendorName "NVIDIA Corporation"
44+
BusID "PCI:164:0:0"
45+
BoardName "NVIDIA GeForce RTX 3090"
46+
EndSection
47+
```
48+
49+
4. Run Xorg:
50+
51+
```
52+
sudo nohup Xorg :0 -config /etc/X11/xorg-0.conf &
53+
```
54+
55+
5. Set the display (Remember to run the following command in every new terminal session before running the evaluation code):
56+
57+
```
58+
export DISPLAY=:0
59+
```
60+
</details>
61+
62+
63+
### Install Dependencies
64+
65+
```bash
66+
conda create -n embodiedeval python=3.10
67+
conda activate embodiedeval
68+
pip install -r requirements.txt
69+
```
70+
71+
### Download Dataset
72+
73+
```bash
74+
python download.py
75+
```
76+
77+
78+
## Evaluation
79+
80+
### Run Baselines
81+
82+
#### Random baseline
83+
84+
```bash
85+
python run_eval.py --agent random
86+
```
87+
88+
#### Human baseline
89+
90+
```bash
91+
python run_eval.py --agent human
92+
```
93+
94+
In human baseline, you can manually interact with the environment.
95+
<details>
96+
<summary>How to play</summary>
97+
<br>
98+
99+
- Use the keyboard to press the corresponding number to choose an option;
100+
101+
- Pressing W/A/D will map to the forward/turn left/turn right options in the menu;
102+
103+
- Pressing Enter opens or closes the chat window, and you can enter option numbers greater than 9;
104+
105+
- Pressing T will hide/show the options panel.
106+
</details>
107+
108+
#### GPT-4o
109+
110+
Edit the `api_key` and `base_url` in agent.py and run:
111+
```bash
112+
python run_eval.py --agent gpt-4o
113+
```
114+
115+
### Evaluate Your Own Model
116+
117+
To evaluate your own model, you need to overwrite the `MyAgent` class in `agent.py`.
118+
In the `__init__` method, you need to load the model or initialize the API.
119+
In the `generate` method, you need to perform model inference or API calls and return the generated text. See the comments within the class for details.
120+
121+
Run the following code to evaluate your model.
122+
```bash
123+
python run_eval.py --agent myagent
124+
```
125+
126+
If your server cannot run the simulator (e.g. without sudo access), and your personal computer cannot run the model. You can run simulation on your computer and the model on the server using the following steps:
127+
<details>
128+
<summary>Evaluation steps with a remote simulator</summary>
129+
<br>
130+
131+
1. Perform the `Install Dependencies` and `Download Dataset` steps on both your local computer and the server.
132+
133+
2. On the server, run:
134+
```
135+
python run_eval.py --agent myagent --remote --scene_folder <The absolute path of the scene folder on your local computer>
136+
```
137+
This command will hang, waiting for the simulator to connect.
138+
139+
140+
3. On your computer, set up a SSH tunnel between your computer and the server:
141+
```
142+
ssh -N -L 50051:localhost:50051 <username>@<host> [-p <ssh_port>]
143+
```
144+
145+
4. On your computer, launch the simulator:
146+
```
147+
python launch.py
148+
```
149+
150+
Once the simulator starts, the evaluation process on the server will begin.
151+
152+
</details>
153+
154+
155+
### Compute Metrics
156+
157+
Run metrics.py with the result folder as a parameter to compute the performance. The `total_metrics.json` (overall performance) and `type_metrics.json` (performance per task type) will be saved in the result folder.
158+
159+
```
160+
python metrics.py --result_folder results/xxx-xxx-xxx
161+
```
162+
163+
### Citation
164+
165+
```
166+
```

0 commit comments

Comments
 (0)