Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 82 additions & 118 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,152 +1,116 @@
# SAM 3D
# SAM 3D Objects - USD Asset Generation Pipeline

SAM 3D Objects is one part of SAM 3D, a pair of models for object and human mesh reconstruction. If you’re looking for SAM 3D Body, [click here](https://github.com/facebookresearch/sam-3d-body).
Convert 2D images into 3D USD assets which could directly used for Isaac Sim. This pipeline uses LangSplat for image segmentation and SAM 3D Objects for 3D reconstruction.

# SAM 3D Objects
<img src="doc/photo.png" height="280" alt="Input Image"> <img src="doc/usd_in_isaac.png" height="280" alt="USD in Isaac Sim">

**SAM 3D Team**, [Xingyu Chen](https://scholar.google.com/citations?user=gjSHr6YAAAAJ&hl=en&oi=sra)\*, [Fu-Jen Chu](https://fujenchu.github.io/)\*, [Pierre Gleize](https://scholar.google.com/citations?user=4imOcw4AAAAJ&hl=en&oi=ao)\*, [Kevin J Liang](https://kevinjliang.github.io/)\*, [Alexander Sax](https://alexsax.github.io/)\*, [Hao Tang](https://scholar.google.com/citations?user=XY6Nh9YAAAAJ&hl=en&oi=sra)\*, [Weiyao Wang](https://sites.google.com/view/weiyaowang/home)\*, [Michelle Guo](https://scholar.google.com/citations?user=lyjjpNMAAAAJ&hl=en&oi=ao), [Thibaut Hardin](https://github.com/Thibaut-H), [Xiang Li](https://ryanxli.github.io/)⚬, [Aohan Lin](https://github.com/linaohan), [Jia-Wei Liu](https://jia-wei-liu.github.io/), [Ziqi Ma](https://ziqi-ma.github.io/)⚬, [Anushka Sagar](https://www.linkedin.com/in/anushkasagar/), [Bowen Song](https://scholar.google.com/citations?user=QQKVkfcAAAAJ&hl=en&oi=sra)⚬, [Xiaodong Wang](https://scholar.google.com/citations?authuser=2&user=rMpcFYgAAAAJ), [Jianing Yang](https://jedyang.com/)⚬, [Bowen Zhang](http://home.ustc.edu.cn/~zhangbowen/)⚬, [Piotr Dollár](https://pdollar.github.io/)†, [Georgia Gkioxari](https://georgiagkioxari.com/)†, [Matt Feiszli](https://scholar.google.com/citations?user=A-wA73gAAAAJ&hl=en&oi=ao)†§, [Jitendra Malik](https://people.eecs.berkeley.edu/~malik/)†§
Based on [facebookresearch/sam-3d-objects/pull/38](https://github.com/facebookresearch/sam-3d-objects/pull/38) and [segment-anything-langsplat](https://github.com/minghanqin/segment-anything-langsplat). Tested on Ubuntu 24.04 + RTX 5090.

***Meta Superintelligence Labs***
## Overview

*Core contributor (Alphabetical, Equal Contribution), ⚬Intern, †Project leads, §Equal Contribution
Two-step process:

[[`Paper`](https://ai.meta.com/research/publications/sam-3d-3dfy-anything-in-images/)] [[`Code`](https://github.com/facebookresearch/sam-3d-objects)] [[`Website`](https://ai.meta.com/sam3d/)] [[`Demo`](https://www.aidemos.meta.com/segment-anything/editor/convert-image-to-3d)] [[`Blog`](https://ai.meta.com/blog/sam-3d/)] [[`BibTeX`](#citing-sam-3d-objects)]
1. **Stage 1**: Segment objects in images using LangSplat (creates masks)
2. **Stage 2**: Convert masks to 3D USD files with textures and physics

**SAM 3D Objects** is a foundation model that reconstructs full 3D shape geometry, texture, and layout from a single image, excelling in real-world scenarios with occlusion and clutter by using progressive training and a data engine with human feedback. It outperforms prior 3D generation models in human preference tests on real-world objects and scenes. We released code, weights, online demo, and a new challenging benchmark.
## Stage 1: Image Preprocessing

LangSplat automatically finds objects in images and creates masks. It generates three modes:

<p align="center"><img src="doc/intro.png"/></p>
- **s** (small): Detailed masks
- **m** (medium): Balanced masks
- **l** (large): Coarse masks (recommended for 3D)

-----
**LangSplat segmentation (l mode) vs default segmentation:**

<p align="center"><img src="doc/arch.png"/></p>
<img src="doc/segments_l.png" width="350" alt="LangSplat Segmentation"> | <img src="doc/segments_default.png" width="350" alt="Default Segmentation">
:---: | :---:
LangSplat (l mode) | Default

## Latest updates

**11/19/2025** - Checkpoints Launched, Web Demo and Paper are out.

## Installation
**Input**:
```
input_dir/
image.png
```

Follow the [setup](doc/setup.md) steps before running the following.
**Output**:
```
output_dir/
input_dir_name/
s/
image.png, 0.png, 1.png, ...
m/
image.png, 0.png, 1.png, ...
l/
image.png, 0.png, 1.png, ...
segments_*.png # Visualizations
```

## Single or Multi-Object 3D Generation
## Stage 2: 3D Reconstruction

SAM 3D Objects can convert masked objects in an image, into 3D models with pose, shape, texture, and layout. SAM 3D is designed to be robust in challenging natural images, handling small objects and occlusions, unusual poses, and difficult situations encountered in uncurated natural scenes like this kidsroom:
Converts masks into 3D USD files. Each USD file contains:

<p align="center">
<img src="notebook/images/shutterstock_stylish_kidsroom_1640806567/image.png" width="55%"/>
<img src="doc/kidsroom_transparent.gif" width="40%"/>
</p>
- **XForm wrapper**: All objects wrapped in XForm primitives
- **Textured mesh**: 3D model with textures
- **Scene scaling**: Properly scaled for your scene
- **Physics**: Rigid body, collision shapes, and mass
- **Default prim**: Ready for USD scene composition
- **Coordinate conversion**: Automatic Y-up to Z-up transform

For a quick start, run `python demo.py` or use the the following lines of code:
## Installation
Follow the [setup](doc/setup.md) steps before running the following.

```python
import sys
## Usage

# import inference code
sys.path.append("notebook")
from inference import Inference, load_image, load_single_mask
### Full Pipeline

# load model
tag = "hf"
config_path = f"checkpoints/{tag}/pipeline.yaml"
inference = Inference(config_path, compile=False)
Run both stages together:

# load image and mask
image = load_image("notebook/images/shutterstock_stylish_kidsroom_1640806567/image.png")
mask = load_single_mask("notebook/images/shutterstock_stylish_kidsroom_1640806567", index=14)
```bash
python demo.py --image_dir=notebook/images/isaac_lift_ball/ --output_dir=result
```

# run model
output = inference(image, mask, seed=42)
**Options**:
- `--image_dir`: Folder with `image.png`
- `--output_dir`: Where to save results
- `--segment_mode`: Use `'l'`, `'m'`, or `'s'` (default: `'l'`)
- `--sam_ckpt_path`: Path to SAM checkpoint (default: `checkpoints/samv1/sam_vit_h_4b8939.pth`)

# export gaussian splat
output["gs"].save_ply(f"splat.ply")
**Output**:
```
result/
images/
input_dir_name/
s/, m/, l/ # Masks for each mode
usds/
input_dir_name/
0.usd, 1.usd, ... # USD files for each object
```

For more details and multi-object reconstruction, please take a look at out two jupyter notebooks:
* [single object](notebook/demo_single_object.ipynb)
* [multi object](notebook/demo_multi_object.ipynb)
### Preprocessing Only

Just create masks without 3D reconstruction:

## SAM 3D Body
```bash
python preprocess.py --input_dir=notebook/images/isaac_lift_ball/ --output_dir=result --sam_ckpt_path=checkpoints/samv1/sam_vit_h_4b8939.pth
```

[SAM 3D Body (3DB)](https://github.com/facebookresearch/sam-3d-body) is a robust promptable foundation model for single-image 3D human mesh recovery (HMR).
**Options**:
- `--input_dir`: Folder with `image.png`
- `--output_dir`: Where to save masks
- `--sam_ckpt_path`: Path to SAM checkpoint

As a way to combine the strengths of both **SAM 3D Objects** and **SAM 3D Body**, we provide an example notebook that demonstrates how to combine the results of both models such that they are aligned in the same frame of reference. Check it out [here](notebook/demo_3db_mesh_alignment.ipynb).
This creates masks in s, m, and l modes only.

## License

The SAM 3D Objects model checkpoints and code are licensed under [SAM License](./LICENSE).

## Contributing

See [contributing](CONTRIBUTING.md) and the [code of conduct](CODE_OF_CONDUCT.md).

## Contributors

The SAM 3D Objects project was made possible with the help of many contributors.

Robbie Adkins,
Paris Baptiste,
Karen Bergan,
Kai Brown,
Michelle Chan,
Ida Cheng,
Khadijat Durojaiye,
Patrick Edwards,
Daniella Factor,
Facundo Figueroa,
Rene de la Fuente,
Eva Galper,
Cem Gokmen,
Alex He,
Enmanuel Hernandez,
Dex Honsa,
Leonna Jones,
Arpit Kalla,
Kris Kitani,
Helen Klein,
Kei Koyama,
Robert Kuo,
Vivian Lee,
Alex Lende,
Jonny Li,
Kehan Lyu,
Faye Ma,
Mallika Malhotra,
Sasha Mitts,
William Ngan,
George Orlin,
Peter Park,
Don Pinkus,
Roman Radle,
Nikhila Ravi,
Azita Shokrpour,
Jasmine Shone,
Zayida Suber,
Phillip Thomas,
Tatum Turner,
Joseph Walker,
Meng Wang,
Claudette Ward,
Andrew Westbury,
Lea Wilken,
Nan Yang,
Yael Yungster


## Citing SAM 3D Objects

If you use SAM 3D Objects in your research, please use the following BibTeX entry.
Built on:
- [SAM 3D Objects](https://github.com/facebookresearch/sam-3d-objects) - SAM License, [PR #38](https://github.com/facebookresearch/sam-3d-objects/pull/38)
- [segment-anything-langsplat](https://github.com/minghanqin/segment-anything-langsplat) - For preprocessing

```
@article{sam3dteam2025sam3d3dfyimages,
title={SAM 3D: 3Dfy Anything in Images},
author={SAM 3D Team and Xingyu Chen and Fu-Jen Chu and Pierre Gleize and Kevin J Liang and Alexander Sax and Hao Tang and Weiyao Wang and Michelle Guo and Thibaut Hardin and Xiang Li and Aohan Lin and Jiawei Liu and Ziqi Ma and Anushka Sagar and Bowen Song and Xiaodong Wang and Jianing Yang and Bowen Zhang and Piotr Dollár and Georgia Gkioxari and Matt Feiszli and Jitendra Malik},
year={2025},
eprint={2511.16624},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.16624},
}
```
## References

- [SAM 3D Objects](https://github.com/facebookresearch/sam-3d-objects)
- [USD Export PR #38](https://github.com/facebookresearch/sam-3d-objects/pull/38)
- [segment-anything-langsplat](https://github.com/minghanqin/segment-anything-langsplat)
Loading