3DA-VTG: Explicitly-Aligned Visuo-Tactile Grasp Framework

📝 1. Paper Overview

Recent visuo-tactile grasping research has increasingly emphasized the value of combining tactile sensing with vision to enhance in-contact manipulation. However, most existing approaches:

rely on 2D feature-space fusion, lacking explicit spatial alignment,
contain limited annotation density, and
do not provide ground-truth visuo-tactile correspondence in 3D.

To overcome these limitations, our work proposes:

🌌 3DA-VTG Framework

A complete, explicitly-aligned 3D visuo-tactile learning setup consisting of:

a large-scale multimodal grasp dataset,
a unified 3D sensory alignment & reconstruction pipeline,
a shape completion module to recover occluded geometry,
a geometry-aware stability prediction network (SGA-GSN).

🔓 2. Open-Source Plan

Component	Status
📁 3DA-VTG Dataset	sensory/pose/stability data collected in simulation
🧰 Dataset APIs	aligned data loading, tactile depth tools, 3D reconstruction scripts
🧠 Full Framework Code	unified 3D pipeline + shape completion integration
🌀 SGA-GSN Network	training, testing, inference, weights

🔔 Will be released progressively following paper acceptance. ⭐ Please Star & Watch the repository for updates.

📦 3. Dataset: 3DA-VTG (Aligned Visuo-Tactile Grasp Dataset)

📍 Dataset Pipeline

The 3DA-VTG dataset is constructed using a simplified robot handover scenario in simulation to provide dense, explicitly aligned visuo-tactile data.

📌 Key Properties

440K grasp trials, ~5K per object
88 objects (YCB, DexNet 2.0, self-collected objects from GraspNet-1Billion)
Each grasp sample includes:
- RGB-D visual observation
- Dual GelSight tactile RGB-D images
- 6-DoF extrinsic parameters (camera, gels, object)
- Unified visuo-tactile 3D point cloud
- Stable / unstable grasp outcome

🧠 4. The 3DA-VTG Framework

📍 Framework Overview

The proposed framework performs explicit spatial fusion, rather than feature-space concatenation, through three stages:

🏗️ Stage ① Unified 3D Representation

SAM-prompted visual segmentation
Tactile depth estimation using a transformer
Reconstruction into a common world frame
Produces aligned sensory 3D point clouds

🔧 Stage ② Multimodal Shape Completion

Based on AdaPoinTr
Balanced sampling between visual & tactile inputs
Completes occluded geometry invisible to the RGB camera
Output: a full object point cloud

🧩 Stage ③ Geometry-Aware Stability Prediction (SGA-GSN)

Dual-branch feature extraction (contact vs. shape)
Geometry-aware cross-attention fusion
Multi-resolution spatial reasoning
Output: binary stability label

📊 5. Performance of the Framework (via SGA-GSN)

📌 SGA-GSN serves as the stability prediction module of the 3DA-VTG framework.

💻 Results on 3DA-VTG Dataset

Split	Accuracy	F1 Score
Seen Objects	81.9%	79.3%
Unseen Objects	80.0%	80.7%

⭐ Citation

If you find this framework or dataset useful, please cite:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figures		figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3DA-VTG: Explicitly-Aligned Visuo-Tactile Grasp Framework

📝 1. Paper Overview

🌌 3DA-VTG Framework

🔓 2. Open-Source Plan

📦 3. Dataset: 3DA-VTG (Aligned Visuo-Tactile Grasp Dataset)

📍 Dataset Pipeline

📌 Key Properties

🧠 4. The 3DA-VTG Framework

📍 Framework Overview

🏗️ Stage ① Unified 3D Representation

🔧 Stage ② Multimodal Shape Completion

🧩 Stage ③ Geometry-Aware Stability Prediction (SGA-GSN)

📊 5. Performance of the Framework (via SGA-GSN)

💻 Results on 3DA-VTG Dataset

⭐ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

3DA-VTG: Explicitly-Aligned Visuo-Tactile Grasp Framework

📝 1. Paper Overview

🌌 3DA-VTG Framework

🔓 2. Open-Source Plan

📦 3. Dataset: 3DA-VTG (Aligned Visuo-Tactile Grasp Dataset)

📍 Dataset Pipeline

📌 Key Properties

🧠 4. The 3DA-VTG Framework

📍 Framework Overview

🏗️ Stage ① Unified 3D Representation

🔧 Stage ② Multimodal Shape Completion

🧩 Stage ③ Geometry-Aware Stability Prediction (SGA-GSN)

📊 5. Performance of the Framework (via SGA-GSN)

💻 Results on 3DA-VTG Dataset

⭐ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages