Skip to content

Yiju1213/3DA-VTG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

3DA-VTG: Explicitly-Aligned Visuo-Tactile Grasp Framework


📝 1. Paper Overview

Recent visuo-tactile grasping research has increasingly emphasized the value of combining tactile sensing with vision to enhance in-contact manipulation. However, most existing approaches:

  • rely on 2D feature-space fusion, lacking explicit spatial alignment,
  • contain limited annotation density, and
  • do not provide ground-truth visuo-tactile correspondence in 3D.

To overcome these limitations, our work proposes:

🌌 3DA-VTG Framework

A complete, explicitly-aligned 3D visuo-tactile learning setup consisting of:

  • a large-scale multimodal grasp dataset,
  • a unified 3D sensory alignment & reconstruction pipeline,
  • a shape completion module to recover occluded geometry,
  • a geometry-aware stability prediction network (SGA-GSN).

🔓 2. Open-Source Plan

Component Status
📁 3DA-VTG Dataset sensory/pose/stability data collected in simulation
🧰 Dataset APIs aligned data loading, tactile depth tools, 3D reconstruction scripts
🧠 Full Framework Code unified 3D pipeline + shape completion integration
🌀 SGA-GSN Network training, testing, inference, weights

🔔 Will be released progressively following paper acceptance.Please Star & Watch the repository for updates.


📦 3. Dataset: 3DA-VTG (Aligned Visuo-Tactile Grasp Dataset)

📍 Dataset Pipeline

Dataset Pipeline

The 3DA-VTG dataset is constructed using a simplified robot handover scenario in simulation to provide dense, explicitly aligned visuo-tactile data.

📌 Key Properties

  • 440K grasp trials, ~5K per object
  • 88 objects (YCB, DexNet 2.0, self-collected objects from GraspNet-1Billion)
  • Each grasp sample includes:
    • RGB-D visual observation
    • Dual GelSight tactile RGB-D images
    • 6-DoF extrinsic parameters (camera, gels, object)
    • Unified visuo-tactile 3D point cloud
    • Stable / unstable grasp outcome

🧠 4. The 3DA-VTG Framework

📍 Framework Overview

3DA-VTG Framework

The proposed framework performs explicit spatial fusion, rather than feature-space concatenation, through three stages:

🏗️ Stage ① Unified 3D Representation

  • SAM-prompted visual segmentation
  • Tactile depth estimation using a transformer
  • Reconstruction into a common world frame
  • Produces aligned sensory 3D point clouds

🔧 Stage ② Multimodal Shape Completion

  • Based on AdaPoinTr
  • Balanced sampling between visual & tactile inputs
  • Completes occluded geometry invisible to the RGB camera
  • Output: a full object point cloud

🧩 Stage ③ Geometry-Aware Stability Prediction (SGA-GSN)

  • Dual-branch feature extraction (contact vs. shape)
  • Geometry-aware cross-attention fusion
  • Multi-resolution spatial reasoning
  • Output: binary stability label

📊 5. Performance of the Framework (via SGA-GSN)

📌 SGA-GSN serves as the stability prediction module of the 3DA-VTG framework.

💻 Results on 3DA-VTG Dataset

Split Accuracy F1 Score
Seen Objects 81.9% 79.3%
Unseen Objects 80.0% 80.7%

⭐ Citation

If you find this framework or dataset useful, please cite:


About

3DA-VTG Dataset and perception framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors