Skip to content

Pico-Developer/SpatialMP4

Repository files navigation

SpatialMP4

A C++/Python toolkit for processing SpatialMP4 format, supporting reading and processing spatial video files containing RGB, depth, pose, and audio data.

πŸš€ Features

  • Multi-modal Data Support: Simultaneously process RGB images, depth maps, pose data, and audio
  • Stereo Vision: Support for left and right eye RGB image data
  • High Performance: Efficient video decoding based on FFmpeg
  • Flexible Reading Modes: Support for RGB-only, Depth-only, and Depth-first reading modes
  • Random Access: Support for random access to video frames and keyframe indexing
  • 3D Reconstruction: Built-in point cloud generation and RGBD data processing
  • Camera Calibration: Support for reading and applying intrinsic and extrinsic parameters
  • Visualization Tools: Rich data visualization and debugging capabilities
  • Cross-Platform: Full support for Linux and macOS

πŸ“‹ System Requirements

  • Operating System:
    • Linux (Ubuntu 18.04+ recommended)
    • macOS (10.15+ Catalina, Xcode required)
  • Compiler:
    • GCC 7.0+ or Clang 6.0+ (C++17 support required)
    • Apple Clang from Xcode 11.0+ on macOS (how-to-install)
  • CMake: 3.24.1+

πŸ”§ Dependencies

The project depends on the following third-party libraries:

  • FFmpeg: Video encoding/decoding (libavformat, libavcodec, libswscale)
  • OpenCV: Image processing and computer vision
  • Eigen3: Linear algebra operations
  • Sophus: Lie group operations for SE(3) group
  • spdlog: High-performance logging library
  • fmt: Modern C++ formatting library
  • Google Test: Unit testing framework (optional)

πŸ› οΈ Build and Installation (cpp)

1. Clone Repository

git clone https://github.com/Pico-Developer/SpatialMP4
cd SpatialMP4

2. Build FFmpeg

Build ffmpeg first:

bash scripts/build_ffmpeg.sh

3. Install Dependencies

bash scripts/install_deps.sh

4. Configure and Build

mkdir build && cd build

# Configure project
cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_PYTHON=OFF

# Build
make -j$(nproc)  # On Linux
make -j$(sysctl -n hw.ncpu)  # On macOS

Sometime it can be difficult to build a c++ source project. FAQ for installation may help you. If still not working, welcome submit a issue.

5. Run Tests (Optional)

# Build with tests
cmake .. -DBUILD_TESTING=ON
make -j$(nproc)  # On Linux
make -j$(sysctl -n hw.ncpu)  # On macOS

# Run tests
cd ..
./build/test_reader

πŸ› οΈ Build and Installation (python)

1. Clone Repository

git clone https://github.com/Pico-Developer/SpatialMP4
cd SpatialMP4

2. Build FFmpeg

Build ffmpeg first:

bash scripts/build_ffmpeg.sh

3. Install Dependencies

bash scripts/install_deps.sh

4. Build and Install

pip3 install .

πŸ“– Usage Guide (cpp)

Basic Usage

#include "spatialmp4/reader.h"
#include "spatialmp4/data_types.h"

// Create reader
SpatialML::Reader reader("path/to/your/spatial.mp4");

// Check data types
if (reader.HasRGB()) {
    std::cout << "Contains RGB data" << std::endl;
}
if (reader.HasDepth()) {
    std::cout << "Contains depth data" << std::endl;
}
if (reader.HasPose()) {
    std::cout << "Contains pose data" << std::endl;
}

// Get camera parameters
auto rgb_intrinsics = reader.GetRgbIntrinsicsLeft();
auto depth_intrinsics = reader.GetDepthIntrinsics();

Reading RGB and Depth Data

// Set reading mode
reader.SetReadMode(SpatialML::Reader::ReadMode::DEPTH_FIRST);
reader.Reset();

while (reader.HasNext()) {
    SpatialML::rgb_frame rgb_frame;
    SpatialML::depth_frame depth_frame;
    
    // Read RGB and depth frames simultaneously
    reader.Load(rgb_frame, depth_frame);
    
    // Process data
    cv::Mat left_rgb = rgb_frame.left_rgb;
    cv::Mat right_rgb = rgb_frame.right_rgb;
    cv::Mat depth = depth_frame.depth;
    
    std::cout << "RGB timestamp: " << rgb_frame.timestamp << std::endl;
    std::cout << "Depth timestamp: " << depth_frame.timestamp << std::endl;
}

RGB-Only Reading

reader.SetReadMode(SpatialML::Reader::ReadMode::RGB_ONLY);
reader.Reset();

while (reader.HasNext()) {
    SpatialML::rgb_frame rgb_frame;
    reader.Load(rgb_frame);
    
    // Process RGB data
    cv::imshow("Left RGB", rgb_frame.left_rgb);
    cv::imshow("Right RGB", rgb_frame.right_rgb);
    cv::waitKey(1);
}

Pose Data Processing

// Get all pose data
auto pose_frames = reader.GetPoseFrames();

for (const auto& pose : pose_frames) {
    // Convert to SE(3) representation
    Sophus::SE3d se3_pose = pose.as_se3();
    
    // Get rotation and translation
    Eigen::Matrix3d rotation = se3_pose.rotationMatrix();
    Eigen::Vector3d translation = se3_pose.translation();
    
    std::cout << "Pose timestamp: " << pose.timestamp << std::endl;
    std::cout << "Position: " << translation.transpose() << std::endl;
}

Depth Projection to RGB

#include "utilities/RgbdUtils.h"

// Get camera parameters
auto rgb_intrinsics = reader.GetRgbIntrinsicsLeft().as_cvmat();
auto depth_intrinsics = reader.GetDepthIntrinsics().as_cvmat();

// Calculate transformation matrix
auto T_I_Srgb = reader.GetRgbExtrinsicsLeft().as_se3();
auto T_I_Stof = reader.GetDepthExtrinsics().as_se3();
auto T_Srgb_Stof = T_I_Srgb.inverse() * T_I_Stof;

// Project depth to RGB
cv::Mat projected_depth;
Utilities::ProjectDepthToRgb(depth_frame.depth, rgb_frame.left_rgb, 
                           rgb_intrinsics, depth_intrinsics, 
                           T_Srgb_Stof, projected_depth);

Point Cloud Generation

#include "utilities/PointcloudUtils.h"

// Generate point cloud from RGBD data
Utilities::Pointcloud pcd;
Utilities::RgbdToPointcloud(rgb_frame.left_rgb, projected_depth, 
                          rgb_intrinsics, pcd, 10.0f);

// Save point cloud
Utilities::SavePointcloudToFile("output.obj", pcd);

πŸ“š API Reference (cpp)

SpatialML::Reader

Main SpatialMP4 file reader class.

Constructor

Reader(const std::string& filename)

Data Check Methods

bool HasRGB() const;        // Whether contains RGB data
bool HasDepth() const;      // Whether contains depth data  
bool HasPose() const;       // Whether contains pose data
bool HasAudio() const;      // Whether contains audio data
bool HasDisparity() const;  // Whether contains disparity data

Parameter Getter Methods

camera_intrinsics GetRgbIntrinsicsLeft() const;   // Left RGB camera intrinsics
camera_intrinsics GetRgbIntrinsicsRight() const;  // Right RGB camera intrinsics
camera_extrinsics GetRgbExtrinsicsLeft() const;   // Left RGB camera extrinsics
camera_extrinsics GetRgbExtrinsicsRight() const;  // Right RGB camera extrinsics
camera_intrinsics GetDepthIntrinsics() const;     // Depth camera intrinsics
camera_extrinsics GetDepthExtrinsics() const;     // Depth camera extrinsics

Reading Control Methods

void SetReadMode(ReadMode mode);  // Set reading mode
bool HasNext() const;             // Whether has next frame
void Reset();                     // Reset to beginning
int GetIndex() const;             // Get current index

Data Loading Methods

void Load(rgb_frame& rgb_frame);                              // Load RGB frame
void Load(depth_frame& depth_frame);                          // Load depth frame  
void Load(rgb_frame& rgb_frame, depth_frame& depth_frame);    // Load RGB and depth frames simultaneously

Data Structures

rgb_frame

struct rgb_frame {
    double timestamp;     // Timestamp
    cv::Mat left_rgb;     // Left eye RGB image
    cv::Mat right_rgb;    // Right eye RGB image
    pose_frame pose;      // Corresponding pose data
};

depth_frame

struct depth_frame {
    double timestamp;     // Timestamp
    cv::Mat depth;        // Depth image
    pose_frame pose;      // Corresponding pose data
};

pose_frame

struct pose_frame {
    double timestamp;     // Timestamp
    double x, y, z;       // Position
    double qw, qx, qy, qz; // Quaternion rotation
    
    Sophus::SE3d as_se3() const;  // Convert to SE(3) representation
};

πŸ“– Usage Guide (python)

Basic Usage Example

import spatialmp4

# Create a reader
reader = spatialmp4.Reader("your_video.mp4")

# Check available streams
print("Has RGB:", reader.has_rgb())
print("Has Depth:", reader.has_depth())
print("Has Pose:", reader.has_pose())

# Set reading mode
reader.set_read_mode(spatialmp4.ReadMode.DEPTH_FIRST)

# Read frames
while reader.has_next():
    rgb_frame, depth_frame = reader.load_both()
    left_rgb = rgb_frame.left_rgb  # numpy array (H, W, 3)
    depth = depth_frame.depth      # numpy array (H, W)
    pose = rgb_frame.pose
    print("RGB timestamp:", rgb_frame.timestamp, "Pose:", pose.x, pose.y, pose.z)

πŸ“š API Reference (python)

Main Classes and Methods

spatialmp4.Reader

Main class for reading SpatialMP4 files.

  • Reader(filename: str) β€” Create a new reader for the given file.
  • has_rgb() -> bool β€” Whether the file contains RGB data.
  • has_depth() -> bool β€” Whether the file contains depth data.
  • has_pose() -> bool β€” Whether the file contains pose data.
  • has_audio() -> bool β€” Whether the file contains audio data.
  • has_disparity() -> bool β€” Whether the file contains disparity data.
  • get_duration() -> float β€” Get video duration in seconds.
  • get_rgb_fps() -> float β€” Get RGB stream FPS.
  • get_depth_fps() -> float β€” Get depth stream FPS.
  • get_rgb_width() -> int β€” Get RGB frame width.
  • get_rgb_height() -> int β€” Get RGB frame height.
  • get_depth_width() -> int β€” Get depth frame width.
  • get_depth_height() -> int β€” Get depth frame height.
  • get_rgb_intrinsics_left() -> CameraIntrinsics β€” Get left RGB camera intrinsics.
  • get_rgb_intrinsics_right() -> CameraIntrinsics β€” Get right RGB camera intrinsics.
  • get_rgb_extrinsics_left() -> CameraExtrinsics β€” Get left RGB camera extrinsics.
  • get_rgb_extrinsics_right() -> CameraExtrinsics β€” Get right RGB camera extrinsics.
  • get_depth_intrinsics() -> CameraIntrinsics β€” Get depth camera intrinsics.
  • get_depth_extrinsics() -> CameraExtrinsics β€” Get depth camera extrinsics.
  • get_pose_frames() -> List[PoseFrame] β€” Get all pose frames.
  • set_read_mode(mode: ReadMode) β€” Set reading mode (see enums below).
  • has_next() -> bool β€” Whether there is a next frame.
  • reset() β€” Reset to the beginning of the file.
  • get_index() -> int β€” Get current frame index.
  • get_frame_count() -> int β€” Get total number of frames.
  • load_rgb() -> RGBFrame β€” Load the next RGB frame.
  • load_depth(raw_head_pose: bool = False) -> DepthFrame β€” Load the next depth frame.
  • load_both() -> (RGBFrame, DepthFrame) β€” Load the next RGB and depth frames simultaneously.
  • load_rgbd(densify: bool = False) -> Rgbd β€” Load RGBD data (for advanced use).

spatialmp4.RGBFrame

  • timestamp: float β€” Frame timestamp.
  • left_rgb: np.ndarray β€” Left RGB image (H, W, 3, uint8).
  • right_rgb: np.ndarray β€” Right RGB image (H, W, 3, uint8).
  • pose: PoseFrame β€” Associated pose data.

spatialmp4.DepthFrame

  • timestamp: float β€” Frame timestamp.
  • depth: np.ndarray β€” Depth image (H, W, float32, meters).
  • pose: PoseFrame β€” Associated pose data.

spatialmp4.PoseFrame

  • timestamp: float β€” Pose timestamp.
  • x, y, z: float β€” Position.
  • qw, qx, qy, qz: float β€” Quaternion orientation.
  • as_se3() β€” Convert to SE(3) representation (requires Sophus/Eigen, advanced use).

spatialmp4.CameraIntrinsics

  • fx, fy, cx, cy: float β€” Camera intrinsic parameters.
  • as_cvmat() β€” Return as OpenCV matrix.

spatialmp4.CameraExtrinsics

  • extrinsics: np.ndarray β€” 4x4 extrinsic matrix.
  • as_cvmat() β€” Return as OpenCV matrix.
  • as_se3() β€” Return as SE(3) (advanced use).

Enums

spatialmp4.ReadMode

  • RGB_ONLY β€” Only read RGB frames.
  • DEPTH_ONLY β€” Only read depth frames.
  • DEPTH_FIRST β€” Read both RGB and depth frames, depth as reference.

spatialmp4.StreamType

  • UNKNOWN β€” Unknown stream type
  • AUDIO β€” Audio stream
  • AUDIO_2 β€” Secondary audio stream
  • RGB β€” RGB video stream
  • DISPARITY β€” Disparity stream
  • POSE β€” Pose data stream
  • DEPTH β€” Depth stream

Advanced Usage

  • See examples/python/visualize_rerun.py and examples/python/generate_pcd.py for advanced usage, including point cloud generation and visualization with Open3D or Rerun.
  • All image and depth data are returned as NumPy arrays for easy integration with OpenCV, Open3D, PyTorch, etc.
  • Camera parameters and pose data can be used for 3D reconstruction and SLAM applications.

πŸ› Debugging and Logging

The project uses spdlog for logging:

#include <spdlog/spdlog.h>

// Set log level
spdlog::set_level(spdlog::level::debug);

// Debug information will be automatically output in the code

πŸ“„ License

This project is licensed under the MIT. See the LICENSE file for details.

🀝 Contributing

Issues and Pull Requests are welcome to improve this project!

πŸ“ž Contact

For questions or suggestions, please contact us through GitHub Issues.

About

SpatialMP4 format cpp/python toolkit.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published