A C++/Python toolkit for processing SpatialMP4 format, supporting reading and processing spatial video files containing RGB, depth, pose, and audio data.
- Multi-modal Data Support: Simultaneously process RGB images, depth maps, pose data, and audio
- Stereo Vision: Support for left and right eye RGB image data
- High Performance: Efficient video decoding based on FFmpeg
- Flexible Reading Modes: Support for RGB-only, Depth-only, and Depth-first reading modes
- Random Access: Support for random access to video frames and keyframe indexing
- 3D Reconstruction: Built-in point cloud generation and RGBD data processing
- Camera Calibration: Support for reading and applying intrinsic and extrinsic parameters
- Visualization Tools: Rich data visualization and debugging capabilities
- Cross-Platform: Full support for Linux and macOS
- Operating System:
- Linux (Ubuntu 18.04+ recommended)
- macOS (10.15+ Catalina, Xcode required)
- Compiler:
- GCC 7.0+ or Clang 6.0+ (C++17 support required)
- Apple Clang from Xcode 11.0+ on macOS (how-to-install)
- CMake: 3.24.1+
The project depends on the following third-party libraries:
- FFmpeg: Video encoding/decoding (libavformat, libavcodec, libswscale)
- OpenCV: Image processing and computer vision
- Eigen3: Linear algebra operations
- Sophus: Lie group operations for SE(3) group
- spdlog: High-performance logging library
- fmt: Modern C++ formatting library
- Google Test: Unit testing framework (optional)
git clone https://github.com/Pico-Developer/SpatialMP4
cd SpatialMP4
Build ffmpeg
first:
bash scripts/build_ffmpeg.sh
bash scripts/install_deps.sh
mkdir build && cd build
# Configure project
cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_PYTHON=OFF
# Build
make -j$(nproc) # On Linux
make -j$(sysctl -n hw.ncpu) # On macOS
Sometime it can be difficult to build a c++ source project. FAQ for installation may help you. If still not working, welcome submit a issue.
# Build with tests
cmake .. -DBUILD_TESTING=ON
make -j$(nproc) # On Linux
make -j$(sysctl -n hw.ncpu) # On macOS
# Run tests
cd ..
./build/test_reader
git clone https://github.com/Pico-Developer/SpatialMP4
cd SpatialMP4
Build ffmpeg
first:
bash scripts/build_ffmpeg.sh
bash scripts/install_deps.sh
pip3 install .
#include "spatialmp4/reader.h"
#include "spatialmp4/data_types.h"
// Create reader
SpatialML::Reader reader("path/to/your/spatial.mp4");
// Check data types
if (reader.HasRGB()) {
std::cout << "Contains RGB data" << std::endl;
}
if (reader.HasDepth()) {
std::cout << "Contains depth data" << std::endl;
}
if (reader.HasPose()) {
std::cout << "Contains pose data" << std::endl;
}
// Get camera parameters
auto rgb_intrinsics = reader.GetRgbIntrinsicsLeft();
auto depth_intrinsics = reader.GetDepthIntrinsics();
// Set reading mode
reader.SetReadMode(SpatialML::Reader::ReadMode::DEPTH_FIRST);
reader.Reset();
while (reader.HasNext()) {
SpatialML::rgb_frame rgb_frame;
SpatialML::depth_frame depth_frame;
// Read RGB and depth frames simultaneously
reader.Load(rgb_frame, depth_frame);
// Process data
cv::Mat left_rgb = rgb_frame.left_rgb;
cv::Mat right_rgb = rgb_frame.right_rgb;
cv::Mat depth = depth_frame.depth;
std::cout << "RGB timestamp: " << rgb_frame.timestamp << std::endl;
std::cout << "Depth timestamp: " << depth_frame.timestamp << std::endl;
}
reader.SetReadMode(SpatialML::Reader::ReadMode::RGB_ONLY);
reader.Reset();
while (reader.HasNext()) {
SpatialML::rgb_frame rgb_frame;
reader.Load(rgb_frame);
// Process RGB data
cv::imshow("Left RGB", rgb_frame.left_rgb);
cv::imshow("Right RGB", rgb_frame.right_rgb);
cv::waitKey(1);
}
// Get all pose data
auto pose_frames = reader.GetPoseFrames();
for (const auto& pose : pose_frames) {
// Convert to SE(3) representation
Sophus::SE3d se3_pose = pose.as_se3();
// Get rotation and translation
Eigen::Matrix3d rotation = se3_pose.rotationMatrix();
Eigen::Vector3d translation = se3_pose.translation();
std::cout << "Pose timestamp: " << pose.timestamp << std::endl;
std::cout << "Position: " << translation.transpose() << std::endl;
}
#include "utilities/RgbdUtils.h"
// Get camera parameters
auto rgb_intrinsics = reader.GetRgbIntrinsicsLeft().as_cvmat();
auto depth_intrinsics = reader.GetDepthIntrinsics().as_cvmat();
// Calculate transformation matrix
auto T_I_Srgb = reader.GetRgbExtrinsicsLeft().as_se3();
auto T_I_Stof = reader.GetDepthExtrinsics().as_se3();
auto T_Srgb_Stof = T_I_Srgb.inverse() * T_I_Stof;
// Project depth to RGB
cv::Mat projected_depth;
Utilities::ProjectDepthToRgb(depth_frame.depth, rgb_frame.left_rgb,
rgb_intrinsics, depth_intrinsics,
T_Srgb_Stof, projected_depth);
#include "utilities/PointcloudUtils.h"
// Generate point cloud from RGBD data
Utilities::Pointcloud pcd;
Utilities::RgbdToPointcloud(rgb_frame.left_rgb, projected_depth,
rgb_intrinsics, pcd, 10.0f);
// Save point cloud
Utilities::SavePointcloudToFile("output.obj", pcd);
Main SpatialMP4 file reader class.
Reader(const std::string& filename)
bool HasRGB() const; // Whether contains RGB data
bool HasDepth() const; // Whether contains depth data
bool HasPose() const; // Whether contains pose data
bool HasAudio() const; // Whether contains audio data
bool HasDisparity() const; // Whether contains disparity data
camera_intrinsics GetRgbIntrinsicsLeft() const; // Left RGB camera intrinsics
camera_intrinsics GetRgbIntrinsicsRight() const; // Right RGB camera intrinsics
camera_extrinsics GetRgbExtrinsicsLeft() const; // Left RGB camera extrinsics
camera_extrinsics GetRgbExtrinsicsRight() const; // Right RGB camera extrinsics
camera_intrinsics GetDepthIntrinsics() const; // Depth camera intrinsics
camera_extrinsics GetDepthExtrinsics() const; // Depth camera extrinsics
void SetReadMode(ReadMode mode); // Set reading mode
bool HasNext() const; // Whether has next frame
void Reset(); // Reset to beginning
int GetIndex() const; // Get current index
void Load(rgb_frame& rgb_frame); // Load RGB frame
void Load(depth_frame& depth_frame); // Load depth frame
void Load(rgb_frame& rgb_frame, depth_frame& depth_frame); // Load RGB and depth frames simultaneously
struct rgb_frame {
double timestamp; // Timestamp
cv::Mat left_rgb; // Left eye RGB image
cv::Mat right_rgb; // Right eye RGB image
pose_frame pose; // Corresponding pose data
};
struct depth_frame {
double timestamp; // Timestamp
cv::Mat depth; // Depth image
pose_frame pose; // Corresponding pose data
};
struct pose_frame {
double timestamp; // Timestamp
double x, y, z; // Position
double qw, qx, qy, qz; // Quaternion rotation
Sophus::SE3d as_se3() const; // Convert to SE(3) representation
};
import spatialmp4
# Create a reader
reader = spatialmp4.Reader("your_video.mp4")
# Check available streams
print("Has RGB:", reader.has_rgb())
print("Has Depth:", reader.has_depth())
print("Has Pose:", reader.has_pose())
# Set reading mode
reader.set_read_mode(spatialmp4.ReadMode.DEPTH_FIRST)
# Read frames
while reader.has_next():
rgb_frame, depth_frame = reader.load_both()
left_rgb = rgb_frame.left_rgb # numpy array (H, W, 3)
depth = depth_frame.depth # numpy array (H, W)
pose = rgb_frame.pose
print("RGB timestamp:", rgb_frame.timestamp, "Pose:", pose.x, pose.y, pose.z)
Main class for reading SpatialMP4 files.
Reader(filename: str)
β Create a new reader for the given file.has_rgb() -> bool
β Whether the file contains RGB data.has_depth() -> bool
β Whether the file contains depth data.has_pose() -> bool
β Whether the file contains pose data.has_audio() -> bool
β Whether the file contains audio data.has_disparity() -> bool
β Whether the file contains disparity data.get_duration() -> float
β Get video duration in seconds.get_rgb_fps() -> float
β Get RGB stream FPS.get_depth_fps() -> float
β Get depth stream FPS.get_rgb_width() -> int
β Get RGB frame width.get_rgb_height() -> int
β Get RGB frame height.get_depth_width() -> int
β Get depth frame width.get_depth_height() -> int
β Get depth frame height.get_rgb_intrinsics_left() -> CameraIntrinsics
β Get left RGB camera intrinsics.get_rgb_intrinsics_right() -> CameraIntrinsics
β Get right RGB camera intrinsics.get_rgb_extrinsics_left() -> CameraExtrinsics
β Get left RGB camera extrinsics.get_rgb_extrinsics_right() -> CameraExtrinsics
β Get right RGB camera extrinsics.get_depth_intrinsics() -> CameraIntrinsics
β Get depth camera intrinsics.get_depth_extrinsics() -> CameraExtrinsics
β Get depth camera extrinsics.get_pose_frames() -> List[PoseFrame]
β Get all pose frames.set_read_mode(mode: ReadMode)
β Set reading mode (see enums below).has_next() -> bool
β Whether there is a next frame.reset()
β Reset to the beginning of the file.get_index() -> int
β Get current frame index.get_frame_count() -> int
β Get total number of frames.load_rgb() -> RGBFrame
β Load the next RGB frame.load_depth(raw_head_pose: bool = False) -> DepthFrame
β Load the next depth frame.load_both() -> (RGBFrame, DepthFrame)
β Load the next RGB and depth frames simultaneously.load_rgbd(densify: bool = False) -> Rgbd
β Load RGBD data (for advanced use).
timestamp: float
β Frame timestamp.left_rgb: np.ndarray
β Left RGB image (H, W, 3, uint8).right_rgb: np.ndarray
β Right RGB image (H, W, 3, uint8).pose: PoseFrame
β Associated pose data.
timestamp: float
β Frame timestamp.depth: np.ndarray
β Depth image (H, W, float32, meters).pose: PoseFrame
β Associated pose data.
timestamp: float
β Pose timestamp.x, y, z: float
β Position.qw, qx, qy, qz: float
β Quaternion orientation.as_se3()
β Convert to SE(3) representation (requires Sophus/Eigen, advanced use).
fx, fy, cx, cy: float
β Camera intrinsic parameters.as_cvmat()
β Return as OpenCV matrix.
extrinsics: np.ndarray
β 4x4 extrinsic matrix.as_cvmat()
β Return as OpenCV matrix.as_se3()
β Return as SE(3) (advanced use).
RGB_ONLY
β Only read RGB frames.DEPTH_ONLY
β Only read depth frames.DEPTH_FIRST
β Read both RGB and depth frames, depth as reference.
UNKNOWN
β Unknown stream typeAUDIO
β Audio streamAUDIO_2
β Secondary audio streamRGB
β RGB video streamDISPARITY
β Disparity streamPOSE
β Pose data streamDEPTH
β Depth stream
- See examples/python/visualize_rerun.py and examples/python/generate_pcd.py for advanced usage, including point cloud generation and visualization with Open3D or Rerun.
- All image and depth data are returned as NumPy arrays for easy integration with OpenCV, Open3D, PyTorch, etc.
- Camera parameters and pose data can be used for 3D reconstruction and SLAM applications.
The project uses spdlog for logging:
#include <spdlog/spdlog.h>
// Set log level
spdlog::set_level(spdlog::level::debug);
// Debug information will be automatically output in the code
This project is licensed under the MIT. See the LICENSE file for details.
Issues and Pull Requests are welcome to improve this project!
For questions or suggestions, please contact us through GitHub Issues.