Simple and unified interface to zero-shot computer vision models curated for robotics use cases.
Check out our HuggingFace space for an online demo or try pollen-vision in a Colab notebook!
Perform zero-shot object detection and segmentation on a live video stream from your webcam with the following code:
import cv2
from pollen_vision.vision_models.object_detection import OwlVitWrapper
from pollen_vision.vision_models.object_segmentation import MobileSamWrapper
from pollen_vision.utils import Annotator, get_bboxes
owl = OwlVitWrapper()
sam = MobileSamWrapper()
annotator = Annotator()
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
predictions = owl.infer(
frame, ["paper cups"]
) # zero-shot object detection | put your classes here
bboxes = get_bboxes(predictions)
masks = sam.infer(frame, bboxes=bboxes) # zero-shot object segmentation
annotated_frame = annotator.annotate(frame, predictions, masks=masks)
cv2.imshow("frame", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
cv2.destroyAllWindows()
break
Supported models
We continue to work on adding new models that could be useful for robotics perception applications.
We chose to focus on zero-shot models to make it easier to use and deploy. Zero-shot models can recognize objects or segment them based on text queries, without needing to be fine-tuned on annotated datasets.
Right now, we support:
Yolo-World
for zero-shot object detection and localizationOwl-Vit
for zero-shot object detection and localizationRecognize-Anything
for zero-shot object detection (without localization)
Mobile-SAM
for (fast) zero-shot object segmentation
Depth Anything
for (non metric) monocular depth estimation
Below is an example of combining Owl-Vit
and Mobile-Sam
to detect and segment objects in a point cloud, all live.
(Note: in this example, there is no temporal or spatial filtering of any kind, we display the raw outputs of the models computed independently on each frame)
pc_segmentation_doc3-2024-02-26_17.07.20.mp4
We also provide wrappers for the Luxonis cameras which we use internally. They allow to easily access the main features that are interesting to our robotics applications (RBG-D, onboard h264 encoding and onboard stereo rectification).
Installation
Note: This package has been tested on Ubuntu 22.04 and macOS (with M1 Pro processor), with python3.10.
This repository uses Git LFS to store large files. You need to install it before cloning the repository.
sudo apt-get install git-lfs
brew install git-lfs
You can install the package directly from the repository without having to clone it first with:
pip install "pollen-vision[vision] @ git+https://github.com/pollen-robotics/pollen-vision.git@main"
Note: here we install the package with the
vision
extra, which includes the vision models. You can also install thedepthai_wrapper
extra to use the Luxonis depthai wrappers.
Clone this repository and then install the package either in "production" mode or "dev" mode.
👉 We recommend using a virtual environment to avoid conflicts with other packages.
After cloning the repository, you can either install everything with:
pip install .[all]
or install only the modules you want:
pip install .[depthai_wrapper]
pip install .[vision]
To add "dev" mode dependencies (CI/CD, testing, etc):
pip install -e .[dev]
If this is the first time you use luxonis cameras on this computer, you need to setup the udev rules:
echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
Gradio demo
A gradio demo is available on Pollen Robotics' Huggingface space. It allows to test the models on your own images without having to install anything.
If you want to run the demo locally, you can install the dependencies with the following command:
pip install pollen_vision[gradio]
You can then run the demo locally on your machine with:
python pollen-vision/gradio/app.py