-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Project ideas for 2026
The ideas below are from 2025 and haven't been verified yet. Please wait for the official idea list for 2026.
Short description: Neural Language Model can work locally without the internet. You will write your own Chatbot desktop cross-platform application using OpenVINO and Electron (or analogs). The Chatbot may be general or crafted to your needs (subject to the NLP model).
Expected outcomes:
- Desktop Chat-Bot application works without the internet
- Project uses an NLP model
- Project uses OpenVINO in Electron environment
- Medium/OpenVINO blogs
Skills required/preferred: JavaScript, Electron, OpenVINO GenAI, Natural Language Processing
Mentors: Alicja Miłoszewska, Kirill Suvorov
Size of project: 175 hours
Difficulty: Medium
Short description: AI PCs incorporate multiple devices/inference engines for different machine-learning applications. Based on the performance, latency or power consumption requirements, an application may choose to use either NPU, GPU or a CPU for inference tasks. Usually, an application utilizes a single engine/device for the entire lifetime of the process/inference. The machine learning model being used by the application is compiled only for one device. However, it is important for the application to switch between different inference devices during runtime based on user preference, application behavior, and load/stress on the current device in use. Through this project, we want to build a face-detection application that continuously runs on the AI PC while switching between different inference devices during runtime based on user recommendations or evaluating the stress on the current engine. The inference should not end/pause while switching devices and should not lead to BSODs/System Hang/Device Crashes causing other applications to fail.
Expected outcomes:
- Implement low latency Face-Detection application to run on multiple devices/engines within AI PCs
- Utilize OpenVINO AUTO feature to demonstrate runtime switching between devices
- Create a GUI to prompt user to change the device during runtime based on user preference
- Analyze the device load and recommend user to switch to the most appropriate device to continue inference
Skills required/preferred: Python or C++, Basic ML knowledge
Mentors: Shivam Basia, Aishwarye Omer
Size of project: 175 hours
Difficulty: Easy
Short description: Visual Prompting is an advanced computer vision technique that enables object identification in images using reference examples, eliminating the need for labeled training data (zero-shot learning). This approach leverages powerful foundational models such as DINOv2 for feature extraction and Segment Anything Model (SAM) for precise object segmentation. The process involves matching object features across images to locate instances of the target object in unseen data. SAM is then used to generate segmentation masks. However, these masks often contain false positives or incomplete segmentations. To improve accuracy, filtering and merging techniques must be applied. Existing solutions are often dataset-specific, limiting their generalizability. The goal of this project is to develop a more effective and generalizable approach for refining segmentation masks across diverse datasets. The student will experiment with existing methods, evaluate different refinement strategies, and explore novel techniques to improve segmentation robustness.
Expected outcomes:
- A robust pipeline for refining object masks that generalize across datasets.
- A benchmarking framework to evaluate different segmentation refinement strategies.
- Potential integration with existing open-source vision repositories.
Skills required/preferred:
- Proficiency in Python and experience with deep learning frameworks like PyTorch.
- Strong understanding of deep learning, computer vision, and self-supervised learning.
- Familiarity with foundational vision models like DINOv2, SAM, and CLIP (optional but beneficial).
Mentors: Daan Krol, Klaas Dijkstra, Samet Akcay
Size of project: 350 hours (175 hours possible)
Difficulty: Medium
9. Interactive Multimodal Data Explorer: Leveraging Foundation Models for Dataset Exploration and Cleaning with Datumaro
Short description: Large language-vision models and other multimodal models (CLIP, LLaVa, PaLM, GPT-4V) generate embeddings for their included modalities which are aligned during training. These embeddings have proven valuable for downstream tasks like detection and classification. This project aims to enhance Datumaro - an efficient dataset management library - by building interactive visualization tools for exploring these joint embeddings. Users will be able to navigate (pan, zoom, filter) the embedding space to gain insights into dataset structure and relationships between modalities. The project will leverage either modern web frameworks (React/Vue.js) or ML-specific frameworks (Streamlit/Gradio) to create an intuitive interface for data exploration, with additional functionality for basic annotation operations to identify and tag noisy or corrupt data. The Datumaro toolkit will handle core dataset operations while OTX will be used for feature computation.
Expected outcomes:
- Interactive web application integrated with Datumaro for dataset visualization and management
- Real-time exploration of joint embedding spaces through 3D visualization
- Integration with foundation models for embedding generation
- Basic annotation interface for data cleaning and tagging
- Documentation and example workflows within the Datumaro ecosystem
Skills required/preferred:
- Python programming with experience in ML/DL concepts
- Web development skills (React/Vue.js OR Streamlit/Gradio OR HTML/Javascript/D3)
- Experience with data visualization libraries (D3.js, Plotly, etc.)
- Understanding of embedding spaces and dimensional reduction
- Familiarity with dataset management concepts
- Interest in learning Datumaro's architecture and capabilities
Mentors: Laurens Hogeweg, Samet Akcay
Size of project: 350 hours
Difficulty: Medium
11. Leveraging Large Foundation Models (LFMs) for Automated Annotation and Edge-Deployable Model Training with Human-In-Loop
Short description: The annotation process for large-scale datasets, particularly for tasks such as classification, object detection, and segmentation, is time-consuming and labor-intensive. Large Foundation Models (LFMs) offer powerful capabilities in generating annotations automatically, significantly reducing human effort. However, these models are often too large and computationally expensive for real-time edge deployment. This project aims to develop a framework that leverages LFMs for annotation while progressively distilling their knowledge into smaller, edge-deployable models. The system will also incorporate uncertainty estimation and active learning to ensure high-quality labels with minimal human intervention.
Example flow: The first step involves leveraging LFMs to generate initial annotations for large datasets for a specific task (classification, object detection, or segmentation). To enhance reliability, predictions from LFMs can be combined with additional weak supervision sources or human in loop. Active learning techniques will be incorporated to prioritize uncertain or highly informative samples for human verification, ensuring efficient use of annotation effort. The system will track human corrections and use them to improve future annotation reliability by iteratively refining the uncertainty estimation models. Additionally, as the annotation loop progresses, the smaller model will be fine-tuned on corrected annotations, gradually improving its accuracy with minimal human intervention. Over time, the smaller model can take over the annotation process, reducing dependency on expensive LFMs while maintaining high-quality labels.
Expected outcomes:
- A scalable framework that automates annotation using LFMs while minimizing human effort.
- A HITL framework where uncertain annotations are flagged for human verification. A simple UI can be useful for this, however, for experimentation, the annotations can also be corrected using ground truth (No UI required).
- A robust uncertainty estimation and active learning pipeline to improve annotation quality.
- Training pipelines for edge-deployable models in an active learning environment.
- A benchmarking study comparing manual annotation, LFM-only annotation, and the proposed AL-HITL approach in terms of accuracy, annotation speed, and human effort reduction.
- Open-source repo, including example scripts (and possibly datasets) annotated using the system.
Skills required/preferred:
- Experience with deep learning frameworks such as PyTorch and Transformers.
- Familiarity with Large Foundation Models (LFMs) and Active Learning techniques.
- Understanding of uncertainty estimation in AI models
- (Optional) Experience in developing simple web-based UI applications for interactive dataset annotation (e.g., using Gradio or Streamlit), leveraging existing work where applicable.
Mentors: Rajesh Gangireddy, Samet Akcay
Size of project: 350 Hours
Difficulty: Medium
13. Implement XLA plugin to run OpenVINO applications on any XLA supported devices (NVIDIA GPUs, FPGA, TPUs)
Short description: Need to implement new OpenVINO plugin for which openvino.compile will convert OpenVINO IR into XLA representation (using HLO or MLIR dialect) and inference request will run the compiled blob on any XLA backend (GPUs, FPGA, TPUs). This feature will allow to infer models on CUDA GPUs, TPUs, FPGA devices using OpenVINO API.
Expected outcomes: It should be new OpenVINO plugin with "XLA" name that is able to infer basic CNN and transformer models on NVidia GPUs, Google TPUs, etc.
Skills required/preferred: well familiar with AI frameworks and tensor operations, understanding and experience with HLO/MLIR dialect, XLA C++ API.
Mentors: Roman Kazantsev, Maxim Vafin, Anastasia Popova, Andrei Kochin
Size of project: 350 hours
Difficulty: Hard
Short description: Large Language Models (LLMs) require significant computational resources: memory, compute and power for efficient inference. The Neural Network Compression Framework (NNCF) enables model optimization via quantization and weight compression techniques, reducing memory and compute requirements. However, inference these compressed models efficiently across CPU, GPU and other hardware demands optimized execution kernels. Triton can solve this because it allows writing a kernel once and achieving portable and efficient execution on multiple different hardware platforms. This project aims to accelerate the inference performance of NNCF-compressed LLMs by leveraging "low-bit matmul" Triton kernels and providing capability to customize kernels for researching new compression algorithms. In this project, available open-source implementations of "low-bit matmul" Triton kernels such as: GemLite, Tinygemm, Marlin, Triton AutoGPTQ, etc., will be considered. Taking into account of the open-source implementations, the "low-bit matmul" Triton kernels will be implemented to support NNCF weight compression types: INT8, INT4, NF4, FP4 and dynamic INT8 group quantization. It will also be necessary to implement torch.compile compatibility, as one of the solutions is to use custom_op for calling the kernels.
Expected outcomes:
- Efficient inference of NNCF-compressed LLMs using Triton.
- Performance benchmarks demonstrating acceleration improvements.
- Pull request with the implementation of "low-bit matmul" Triton kernels in NNCF.
Skills required/preferred:
- Python
- Understanding LLMs
- Understanding model optimization techniques (NNCF)
- Experience in writing Triton or CUDA (Optional) kernels
- Experience with PyTorch, torch.compile
- Performance Profiling
Mentors: Alexander Suslov, Alexander Dokuchaev
Size of project: 350 hours
Difficulty: Medium to hard
Short description: Tracking the objects in a video stream is an important use case. It combines an object detection model with a tracking algorithm that analyzes a whole sequence of images. The current state-of-the-art algorithm is ByteTrack.
The goal of the project is to implement the ByteTrack algorithm as a MediaPipe graph that could delegate inference execution to the OpenVINO inference calculator. This graph could be deployed in the OpenVINO Model Server and deployed for serving. A sample application adopting KServer API would send the stream of images and would get the information about the tracked objects in the stream.
Expected outcomes: MediaPipe graphs with the calculator implementation for ByteTrack algorithm with yolo models.
Skills required/preferred: C++ (for writing calculator), Python(for writing client) MediaPipe
Mentors: Dariusz Trawinski, Damian Kalinowski
Size of project: 175 hours
Difficulty: Medium
Short description: OpenVINO Test Drive allows users to quickly and easily test a variety of GenAI models like image generation, whisper and text generation. Although text generation models do a good job of answering questions, they often lack crucial domain-specific knowledge. RAG (retrieval-augmented generation) aims to solve this issue by allowing the user to supply this knowledge to the model. OpenVINO Test Drive has a basic RAG implementation. However, the quality of the output can be greatly improved by preprocessing documents. This project will explore and develop a more sophisticated RAG implementation to be implemented into the OpenVINO Test Drive.
Expected outcomes: Implementation of a more sophisticated RAG into OpenVINO Test Drive.
Skills required/preferred: Dart, Flutter, Langchain, OpenVINO
Mentors: Ronald Hecker, Arend Jan Kramer
Size of project: 175 hours
Difficulty: Medium
© Copyright 2018-2024, OpenVINO team
- Home
- General resources
- How to build
-
Developer documentation
- Inference Engine architecture
- CPU plugin
- GPU plugin
- HETERO plugin architecture
- Snippets
- Sample for IE C++/C/Python API
- Proxy plugin (Concept)
- Tests