Skip to content
View rsomani95's full-sized avatar

Organizations

@Synopsis

Block or report rsomani95

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,081 1,374 Updated Mar 3, 2025

On-device Speech Recognition for Apple Silicon

Swift 4,434 375 Updated Feb 22, 2025

TheBoringNotch: Not so boring notch That Rocks 🎸🎶

Swift 2,872 165 Updated Mar 26, 2025

The smartest way to learn touch typing and improve your typing speed.

TypeScript 2,628 247 Updated Mar 26, 2025

Simple image captioning model

Jupyter Notebook 1,349 223 Updated Jun 9, 2024

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Python 192 21 Updated Jan 28, 2024

Schedule-Free Optimization in PyTorch

Python 2,123 72 Updated Mar 24, 2025

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,795 170 Updated Jan 22, 2025

PyTorch implementation of the InfoNCE loss for self-supervised learning.

Python 529 42 Updated Nov 17, 2023

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 773 40 Updated Aug 13, 2024

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 927 126 Updated Apr 12, 2024

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition

Python 289 23 Updated Sep 17, 2023

Django Channels based WebSocket GraphQL server with Graphene-like subscriptions

Python 281 87 Updated Jul 19, 2024

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 795 54 Updated Mar 25, 2024

Framework for benchmarking fully-managed vector databases

Python 1 2 Updated Feb 2, 2024

Perceptual video quality assessment based on multi-method fusion.

Python 4,843 773 Updated Mar 13, 2025

Highly commented implementations of Transformers in PyTorch

Python 132 8 Updated Aug 2, 2023

Applying the latest advancements in AI and machine learning to solve complex business problems.

Python 77 31 Updated Mar 13, 2024

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,442 427 Updated May 29, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,757 105 Updated Feb 27, 2025

Incredibly descriptive audiovisual summaries for videos

Python 40 2 Updated Aug 2, 2024

tiny vision language model

Python 7,696 593 Updated Mar 27, 2025

Automatically optimize SQL queries in Graphene-Django schemas.

Python 18 8 Updated Mar 24, 2025

A family of lightweight multimodal models.

Python 1,006 75 Updated Nov 18, 2024

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 1,022 70 Updated Oct 6, 2024

Learning audio concepts from natural language supervision

Python 540 41 Updated Sep 18, 2024

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 3,805 316 Updated Jan 8, 2025
Jupyter Notebook 8,243 589 Updated Jun 16, 2024

VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Cap…

78 3 Updated Dec 5, 2022

A language for constraint-guided and efficient LLM programming.

Python 3,872 206 Updated Jun 3, 2024
Next
Showing results