rsomani95

Follow

Rahul Somani rsomani95

Follow

61 followers · 34 following

Achievements

Achievements

Organizations

Stars

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,081 1,374 Updated Mar 3, 2025

argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon

Swift 4,434 375 Updated Feb 22, 2025

TheBoredTeam / boring.notch

TheBoringNotch: Not so boring notch That Rocks 🎸🎶

Swift 2,872 165 Updated Mar 26, 2025

aradzie / keybr.com

The smartest way to learn touch typing and improve your typing speed.

TypeScript 2,628 247 Updated Mar 26, 2025

rmokady / CLIP_prefix_caption

Simple image captioning model

Jupyter Notebook 1,349 223 Updated Jun 9, 2024

DavidHuji / CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Python 192 21 Updated Jan 28, 2024

facebookresearch / schedule_free

Schedule-Free Optimization in PyTorch

Python 2,123 72 Updated Mar 24, 2025

InternLM / InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,795 170 Updated Jan 22, 2025

RElbers / info-nce-pytorch

PyTorch implementation of the InfoNCE loss for self-supervised learning.

Python 529 42 Updated Nov 17, 2023

beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 773 40 Updated Aug 13, 2024

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 927 126 Updated Apr 12, 2024

taoyang1122 / adapt-image-models

Forked from amazon-science/adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition

Python 289 23 Updated Sep 17, 2023

datadvance / DjangoChannelsGraphqlWs

Django Channels based WebSocket GraphQL server with Graphene-like subscriptions

Python 281 87 Updated Jul 19, 2024

PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 795 54 Updated Mar 25, 2024

silver-ymz / vector-db-benchmark

Forked from myscale/vector-db-benchmark

Framework for benchmarking fully-managed vector databases

Python 1 2 Updated Feb 2, 2024

Netflix / vmaf

Perceptual video quality assessment based on multi-method fusion.

Python 4,843 773 Updated Mar 13, 2025

warner-benjamin / commented-transformers

Highly commented implementations of Transformers in PyTorch

Python 132 8 Updated Aug 2, 2023

prolego-team / neo-sophia

Applying the latest advancements in AI and machine learning to solve complex business problems.

Python 77 31 Updated Mar 13, 2024

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,442 427 Updated May 29, 2024

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,757 105 Updated Feb 27, 2025

sieve-community / describe

Incredibly descriptive audiovisual summaries for videos

Python 40 2 Updated Aug 2, 2024

vikhyat / moondream

tiny vision language model

Python 7,696 593 Updated Mar 27, 2025

MrThearMan / graphene-django-query-optimizer

Automatically optimize SQL queries in Graphene-Django schemas.

Python 18 8 Updated Mar 24, 2025

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Python 1,006 75 Updated Nov 18, 2024

OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 1,022 70 Updated Oct 6, 2024

microsoft / CLAP

Learning audio concepts from natural language supervision

Python 540 41 Updated Sep 18, 2024

huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 3,805 316 Updated Jan 8, 2025

Vaibhavs10 / insanely-fast-whisper

Jupyter Notebook 8,243 589 Updated Jun 16, 2024

google-research-datasets / videoCC-data

VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Cap…

78 3 Updated Dec 5, 2022

eth-sri / lmql

A language for constraint-guided and efficient LLM programming.

Python 3,872 206 Updated Jun 3, 2024