multi-modal

Here are 318 public repositories matching this topic...

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Updated Oct 22, 2024
Python

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Updated Nov 7, 2024
Python

modelscope / modelscope

Star

ModelScope: bring the notion of Model-as-a-Service to life.

python nlp science machine-learning deep-learning cv speech multi-modal

Updated Nov 8, 2024
Python

THUDM / CogVLM

Star

a state-of-the-art-level open visual language model | 多模态预训练模型

pretrained-models language-model multi-modal cross-modality visual-language-models

Updated May 29, 2024
Python

OpenGVLab / InternVL

Star

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated Oct 29, 2024
Python

lucidrains / DALLE-pytorch

Star

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

deep-learning transformers artificial-intelligence multi-modal attention-mechanism text-to-image

Updated Feb 17, 2024
Python

modelscope / agentscope

Star

Start building LLM-empowered multi-agent applications in an easier way.

agent drag-and-drop chatbot multi-agent multi-modal distributed-agents gpt-4 large-language-models llm llm-agent llama3 gpt-4o

Updated Nov 8, 2024
Python

marqo-ai / marqo

Star

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Updated Nov 9, 2024
Python

valhalla / valhalla

Star

Open Source Routing Engine for OpenStreetMap

directions openstreetmap routing astar traveling-salesman dijkstra routing-engine isochrones multi-modal tiled

Updated Nov 8, 2024
C++

OFA-Sys / Chinese-CLIP

Star

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

nlp computer-vision deep-learning transformers pytorch chinese pretrained-models multi-modal clip coreml-models contrastive-loss vision-language multi-modal-learning image-text-retrieval vision-and-language-pre-training

Updated Aug 6, 2024
Python

THUDM / VisualGLM-6B

Star

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

gpt multi-modal chatglm-6b

Updated Aug 23, 2024
Python

zjunlp / DeepKE

Star

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Updated Nov 3, 2024
Python

docarray / docarray

Star

Represent, send, store and search multimodal data

elasticsearch machine-learning deep-learning protobuf pytorch data-structures nearest-neighbor-search cross-modal multi-modal semantic-search multimodal nested-data weaviate dataclass pydantic fastapi neural-search qdrant docarray

Updated Oct 1, 2024
Python

PKU-YuanGroup / Video-LLaVA

Star

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

multi-modal instruction-tuning large-vision-language-model

Updated Sep 25, 2024
Python

modelscope / data-juicer

Star

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！