Skip to content

Commit 15db2f9

Browse files
committed
tei added
1 parent 5d6f8f2 commit 15db2f9

File tree

5 files changed

+69
-0
lines changed

5 files changed

+69
-0
lines changed
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
draft: false
3+
title: Tei fully managed open source service | OctaByte.io
4+
meta:
5+
cover: /images/applications/ai-gpu/tei/screenshot-1.png
6+
description: TEI is a high-performance inference engine for text embeddings, delivering ultra-fast results with dynamic batching, optimized transformers, and production-ready observability.
7+
keywords: text embeddings inference, TEI, fast NLP inference, Flash Attention, Candle, cuBLASLt, dynamic batching, Safetensors, OpenTelemetry, Prometheus, production-ready inference, open-source embeddings engine, BAAI bge-base-en-v1.5, NLP optimization, AI infrastructure
8+
breadcrumb:
9+
- name: Home
10+
url: /
11+
- name: Software Catalog
12+
url: /fully-managed-open-source-services
13+
- name: Applications
14+
url: /fully-managed-open-source-services/applications
15+
- name: AI/GPU
16+
url: /fully-managed-open-source-services/applications/ai-gpu
17+
- name: Tei
18+
url: /fully-managed-open-source-services/applications/ai-gpu/tei
19+
content:
20+
id: tei
21+
name: Tei
22+
title: Lightning-Fast Text Embeddings Inference for Production-Ready AI Applications
23+
logo: /images/applications/ai-gpu/tei/logo.png
24+
website: https://huggingface.co/docs/text-embeddings-inference/quick_tour
25+
iframe_website: https://huggingface.co/docs/text-embeddings-inference/quick_tour
26+
screenshots:
27+
- /images/applications/ai-gpu/tei/screenshot-1.png
28+
- /images/applications/ai-gpu/tei/screenshot-2.png
29+
---
30+
31+
## Overview
32+
33+
TEI (Text Embeddings Inference) is an open-source, blazing-fast inference solution designed specifically for generating text embeddings with unmatched performance and efficiency. Built for real-time and production environments, TEI benchmarks impressively on models like BAAI/bge-base-en-v1.5, achieving exceptional speeds on GPUs such as the Nvidia A10 with sequences up to 512 tokens.
34+
35+
Under the hood, TEI employs advanced technologies like Flash Attention, Candle, and cuBLASLt to power its inference engine. It dynamically adapts to workloads through token-based batching, reducing latency and maximizing GPU utilization. With support for Safetensors weight loading, TEI significantly improves initialization times, ensuring rapid deployment.
36+
37+
TEI is also engineered for scalable, observable, and production-grade usage. It includes built-in support for distributed tracing via OpenTelemetry and exposes Prometheus metrics for effortless monitoring. Whether you're building AI-powered applications, search engines, or NLP pipelines, TEI offers the speed, efficiency, and reliability needed to run large-scale inference workloads.
38+
39+
## Features
40+
41+
- ### Dynamic Batching
42+
43+
TEI utilizes token-based dynamic batching to intelligently manage GPU resources and minimize idle time, leading to improved inference throughput.
44+
45+
- ### Optimized Inference Engine
46+
47+
Built using cutting-edge components like Flash Attention, Candle, and cuBLASLt, TEI ensures highly optimized transformer model execution with minimal latency.
48+
49+
- ### Fast Model Loading with Safetensors
50+
51+
TEI supports Safetensors weight loading for dramatically faster startup times, enabling rapid scaling and reloading of models in production.
52+
53+
- ### Production-Grade Observability
54+
55+
Integrated with OpenTelemetry for distributed tracing and Prometheus for metrics export, TEI is fully equipped for monitoring and diagnostics in real-world deployments.
56+
57+
- ### High Performance on Modern GPUs
58+
59+
TEI is benchmarked on advanced GPUs like the Nvidia A10, delivering low-latency inference even with long sequences of up to 512 tokens.
60+
61+
- ### Open Source and Extensible
62+
63+
As an open-source solution, TEI is customizable and extendable to fit specific NLP workflows, making it an ideal choice for developers and ML engineers.

data/services.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1252,6 +1252,12 @@ catalog_list:
12521252
url: /applications/automation/docassemble
12531253
category: automation
12541254
description: Docassemble is a free, open-source system that automates guided interviews and document assembly, providing customized workflows and generating documents in multiple formats.
1255+
- id: tei
1256+
title: Tei
1257+
image: /images/applications/ai-gpu/tei/logo.png
1258+
url: /applications/ai-gpu/tei
1259+
category: ai-gpu
1260+
description: TEI is a high-performance inference engine for text embeddings, delivering ultra-fast results with dynamic batching, optimized transformers, and production-ready observability.
12551261
- id: ragflow
12561262
title: Ragflow
12571263
image: /images/applications/ai-gpu/ragflow/logo.png
4.83 KB
Loading
Loading
Loading

0 commit comments

Comments
 (0)