octabyte-io
diff --git a/‎content/fully-managed-open-source-services/applications/ai-gpu/tei.md
Lines changed: 63 additions & 0 deletions b/‎content/fully-managed-open-source-services/applications/ai-gpu/tei.md
Lines changed: 63 additions & 0 deletions
diff --git a/‎data/services.yaml
Lines changed: 6 additions & 0 deletions b/‎data/services.yaml
Lines changed: 6 additions & 0 deletions
diff --git a/‎static/images/applications/ai-gpu/tei/logo.png
4.83 KB b/‎static/images/applications/ai-gpu/tei/logo.png
4.83 KB
diff --git a/‎static/images/applications/ai-gpu/tei/screenshot-1.png
3.72 KB b/‎static/images/applications/ai-gpu/tei/screenshot-1.png
3.72 KB
diff --git a/‎static/images/applications/ai-gpu/tei/screenshot-2.png
3.4 KB b/‎static/images/applications/ai-gpu/tei/screenshot-2.png
3.4 KB
@@ -0,0 +1,63 @@
+---
+draft: false
+title: Tei fully managed open source service | OctaByte.io
+meta:
+  cover: /images/applications/ai-gpu/tei/screenshot-1.png
+  description: TEI is a high-performance inference engine for text embeddings, delivering ultra-fast results with dynamic batching, optimized transformers, and production-ready observability.
+  keywords: text embeddings inference, TEI, fast NLP inference, Flash Attention, Candle, cuBLASLt, dynamic batching, Safetensors, OpenTelemetry, Prometheus, production-ready inference, open-source embeddings engine, BAAI bge-base-en-v1.5, NLP optimization, AI infrastructure
+  breadcrumb:
+    - name: Home
+      url: /
+    - name: Software Catalog
+      url: /fully-managed-open-source-services
+    - name: Applications
+      url: /fully-managed-open-source-services/applications
+    - name: AI/GPU
+      url: /fully-managed-open-source-services/applications/ai-gpu
+    - name: Tei
+      url: /fully-managed-open-source-services/applications/ai-gpu/tei
+content:
+  id: tei
+  name: Tei
+  title: Lightning-Fast Text Embeddings Inference for Production-Ready AI Applications
+  logo: /images/applications/ai-gpu/tei/logo.png
+  website: https://huggingface.co/docs/text-embeddings-inference/quick_tour
+  iframe_website: https://huggingface.co/docs/text-embeddings-inference/quick_tour
+  screenshots:
+    - /images/applications/ai-gpu/tei/screenshot-1.png
+    - /images/applications/ai-gpu/tei/screenshot-2.png
+---
+
+## Overview
+
+TEI (Text Embeddings Inference) is an open-source, blazing-fast inference solution designed specifically for generating text embeddings with unmatched performance and efficiency. Built for real-time and production environments, TEI benchmarks impressively on models like BAAI/bge-base-en-v1.5, achieving exceptional speeds on GPUs such as the Nvidia A10 with sequences up to 512 tokens.
+
+Under the hood, TEI employs advanced technologies like Flash Attention, Candle, and cuBLASLt to power its inference engine. It dynamically adapts to workloads through token-based batching, reducing latency and maximizing GPU utilization. With support for Safetensors weight loading, TEI significantly improves initialization times, ensuring rapid deployment.
+
+TEI is also engineered for scalable, observable, and production-grade usage. It includes built-in support for distributed tracing via OpenTelemetry and exposes Prometheus metrics for effortless monitoring. Whether you're building AI-powered applications, search engines, or NLP pipelines, TEI offers the speed, efficiency, and reliability needed to run large-scale inference workloads.
+
+## Features
+
+- ### Dynamic Batching
+
+  TEI utilizes token-based dynamic batching to intelligently manage GPU resources and minimize idle time, leading to improved inference throughput.
+
+- ### Optimized Inference Engine
+
+  Built using cutting-edge components like Flash Attention, Candle, and cuBLASLt, TEI ensures highly optimized transformer model execution with minimal latency.
+
+- ### Fast Model Loading with Safetensors
+
+  TEI supports Safetensors weight loading for dramatically faster startup times, enabling rapid scaling and reloading of models in production.
+
+- ### Production-Grade Observability
+
+  Integrated with OpenTelemetry for distributed tracing and Prometheus for metrics export, TEI is fully equipped for monitoring and diagnostics in real-world deployments.
+
+- ### High Performance on Modern GPUs
+
+  TEI is benchmarked on advanced GPUs like the Nvidia A10, delivering low-latency inference even with long sequences of up to 512 tokens.
+
+- ### Open Source and Extensible
+
+  As an open-source solution, TEI is customizable and extendable to fit specific NLP workflows, making it an ideal choice for developers and ML engineers.
@@ -1252,6 +1252,12 @@ catalog_list:
         url: /applications/automation/docassemble
         category: automation
         description: Docassemble is a free, open-source system that automates guided interviews and document assembly, providing customized workflows and generating documents in multiple formats.
+      - id: tei
+        title: Tei
+        image: /images/applications/ai-gpu/tei/logo.png
+        url: /applications/ai-gpu/tei
+        category: ai-gpu
+        description: TEI is a high-performance inference engine for text embeddings, delivering ultra-fast results with dynamic batching, optimized transformers, and production-ready observability.
       - id: ragflow
         title: Ragflow
         image: /images/applications/ai-gpu/ragflow/logo.png