From ff33f8adc4ec6c7d0871f6005ab8af71342f95ec Mon Sep 17 00:00:00 2001 From: Mo King Date: Mon, 14 Apr 2025 15:45:19 -0400 Subject: [PATCH 01/11] Initial commit --- docs/tetra/overview.md | 204 +++++++++++++++++++++++++ docs/tetra/quickstart.md | 314 +++++++++++++++++++++++++++++++++++++++ sidebars.js | 11 ++ 3 files changed, 529 insertions(+) create mode 100644 docs/tetra/overview.md create mode 100644 docs/tetra/quickstart.md diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md new file mode 100644 index 00000000..0f9cc174 --- /dev/null +++ b/docs/tetra/overview.md @@ -0,0 +1,204 @@ +--- +title: Overview +description: "" +sidebar_position: 1 +--- + +# Tetra overview + +Tetra is a Python SDK that streamlines the development and deployment of multi-model AI workflows on RunPod's [Serverless](/serverless/overview) infrastructure. It provides an abstraction layer that enables you to define, execute, and monitor sophisticated AI pipelines through a declarative interface, eliminating infrastructure overhead. + +## Why use Tetra? + +* **Simplified workflow development**: Define AI pipelines in pure Python with minimal configuration, focusing on your logic rather than infrastructure details. +* **Optimized resource utilization**: Specify hardware requirements at the function level for precise control over GPU and CPU allocation. +* **Seamless deployment**: Automatically handle the RunPod Serverless infrastructure setup, worker communication, and data transfer. +* **Reduced development overhead**: Skip the tedious process of writing application code, building Docker containers, and managing endpoints for each worker. +* **Intuitive programming model**: Use familiar Python decorators to mark functions for remote execution. + +## Key concepts + +### Resource configurations + +Tetra allows explicit specification of hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over: + +* GPU/CPU allocation +* Worker scaling limits +* Template selection +* Memory requirements + +CPU example: + +```python +from tetra import ServerlessResource + +# Configure a GPU endpoint +gpu_config = ServerlessResource( + templateId="abc123", # GPU template ID + gpuIds="any", + workersMax=5, # Scale up to 5 workers + name="parallel-processor" # Name of the endpoint that will be created or used +) + +cpu_resource = ServerlessResource( + templateId="def456", # CPU template ID + workersMax=1 + name="data-processor", # Name of the endpoint that will be created or used +) +``` + +### Remote functions + +Remote functions are the building blocks of Tetra workflows. Simply mark any Python function with the `@remote` decorator to designate it for execution on RunPod's infrastructure: + +```python +from tetra import remote + +@remote( + resource_config=gpu_config, # Uses the GPU config defined in the previous section +) +def process_image(image_data): + + # Code you add here will be run remotely using RunPod infrastructure + + return results +``` + +### Passing data between RunPod and your local machine + +Tetra makes it easy to pass data between your local environment and RunPod's infrastructure. The remote function can accept any serializable Python objects as input and return them as output: + +```python +async def main(): + # Code you add here will be run locally, allowing you to pass data between RunPod and your local machine. + + print("Processing image...") + result = await process_image(image) # This function will run remotely, using an image passed in from your local machine + +if __name__ == "__main__": + asyncio.run(main()) +``` + +### Dependencies + +Specify required Python libraries directly in the `@remote` decorator, and Tetra ensures they're available in your execution environment: + +```python +@remote( + resource_config=gpu_resource, + dependencies=["torch", "transformers", "pillow"] +) +def model_inference(data): + # Libraries are automatically installed + from transformers import AutoModel + import torch + from PIL import Image + # ... +``` + +### Asynchronous execution + +Tetra workflows run asynchronously, making it easy to manage complex pipelines: + +```python +@remote(...) +def preprocess_data(raw_data): + ... + +@remote(...) +def model_inference(preprocessed): + ... + +@remote(...) +def process_chunk(data): + ... + +async def main(): + # Run remote functions in sequence + preprocessed = await preprocess_data(raw_data) + result = await model_inference(preprocessed) + + # Or run them in parallel + results = await asyncio.gather( + process_chunk(data1), + process_chunk(data2), + process_chunk(data3) + ) +``` + +## How Tetra works + +When you execute a Tetra workflow: + +1. The `@remote` decorator identifies functions designated for remote execution. +2. Tetra analyzes the dependencies between functions to determine execution order. +3. For each remote function: + - Tetra provisions the appropriate resources on RunPod. + - Input data is serialized and transferred to the remote worker. + - The function executes on the remote infrastructure. + - Results are returned to your local environment. +4. Data flows between functions according to your workflow definition. + +A diagram showing the Tetra workflow execution process + +## Common use cases + +* **Multi-modal AI pipelines**: Combine text, image, and audio models in unified workflows. +* **Distributed model training**: Scale model training across multiple GPU workers. +* **AI research experimentation**: Quickly prototype and test complex model combinations. +* **Production inference systems**: Deploy sophisticated, multi-stage inference pipelines. +* **Data processing workflows**: Process large datasets using distributed resources. + +## Tetra vs. traditional RunPod Serverless + +| Aspect | Traditional Serverless | Tetra | +|--------|------------------------|-------| +| Development Process | Write handler, build Docker, create endpoint | Write Python functions with decorators | +| Infrastructure Management | Manual endpoint configuration | Automatic resource provisioning | +| Worker Communication | Manage manually | Automatic data transfer between functions | +| Development Overhead | High (Docker, endpoints per worker) | Low (pure Python development) | +| Deployment Speed | Multiple manual steps | Automatic deployment from code | +| Resource Control | Endpoint-level | Function-level granularity | + +## Get started with Tetra + +Getting started with Tetra is straightforward: + +1. Install Tetra and set up your environment: + ```bash + pip install tetra + ``` + +2. Create your first remote function: + ```python + from tetra import remote, ServerlessResource + + resource = ServerlessResource( + templateId="your_template_id", + name="my-first-worker" + ) + + @remote(resource_config=resource) + def hello_tetra(name): + return f"Hello, {name} from Tetra!" + + async def main(): + result = await hello_tetra("World") + print(result) + + if __name__ == "__main__": + import asyncio + asyncio.run(main()) + ``` + +3. Run your code and watch Tetra handle the infrastructure automatically. + +## Next steps + +Ready to streamline your AI workflow development with Tetra? + +- [Follow the quickstart guide to deploy your first workflow.](/tetra/quickstart) +- [Learn about advanced Tetra configurations and resource optimization.](/tetra/advanced-config) +- [Explore example workflows for common AI use cases.](/tetra/examples) +- [Read the complete API reference.](/docs/tetra/api-reference) +- [Join the Tetra community and get support.](/docs/tetra/community) diff --git a/docs/tetra/quickstart.md b/docs/tetra/quickstart.md new file mode 100644 index 00000000..b0d21994 --- /dev/null +++ b/docs/tetra/quickstart.md @@ -0,0 +1,314 @@ +--- +title: Quickstart +description: "" +sidebar_position: 2 +--- + +# Tetra quickstart + +Learn how to set up your Tetra development environment to seamlessly run AI workloads using [RunPod Serverless](/serverless/overview) resources. + +Tetra is a Python SDK that simplifies the deployment AI workflows on RunPod by automating infrastructure management and worker communication. It lets you run code using RunPod compute resources without needing to open the RunPod web—just run your code locally and Tetra takes care of the rest. + +## What you'll learn + +In this tutorial you'll learn how to: + +- Set up your development environment for Tetra. +- Create and define remote functions with the `@remote` decorator. +- Deploy a GPU-based Tetra workload using RunPod resources. +- Pass data between your local environment and remote workers. +- Understand how Tetra manages remote execution. + +## Requirements + +- You've [created a RunPod account](/get-started/manage-accounts). +- You've created a [RunPod API key](/get-started/api-keys). +- You've installed [Python 3.9 - 3.12](https://www.python.org/downloads/) and [Poetry](https://python-poetry.org/) (for dependency management). + +:::note + +If you have a later version of Python installed, you can use [penv](https://github.com/pyenv/pyenv) to switch to an earlier one. + +::: + +## Step 1: Install Tetra + +First, let's install Tetra and set up your virtual environment: + +1. Run this command to clone the Tetra repository: + ```bash + git clone tetra-rp && cd tetra-rp + ``` + +2. Install dependencies with poetry: + ```bash + poetry install + ``` + +3. Activate the virtual environment: + ```bash + $(poetry env activate) + ``` + +## Step 2: Add your API key to the environment + +You'll need to add your [RunPod API key](/get-started/api-keys) to your development environment before you can use Tetra to run your workloads. + +Run this command to create a `.env` file in your project root, replacing [YOUR_API_KEY] with your API key: + +```bash +touch .env && echo "RUNPOD_API_KEY=[YOUR_API_KEY]" > .env +``` + +## Step 3: Create your project file + +Now you can start building your Tetra project. Create a new file called `matrix_gpu_example.py` in the same folder as the `.env` file you just created, and open it in your code editor. You'll build this file step-by-step. + +## Step 4: Add imports and load .env file + +First, add the necessary import statements: + +```python +import asyncio +from dotenv import load_dotenv +from tetra import remote, ServerlessResource + +# Load environment variables from .env file +load_dotenv() +``` + +This imports: + +- `asyncio`: Python's asynchronous programming library, which Tetra uses for non-blocking execution. +- `dotenv`: Helps load environment variables from your `.env` file, including your RunPod API key. +- `remote` and `ServerlessResource`: The core Tetra components you'll use to define remote functions and their resource requirements. + +The `load_dotenv()` call reads your API key from the `.env` file and makes it available to Tetra. + +## Step 5: Add Serverless endpoint configuration + +Next, let's define the Serverless endpoint configuration for our Tetra workload: + +```python +# Configuration for a Serverless endpoint using GPU workers +gpu_config = ServerlessResource( + templateId="[YOUR_TEMPLATE_ID]", # Replace with your template ID + gpuIds="any", # Use any available GPU + workersMax=1, + name="tetra_gpu", +) +``` + +This `ServerlessResource` object defines: + +- `templateId`: The RunPod template ID to use (you'll replace this with your actual template ID). +- `gpuIds="any"`: The GPU IDs that can be used by workers on this endpoint. This configuration allows the endpoint to use any GPUs that are available. You can also replace `any` with a comma-separated list of [GPU IDs](/references/gpu-types). +- `workersMax=1`: Sets the maximum number of worker instances to 1. +- `name="tetra_gpu"`: The name of the endpoint that will be created and used on the RunPod web interface. If an endpoint of this name already exists, Tetra will reuse it instead of creating a new one. + +## Step 6: Define your remote function + +Now, let's define the function that will run on the GPU worker: + +```python +@remote( + resource_config=gpu_config, + dependencies=["numpy", "torch"] +) +def tetra_matrix_operations(size=1000): + """Perform large matrix operations using NumPy and check GPU availability.""" + import numpy as np + import torch + + # Check if GPU is available + gpu_available = torch.cuda.is_available() + device_count = torch.cuda.device_count() if gpu_available else 0 + device_name = torch.cuda.get_device_name(0) if gpu_available else "N/A" + + # Create large random matrices + A = np.random.rand(size, size) + B = np.random.rand(size, size) + + # Perform matrix multiplication + C = np.dot(A, B) + + return { + "matrix_size": size, + "result_shape": C.shape, + "result_mean": float(np.mean(C)), + "result_std": float(np.std(C)), + "gpu_available": gpu_available, + "device_count": device_count, + "device_name": device_name + } +``` + +Let's break down this function: + +- `@remote`: This is the "remote decorator" that marks the function to run on RunPod's infrastructure instead of locally. + - `resource_config=gpu_config`: The function will run using the GPU configuration we defined earlier. + - `dependencies=["numpy", "torch"]`: Lists the Python packages that must be installed on the remote worker. + +- The `tetra_matrix_operations` function itself: + - Checks if a GPU is available using PyTorch's CUDA utilities. + - Creates two large random matrices using NumPy. + - Performs matrix multiplication. + - Returns statistics about the result and information about the GPU. + +Notice that we import `numpy` and `torch` inside the function, not at the top of the file. This is because these imports need to happen on the remote worker, not in your local environment. + +## Step 7: Add the main function + +Finally, add the main function to execute your GPU workload: + +```python +async def main(): + # Run the GPU matrix operations + print("Starting large matrix operations on GPU...") + result = await tetra_matrix_operations(1000) + + # Print the results + print("\nMatrix operations results:") + print(f"Matrix size: {result['matrix_size']}x{result['matrix_size']}") + print(f"Result shape: {result['result_shape']}") + print(f"Result mean: {result['result_mean']:.4f}") + print(f"Result standard deviation: {result['result_std']:.4f}") + + # Print GPU information + print("\nGPU Information:") + if result['gpu_available']: + print(f"GPU device count: {result['device_count']}") + print(f"GPU device name: {result['device_name']}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +This main function: +- Calls the remote function with `await`, which runs it asynchronously. +- Prints the results of the matrix operations. +- Displays information about the GPU that was used. + +The `asyncio.run(main())` line runs the asynchronous main function. + +All code outside of the `@remote` decorated function runs on your local machine. This main function serves as the interface between your local environment and RunPod's infrastructure, allowing you to send input data to remote functions and process their returned results. The `await` keyword handles the asynchronous communication, making the remote execution feel seamless. + +## Step 8: Run your GPU example + +Now you're ready to run the example: + +```bash +python matrix_gpu_example.py +``` + +You should see output similar to this: + +``` +Starting large matrix operations on GPU... +Resource ServerlessResource_33e1fa59c64b611c66c5a778e120c522 already exists, reusing. +Registering RunPod endpoint: server_ServerlessResource_33e1fa59c64b611c66c5a778e120c522 at https://api.runpod.ai/xvf32dan8rcilp +Initialized RunPod stub for endpoint: https://api.runpod.ai/xvf32dan8rcilp (ID: xvf32dan8rcilp) +Executing function on RunPod endpoint ID: xvf32dan8rcilp +Initial job status: IN_QUEUE +Job completed, output received + +Matrix operations results: +Matrix size: 1000x1000 +Result shape: (1000, 1000) +Result mean: 249.8286 +Result standard deviation: 6.8704 + +GPU Information: +GPU device count: 1 +GPU device name: NVIDIA GeForce RTX 4090 +``` + +## 9. Understand what's happening + +When you run this script: + +1. Tetra reads your GPU resource configuration and provisions a worker on RunPod. +2. It installs the required dependencies (NumPy and PyTorch) on the worker. +3. Your `tetra_matrix_operations` function runs on the remote worker. +4. The function creates and multiplies large matrices, then calculates statistics. +5. It also checks for GPU availability using PyTorch. +6. The results are returned to your local environment. +7. Your main function displays those results. + +## Step 10: Run multiple operations in parallel + +Now let's see how easy it is to run multiple remote operations in paralell. First, replace the `main` function with this code: + +```python +async def main(): + # Run multiple matrix operations in parallel + print("Starting large matrix operations on GPU...") + # Define different matrix sizes to test + sizes = [500, 1000, 2000] + + # Run all matrix operations in parallel + results = await asyncio.gather(*[ + tetra_matrix_operations(size) for size in sizes + ]) + + # Print the results for each matrix size + for size, result in zip(sizes, results): + print(f"\nMatrix size: {size}x{size}") + print(f"Result shape: {result['result_shape']}") + print(f"Result mean: {result['result_mean']:.4f}") + print(f"Result standard deviation: {result['result_std']:.4f}") + + # Print GPU information (using the first result since it's the same for all) + print("\nGPU Information:") + if results[0]['gpu_available']: + print(f"GPU device count: {results[0]['device_count']}") + print(f"GPU device name: {results[0]['device_name']}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +Now you're ready to run the example again: + +```bash +python matrix_gpu_example.py +``` + +You should now see results for all three matrix sizes after the operations have completed: + +```bash +Initial job status: IN_QUEUE +Initial job status: IN_QUEUE +Initial job status: IN_QUEUE +Job completed, output received +Job completed, output received +Job completed, output received + +Matrix size: 500x500 +Result shape: (500, 500) +Result mean: 125.3097 +Result standard deviation: 5.0425 + +Matrix size: 1000x1000 +Result shape: (1000, 1000) +Result mean: 249.9442 +Result standard deviation: 7.1072 + +Matrix size: 2000x2000 +Result shape: (2000, 2000) +Result mean: 500.1321 +Result standard deviation: 9.8879 +``` + +This demonstrates how Tetra can efficiently handle multiple GPU operations simultaneously. + +## Next steps + +Nicely done, you've successfuly used Tetra to seamlessly run GPU workloads using RunPod resources! + +Now that you've learned the basics of Tetra, you can: + +- Create a workflow that chains functions together, passing data between them. +- Explore more advanced PyTorch operations on the GPU. +- Try different resource configurations to optimize performance. diff --git a/sidebars.js b/sidebars.js index 902ffc6b..6b460370 100644 --- a/sidebars.js +++ b/sidebars.js @@ -39,6 +39,16 @@ module.exports = { }, ], }, + { + type: "category", + label: "Tetra", + items: [ + { + type: "autogenerated", + dirName: "tetra", + }, + ], + }, { type: "category", label: "runpodctl", @@ -49,6 +59,7 @@ module.exports = { }, ], }, + { type: "doc", id: "fine-tune/index", From 1715e8a70acee031aa410b4bd06f8105ecf09096 Mon Sep 17 00:00:00 2001 From: Mo King Date: Mon, 14 Apr 2025 15:50:10 -0400 Subject: [PATCH 02/11] How tetra works --- docs/tetra/overview.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index 0f9cc174..6f574b8d 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -139,8 +139,6 @@ When you execute a Tetra workflow: - Results are returned to your local environment. 4. Data flows between functions according to your workflow definition. -A diagram showing the Tetra workflow execution process - ## Common use cases * **Multi-modal AI pipelines**: Combine text, image, and audio models in unified workflows. From 2aecd8a9510378f702ec0cf9178902d90ff4c50a Mon Sep 17 00:00:00 2001 From: Mo King Date: Mon, 14 Apr 2025 18:55:11 -0400 Subject: [PATCH 03/11] Simplify parallel processing example --- docs/tetra/overview.md | 85 ++++++++++++---------------------------- docs/tetra/quickstart.md | 25 ++++++------ 2 files changed, 38 insertions(+), 72 deletions(-) diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index 6f574b8d..797176fd 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -6,19 +6,31 @@ sidebar_position: 1 # Tetra overview -Tetra is a Python SDK that streamlines the development and deployment of multi-model AI workflows on RunPod's [Serverless](/serverless/overview) infrastructure. It provides an abstraction layer that enables you to define, execute, and monitor sophisticated AI pipelines through a declarative interface, eliminating infrastructure overhead. +Tetra is a Python SDK that streamlines the development and deployment of AI workflows on RunPod's [Serverless](/serverless/overview) infrastructure. It provides an abstraction layer that enables you to define, execute, and monitor sophisticated AI pipelines through a declarative interface, eliminating infrastructure overhead. ## Why use Tetra? -* **Simplified workflow development**: Define AI pipelines in pure Python with minimal configuration, focusing on your logic rather than infrastructure details. -* **Optimized resource utilization**: Specify hardware requirements at the function level for precise control over GPU and CPU allocation. -* **Seamless deployment**: Automatically handle the RunPod Serverless infrastructure setup, worker communication, and data transfer. -* **Reduced development overhead**: Skip the tedious process of writing application code, building Docker containers, and managing endpoints for each worker. -* **Intuitive programming model**: Use familiar Python decorators to mark functions for remote execution. +Tetra provides several advantages over vanilla Serverless: + +- **Simplified workflow development**: Define AI pipelines in pure Python with minimal configuration, focusing on your logic rather than infrastructure details. +- **Optimized resource utilization**: Specify hardware requirements at the function level for precise control over GPU and CPU allocation. +- **Seamless deployment**: Tetra automatically handles RunPod Serverless infrastructure setup, worker communication, and data transfer. +- **Reduced development overhead**: Skip the tedious process of writing application code, building Docker containers, and managing endpoints for each worker. +- **Intuitive programming model**: Use familiar Python decorators to mark functions for remote execution. + +## Get started with Tetra + +You can get started with Tetra in minutes by following this [step-by-step tutorial](/tetra/quickstart). + +You can also start by cloning the tetra-rp repository and running the examples inside: + +``` +git clone https://github.com/runpod/tetra-rp.git +``` ## Key concepts -### Resource configurations +### Resource configuration Tetra allows explicit specification of hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over: @@ -141,62 +153,15 @@ When you execute a Tetra workflow: ## Common use cases -* **Multi-modal AI pipelines**: Combine text, image, and audio models in unified workflows. -* **Distributed model training**: Scale model training across multiple GPU workers. -* **AI research experimentation**: Quickly prototype and test complex model combinations. -* **Production inference systems**: Deploy sophisticated, multi-stage inference pipelines. -* **Data processing workflows**: Process large datasets using distributed resources. - -## Tetra vs. traditional RunPod Serverless - -| Aspect | Traditional Serverless | Tetra | -|--------|------------------------|-------| -| Development Process | Write handler, build Docker, create endpoint | Write Python functions with decorators | -| Infrastructure Management | Manual endpoint configuration | Automatic resource provisioning | -| Worker Communication | Manage manually | Automatic data transfer between functions | -| Development Overhead | High (Docker, endpoints per worker) | Low (pure Python development) | -| Deployment Speed | Multiple manual steps | Automatic deployment from code | -| Resource Control | Endpoint-level | Function-level granularity | - -## Get started with Tetra - -Getting started with Tetra is straightforward: - -1. Install Tetra and set up your environment: - ```bash - pip install tetra - ``` - -2. Create your first remote function: - ```python - from tetra import remote, ServerlessResource - - resource = ServerlessResource( - templateId="your_template_id", - name="my-first-worker" - ) - - @remote(resource_config=resource) - def hello_tetra(name): - return f"Hello, {name} from Tetra!" - - async def main(): - result = await hello_tetra("World") - print(result) - - if __name__ == "__main__": - import asyncio - asyncio.run(main()) - ``` - -3. Run your code and watch Tetra handle the infrastructure automatically. +- **Multi-modal AI pipelines**: Combine text, image, and audio models in unified workflows. +- **Distributed model training**: Scale model training across multiple GPU workers. +- **AI research experimentation**: Quickly prototype and test complex model combinations. +- **Production inference systems**: Deploy sophisticated, multi-stage inference pipelines. +- **Data processing workflows**: Process large datasets using distributed resources. ## Next steps Ready to streamline your AI workflow development with Tetra? - [Follow the quickstart guide to deploy your first workflow.](/tetra/quickstart) -- [Learn about advanced Tetra configurations and resource optimization.](/tetra/advanced-config) -- [Explore example workflows for common AI use cases.](/tetra/examples) -- [Read the complete API reference.](/docs/tetra/api-reference) -- [Join the Tetra community and get support.](/docs/tetra/community) +- [Clone the tetra-rp repository and test the workloads in the examples folder.](https://github.com/runpod/tetra-rp) \ No newline at end of file diff --git a/docs/tetra/quickstart.md b/docs/tetra/quickstart.md index b0d21994..1d2775df 100644 --- a/docs/tetra/quickstart.md +++ b/docs/tetra/quickstart.md @@ -38,7 +38,7 @@ First, let's install Tetra and set up your virtual environment: 1. Run this command to clone the Tetra repository: ```bash - git clone tetra-rp && cd tetra-rp + git clone git clone https://github.com/runpod/tetra-rp.git && cd tetra-rp ``` 2. Install dependencies with poetry: @@ -224,7 +224,7 @@ GPU device count: 1 GPU device name: NVIDIA GeForce RTX 4090 ``` -## 9. Understand what's happening +## Step 9: Understand what's happening When you run this script: @@ -238,23 +238,24 @@ When you run this script: ## Step 10: Run multiple operations in parallel -Now let's see how easy it is to run multiple remote operations in paralell. First, replace the `main` function with this code: +Now you'll see how easy it is to run multiple remote operations in paralell. First, replace your `main` function with this code: ```python async def main(): # Run multiple matrix operations in parallel print("Starting large matrix operations on GPU...") - # Define different matrix sizes to test - sizes = [500, 1000, 2000] # Run all matrix operations in parallel - results = await asyncio.gather(*[ - tetra_matrix_operations(size) for size in sizes - ]) + results = await asyncio.gather( + tetra_matrix_operations(500), + tetra_matrix_operations(1000), + tetra_matrix_operations(2000) + ) + print("\nMatrix operations results:") # Print the results for each matrix size - for size, result in zip(sizes, results): - print(f"\nMatrix size: {size}x{size}") + for r in results: + print(f"\nMatrix size: {result['matrix_size']}x{result['matrix_size']}") print(f"Result shape: {result['result_shape']}") print(f"Result mean: {result['result_mean']:.4f}") print(f"Result standard deviation: {result['result_std']:.4f}") @@ -301,11 +302,11 @@ Result mean: 500.1321 Result standard deviation: 9.8879 ``` -This demonstrates how Tetra can efficiently handle multiple GPU operations simultaneously. +That's all it takes to run multiple operations in parallel! ## Next steps -Nicely done, you've successfuly used Tetra to seamlessly run GPU workloads using RunPod resources! +Nicely done, you've successfuly used Tetra to seamlessly run a GPU workload using RunPod resources! Now that you've learned the basics of Tetra, you can: From c2b827d41b23973294d37e9db81fe1004e5add35 Mon Sep 17 00:00:00 2001 From: Mo King Date: Tue, 15 Apr 2025 10:38:07 -0400 Subject: [PATCH 04/11] Edit quickstart --- docs/tetra/overview.md | 4 +- docs/tetra/quickstart.md | 108 +++++++++++++++++++++++++-------------- 2 files changed, 70 insertions(+), 42 deletions(-) diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index 797176fd..26c16a46 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -1,6 +1,6 @@ --- title: Overview -description: "" +description: "Tetra is a Python SDK that streamlines the development and deployment of AI workflows on RunPod's Serverless infrastructure." sidebar_position: 1 --- @@ -39,8 +39,6 @@ Tetra allows explicit specification of hardware requirements at the function lev * Template selection * Memory requirements -CPU example: - ```python from tetra import ServerlessResource diff --git a/docs/tetra/quickstart.md b/docs/tetra/quickstart.md index 1d2775df..7227fae8 100644 --- a/docs/tetra/quickstart.md +++ b/docs/tetra/quickstart.md @@ -1,6 +1,6 @@ --- title: Quickstart -description: "" +description: "Learn how to set up your Tetra development environment to seamlessly run AI workloads using RunPod Serverless resources." sidebar_position: 2 --- @@ -15,6 +15,7 @@ Tetra is a Python SDK that simplifies the deployment AI workflows on RunPod by a In this tutorial you'll learn how to: - Set up your development environment for Tetra. +- Configure a [Serverless endpoint](/serverless/endpoints/overview) using a `ServerlessResource` object. - Create and define remote functions with the `@remote` decorator. - Deploy a GPU-based Tetra workload using RunPod resources. - Pass data between your local environment and remote workers. @@ -28,20 +29,20 @@ In this tutorial you'll learn how to: :::note -If you have a later version of Python installed, you can use [penv](https://github.com/pyenv/pyenv) to switch to an earlier one. +If you have a later version of Python installed (> 3.12), you can use [pyenv](https://github.com/pyenv/pyenv) to switch to an earlier one. ::: ## Step 1: Install Tetra -First, let's install Tetra and set up your virtual environment: +First, let's clone the Tetra repo and set up your virtual environment: 1. Run this command to clone the Tetra repository: ```bash - git clone git clone https://github.com/runpod/tetra-rp.git && cd tetra-rp + git clone https://github.com/runpod/tetra-rp.git && cd tetra-rp ``` -2. Install dependencies with poetry: +2. Install dependencies with Poetry: ```bash poetry install ``` @@ -55,15 +56,27 @@ First, let's install Tetra and set up your virtual environment: You'll need to add your [RunPod API key](/get-started/api-keys) to your development environment before you can use Tetra to run your workloads. -Run this command to create a `.env` file in your project root, replacing [YOUR_API_KEY] with your API key: +Run this command to create a `.env` file, replacing [YOUR_API_KEY] with your RunPod API key: ```bash touch .env && echo "RUNPOD_API_KEY=[YOUR_API_KEY]" > .env ``` +:::note + +You can create this in your project's root directory or in the `/examples` folder. Just make sure your `.env` file is in the same folder as the Python file you create in the next step. + +::: + ## Step 3: Create your project file -Now you can start building your Tetra project. Create a new file called `matrix_gpu_example.py` in the same folder as the `.env` file you just created, and open it in your code editor. You'll build this file step-by-step. +Now you're ready to start building your Tetra project. Create a new file called `matrix_operations.py` in the same directory as your `.env` file: + +```bash +touch matrix_operations.py +``` + +Open this file in your preferred code editor. We'll walk through building it out step-by-step, implementing a simple matrix multiplication example that demonstrates Tetra's remote execution and parallel processing capabilities. ## Step 4: Add imports and load .env file @@ -84,7 +97,7 @@ This imports: - `dotenv`: Helps load environment variables from your `.env` file, including your RunPod API key. - `remote` and `ServerlessResource`: The core Tetra components you'll use to define remote functions and their resource requirements. -The `load_dotenv()` call reads your API key from the `.env` file and makes it available to Tetra. +`load_dotenv()` reads your API key from the `.env` file and makes it available to Tetra. ## Step 5: Add Serverless endpoint configuration @@ -93,8 +106,8 @@ Next, let's define the Serverless endpoint configuration for our Tetra workload: ```python # Configuration for a Serverless endpoint using GPU workers gpu_config = ServerlessResource( - templateId="[YOUR_TEMPLATE_ID]", # Replace with your template ID - gpuIds="any", # Use any available GPU + templateId="[YOUR_TEMPLATE_ID]", # Replace with your template ID + gpuIds="any", # Use any available GPU workersMax=1, name="tetra_gpu", ) @@ -105,7 +118,9 @@ This `ServerlessResource` object defines: - `templateId`: The RunPod template ID to use (you'll replace this with your actual template ID). - `gpuIds="any"`: The GPU IDs that can be used by workers on this endpoint. This configuration allows the endpoint to use any GPUs that are available. You can also replace `any` with a comma-separated list of [GPU IDs](/references/gpu-types). - `workersMax=1`: Sets the maximum number of worker instances to 1. -- `name="tetra_gpu"`: The name of the endpoint that will be created and used on the RunPod web interface. If an endpoint of this name already exists, Tetra will reuse it instead of creating a new one. +- `name="tetra_gpu"`: The name of the endpoint that will be created/used on the RunPod web interface. + +If you run a Tetra function that uses an identical `ServerlessResource` configuration to a prior run, RunPod will reuse your existing endpoint rather than creating a new one. However, if any configuration values have changed (not just the `name` parameter), a new endpoint will be created to match your updated requirements. ## Step 6: Define your remote function @@ -116,15 +131,14 @@ Now, let's define the function that will run on the GPU worker: resource_config=gpu_config, dependencies=["numpy", "torch"] ) -def tetra_matrix_operations(size=1000): +def tetra_matrix_operations(size): """Perform large matrix operations using NumPy and check GPU availability.""" import numpy as np import torch - # Check if GPU is available - gpu_available = torch.cuda.is_available() - device_count = torch.cuda.device_count() if gpu_available else 0 - device_name = torch.cuda.get_device_name(0) if gpu_available else "N/A" + # Get GPU count and name + device_count = torch.cuda.device_count() + device_name = torch.cuda.get_device_name(0) # Create large random matrices A = np.random.rand(size, size) @@ -138,7 +152,6 @@ def tetra_matrix_operations(size=1000): "result_shape": C.shape, "result_mean": float(np.mean(C)), "result_std": float(np.std(C)), - "gpu_available": gpu_available, "device_count": device_count, "device_name": device_name } @@ -151,7 +164,7 @@ Let's break down this function: - `dependencies=["numpy", "torch"]`: Lists the Python packages that must be installed on the remote worker. - The `tetra_matrix_operations` function itself: - - Checks if a GPU is available using PyTorch's CUDA utilities. + - Gets GPU details using PyTorch's CUDA utilities. - Creates two large random matrices using NumPy. - Performs matrix multiplication. - Returns statistics about the result and information about the GPU. @@ -160,7 +173,7 @@ Notice that we import `numpy` and `torch` inside the function, not at the top of ## Step 7: Add the main function -Finally, add the main function to execute your GPU workload: +Finally, add this `main` function to execute your GPU workload: ```python async def main(): @@ -185,21 +198,28 @@ if __name__ == "__main__": asyncio.run(main()) ``` -This main function: -- Calls the remote function with `await`, which runs it asynchronously. +This `main` function: + +- Calls the remote function with `await`, which runs it asynchronously on RunPod's infrastructure. - Prints the results of the matrix operations. - Displays information about the GPU that was used. -The `asyncio.run(main())` line runs the asynchronous main function. +The `asyncio.run(main())` line is Python's standard way to execute an asynchronous `main` function from synchronous code. It creates an event loop, runs the `main function until completion, and then closes the loop. -All code outside of the `@remote` decorated function runs on your local machine. This main function serves as the interface between your local environment and RunPod's infrastructure, allowing you to send input data to remote functions and process their returned results. The `await` keyword handles the asynchronous communication, making the remote execution feel seamless. +All code outside of the `@remote` decorated function runs on your local machine. The `main` function acts as a bridge between your local environment and RunPod's cloud infrastructure, allowing you to: + +- Send input data to remote functions (in this case, the matrix size parameter) +- Wait for remote execution to complete without blocking your local process +- Process the returned results locally once they're available + +The `await` keyword is crucial here—it pauses execution of the `main` function until the remote operation completes, but doesn't block the entire Python process. This asynchronous pattern enables efficient resource utilization while maintaining a simple, sequential coding style. ## Step 8: Run your GPU example Now you're ready to run the example: ```bash -python matrix_gpu_example.py +python matrix_operations.py ``` You should see output similar to this: @@ -224,6 +244,22 @@ GPU device count: 1 GPU device name: NVIDIA GeForce RTX 4090 ``` +:::tip + +If you're having trouble running your code due to authentication issues: +1. Verify your `.env` file is in the same directory as your `matrix_operations.py` file. +2. Check that the API key in your `.env` file is correct and properly formatted. +3. Alternatively, you can set the API key directly in your terminal with: + ```bash + export RUNPOD_API_KEY=[YOUR_API_KEY] + ``` +4. For Windows users: + ```cmd + set RUNPOD_API_KEY=[YOUR_API_KEY] + ``` + +::: + ## Step 9: Understand what's happening When you run this script: @@ -232,13 +268,13 @@ When you run this script: 2. It installs the required dependencies (NumPy and PyTorch) on the worker. 3. Your `tetra_matrix_operations` function runs on the remote worker. 4. The function creates and multiplies large matrices, then calculates statistics. -5. It also checks for GPU availability using PyTorch. -6. The results are returned to your local environment. -7. Your main function displays those results. +5. Your local `main` function receives these results and displays them in your terminal. ## Step 10: Run multiple operations in parallel -Now you'll see how easy it is to run multiple remote operations in paralell. First, replace your `main` function with this code: +Now let's see how easy it is to run multiple remote operations in paralell using Tetra. + +First, replace your `main` function with this code: ```python async def main(): @@ -254,26 +290,22 @@ async def main(): print("\nMatrix operations results:") # Print the results for each matrix size - for r in results: + for result in results: print(f"\nMatrix size: {result['matrix_size']}x{result['matrix_size']}") print(f"Result shape: {result['result_shape']}") print(f"Result mean: {result['result_mean']:.4f}") print(f"Result standard deviation: {result['result_std']:.4f}") - - # Print GPU information (using the first result since it's the same for all) - print("\nGPU Information:") - if results[0]['gpu_available']: - print(f"GPU device count: {results[0]['device_count']}") - print(f"GPU device name: {results[0]['device_name']}") if __name__ == "__main__": asyncio.run(main()) ``` -Now you're ready to run the example again: +This new `main` function demonstrates Tetra's ability to run multiple operations in parallel using `asyncio.gather()`. Instead of running one matrix operation at a time, we're now launching three operations with different matrix sizes (500, 1000, and 2000) simultaneously. This parallel execution significantly improves efficiency when you have multiple independent tasks that can run concurrently, making better use of available GPU resources. + +Try running the example again: ```bash -python matrix_gpu_example.py +python matrix_operations.py ``` You should now see results for all three matrix sizes after the operations have completed: @@ -302,8 +334,6 @@ Result mean: 500.1321 Result standard deviation: 9.8879 ``` -That's all it takes to run multiple operations in parallel! - ## Next steps Nicely done, you've successfuly used Tetra to seamlessly run a GPU workload using RunPod resources! From 7655ad868bd09b44beef02e4b105b48fafa8e6b1 Mon Sep 17 00:00:00 2001 From: Mo King Date: Tue, 15 Apr 2025 10:57:48 -0400 Subject: [PATCH 05/11] Update get started name; edit overview --- docs/tetra/{quickstart.md => get-started.md} | 4 +- docs/tetra/overview.md | 41 +++++++++++--------- 2 files changed, 24 insertions(+), 21 deletions(-) rename docs/tetra/{quickstart.md => get-started.md} (99%) diff --git a/docs/tetra/quickstart.md b/docs/tetra/get-started.md similarity index 99% rename from docs/tetra/quickstart.md rename to docs/tetra/get-started.md index 7227fae8..7a7dbbfe 100644 --- a/docs/tetra/quickstart.md +++ b/docs/tetra/get-started.md @@ -1,10 +1,10 @@ --- -title: Quickstart +title: Get started description: "Learn how to set up your Tetra development environment to seamlessly run AI workloads using RunPod Serverless resources." sidebar_position: 2 --- -# Tetra quickstart +# Get started with Tetra Learn how to set up your Tetra development environment to seamlessly run AI workloads using [RunPod Serverless](/serverless/overview) resources. diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index 26c16a46..8c911523 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -6,7 +6,7 @@ sidebar_position: 1 # Tetra overview -Tetra is a Python SDK that streamlines the development and deployment of AI workflows on RunPod's [Serverless](/serverless/overview) infrastructure. It provides an abstraction layer that enables you to define, execute, and monitor sophisticated AI pipelines through a declarative interface, eliminating infrastructure overhead. +Tetra is a Python SDK that streamlines the development and deployment of AI workflows on RunPod's [Serverless](/serverless/overview) infrastructure. It provides an abstraction layer that lets you define, execute, and monitor sophisticated AI pipelines through a declarative interface, eliminating infrastructure overhead. ## Why use Tetra? @@ -16,11 +16,11 @@ Tetra provides several advantages over vanilla Serverless: - **Optimized resource utilization**: Specify hardware requirements at the function level for precise control over GPU and CPU allocation. - **Seamless deployment**: Tetra automatically handles RunPod Serverless infrastructure setup, worker communication, and data transfer. - **Reduced development overhead**: Skip the tedious process of writing application code, building Docker containers, and managing endpoints for each worker. -- **Intuitive programming model**: Use familiar Python decorators to mark functions for remote execution. +- **Intuitive programming model**: Use Python decorators to mark functions for remote execution. ## Get started with Tetra -You can get started with Tetra in minutes by following this [step-by-step tutorial](/tetra/quickstart). +You can get started with Tetra in minutes by following this [step-by-step tutorial](/tetra/get-started). You can also start by cloning the tetra-rp repository and running the examples inside: @@ -32,12 +32,11 @@ git clone https://github.com/runpod/tetra-rp.git ### Resource configuration -Tetra allows explicit specification of hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over: +Tetra lets you specificy hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over: -* GPU/CPU allocation -* Worker scaling limits -* Template selection -* Memory requirements +- GPU/CPU allocation. +- Worker scaling limits. +- Template selection. ```python from tetra import ServerlessResource @@ -65,25 +64,27 @@ Remote functions are the building blocks of Tetra workflows. Simply mark any Pyt from tetra import remote @remote( - resource_config=gpu_config, # Uses the GPU config defined in the previous section + resource_config=gpu_config, # Uses a ServerlessResource object to set up an endpoint ) def process_image(image_data): - # Code you add here will be run remotely using RunPod infrastructure + # Code you add here will be run remotely using RunPod Serverless return results ``` -### Passing data between RunPod and your local machine +### Transfer data between RunPod and your local machine Tetra makes it easy to pass data between your local environment and RunPod's infrastructure. The remote function can accept any serializable Python objects as input and return them as output: ```python async def main(): - # Code you add here will be run locally, allowing you to pass data between RunPod and your local machine. + # Code you add here will be run locally + + image = ... # Upload an image from your local machine print("Processing image...") - result = await process_image(image) # This function will run remotely, using an image passed in from your local machine + result = await process_image(image) # Process image remotely if __name__ == "__main__": asyncio.run(main()) @@ -91,7 +92,7 @@ if __name__ == "__main__": ### Dependencies -Specify required Python libraries directly in the `@remote` decorator, and Tetra ensures they're available in your execution environment: +You can specify required Python dependencies for remote workers at the function level from within the `@remote` decorator, and Tetra ensures they will be installed in your execution environment: ```python @remote( @@ -106,9 +107,11 @@ def model_inference(data): # ... ``` +Make sure to include `import` statements *inside* any remote functions that require them. + ### Asynchronous execution -Tetra workflows run asynchronously, making it easy to manage complex pipelines: +Tetra workflows run asynchronously, making it easy to manage complex pipelines and run parallel processes: ```python @remote(...) @@ -143,11 +146,11 @@ When you execute a Tetra workflow: 1. The `@remote` decorator identifies functions designated for remote execution. 2. Tetra analyzes the dependencies between functions to determine execution order. 3. For each remote function: - - Tetra provisions the appropriate resources on RunPod. + - Tetra provisions the appropriate endpoint and worker resources on RunPod. - Input data is serialized and transferred to the remote worker. - The function executes on the remote infrastructure. - Results are returned to your local environment. -4. Data flows between functions according to your workflow definition. +4. Data flows between functions as defined by your local code. ## Common use cases @@ -161,5 +164,5 @@ When you execute a Tetra workflow: Ready to streamline your AI workflow development with Tetra? -- [Follow the quickstart guide to deploy your first workflow.](/tetra/quickstart) -- [Clone the tetra-rp repository and test the workloads in the examples folder.](https://github.com/runpod/tetra-rp) \ No newline at end of file +- [Build your first Tetra workflow using this step-by-step tutorial.](/tetra/get-started) +- [Clone the tetra-rp repository and test the files in the `/examples` folder.](https://github.com/runpod/tetra-rp) \ No newline at end of file From 00dfcb934b907ad64c2f1604b73c502332f9b6d1 Mon Sep 17 00:00:00 2001 From: Mo King Date: Tue, 15 Apr 2025 11:11:36 -0400 Subject: [PATCH 06/11] Move getting started info in overview --- docs/tetra/overview.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index 8c911523..ab077f27 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -8,6 +8,14 @@ sidebar_position: 1 Tetra is a Python SDK that streamlines the development and deployment of AI workflows on RunPod's [Serverless](/serverless/overview) infrastructure. It provides an abstraction layer that lets you define, execute, and monitor sophisticated AI pipelines through a declarative interface, eliminating infrastructure overhead. +You can get started with Tetra in minutes by following this [step-by-step tutorial](/tetra/get-started). + +You can also start by cloning the Tetra repository and running the examples inside: + +``` +git clone https://github.com/runpod/tetra-rp.git +``` + ## Why use Tetra? Tetra provides several advantages over vanilla Serverless: @@ -18,16 +26,6 @@ Tetra provides several advantages over vanilla Serverless: - **Reduced development overhead**: Skip the tedious process of writing application code, building Docker containers, and managing endpoints for each worker. - **Intuitive programming model**: Use Python decorators to mark functions for remote execution. -## Get started with Tetra - -You can get started with Tetra in minutes by following this [step-by-step tutorial](/tetra/get-started). - -You can also start by cloning the tetra-rp repository and running the examples inside: - -``` -git clone https://github.com/runpod/tetra-rp.git -``` - ## Key concepts ### Resource configuration From daec8d40e467c0e892e0104a9d6b78aee5b53d3f Mon Sep 17 00:00:00 2001 From: Mo King Date: Tue, 15 Apr 2025 11:16:00 -0400 Subject: [PATCH 07/11] Update config sample in overview --- docs/tetra/overview.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index ab077f27..7f9b9b38 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -36,17 +36,20 @@ Tetra lets you specificy hardware requirements at the function level through the - Worker scaling limits. - Template selection. +For example: + ```python from tetra import ServerlessResource # Configure a GPU endpoint gpu_config = ServerlessResource( templateId="abc123", # GPU template ID - gpuIds="any", - workersMax=5, # Scale up to 5 workers + gpuIds="any", # Use any available GPU + workersMax=5, # Scales up to 5 workers name="parallel-processor" # Name of the endpoint that will be created or used ) +# Configure a CPU endpoint cpu_resource = ServerlessResource( templateId="def456", # CPU template ID workersMax=1 From c2273e0cd099642902ce512c390a09fe3a1bd889 Mon Sep 17 00:00:00 2001 From: Mo King Date: Tue, 15 Apr 2025 12:58:17 -0400 Subject: [PATCH 08/11] Fix minor issues --- docs/tetra/get-started.md | 7 +++---- docs/tetra/overview.md | 8 ++++---- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/tetra/get-started.md b/docs/tetra/get-started.md index 7a7dbbfe..bcbf69ed 100644 --- a/docs/tetra/get-started.md +++ b/docs/tetra/get-started.md @@ -190,9 +190,8 @@ async def main(): # Print GPU information print("\nGPU Information:") - if result['gpu_available']: - print(f"GPU device count: {result['device_count']}") - print(f"GPU device name: {result['device_name']}") + print(f"GPU device count: {result['device_count']}") + print(f"GPU device name: {result['device_name']}") if __name__ == "__main__": asyncio.run(main()) @@ -272,7 +271,7 @@ When you run this script: ## Step 10: Run multiple operations in parallel -Now let's see how easy it is to run multiple remote operations in paralell using Tetra. +Now let's see how easy it is to run multiple remote operations in parallel using Tetra. First, replace your `main` function with this code: diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index 7f9b9b38..4389776c 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -30,7 +30,7 @@ Tetra provides several advantages over vanilla Serverless: ### Resource configuration -Tetra lets you specificy hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over: +Tetra lets you specify hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over: - GPU/CPU allocation. - Worker scaling limits. @@ -46,13 +46,13 @@ gpu_config = ServerlessResource( templateId="abc123", # GPU template ID gpuIds="any", # Use any available GPU workersMax=5, # Scales up to 5 workers - name="parallel-processor" # Name of the endpoint that will be created or used + name="parallel-processor", # Name of the endpoint that will be created or used ) # Configure a CPU endpoint cpu_resource = ServerlessResource( templateId="def456", # CPU template ID - workersMax=1 + workersMax=1 , name="data-processor", # Name of the endpoint that will be created or used ) ``` @@ -74,7 +74,7 @@ def process_image(image_data): return results ``` -### Transfer data between RunPod and your local machine +### Remote/local data transfer Tetra makes it easy to pass data between your local environment and RunPod's infrastructure. The remote function can accept any serializable Python objects as input and return them as output: From 43622d2a51024dd287072c5523bc481644006dd8 Mon Sep 17 00:00:00 2001 From: Mo King Date: Wed, 16 Apr 2025 09:00:32 -0400 Subject: [PATCH 09/11] Add get-started section --- docs/tetra/overview.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index 4389776c..b60092c9 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -8,7 +8,9 @@ sidebar_position: 1 Tetra is a Python SDK that streamlines the development and deployment of AI workflows on RunPod's [Serverless](/serverless/overview) infrastructure. It provides an abstraction layer that lets you define, execute, and monitor sophisticated AI pipelines through a declarative interface, eliminating infrastructure overhead. -You can get started with Tetra in minutes by following this [step-by-step tutorial](/tetra/get-started). +## Get started + +Learn how to code Tetra workflows in serial and parallel by following this [step-by-step tutorial](/tetra/get-started). You can also start by cloning the Tetra repository and running the examples inside: From c7a83216c58bab91a00cfde402b0208ea8b049b8 Mon Sep 17 00:00:00 2001 From: Mo King Date: Wed, 23 Apr 2025 10:11:55 -0400 Subject: [PATCH 10/11] Remove references to templates --- docs/tetra/get-started.md | 2 -- docs/tetra/overview.md | 8 +------- 2 files changed, 1 insertion(+), 9 deletions(-) diff --git a/docs/tetra/get-started.md b/docs/tetra/get-started.md index bcbf69ed..e7398a05 100644 --- a/docs/tetra/get-started.md +++ b/docs/tetra/get-started.md @@ -106,7 +106,6 @@ Next, let's define the Serverless endpoint configuration for our Tetra workload: ```python # Configuration for a Serverless endpoint using GPU workers gpu_config = ServerlessResource( - templateId="[YOUR_TEMPLATE_ID]", # Replace with your template ID gpuIds="any", # Use any available GPU workersMax=1, name="tetra_gpu", @@ -115,7 +114,6 @@ gpu_config = ServerlessResource( This `ServerlessResource` object defines: -- `templateId`: The RunPod template ID to use (you'll replace this with your actual template ID). - `gpuIds="any"`: The GPU IDs that can be used by workers on this endpoint. This configuration allows the endpoint to use any GPUs that are available. You can also replace `any` with a comma-separated list of [GPU IDs](/references/gpu-types). - `workersMax=1`: Sets the maximum number of worker instances to 1. - `name="tetra_gpu"`: The name of the endpoint that will be created/used on the RunPod web interface. diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index b60092c9..1aadefd5 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -32,11 +32,7 @@ Tetra provides several advantages over vanilla Serverless: ### Resource configuration -Tetra lets you specify hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over: - -- GPU/CPU allocation. -- Worker scaling limits. -- Template selection. +Tetra lets you specify hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over GPU/CPU allocation and worker scaling.limits. For example: @@ -45,7 +41,6 @@ from tetra import ServerlessResource # Configure a GPU endpoint gpu_config = ServerlessResource( - templateId="abc123", # GPU template ID gpuIds="any", # Use any available GPU workersMax=5, # Scales up to 5 workers name="parallel-processor", # Name of the endpoint that will be created or used @@ -53,7 +48,6 @@ gpu_config = ServerlessResource( # Configure a CPU endpoint cpu_resource = ServerlessResource( - templateId="def456", # CPU template ID workersMax=1 , name="data-processor", # Name of the endpoint that will be created or used ) From e0fcac0aa34c783b64c760392959e0de8b657257 Mon Sep 17 00:00:00 2001 From: Mo King Date: Mon, 28 Apr 2025 11:08:56 -0400 Subject: [PATCH 11/11] Add configuration parameters --- docs/tetra/get-started.md | 9 ++------- docs/tetra/overview.md | 23 +++++++++++++++++++++-- 2 files changed, 23 insertions(+), 9 deletions(-) diff --git a/docs/tetra/get-started.md b/docs/tetra/get-started.md index e7398a05..970398ca 100644 --- a/docs/tetra/get-started.md +++ b/docs/tetra/get-started.md @@ -42,14 +42,9 @@ First, let's clone the Tetra repo and set up your virtual environment: git clone https://github.com/runpod/tetra-rp.git && cd tetra-rp ``` -2. Install dependencies with Poetry: +2. Install dependencies with `pip`: ```bash - poetry install - ``` - -3. Activate the virtual environment: - ```bash - $(poetry env activate) + pip install -r requirements.txt ``` ## Step 2: Add your API key to the environment diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md index 1aadefd5..149ac03e 100644 --- a/docs/tetra/overview.md +++ b/docs/tetra/overview.md @@ -53,6 +53,8 @@ cpu_resource = ServerlessResource( ) ``` +See [Configuration parameters](#configuration-parameters) for a complete list of available settings. + ### Remote functions Remote functions are the building blocks of Tetra workflows. Simply mark any Python function with the `@remote` decorator to designate it for execution on RunPod's infrastructure: @@ -87,14 +89,14 @@ if __name__ == "__main__": asyncio.run(main()) ``` -### Dependencies +### Dependency management You can specify required Python dependencies for remote workers at the function level from within the `@remote` decorator, and Tetra ensures they will be installed in your execution environment: ```python @remote( resource_config=gpu_resource, - dependencies=["torch", "transformers", "pillow"] + dependencies=["torch==2.0.1", "transformers", "pillow"] ) def model_inference(data): # Libraries are automatically installed @@ -157,6 +159,23 @@ When you execute a Tetra workflow: - **Production inference systems**: Deploy sophisticated, multi-stage inference pipelines. - **Data processing workflows**: Process large datasets using distributed resources. +## Configuration parameters + +| Parameter | Description | Default | Example values | +|-----------|-------------|---------|---------------| +| `name` | (Required) Name for your endpoint | "" | `"stable-diffusion-server"` | +| `gpuIds` | Type of GPU to request | `"any"` | `"any"` or a list of comma-separated [GPU IDs](https://docs.runpod.io/references/gpu-types) | +| `gpuCount` | Number of GPUs per worker | 1 | 1, 2, 4 | +| `workersMin` | Minimum number of workers | 0 | Set to 1 for persistence | +| `workersMax` | Maximum number of workers | 3 | Higher for more concurrency | +| `idleTimeout` | Minutes before scaling down | 5 | 10, 30, 60 | +| `env` | Environment variables | None | `{"HF_TOKEN": "xyz"}` | +| `networkVolumeId` | Persistent storage ID | None | `"vol_abc123"` | +| `executionTimeoutMs` | Max execution time (milliseconds) | 0 (no limit) | 600000 (10 min) | +| `scalerType` | Scaling strategy | `QUEUE_DELAY` | `NONE`, `QUEUE_SIZE` | +| `scalerValue` | Scaling parameter value | 4 | 1-10 range typical | +| `locations` | Preferred data center locations | None | `"us-east,eu-central"` | + ## Next steps Ready to streamline your AI workflow development with Tetra?