diff --git a/docs/tetra/get-started.md b/docs/tetra/get-started.md new file mode 100644 index 0000000..970398c --- /dev/null +++ b/docs/tetra/get-started.md @@ -0,0 +1,337 @@ +--- +title: Get started +description: "Learn how to set up your Tetra development environment to seamlessly run AI workloads using RunPod Serverless resources." +sidebar_position: 2 +--- + +# Get started with Tetra + +Learn how to set up your Tetra development environment to seamlessly run AI workloads using [RunPod Serverless](/serverless/overview) resources. + +Tetra is a Python SDK that simplifies the deployment AI workflows on RunPod by automating infrastructure management and worker communication. It lets you run code using RunPod compute resources without needing to open the RunPod web—just run your code locally and Tetra takes care of the rest. + +## What you'll learn + +In this tutorial you'll learn how to: + +- Set up your development environment for Tetra. +- Configure a [Serverless endpoint](/serverless/endpoints/overview) using a `ServerlessResource` object. +- Create and define remote functions with the `@remote` decorator. +- Deploy a GPU-based Tetra workload using RunPod resources. +- Pass data between your local environment and remote workers. +- Understand how Tetra manages remote execution. + +## Requirements + +- You've [created a RunPod account](/get-started/manage-accounts). +- You've created a [RunPod API key](/get-started/api-keys). +- You've installed [Python 3.9 - 3.12](https://www.python.org/downloads/) and [Poetry](https://python-poetry.org/) (for dependency management). + +:::note + +If you have a later version of Python installed (> 3.12), you can use [pyenv](https://github.com/pyenv/pyenv) to switch to an earlier one. + +::: + +## Step 1: Install Tetra + +First, let's clone the Tetra repo and set up your virtual environment: + +1. Run this command to clone the Tetra repository: + ```bash + git clone https://github.com/runpod/tetra-rp.git && cd tetra-rp + ``` + +2. Install dependencies with `pip`: + ```bash + pip install -r requirements.txt + ``` + +## Step 2: Add your API key to the environment + +You'll need to add your [RunPod API key](/get-started/api-keys) to your development environment before you can use Tetra to run your workloads. + +Run this command to create a `.env` file, replacing [YOUR_API_KEY] with your RunPod API key: + +```bash +touch .env && echo "RUNPOD_API_KEY=[YOUR_API_KEY]" > .env +``` + +:::note + +You can create this in your project's root directory or in the `/examples` folder. Just make sure your `.env` file is in the same folder as the Python file you create in the next step. + +::: + +## Step 3: Create your project file + +Now you're ready to start building your Tetra project. Create a new file called `matrix_operations.py` in the same directory as your `.env` file: + +```bash +touch matrix_operations.py +``` + +Open this file in your preferred code editor. We'll walk through building it out step-by-step, implementing a simple matrix multiplication example that demonstrates Tetra's remote execution and parallel processing capabilities. + +## Step 4: Add imports and load .env file + +First, add the necessary import statements: + +```python +import asyncio +from dotenv import load_dotenv +from tetra import remote, ServerlessResource + +# Load environment variables from .env file +load_dotenv() +``` + +This imports: + +- `asyncio`: Python's asynchronous programming library, which Tetra uses for non-blocking execution. +- `dotenv`: Helps load environment variables from your `.env` file, including your RunPod API key. +- `remote` and `ServerlessResource`: The core Tetra components you'll use to define remote functions and their resource requirements. + +`load_dotenv()` reads your API key from the `.env` file and makes it available to Tetra. + +## Step 5: Add Serverless endpoint configuration + +Next, let's define the Serverless endpoint configuration for our Tetra workload: + +```python +# Configuration for a Serverless endpoint using GPU workers +gpu_config = ServerlessResource( + gpuIds="any", # Use any available GPU + workersMax=1, + name="tetra_gpu", +) +``` + +This `ServerlessResource` object defines: + +- `gpuIds="any"`: The GPU IDs that can be used by workers on this endpoint. This configuration allows the endpoint to use any GPUs that are available. You can also replace `any` with a comma-separated list of [GPU IDs](/references/gpu-types). +- `workersMax=1`: Sets the maximum number of worker instances to 1. +- `name="tetra_gpu"`: The name of the endpoint that will be created/used on the RunPod web interface. + +If you run a Tetra function that uses an identical `ServerlessResource` configuration to a prior run, RunPod will reuse your existing endpoint rather than creating a new one. However, if any configuration values have changed (not just the `name` parameter), a new endpoint will be created to match your updated requirements. + +## Step 6: Define your remote function + +Now, let's define the function that will run on the GPU worker: + +```python +@remote( + resource_config=gpu_config, + dependencies=["numpy", "torch"] +) +def tetra_matrix_operations(size): + """Perform large matrix operations using NumPy and check GPU availability.""" + import numpy as np + import torch + + # Get GPU count and name + device_count = torch.cuda.device_count() + device_name = torch.cuda.get_device_name(0) + + # Create large random matrices + A = np.random.rand(size, size) + B = np.random.rand(size, size) + + # Perform matrix multiplication + C = np.dot(A, B) + + return { + "matrix_size": size, + "result_shape": C.shape, + "result_mean": float(np.mean(C)), + "result_std": float(np.std(C)), + "device_count": device_count, + "device_name": device_name + } +``` + +Let's break down this function: + +- `@remote`: This is the "remote decorator" that marks the function to run on RunPod's infrastructure instead of locally. + - `resource_config=gpu_config`: The function will run using the GPU configuration we defined earlier. + - `dependencies=["numpy", "torch"]`: Lists the Python packages that must be installed on the remote worker. + +- The `tetra_matrix_operations` function itself: + - Gets GPU details using PyTorch's CUDA utilities. + - Creates two large random matrices using NumPy. + - Performs matrix multiplication. + - Returns statistics about the result and information about the GPU. + +Notice that we import `numpy` and `torch` inside the function, not at the top of the file. This is because these imports need to happen on the remote worker, not in your local environment. + +## Step 7: Add the main function + +Finally, add this `main` function to execute your GPU workload: + +```python +async def main(): + # Run the GPU matrix operations + print("Starting large matrix operations on GPU...") + result = await tetra_matrix_operations(1000) + + # Print the results + print("\nMatrix operations results:") + print(f"Matrix size: {result['matrix_size']}x{result['matrix_size']}") + print(f"Result shape: {result['result_shape']}") + print(f"Result mean: {result['result_mean']:.4f}") + print(f"Result standard deviation: {result['result_std']:.4f}") + + # Print GPU information + print("\nGPU Information:") + print(f"GPU device count: {result['device_count']}") + print(f"GPU device name: {result['device_name']}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +This `main` function: + +- Calls the remote function with `await`, which runs it asynchronously on RunPod's infrastructure. +- Prints the results of the matrix operations. +- Displays information about the GPU that was used. + +The `asyncio.run(main())` line is Python's standard way to execute an asynchronous `main` function from synchronous code. It creates an event loop, runs the `main function until completion, and then closes the loop. + +All code outside of the `@remote` decorated function runs on your local machine. The `main` function acts as a bridge between your local environment and RunPod's cloud infrastructure, allowing you to: + +- Send input data to remote functions (in this case, the matrix size parameter) +- Wait for remote execution to complete without blocking your local process +- Process the returned results locally once they're available + +The `await` keyword is crucial here—it pauses execution of the `main` function until the remote operation completes, but doesn't block the entire Python process. This asynchronous pattern enables efficient resource utilization while maintaining a simple, sequential coding style. + +## Step 8: Run your GPU example + +Now you're ready to run the example: + +```bash +python matrix_operations.py +``` + +You should see output similar to this: + +``` +Starting large matrix operations on GPU... +Resource ServerlessResource_33e1fa59c64b611c66c5a778e120c522 already exists, reusing. +Registering RunPod endpoint: server_ServerlessResource_33e1fa59c64b611c66c5a778e120c522 at https://api.runpod.ai/xvf32dan8rcilp +Initialized RunPod stub for endpoint: https://api.runpod.ai/xvf32dan8rcilp (ID: xvf32dan8rcilp) +Executing function on RunPod endpoint ID: xvf32dan8rcilp +Initial job status: IN_QUEUE +Job completed, output received + +Matrix operations results: +Matrix size: 1000x1000 +Result shape: (1000, 1000) +Result mean: 249.8286 +Result standard deviation: 6.8704 + +GPU Information: +GPU device count: 1 +GPU device name: NVIDIA GeForce RTX 4090 +``` + +:::tip + +If you're having trouble running your code due to authentication issues: +1. Verify your `.env` file is in the same directory as your `matrix_operations.py` file. +2. Check that the API key in your `.env` file is correct and properly formatted. +3. Alternatively, you can set the API key directly in your terminal with: + ```bash + export RUNPOD_API_KEY=[YOUR_API_KEY] + ``` +4. For Windows users: + ```cmd + set RUNPOD_API_KEY=[YOUR_API_KEY] + ``` + +::: + +## Step 9: Understand what's happening + +When you run this script: + +1. Tetra reads your GPU resource configuration and provisions a worker on RunPod. +2. It installs the required dependencies (NumPy and PyTorch) on the worker. +3. Your `tetra_matrix_operations` function runs on the remote worker. +4. The function creates and multiplies large matrices, then calculates statistics. +5. Your local `main` function receives these results and displays them in your terminal. + +## Step 10: Run multiple operations in parallel + +Now let's see how easy it is to run multiple remote operations in parallel using Tetra. + +First, replace your `main` function with this code: + +```python +async def main(): + # Run multiple matrix operations in parallel + print("Starting large matrix operations on GPU...") + + # Run all matrix operations in parallel + results = await asyncio.gather( + tetra_matrix_operations(500), + tetra_matrix_operations(1000), + tetra_matrix_operations(2000) + ) + + print("\nMatrix operations results:") + # Print the results for each matrix size + for result in results: + print(f"\nMatrix size: {result['matrix_size']}x{result['matrix_size']}") + print(f"Result shape: {result['result_shape']}") + print(f"Result mean: {result['result_mean']:.4f}") + print(f"Result standard deviation: {result['result_std']:.4f}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +This new `main` function demonstrates Tetra's ability to run multiple operations in parallel using `asyncio.gather()`. Instead of running one matrix operation at a time, we're now launching three operations with different matrix sizes (500, 1000, and 2000) simultaneously. This parallel execution significantly improves efficiency when you have multiple independent tasks that can run concurrently, making better use of available GPU resources. + +Try running the example again: + +```bash +python matrix_operations.py +``` + +You should now see results for all three matrix sizes after the operations have completed: + +```bash +Initial job status: IN_QUEUE +Initial job status: IN_QUEUE +Initial job status: IN_QUEUE +Job completed, output received +Job completed, output received +Job completed, output received + +Matrix size: 500x500 +Result shape: (500, 500) +Result mean: 125.3097 +Result standard deviation: 5.0425 + +Matrix size: 1000x1000 +Result shape: (1000, 1000) +Result mean: 249.9442 +Result standard deviation: 7.1072 + +Matrix size: 2000x2000 +Result shape: (2000, 2000) +Result mean: 500.1321 +Result standard deviation: 9.8879 +``` + +## Next steps + +Nicely done, you've successfuly used Tetra to seamlessly run a GPU workload using RunPod resources! + +Now that you've learned the basics of Tetra, you can: + +- Create a workflow that chains functions together, passing data between them. +- Explore more advanced PyTorch operations on the GPU. +- Try different resource configurations to optimize performance. diff --git a/docs/tetra/overview.md b/docs/tetra/overview.md new file mode 100644 index 0000000..149ac03 --- /dev/null +++ b/docs/tetra/overview.md @@ -0,0 +1,184 @@ +--- +title: Overview +description: "Tetra is a Python SDK that streamlines the development and deployment of AI workflows on RunPod's Serverless infrastructure." +sidebar_position: 1 +--- + +# Tetra overview + +Tetra is a Python SDK that streamlines the development and deployment of AI workflows on RunPod's [Serverless](/serverless/overview) infrastructure. It provides an abstraction layer that lets you define, execute, and monitor sophisticated AI pipelines through a declarative interface, eliminating infrastructure overhead. + +## Get started + +Learn how to code Tetra workflows in serial and parallel by following this [step-by-step tutorial](/tetra/get-started). + +You can also start by cloning the Tetra repository and running the examples inside: + +``` +git clone https://github.com/runpod/tetra-rp.git +``` + +## Why use Tetra? + +Tetra provides several advantages over vanilla Serverless: + +- **Simplified workflow development**: Define AI pipelines in pure Python with minimal configuration, focusing on your logic rather than infrastructure details. +- **Optimized resource utilization**: Specify hardware requirements at the function level for precise control over GPU and CPU allocation. +- **Seamless deployment**: Tetra automatically handles RunPod Serverless infrastructure setup, worker communication, and data transfer. +- **Reduced development overhead**: Skip the tedious process of writing application code, building Docker containers, and managing endpoints for each worker. +- **Intuitive programming model**: Use Python decorators to mark functions for remote execution. + +## Key concepts + +### Resource configuration + +Tetra lets you specify hardware requirements at the function level through the `ServerlessResource` object. This provides granular control over GPU/CPU allocation and worker scaling.limits. + +For example: + +```python +from tetra import ServerlessResource + +# Configure a GPU endpoint +gpu_config = ServerlessResource( + gpuIds="any", # Use any available GPU + workersMax=5, # Scales up to 5 workers + name="parallel-processor", # Name of the endpoint that will be created or used +) + +# Configure a CPU endpoint +cpu_resource = ServerlessResource( + workersMax=1 , + name="data-processor", # Name of the endpoint that will be created or used +) +``` + +See [Configuration parameters](#configuration-parameters) for a complete list of available settings. + +### Remote functions + +Remote functions are the building blocks of Tetra workflows. Simply mark any Python function with the `@remote` decorator to designate it for execution on RunPod's infrastructure: + +```python +from tetra import remote + +@remote( + resource_config=gpu_config, # Uses a ServerlessResource object to set up an endpoint +) +def process_image(image_data): + + # Code you add here will be run remotely using RunPod Serverless + + return results +``` + +### Remote/local data transfer + +Tetra makes it easy to pass data between your local environment and RunPod's infrastructure. The remote function can accept any serializable Python objects as input and return them as output: + +```python +async def main(): + # Code you add here will be run locally + + image = ... # Upload an image from your local machine + + print("Processing image...") + result = await process_image(image) # Process image remotely + +if __name__ == "__main__": + asyncio.run(main()) +``` + +### Dependency management + +You can specify required Python dependencies for remote workers at the function level from within the `@remote` decorator, and Tetra ensures they will be installed in your execution environment: + +```python +@remote( + resource_config=gpu_resource, + dependencies=["torch==2.0.1", "transformers", "pillow"] +) +def model_inference(data): + # Libraries are automatically installed + from transformers import AutoModel + import torch + from PIL import Image + # ... +``` + +Make sure to include `import` statements *inside* any remote functions that require them. + +### Asynchronous execution + +Tetra workflows run asynchronously, making it easy to manage complex pipelines and run parallel processes: + +```python +@remote(...) +def preprocess_data(raw_data): + ... + +@remote(...) +def model_inference(preprocessed): + ... + +@remote(...) +def process_chunk(data): + ... + +async def main(): + # Run remote functions in sequence + preprocessed = await preprocess_data(raw_data) + result = await model_inference(preprocessed) + + # Or run them in parallel + results = await asyncio.gather( + process_chunk(data1), + process_chunk(data2), + process_chunk(data3) + ) +``` + +## How Tetra works + +When you execute a Tetra workflow: + +1. The `@remote` decorator identifies functions designated for remote execution. +2. Tetra analyzes the dependencies between functions to determine execution order. +3. For each remote function: + - Tetra provisions the appropriate endpoint and worker resources on RunPod. + - Input data is serialized and transferred to the remote worker. + - The function executes on the remote infrastructure. + - Results are returned to your local environment. +4. Data flows between functions as defined by your local code. + +## Common use cases + +- **Multi-modal AI pipelines**: Combine text, image, and audio models in unified workflows. +- **Distributed model training**: Scale model training across multiple GPU workers. +- **AI research experimentation**: Quickly prototype and test complex model combinations. +- **Production inference systems**: Deploy sophisticated, multi-stage inference pipelines. +- **Data processing workflows**: Process large datasets using distributed resources. + +## Configuration parameters + +| Parameter | Description | Default | Example values | +|-----------|-------------|---------|---------------| +| `name` | (Required) Name for your endpoint | "" | `"stable-diffusion-server"` | +| `gpuIds` | Type of GPU to request | `"any"` | `"any"` or a list of comma-separated [GPU IDs](https://docs.runpod.io/references/gpu-types) | +| `gpuCount` | Number of GPUs per worker | 1 | 1, 2, 4 | +| `workersMin` | Minimum number of workers | 0 | Set to 1 for persistence | +| `workersMax` | Maximum number of workers | 3 | Higher for more concurrency | +| `idleTimeout` | Minutes before scaling down | 5 | 10, 30, 60 | +| `env` | Environment variables | None | `{"HF_TOKEN": "xyz"}` | +| `networkVolumeId` | Persistent storage ID | None | `"vol_abc123"` | +| `executionTimeoutMs` | Max execution time (milliseconds) | 0 (no limit) | 600000 (10 min) | +| `scalerType` | Scaling strategy | `QUEUE_DELAY` | `NONE`, `QUEUE_SIZE` | +| `scalerValue` | Scaling parameter value | 4 | 1-10 range typical | +| `locations` | Preferred data center locations | None | `"us-east,eu-central"` | + +## Next steps + +Ready to streamline your AI workflow development with Tetra? + +- [Build your first Tetra workflow using this step-by-step tutorial.](/tetra/get-started) +- [Clone the tetra-rp repository and test the files in the `/examples` folder.](https://github.com/runpod/tetra-rp) \ No newline at end of file diff --git a/sidebars.js b/sidebars.js index 9e3e5fc..54408d8 100644 --- a/sidebars.js +++ b/sidebars.js @@ -43,6 +43,17 @@ module.exports = { }, ], }, + { + type: "category", + label: "Tetra", + collapsible: false, + items: [ + { + type: "autogenerated", + dirName: "tetra", + }, + ], + }, { type: "category", label: "RunPod CLI", @@ -54,6 +65,7 @@ module.exports = { }, ], }, + { type: "category", label: "Instant Clusters",