arangodb · nerpaula · Aug 5, 2025 · Aug 8, 2025 · Aug 11, 2025 · Aug 21, 2025
diff --git a/site/content/ai-suite/graphrag/private-llm-triton-tutorial.md b/site/content/ai-suite/graphrag/private-llm-triton-tutorial.md
@@ -0,0 +1,198 @@
+---
+title: How to use GraphRAG with a Private LLM
+menuTitle: Private LLM Tutorial
+weight: 30
+description: >-
+ Learn how to create, configure, and run a full GraphRAG workflow with
+ using a private Large Language Model (LLM) and Triton Inference Server
+---
+{{< tip >}}
+The Arango AI Data Platform is available as a pre-release. To get
+exclusive early access, [get in touch](https://arango.ai/ai-preview/) with
+the Arango team.
+{{< /tip >}}
+
+## Prerequisite: Get an LLM to host
+
+If you already have an LLM, you can skip this step. If you are new to LLMs
+(Large Language Models), this section explains how to get and prepare an
+open-source LLM.
+
+This tutorial downloads an open-source model from Hugging Face, but you can
+use any other model provider.
+
+### Install the Hugging Face CLI
+
+Follow the official [Hugging Face guide](https://huggingface.co/docs/huggingface_hub/en/guides/cli)
+to install the CLI.
+
+You should now be able to run the `hf --help` command.
+
+### Download a model
+
+Pick the model you want to use. For demonstration purposes, this tutorial is
+using a [Nemotron model](https://huggingface.co/nvidia/OpenReasoning-Nemotron-7B).
+
+You can download it with the following command:
+
+```sh
+hf download nvidia/OpenReasoning-Nemotron-7B`
+```
+
+Refer to the Hugging Face documentation for more details.
+
+{{< info >}}
+ArangoDB explicitly provides no further guarantees or guidance on the chosen LLM.
+ArangoDB's goal is to work with any LLM available in the market.
+{{< /info >}}
+
+### Export model as ONNX
+
+ONNX is an open standard hat defines a common set of operators and a file format
+to represent deep learning models in different frameworks. The Optimum library
+exports a model to ONNX with configuration objects which are supported for many
+architectures and can be easily extended.
+
+Follow the [Hugging Face guideline](https://huggingface.co/docs/transformers/serialization)
+to export the model as ONNX format via Optimum.
+
+After installing Optimum, run the following command:
+
+```sh
+optimum-cli export onnx --model nvidia/OpenReasoning-Nemotron-7B MyModel
+```
+
+{{< tip >}}
+Replace `MyModel` with a name of your choice for your model.
+{{< /tip >}}
+
+This exports the model into ONNX format, which is currently required.
+
+## Prepare the necessary files
+
+You need two files for the model to work:
+- Triton configuration file: `config.pbtxt`
+- Python backend file: `model.py`
+
+{{< info >}}
+Currently, it is only supported the Python backend of Triton with the rest of GenAI services.
+Other operating modes will be added in future versions.
+{{< /info >}}
+
+### Triton configuration file
+
+To ensure compatibility with the Triton service, you need the following configuration
+file `config.pbtxt`, which must be placed next to your Models folder:
+
+```py
+name: "MyModel" # Set the name to the you chose previously
+backend: "python"
+input [
+  {
+    name: "INPUT0"
+    data_type: TYPE_STRING
+    dims: [-1]
+  }
+]
+output [
+  {
+    name: "OUTPUT0"
+    data_type: TYPE_STRING
+    dims: [-1]
+  }
+]
+instance_group [
+  {
+    count: 1
+    kind: KIND_GPU
+  }
+]
+```
+
+This configuration defines the display name of the Model, specifies the use of
+the Python backend, and sets input and output as string tokens for text generation.
+It also configures the model to use 1 GPU on the Triton server.
+
+### Triton Python backend
+
+Next, you need to implement Python code for the backend to handle the text
+tokenization within the Triton server.
+
+Therefore, place a file named `model.py` in your model folder with the following content:
+
+```py
+import numpy as np
+import json
+import triton_python_backend_utils as pb_utils
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+
+class TritonPythonModel:
+    def initialize(self, args):
+        model_path = args['model_repository'] + "/" + args['model_version'] + "/"
+        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
+        self.model = AutoModelForCausalLM.from_pretrained(model_path)
+        self.pipe = pipeline("text-generation", model=self.model, tokenizer=self.tokenizer, batch_size=1)
+
+    def execute(self, requests):
+        responses = []
+        for request in requests:
+            in_0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
+            input_text = in_0.as_numpy()[0].decode('utf-8')
+
+            try:
+                input_data = json.loads(input_text)
+            except json.JSONDecodeError:
+                input_data = eval(input_text)
+
+            prompt = self.tokenizer.apply_chat_template(input_data, tokenize=False, add_generation_prompt=True)
+            output = self.pipe(prompt, max_new_tokens=1024, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
+            generated_text = output[0]['generated_text'][len(prompt):].strip()
+
+            out_tensor = pb_utils.Tensor("OUTPUT0", np.array([generated_text], dtype=object))
+            responses.append(pb_utils.InferenceResponse(output_tensors=[out_tensor]))
+        return responses
+```
+
+The above code is generic and should work for most CausalLM Models. Check Hugging
+Face Transformers to see if your model supports `AutoModelForCausalLM`.
+If not, you need to adjust this file. You may also need to adjust it if you want
+to fine-tune the configuration of your model. This tutorial prioritizes a
+plug-and-play workflow over fine-tuning for maximum performance, aiming to work
+in most common scenarios.
+
+### Model directory structure
+
+After preparing these files, your directory structure should look similar to this:
+
+```
+.
+├── config.pbtxt
+└── MyModel
+    ├── added_tokens.json
+    ├── chat_template.jinja
+    ├── config.json
+    ├── generation_config.json
+    ├── merges.txt
+    ├── model.onnx
+    ├── model.onnx_data
+    ├── model.py
+    ├── special_tokens_map.json
+    ├── tokenizer_config.json
+    ├── tokenizer.json
+    └── vocab.json
+```
+
+Now you are ready to upload the model.
+
+### Upload the Model to MLflow
+
+First, you need to install the CLI.
+
+```sh
+pip install mlflow==2.22.1
+```
+
+{{< warning >}}
+MLflow version 3 introduces a breaking change that affects this workflow, so it is
+important to use MLflow version 2.
+{{< /warning >}}
diff --git a/site/content/ai-suite/reference/mlflow.md b/site/content/ai-suite/reference/mlflow.md
@@ -112,7 +112,7 @@ There are two approaches for programmatic access to your ArangoDB MLflow service
 
 ### Configuration in Python
 
-```python
+```py
 import mlflow
 import os
 
@@ -136,7 +136,7 @@ export MLFLOW_TRACKING_TOKEN="your-bearer-token-here"
 
 Then use MLflow normally in your Python code:
 
-```python
+```py
 import mlflow
 
 # MLflow automatically uses the environment variables

diff --git a/site/content/ai-suite/reference/triton-inference-server.md b/site/content/ai-suite/reference/triton-inference-server.md
@@ -95,7 +95,7 @@ Triton service. Each model requires the following two files:
 1. **`model.py`**
    Implements the Python backend model. Triton uses this file to load and 
    execute your model for inference.
-   ```python
+   ```py
    class TritonPythonModel:
        def initialize(self, args):
            # Load your model here

diff --git a/site/content/arangodb/3.11/develop/drivers/python.md b/site/content/arangodb/3.11/develop/drivers/python.md
@@ -27,7 +27,7 @@ pip install python-arango --upgrade
 
 You can then import the library in your project as follows:
 
-```python
+```py
 from arango import ArangoClient
 ```
 
@@ -37,7 +37,7 @@ The following example shows how to use the driver from connecting to ArangoDB,
 over creating databases, collections, indexes, and documents, to retrieving
 data using queries:
 
-```python
+```py
 from arango import ArangoClient
 
 # Initialize the client for ArangoDB.
@@ -71,7 +71,7 @@ student_names = [document["name"] for document in cursor]
 The following example shows how to create a [named graph](../../graphs/_index.md),
 populate it with vertices and edges, and query it with a graph traversal:
 
-```python
+```py
 from arango import ArangoClient
 
 # Initialize the client for ArangoDB.
@@ -134,7 +134,7 @@ To connect to a database, create an instance of `ArangoClient` which provides a
 connection to the database server. Then call its `db` method and pass the
 database name, user name, and password as parameters.
 
-```python
+```py
 from arango import ArangoClient
 
 # Initialize a client
@@ -149,7 +149,7 @@ sys_db = client.db("_system", username="root", password="qwerty")
 To retrieve a list of all databases on an ArangoDB server, connect to the
 `_system` database and call the `databases()` method.
 
-```python
+```py
 # Retrieve the names of all databases on the server as list of strings
 db_list = sys_db.databases()
 ```
@@ -159,7 +159,7 @@ db_list = sys_db.databases()
 To create a new database, connect to the `_system` database and call
 `create_database()`.
 
-```python
+```py
 # Create a new database named "test".
 sys_db.create_database("test")
 
@@ -174,7 +174,7 @@ To delete an existing database, connect to the `_system` database and call
 parameter. The `_system` database cannot be deleted. Make sure to specify
 the correct database name when you are deleting databases.
 
-```python
+```py
 # Delete the 'test' database
 sys_db.delete_database("test")
 ```
@@ -186,7 +186,7 @@ sys_db.delete_database("test")
 To retrieve a list of collections in a database, connect to the database and
 call `collections()`.
 
-```python
+```py
 # Connect to the database
 db = client.db(db_name, username=user_name, password=pass_word)
 
@@ -198,7 +198,7 @@ collection_list = db.collections()
 
 To create a new collection, connect to the database and call `create_collection()`.
 
-```python
+```py
 # Create a new collection for doctors
 doctors_col = db.create_collection(name="doctors")
 
@@ -212,7 +212,7 @@ To delete a collection, connect to the database and call `delete_collection()`,
 passing the name of the collection to be deleted as a parameter. Make sure to
 specify the correct collection name when you delete collections.
 
-```python
+```py
 # Delete the 'doctors' collection
 db.delete_collection(name="doctors")
 ```
@@ -225,7 +225,7 @@ To create a new document, get a reference to the collection and call its
 `insert()` method, passing the object/document to be created in ArangoDB as
 a parameter.
 
-```python
+```py
 # Get a reference to the 'patients' collection
 patients_col = db.collection(name="patients")
 
@@ -252,7 +252,7 @@ To patch or partially update a document, call the `update()` method of the
 collection and pass the object/document as a parameter. The document must have
 a property named `_key` holding the unique key assigned to the document.
 
-```python
+```py
 # Patch John's patient record by adding a city property to the document
 patients_col.update({ "_key": "741603", "city": "Cleveland" })
 ```
@@ -280,7 +280,7 @@ collection and pass the object/document that fully replaces thee existing
 document as a parameter. The document must have a property named `_key` holding
 the unique key assigned to the document.
 
-```python
+```py
 # Replace John's document
 patients_col.replace({ "_key": "741603", "fullname": "John Doe", "age": 18, "city": "Cleveland" })
 ```
@@ -306,7 +306,7 @@ not specified in the request when the document was fully replaced.
 To delete a document, call the `delete()` method of the collection and pass an
 document containing at least the `_key` attribute as a parameter.
 
-```python
+```py
 # Delete John's document
 patients_col.delete({ "_key": "741603" })
 ```
@@ -319,7 +319,7 @@ To run a query, connect to the desired database and call `aql.execute()`.
 This returns a cursor, which lets you fetch the results in batches. You can
 iterate over the cursor to automatically fetch the data.
 
-```python
+```py
 # Run a query
 cursor = db.aql.execute('FOR i IN 1..@value RETURN i', bind_vars={'value': 3})