Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions site/content/ai-suite/graphrag/private-llm-triton-tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
---
title: How to use GraphRAG with a Private LLM
menuTitle: Private LLM Tutorial
weight: 30
description: >-
Learn how to create, configure, and run a full GraphRAG workflow with
using a private Large Language Model (LLM) and Triton Inference Server
---
{{< tip >}}
The Arango AI Data Platform is available as a pre-release. To get
exclusive early access, [get in touch](https://arango.ai/ai-preview/) with
the Arango team.
{{< /tip >}}

## Prerequisite: Get an LLM to host

If you already have an LLM, you can skip this step. If you are new to LLMs
(Large Language Models), this section explains how to get and prepare an
open-source LLM.

This tutorial downloads an open-source model from Hugging Face, but you can
use any other model provider.

### Install the Hugging Face CLI

Follow the official [Hugging Face guide](https://huggingface.co/docs/huggingface_hub/en/guides/cli)
to install the CLI.

You should now be able to run the `hf --help` command.

### Download a model

Pick the model you want to use. For demonstration purposes, this tutorial is
using a [Nemotron model](https://huggingface.co/nvidia/OpenReasoning-Nemotron-7B).

You can download it with the following command:

```sh
hf download nvidia/OpenReasoning-Nemotron-7B`
```

Refer to the Hugging Face documentation for more details.

{{< info >}}
ArangoDB explicitly provides no further guarantees or guidance on the chosen LLM.
ArangoDB's goal is to work with any LLM available in the market.
{{< /info >}}

### Export model as ONNX

ONNX is an open standard hat defines a common set of operators and a file format
to represent deep learning models in different frameworks. The Optimum library
exports a model to ONNX with configuration objects which are supported for many
architectures and can be easily extended.

Follow the [Hugging Face guideline](https://huggingface.co/docs/transformers/serialization)
to export the model as ONNX format via Optimum.

After installing Optimum, run the following command:

```sh
optimum-cli export onnx --model nvidia/OpenReasoning-Nemotron-7B MyModel
```

{{< tip >}}
Replace `MyModel` with a name of your choice for your model.
{{< /tip >}}

This exports the model into ONNX format, which is currently required.

## Prepare the necessary files

You need two files for the model to work:
- Triton configuration file: `config.pbtxt`
- Python backend file: `model.py`

{{< info >}}
Currently, it is only supported the Python backend of Triton with the rest of GenAI services.
Other operating modes will be added in future versions.
{{< /info >}}

### Triton configuration file

To ensure compatibility with the Triton service, you need the following configuration
file `config.pbtxt`, which must be placed next to your Models folder:

```py
name: "MyModel" # Set the name to the you chose previously
backend: "python"
input [
{
name: "INPUT0"
data_type: TYPE_STRING
dims: [-1]
}
]
output [
{
name: "OUTPUT0"
data_type: TYPE_STRING
dims: [-1]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
}
]
```

This configuration defines the display name of the Model, specifies the use of
the Python backend, and sets input and output as string tokens for text generation.
It also configures the model to use 1 GPU on the Triton server.

### Triton Python backend

Next, you need to implement Python code for the backend to handle the text
tokenization within the Triton server.

Therefore, place a file named `model.py` in your model folder with the following content:

```py
import numpy as np
import json
import triton_python_backend_utils as pb_utils
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

class TritonPythonModel:
def initialize(self, args):
model_path = args['model_repository'] + "/" + args['model_version'] + "/"
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.model = AutoModelForCausalLM.from_pretrained(model_path)
self.pipe = pipeline("text-generation", model=self.model, tokenizer=self.tokenizer, batch_size=1)

def execute(self, requests):
responses = []
for request in requests:
in_0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
input_text = in_0.as_numpy()[0].decode('utf-8')

try:
input_data = json.loads(input_text)
except json.JSONDecodeError:
input_data = eval(input_text)

prompt = self.tokenizer.apply_chat_template(input_data, tokenize=False, add_generation_prompt=True)
output = self.pipe(prompt, max_new_tokens=1024, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
generated_text = output[0]['generated_text'][len(prompt):].strip()

out_tensor = pb_utils.Tensor("OUTPUT0", np.array([generated_text], dtype=object))
responses.append(pb_utils.InferenceResponse(output_tensors=[out_tensor]))
return responses
```

The above code is generic and should work for most CausalLM Models. Check Hugging
Face Transformers to see if your model supports `AutoModelForCausalLM`.
If not, you need to adjust this file. You may also need to adjust it if you want
to fine-tune the configuration of your model. This tutorial prioritizes a
plug-and-play workflow over fine-tuning for maximum performance, aiming to work
in most common scenarios.

### Model directory structure

After preparing these files, your directory structure should look similar to this:

```
.
├── config.pbtxt
└── MyModel
├── added_tokens.json
├── chat_template.jinja
├── config.json
├── generation_config.json
├── merges.txt
├── model.onnx
├── model.onnx_data
├── model.py
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── vocab.json
```

Now you are ready to upload the model.

### Upload the Model to MLflow

First, you need to install the CLI.

```sh
pip install mlflow==2.22.1
```

{{< warning >}}
MLflow version 3 introduces a breaking change that affects this workflow, so it is
important to use MLflow version 2.
{{< /warning >}}
4 changes: 2 additions & 2 deletions site/content/ai-suite/reference/mlflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ There are two approaches for programmatic access to your ArangoDB MLflow service

### Configuration in Python

```python
```py
import mlflow
import os

Expand All @@ -136,7 +136,7 @@ export MLFLOW_TRACKING_TOKEN="your-bearer-token-here"

Then use MLflow normally in your Python code:

```python
```py
import mlflow

# MLflow automatically uses the environment variables
Expand Down
2 changes: 1 addition & 1 deletion site/content/ai-suite/reference/triton-inference-server.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Triton service. Each model requires the following two files:
1. **`model.py`**
Implements the Python backend model. Triton uses this file to load and
execute your model for inference.
```python
```py
class TritonPythonModel:
def initialize(self, args):
# Load your model here
Expand Down
30 changes: 15 additions & 15 deletions site/content/arangodb/3.11/develop/drivers/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pip install python-arango --upgrade

You can then import the library in your project as follows:

```python
```py
from arango import ArangoClient
```

Expand All @@ -37,7 +37,7 @@ The following example shows how to use the driver from connecting to ArangoDB,
over creating databases, collections, indexes, and documents, to retrieving
data using queries:

```python
```py
from arango import ArangoClient

# Initialize the client for ArangoDB.
Expand Down Expand Up @@ -71,7 +71,7 @@ student_names = [document["name"] for document in cursor]
The following example shows how to create a [named graph](../../graphs/_index.md),
populate it with vertices and edges, and query it with a graph traversal:

```python
```py
from arango import ArangoClient

# Initialize the client for ArangoDB.
Expand Down Expand Up @@ -134,7 +134,7 @@ To connect to a database, create an instance of `ArangoClient` which provides a
connection to the database server. Then call its `db` method and pass the
database name, user name, and password as parameters.

```python
```py
from arango import ArangoClient

# Initialize a client
Expand All @@ -149,7 +149,7 @@ sys_db = client.db("_system", username="root", password="qwerty")
To retrieve a list of all databases on an ArangoDB server, connect to the
`_system` database and call the `databases()` method.

```python
```py
# Retrieve the names of all databases on the server as list of strings
db_list = sys_db.databases()
```
Expand All @@ -159,7 +159,7 @@ db_list = sys_db.databases()
To create a new database, connect to the `_system` database and call
`create_database()`.

```python
```py
# Create a new database named "test".
sys_db.create_database("test")

Expand All @@ -174,7 +174,7 @@ To delete an existing database, connect to the `_system` database and call
parameter. The `_system` database cannot be deleted. Make sure to specify
the correct database name when you are deleting databases.

```python
```py
# Delete the 'test' database
sys_db.delete_database("test")
```
Expand All @@ -186,7 +186,7 @@ sys_db.delete_database("test")
To retrieve a list of collections in a database, connect to the database and
call `collections()`.

```python
```py
# Connect to the database
db = client.db(db_name, username=user_name, password=pass_word)

Expand All @@ -198,7 +198,7 @@ collection_list = db.collections()

To create a new collection, connect to the database and call `create_collection()`.

```python
```py
# Create a new collection for doctors
doctors_col = db.create_collection(name="doctors")

Expand All @@ -212,7 +212,7 @@ To delete a collection, connect to the database and call `delete_collection()`,
passing the name of the collection to be deleted as a parameter. Make sure to
specify the correct collection name when you delete collections.

```python
```py
# Delete the 'doctors' collection
db.delete_collection(name="doctors")
```
Expand All @@ -225,7 +225,7 @@ To create a new document, get a reference to the collection and call its
`insert()` method, passing the object/document to be created in ArangoDB as
a parameter.

```python
```py
# Get a reference to the 'patients' collection
patients_col = db.collection(name="patients")

Expand All @@ -252,7 +252,7 @@ To patch or partially update a document, call the `update()` method of the
collection and pass the object/document as a parameter. The document must have
a property named `_key` holding the unique key assigned to the document.

```python
```py
# Patch John's patient record by adding a city property to the document
patients_col.update({ "_key": "741603", "city": "Cleveland" })
```
Expand Down Expand Up @@ -280,7 +280,7 @@ collection and pass the object/document that fully replaces thee existing
document as a parameter. The document must have a property named `_key` holding
the unique key assigned to the document.

```python
```py
# Replace John's document
patients_col.replace({ "_key": "741603", "fullname": "John Doe", "age": 18, "city": "Cleveland" })
```
Expand All @@ -306,7 +306,7 @@ not specified in the request when the document was fully replaced.
To delete a document, call the `delete()` method of the collection and pass an
document containing at least the `_key` attribute as a parameter.

```python
```py
# Delete John's document
patients_col.delete({ "_key": "741603" })
```
Expand All @@ -319,7 +319,7 @@ To run a query, connect to the desired database and call `aql.execute()`.
This returns a cursor, which lets you fetch the results in batches. You can
iterate over the cursor to automatically fetch the data.

```python
```py
# Run a query
cursor = db.aql.execute('FOR i IN 1..@value RETURN i', bind_vars={'value': 3})

Expand Down
Loading