Add Neuron backend #3018

dacorvo · 2025-02-12T11:10:38Z

What does this PR do?

This adds the neuron backend that was previously maintained in the optimum-neuron repository.

This backend is built on top of the AWS Neuron SDK, and comprises:

the legacy v2 TGI launcher and router,
a neuron specific inference server for text-generation.

Documentation

A dedicated documentation page has been added in the backends subsection.

Tests

The backend comes with some dedicated tests:

neuron server tests (using only the server python package),
integration tests (using docker images).

I did not add continuous integration yet: waiting for advice on how to do it best.

Next steps

use python 3.11 and align on the main Dockerfile,
add a custom launcher that only exposes the relevant parameters and sets default values,
add a new router for servers that have a static batch size.

HuggingFaceDocBuilderDev · 2025-02-12T11:12:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

drbh · 2025-02-17T15:01:55Z

backends/neuron/Makefile

+test_server: install_server
+	python -m pip install -r ${mkfile_dir}/tests/requirements.txt
+	python -m pytest -sv ${mkfile_dir}/tests/server
+
+test_integration: image
+	python -m pip install -r ${mkfile_dir}/tests/requirements.txt
+	python -m pytest -sv ${mkfile_dir}/tests/integration


we may want to use uv in the future to align with the cuda backend.

this can probably be addressed later (just adding a note for reference)

drbh · 2025-02-17T15:09:02Z

once we have the runner ready, I think we'll need to this to build the Dockerfile_neuron file in CI via a change similar to:

diff --git a/.github/workflows/build.yaml b/.github/workflows/build.yaml
index 720a13cb..ae68c4c2 100644
--- a/.github/workflows/build.yaml
+++ b/.github/workflows/build.yaml
@@ -114,6 +114,16 @@ jobs:
                 export extra_pytest="-k test_flash_gemma_simple"
                 export target=""
                 ;;
+            neuron)
+                export dockerfile="Dockerfile_neuron"
+                export label_extension="-neuron"
+                export docker_devices="none"
+                export docker_volume="/mnt/cache"
+                # export runs_on="aws-neuron-priv"
+                export platform="cpu"
+                export extra_pytest="-k test_flash_llama_simple"
+                export target=""
+                ;;
           esac
           echo $dockerfile
           echo "Dockerfile=${dockerfile}"

The base image used to compile the rust components seems to have a low ulimit for opened files, which leads to errors during compilation.

Narsil

I added you as writer to the repo so you can now push on the main repo and have a working CI.

Narsil · 2025-02-18T09:41:03Z

Dockerfile.neuron

+    && rm -rf /var/lib/apt/lists/* \
+    && apt-get clean
+
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --default-toolchain 1.80.1 --profile minimal -y


NIT: Maybe we don't download and pipe into a shell a random script (I know it's done elsewhere so not really blocking, I'm just thinking about security and reproduceability here).

I will remove this in a upcoming pull-request to use python 3.11 and align Dockerfiles.

Narsil · 2025-02-18T09:41:45Z

Dockerfile.neuron

+
+WORKDIR /usr/src
+
+ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse


That shoudn't be needed anymore.

I kept it because it is still in the main Dockerfile. Will remove it in a subsequent pull-request that aligns Dockerfiles.

Narsil · 2025-02-18T09:44:01Z

Dockerfile.neuron

+COPY backends backends
+COPY launcher launcher
+# Remove this line once TGI has fixed the conflict
+RUN cargo update ureq --precise 2.9.7


Is there a PR for the fix ?

This is obsolete: I removed it

Narsil

Lots of nits.

If the docker build works, maybe we should add it to the CI, no ? (So we have released versions and so on)

Narsil · 2025-02-18T09:47:25Z

docs/source/backends/neuron.md

+       -v $(pwd)/data:/data \
+       --privileged \
+       -e HF_TOKEN=${HF_TOKEN} \
+       ghcr.io/huggingface/text-generation-inference:latest-neuron \


We never show latest. Latest is causing way too many confusion.

Narsil · 2025-02-18T09:47:37Z

docs/source/backends/neuron.md

+       -v $(pwd)/data:/data \
+       --device=/dev/neuron0 \
+       -e HF_TOKEN=${HF_TOKEN} \
+       ghcr.io/huggingface/text-generation-inference:latest-neuron \


Narsil · 2025-02-18T09:47:59Z

docs/source/backends/neuron.md

+```
+docker run -p 8080:80 \
+       -v $(pwd)/data:/data \
+       --privileged \


Is --privileged really required ? Looks scary.

Updated the documentation

Narsil · 2025-02-18T09:48:54Z

docs/source/backends/neuron.md

+       -e HF_TOKEN=${HF_TOKEN} \
+       -e HF_AUTO_CAST_TYPE="fp16" \
+       -e HF_NUM_CORES=2 \
+       ghcr.io/huggingface/text-generation-inference:latest-neuron:latest \


Narsil · 2025-02-18T09:50:06Z

backends/neuron/Cargo.toml

+  "backends/grpc-metadata",
+  "launcher",
+  "router"
+]


Should probably be cleaned up at some point, this is now part of the root workspace, not a workspace itself.

I would need to take a deeper look in an upcoming pull-request.

dacorvo · 2025-02-18T14:40:25Z

Closing this as I addressed the review comments in #3033

dacorvo added 3 commits February 12, 2025 11:08

feat: add neuron backend

9c25afb

feat(neuron): add server standalone installation

337329f

feat(neuron): add server and integration tests

856d768

drbh reviewed Feb 17, 2025

View reviewed changes

drbh mentioned this pull request Feb 17, 2025

Pr 3018 ci branch #3028

Closed

fix(neuron): increase ulimit when building image

9501956

The base image used to compile the rust components seems to have a low ulimit for opened files, which leads to errors during compilation.

Narsil reviewed Feb 18, 2025

View reviewed changes

Narsil approved these changes Feb 18, 2025

View reviewed changes

dacorvo mentioned this pull request Feb 18, 2025

Add Neuron backend #3033

Open

dacorvo closed this Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Neuron backend #3018

Add Neuron backend #3018

dacorvo commented Feb 12, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 12, 2025

drbh Feb 17, 2025

drbh commented Feb 17, 2025

Narsil left a comment

Narsil Feb 18, 2025 •

edited

Loading

dacorvo Feb 18, 2025

Narsil Feb 18, 2025

dacorvo Feb 18, 2025

Narsil Feb 18, 2025

dacorvo Feb 18, 2025

Narsil left a comment

Narsil Feb 18, 2025

Narsil Feb 18, 2025

Narsil Feb 18, 2025

dacorvo Feb 18, 2025

Narsil Feb 18, 2025

Narsil Feb 18, 2025

dacorvo Feb 18, 2025

dacorvo commented Feb 18, 2025


		WORKDIR /usr/src

		ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse

Add Neuron backend #3018

Add Neuron backend #3018

Conversation

dacorvo commented Feb 12, 2025 • edited Loading

What does this PR do?

Documentation

Tests

Next steps

HuggingFaceDocBuilderDev commented Feb 12, 2025

Choose a reason for hiding this comment

drbh commented Feb 17, 2025

Narsil left a comment

Choose a reason for hiding this comment

Narsil Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dacorvo commented Feb 18, 2025

dacorvo commented Feb 12, 2025 •

edited

Loading

Narsil Feb 18, 2025 •

edited

Loading