-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Neuron backend #3018
Add Neuron backend #3018
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
test_server: install_server | ||
python -m pip install -r ${mkfile_dir}/tests/requirements.txt | ||
python -m pytest -sv ${mkfile_dir}/tests/server | ||
|
||
test_integration: image | ||
python -m pip install -r ${mkfile_dir}/tests/requirements.txt | ||
python -m pytest -sv ${mkfile_dir}/tests/integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to use uv
in the future to align with the cuda backend.
this can probably be addressed later (just adding a note for reference)
once we have the runner ready, I think we'll need to this to build the diff --git a/.github/workflows/build.yaml b/.github/workflows/build.yaml
index 720a13cb..ae68c4c2 100644
--- a/.github/workflows/build.yaml
+++ b/.github/workflows/build.yaml
@@ -114,6 +114,16 @@ jobs:
export extra_pytest="-k test_flash_gemma_simple"
export target=""
;;
+ neuron)
+ export dockerfile="Dockerfile_neuron"
+ export label_extension="-neuron"
+ export docker_devices="none"
+ export docker_volume="/mnt/cache"
+ # export runs_on="aws-neuron-priv"
+ export platform="cpu"
+ export extra_pytest="-k test_flash_llama_simple"
+ export target=""
+ ;;
esac
echo $dockerfile
echo "Dockerfile=${dockerfile}"
|
The base image used to compile the rust components seems to have a low ulimit for opened files, which leads to errors during compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added you as writer to the repo so you can now push on the main repo and have a working CI.
&& rm -rf /var/lib/apt/lists/* \ | ||
&& apt-get clean | ||
|
||
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --default-toolchain 1.80.1 --profile minimal -y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Maybe we don't download and pipe into a shell a random script (I know it's done elsewhere so not really blocking, I'm just thinking about security and reproduceability here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this in a upcoming pull-request to use python 3.11 and align Dockerfiles.
|
||
WORKDIR /usr/src | ||
|
||
ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shoudn't be needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it because it is still in the main Dockerfile. Will remove it in a subsequent pull-request that aligns Dockerfiles.
COPY backends backends | ||
COPY launcher launcher | ||
# Remove this line once TGI has fixed the conflict | ||
RUN cargo update ureq --precise 2.9.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a PR for the fix ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is obsolete: I removed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of nits.
If the docker build works, maybe we should add it to the CI, no ? (So we have released versions and so on)
-v $(pwd)/data:/data \ | ||
--privileged \ | ||
-e HF_TOKEN=${HF_TOKEN} \ | ||
ghcr.io/huggingface/text-generation-inference:latest-neuron \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never show latest
. Latest is causing way too many confusion.
-v $(pwd)/data:/data \ | ||
--device=/dev/neuron0 \ | ||
-e HF_TOKEN=${HF_TOKEN} \ | ||
ghcr.io/huggingface/text-generation-inference:latest-neuron \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
``` | ||
docker run -p 8080:80 \ | ||
-v $(pwd)/data:/data \ | ||
--privileged \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is --privileged
really required ? Looks scary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the documentation
-e HF_TOKEN=${HF_TOKEN} \ | ||
-e HF_AUTO_CAST_TYPE="fp16" \ | ||
-e HF_NUM_CORES=2 \ | ||
ghcr.io/huggingface/text-generation-inference:latest-neuron:latest \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No latest.
"backends/grpc-metadata", | ||
"launcher", | ||
"router" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be cleaned up at some point, this is now part of the root workspace, not a workspace itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to take a deeper look in an upcoming pull-request.
Closing this as I addressed the review comments in #3033 |
What does this PR do?
This adds the neuron backend that was previously maintained in the optimum-neuron repository.
This backend is built on top of the AWS Neuron SDK, and comprises:
Documentation
A dedicated documentation page has been added in the backends subsection.
Tests
The backend comes with some dedicated tests:
I did not add continuous integration yet: waiting for advice on how to do it best.
Next steps