-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[bug]: segmentation fault on AMD GPU #2894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I had the same issue with nvidia actually. Segmentation errors left and right. Manjaro arch Linux. Installing using the auto installer. Found out I had to delete whole installation and reinstall using python. Sucks but actually worked. Not sure how or why. Working now but now I have a different issue with blank black images. |
@Lolagatorade I believe I got to the bottom of the black images problem earlier today and have posted a fix which will appear in 2.3.2 (coming soon). @src-r-r I feel your pain. ROCm support is very spotty and I've had numerous difficulties with AMD GPUs. Generally the problem is with the |
There has been no activity in this issue for 14 days. If this issue is still being experienced, please reply with an updated confirmation that the issue is still being experienced with the latest release. |
I have the same issue I also noticed this error message in
|
I managed to make it work by making sure i install nightly build of
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.4.2 The i install invokeai from the repo with |
It is also segfaulting for me on Arch Linux using an AMD Radeon 5500XT. I tried using the auto installer as well as installing it manually with python (both with python 3.9.57 and 3.10.12), but no difference. Unfortunately @muhamadazmy suggestion above also did not work for me. Each try, this was the log output from the time starting
|
What is the reason this issue has been closed @hipsterusername? |
We’ve released the 3.0 alpha and it’s a general reset on any issues experienced with the app. If you experience the same segfaulting, I’d advise creating a new issue |
Someone has updates on this? Still have the problem with 3.1.1 |
@TheKarls As far as I know this is still somewhat unsolved. Personally I prevented InvokeAI from segfaulting by setting the environment variable Also, what hardware configuration and OS are you using? That will be helpful for the InvokeAI team to work on this. |
I'm on Arch Linux, with a 6700XT GPU |
After some puzzling I got it to work on my 7900XTX on Arch with kernel 6.5.6 and 7837746 (latest main when I checked out, though tag 3.2.0 will probably work too), using a heavily modified Docker setup:
# syntax=docker/dockerfile:1.4
# Build the Web UI
FROM node:18 AS web-builder
WORKDIR /build
COPY invokeai/frontend/web/ ./
RUN --mount=type=cache,target=/usr/lib/node_modules \
npm install --include dev
RUN --mount=type=cache,target=/usr/lib/node_modules \
yarn vite build
# InvokeAI runtime for AMD cards
FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 AS runtime
ARG DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
RUN apt update && apt install -y --no-install-recommends \
git \
curl \
vim \
tmux \
ncdu \
iotop \
bzip2 \
gosu \
libglib2.0-0 \
libgl1-mesa-glx \
python3-pip \
build-essential \
libopencv-dev \
libstdc++-10-dev && \
apt-get clean && apt-get autoclean && \
pip install --upgrade pip
ENV INVOKEAI_SRC=/opt/invokeai
ENV INVOKEAI_ROOT=/invokeai
ENV PATH="$INVOKEAI_SRC:$PATH"
WORKDIR ${INVOKEAI_SRC}
COPY invokeai ./invokeai
COPY pyproject.toml ./
RUN --mount=type=cache,target=/root/.cache/pip pip install .[onnx-cuda]
COPY --link --from=web-builder /build/dist ${INVOKEAI_SRC}/invokeai/frontend/web/dist
# build patchmatch
RUN cd /usr/lib/$(uname -p)-linux-gnu/pkgconfig/ && ln -sf opencv4.pc opencv.pc
RUN python3 -c "from patchmatch import patch_match"
# Create unprivileged user and make the local dir
RUN userdel $(getent passwd 1000 | cut -d: -f1) && useradd --create-home --shell /bin/bash -u 1000 -G video --comment "container local user" invoke
RUN mkdir -p ${INVOKEAI_ROOT} && chown -R invoke:invoke ${INVOKEAI_ROOT}
COPY docker/docker-entrypoint.sh ./
ENTRYPOINT ["/opt/invokeai/docker-entrypoint.sh"]
CMD ["invokeai-web", "--host", "0.0.0.0"]
version: '3.8'
services:
invokeai:
build:
context: ..
dockerfile: docker/Dockerfile
environment:
HSA_OVERRIDE_GFX_VERSION: 11.0.0
devices:
- /dev/dri:/dev/dri
- /dev/kfd:/dev/kfd
ports:
- 9090:9090/tcp
volumes:
- ./data:/invokeai
command: ["invokeai-web", "--host", "0.0.0.0"] Biggest difference is installing in the global env (not a venv) in the AMD ROCm Docker image. I suppose those have some extra customizations that fixed most of my issues. Anyway, sharing here in the hope it'll prove useful for someone else. |
It works! Thank you so much! I really can't thank you enough |
./invoke.sh: line 37: 26519 Segmentation fault (core dumped) invokeai-web $PARAMS |
Possibly it is ROCm + unsupported GPU. I also has segfaults on a couple of ROCm 5.6 + gfx803 (RTX 570). Also people say such models anyway give no real advantages over CPU generation. For some models nevertheless there are custom kits users can try. |
I figured out there is no ROCm support for the 5700xt unfortunatly. |
Is there an existing issue for this?
OS
Linux
GPU
amd
VRAM
8GB
What happened?
Program installs and starts just fine, but when I hit the "Invoke" button, I immediately get a segmentation fault.
I'm using v2.3.1-post-2
I've tried other models and get the same result. I do not have this issue with invokeai v1.3
I've been playing around with other AI libraries lately and been encountering segfaults, too. Still haven't figured out why in most cases.
Here's the output from rocminfo:
I also have
HSA_OVERRIDE_GFX_VERSION=10.0.3
. If I unsetHSA_OVERRIDE_GFX_VERSION
and start theinvokeai
web I get the error"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
How would I determine where this segfault is coming from?
Screenshots
No response
Additional context
No response
Contact Details
No response
The text was updated successfully, but these errors were encountered: