[bug]: Segmentation fault on image generation start (AMD) #3967

redhelling21 · 2023-07-24T21:47:49Z

Is there an existing issue for this?

I have searched the existing issues

OS

Linux

GPU

amd

VRAM

8GB

What version did you experience this issue on?

3.0.0

What happened?

I tried to install via the automated installer and the manual installation. No matter what I try, when I click on the "Invoke" button on the web GUI, I get a segmentation fault :

$ invokeai --web
[2023-07-24 23:32:06,280]::[InvokeAI]::INFO --> Patchmatch initialized
/home/hellong/.venv/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be removed in 0.17. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
warnings.warn(
INFO: Started server process [18287]
INFO: Waiting for application startup.
[2023-07-24 23:32:06,661]::[InvokeAI]::INFO --> InvokeAI version 3.0.0
[2023-07-24 23:32:06,661]::[InvokeAI]::INFO --> Root directory = /home/hellong/invokeai
[2023-07-24 23:32:06,662]::[InvokeAI]::INFO --> GPU device = cuda AMD Radeon RX 6700 XT
[2023-07-24 23:32:06,664]::[InvokeAI]::INFO --> Scanning /home/hellong/invokeai/models for new models
[2023-07-24 23:32:06,857]::[InvokeAI]::INFO --> Scanned 5 files and directories, imported 0 models
[2023-07-24 23:32:06,859]::[InvokeAI]::INFO --> Model manager service initialized
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:9090 (Press CTRL+C to quit)
INFO: 127.0.0.1:35052 - "GET /socket.io/?EIO=4&transport=polling&t=Oc9qHwH HTTP/1.1" 200 OK
INFO: 127.0.0.1:35052 - "POST /socket.io/?EIO=4&transport=polling&t=Oc9qHwJ&sid=ZXwRuIab-6GgOo1cAAAA HTTP/1.1" 200 OK
INFO: 127.0.0.1:35052 - "GET /socket.io/?EIO=4&transport=polling&t=Oc9qHwK&sid=ZXwRuIab-6GgOo1cAAAA HTTP/1.1" 200 OK
INFO: ('127.0.0.1', 35066) - "WebSocket /socket.io/?EIO=4&transport=websocket&sid=ZXwRuIab-6GgOo1cAAAA" [accepted]
INFO: connection open
INFO: 127.0.0.1:35052 - "GET /socket.io/?EIO=4&transport=polling&t=Oc9qHwM&sid=ZXwRuIab-6GgOo1cAAAA HTTP/1.1" 200 OK
INFO: 127.0.0.1:35052 - "POST /socket.io/?EIO=4&transport=polling&t=Oc9qHwW&sid=ZXwRuIab-6GgOo1cAAAA HTTP/1.1" 200 OK
INFO: 127.0.0.1:35052 - "GET /socket.io/?EIO=4&transport=polling&t=Oc9qHx3&sid=ZXwRuIab-6GgOo1cAAAA HTTP/1.1" 200 OK
INFO: 127.0.0.1:35052 - "GET /socket.io/?EIO=4&transport=polling&t=Oc9qHx5&sid=ZXwRuIab-6GgOo1cAAAA HTTP/1.1" 200 OK
INFO: 127.0.0.1:35052 - "POST /api/v1/sessions/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:35052 - "PUT /api/v1/sessions/50d99cec-2fc6-4e59-9219-f7e9d0dbf159/invoke?all=true HTTP/1.1" 202 Accepted
[2023-07-24 23:32:13,517]::[InvokeAI]::INFO --> Loading model /home/hellong/invokeai/models/sd-1/main/stable-diffusion-v1-5, type sd-1:main:tokenizer
[2023-07-24 23:32:13,747]::[InvokeAI]::INFO --> Loading model /home/hellong/invokeai/models/sd-1/main/stable-diffusion-v1-5, type sd-1:main:text_encoder
Segmentation fault (core dumped)

Screenshots

No response

Additional context

Using ROCm 5.4.2, as recommended by the Pytorch official website.
GPU : AMD Radeon 6700 XT

Contact Details

No response

puresick · 2023-07-26T11:12:21Z

Same happening to me with an AMD Radeon 5500 XT with 8GB of VRAM.

Something similar has also happening to me pre-3.0, but that issue has been closed since the open issues got reset with the 3.0 release: #2894 (comment)

tokenwizard · 2023-07-26T13:04:45Z

I'm also having this issue. When you click the Invoke button, about 5-10 seconds later the console shows the Seg Fault.

Freshly installed using the install script on Linux and using the Analog-Diffusion model.

System Specs are below.

Here is potentially relevant dmesg output:

[Wed Jul 26 08:23:30 2023] invokeai-web[1009479]: segfault at 20 ip 00007f4e27ab40a7 sp 00007f4b5dff9290 error 4 in libamdhip64.so[7f4e27a00000+3f3000] likely on CPU 12 (core 4, socket 0)
[Wed Jul 26 08:23:30 2023] Code: 8d 15 5d 6d 25 00 48 8d 3d f6 6c 25 00 be 32 00 00 00 e8 dc ed 1f 00 e8 c7 ed 1f 00 48 8b 45 b8 48 8b 50 28 4c 8b 24 da 31 c0 <41> 80 7c 24 20 00 74 11 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d
[Wed Jul 26 08:59:06 2023] invokeai-web[1012831]: segfault at 20 ip 00007fbf1fcb40a7 sp 00007fbc55ff9290 error 4 in libamdhip64.so[7fbf1fc00000+3f3000] likely on CPU 9 (core 1, socket 0)
[Wed Jul 26 08:59:06 2023] Code: 8d 15 5d 6d 25 00 48 8d 3d f6 6c 25 00 be 32 00 00 00 e8 dc ed 1f 00 e8 c7 ed 1f 00 48 8b 45 b8 48 8b 50 28 4c 8b 24 da 31 c0 <41> 80 7c 24 20 00 74 11 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d

arvenig · 2023-07-26T13:44:30Z

I appear to have been experiencing this issue too, Linux, Radeon 6900XT. Hopefully relevent detail is that I was able to work around it by using torch version 1.13.1+rocm5.2 and corresponding torchvision 0.14.1+rocm5.2 that I still had from my working Invoke 2.3.5 install. After replacing torch 2.0 and torchvision with those older versions Invoke 3.0 now seems to work as expected for me.

Alex9001 · 2023-07-29T23:47:15Z

I have the same problem.

OS: Artix Linux x86_64
GPU: AMD ATI Radeon RX 6600/6600 XT/6600M
CPU: AMD Ryzen 7 5800H

arvenig · 2023-08-02T10:58:34Z

Was experiencing this issue on my Ryzen 7950X / Radeon 6900XT desktop system running Arch Linux. I seem to have worked around it by disabling the 7950X's iGPU in BIOS. The GPU device reported by invokeai-web at startup both with and without the iGPU enabled is 'cuda AMD Radeon RX 6900 XT', but for whatever reason having the iGPU enabled seems to have been causing an issue. This issue has been present for me in all versions of invoke since the update to torch 2.0. Tested on a fresh InvokeAI 3.0.1post3 install.

Godd67 · 2023-08-05T23:11:59Z

Yep, same issue for me - 2.3 version worked perfectly, 3.0.1post3 (fresh install) failed with segerror
RX6600, Ubuntu 22.04, Rocm 5.6
[2023-08-05 19:00:28,561]::[uvicorn.access]::INFO --> 127.0.0.1:35828 - "PUT /api/v1/sessions/f3756076-a290-4c92-af83-28ccd8e881d4/invoke?all=true HTTP/1.1" 202
[2023-08-05 19:00:28,575]::[InvokeAI]::INFO --> Loading model /media/olegus/Extra/InvokeAi/models/sd-1/main/stable-diffusion-v1-5, type sd-1:main:tokenizer
[2023-08-05 19:00:28,873]::[InvokeAI]::INFO --> Loading model /media/olegus/Extra/InvokeAi/models/sd-1/main/stable-diffusion-v1-5, type sd-1:main:text_encoder
./invoke.sh: line 51: 99206 Segmentation fault (core dumped) invokeai-web $PARAMS

Millu · 2023-08-07T05:29:30Z

Hey! Another person had similar issues with torch and a fix seems to be building a version of python with a different lower torch version (similar to what @arvenig said!):

#4041 (comment)

Godd67 · 2023-08-10T02:49:28Z

Hey! Another person had similar issues with torch and a fix seems to be building a version of python with a different lower torch version (similar to what @arvenig said!):

#4041 (comment)

Can someone explain in simple words how to achieve it? BTW, I use Python 3.10 as it was suggested for previous InvokeAi version.

YabbaYabbaYabba · 2023-08-14T17:48:51Z

i have the same issue - invoke.sh: line 51: 8792 Segmentation fault (core dumped) invokeai-web $PARAMS

Jeremi360 · 2023-08-21T11:24:07Z

I have the same issue ./invoke.sh: linie 51: 4167 Segmentation fault (core dumped) invokeai-web $PARAMS

Godd67 · 2023-08-25T19:50:30Z

Made it work with rocm 5.4.2 and rx6600 and kernel 5.19 .
Followed this guide - https://phazertech.com/tutorials/rocm.html , starting from Other Requirements section - already had rocm installed so cant comment on this part.
It seems the only difference from my previous attempts was this -
sudo apt install nvidia-cuda-toolkit

YabbaYabbaYabba · 2023-08-28T14:06:38Z

Thank you!

archer31 · 2023-10-13T01:13:28Z

Unfortunately none of the posted solutions work to resolve the segfault. What I have tried:

Downgrading torch and torchvision
- This just results in the gpu not being detected anymore
Upgrading torch and torchvision
- Same as above
Applying HSA_OVERRIDE_GFX_VERSION=10.3.0 to my profile
- No appreciable changes

ROCM version 5.4.3
GPU: Radeon RX 7900 XTX
InvokeAI version: 3.2.0 (same also happens in 3.3.0RC1)

Edit: This appears to be an issue with ROCM support for the 7000 series of AMD GPUs. not sure why these are still unsupported 9 months after they came out. guess ill just return this card and get an nvidia gpu :(.

adeliktas · 2023-10-15T16:05:12Z

I just installed in python3.11 venv InvokeAI 3.3.0 with rocm for amd 6600xt and encountered the same issue when pressing "Invoke" Button on the webui.

segfault at 20 ip 00007fd2142b40a7 sp 00007fcecfe91470 error 4 in libamdhip64.so[7fd214200000+3f3000]

pytorch-triton-rocm 2.0.2
torch 2.0.1+rocm5.4.2
torchvision 0.15.2+rocm5.4.2

.../InvokeAI/.venv/lib/python3.11/site-packages/triton/third_party/rocm/lib/libamdhip64.so
.../InvokeAI/.venv/lib/python3.11/site-packages/torch/lib/libamdhip64.so

gdb last traces


[#6] 0x7fffad3c93e4 → hipLaunchKernel()
[#7] 0x7fffaf7b3a3b → at::native::index_select_out_cuda(at::Tensor const&, long, at::Tensor const&, at::Tensor&)::{lambda()#2}::operator()() const()
[#8] 0x7fffaf791d5a → at::native::index_select_out_cuda(at::Tensor const&, long, at::Tensor const&, at::Tensor&)()
[#9] 0x7fffaf7c947b → at::native::index_select_cuda(at::Tensor const&, long, at::Tensor const&)()

takov751 · 2023-10-22T15:33:43Z

Unfortunately none of the posted solutions work to resolve the segfault. What I have tried:

Downgrading torch and torchvision

This just results in the gpu not being detected anymore

Upgrading torch and torchvision

Same as above

Applying HSA_OVERRIDE_GFX_VERSION=10.3.0 to my profile

No appreciable changes

ROCM version 5.4.3

GPU: Radeon RX 7900 XTX

InvokeAI version: 3.2.0 (same also happens in 3.3.0RC1)

Edit: This appears to be an issue with ROCM support for the 7000 series of AMD GPUs. not sure why these are still unsupported 9 months after they came out. guess ill just return this card and get an nvidia gpu :(.

In your case it should be HSA_OVERRIDE_GFX_VERSION=11.0.0

adeliktas · 2023-10-23T21:10:30Z

Unfortunately none of the posted solutions work to resolve the segfault. What I have tried:

Downgrading torch and torchvision

This just results in the gpu not being detected anymore

Upgrading torch and torchvision

Same as above

Applying HSA_OVERRIDE_GFX_VERSION=10.3.0 to my profile

No appreciable changes

ROCM version 5.4.3
GPU: Radeon RX 7900 XTX
InvokeAI version: 3.2.0 (same also happens in 3.3.0RC1)
Edit: This appears to be an issue with ROCM support for the 7000 series of AMD GPUs. not sure why these are still unsupported 9 months after they came out. guess ill just return this card and get an nvidia gpu :(.
* In your case it should be HSA_OVERRIDE_GFX_VERSION=11.0.0

Setting gfx made invokeai run for my 6600 XT, but generating the image bugs and returns an invalid image.
#4278
#4211
CUDA_VERSION=gfx1030 HSA_OVERRIDE_GFX_VERSION=10.3.0 invokeai-web

https://gist.github.com/adeliktas/669812e64fd356afc4648ba847c61133
torch version = 2.0.1+rocm5.4.2
cuda available = True
cuda version = None
device count = 1
cudart = <module 'torch._C._cudart'>
device = 0
capability = (10, 3)
name = AMD Radeon RX 6600 XT

hchasens · 2024-03-07T08:49:02Z

I'm seeing this with my 7900xtx

hchasens · 2024-03-07T09:01:13Z

So I figured it out. When using ROCm it tries to select your first GPU which is your integrated graphics. There's not enough VRAM so you get a segmentation fault. There's an environment variable you can use to disable the visibility of the iGPU.

export HIP_VISIBLE_DEVICES="0"

I found the best place to put it is in invokeai.sh right after the start of the venv.

. .venv/bin/activate

export INVOKEAI_ROOT="$scriptdir"
PARAMS=$@

export HIP_VISIBLE_DEVICES="0"

# Check to see if dialog is installed (it seems to be fairly standard, but good to check regardless) and if the user has passed the --no-tui argument to disable the dialog TUI

This fixed my issue. I've found a programs that have the same issue. Autogen and Text-gen-webui both have the same problem and solution.

Hope this has helped! It's a lot easier than phazertech's guide imo.

Alex9001 · 2024-03-07T09:03:18Z

So I figured it out. When using ROCm it tries to select your first GPU which is your integrated graphics. There's not enough VRAM so you get a segmentation fault. There's an environment variable you can use to disable the visibility of the iGPU.

export HIP_VISIBLE_DEVICES="0"

I found the best place to put it is in invokeai.sh right after the start of the venv.
. .venv/bin/activate

export INVOKEAI_ROOT="$scriptdir"
PARAMS=$@

export HIP_VISIBLE_DEVICES="0"

# Check to see if dialog is installed (it seems to be fairly standard, but good to check regardless) and if the user has passed the --no-tui argument to disable the dialog TUI
This fixed my issue. I've found a programs that have the same issue. Autogen and Text-gen-webui both have the same problem and solution.

Hope this has helped! It's a lot easier than phazertech's guide imo.

Very based.

adeliktas · 2024-03-17T03:06:54Z

after almost half a year, i've decided to give it another try and was able to find my issue after writing this.
I've tried working with different env vars like HIP_VISIBLE_DEVICES="0" and ran two test scripts

https://gist.github.com/adeliktas/669812e64fd356afc4648ba847c61133
https://gist.github.com/damico/484f7b0a148a0c5f707054cf9c0a0533

torch version = 2.2.1+rocm5.7
cuda available = True
cuda version = None
device count = 1
cudart = <module 'torch._C._cudart'>
device = 0
capability = (10, 3)
name = AMD Radeon RX 6600 XT
...
Everything fine! You can run PyTorch code inside of: 
--->  AMD Ryzen 9 3950X 16-Core Processor  
--->  gfx1032

i did print all env vars with the env command and suprisingly found that HSA_OVERRIDE_GFX_VERSION wasn't listed, even though echo $HSA_OVERRIDE_GFX_VERSION prints 10.3.0 because i set it universally with set -U HSA_OVERRIDE_GFX_VERSION 10.3.0 in fish which doesnt export it to bash and is only shared in fish
a simple export HSA_OVERRIDE_GFX_VERSION=10.3.0 solved that

PWD=/home/adeliktas/ai/invokeai_projects/InvokeAI
HSA_OVERRIDE_GFX_VERSION=10.3.0
INVOKEAI_ROOT=/home/adeliktas/ai/invokeai_projects/InvokeAI
HIP_VISIBLE_DEVICES=0
VIRTUAL_ENV_PROMPT=(InvokeAI)
_OLD_FISH_PROMPT_OVERRIDE=/home/adeliktas/ai/invokeai_projects/InvokeAI/.venv
VIRTUAL_ENV=/home/adeliktas/ai/invokeai_projects/InvokeAI/.venv
upstream InvokeAI version 4.0.0rc2 faa1ffb06fd4974c43be14a2119a1aab12b63038

Developer-42 · 2024-03-28T15:21:16Z

So I figured it out. When using ROCm it tries to select your first GPU which is your integrated graphics. There's not enough VRAM so you get a segmentation fault. There's an environment variable you can use to disable the visibility of the iGPU.

export HIP_VISIBLE_DEVICES="0"

I found the best place to put it is in invokeai.sh right after the start of the venv.
. .venv/bin/activate

export INVOKEAI_ROOT="$scriptdir"
PARAMS=$@

export HIP_VISIBLE_DEVICES="0"

# Check to see if dialog is installed (it seems to be fairly standard, but good to check regardless) and if the user has passed the --no-tui argument to disable the dialog TUI
This fixed my issue. I've found a programs that have the same issue. Autogen and Text-gen-webui both have the same problem and solution.

Hope this has helped! It's a lot easier than phazertech's guide imo.

Sadly, this doesn't work for me with my AMD Radeon RX 7800 XT. Also, the file name is invoke.sh not invokeai.sh

takov751 · 2024-03-28T18:16:53Z

So I figured it out. When using ROCm it tries to select your first GPU which is your integrated graphics. There's not enough VRAM so you get a segmentation fault. There's an environment variable you can use to disable the visibility of the iGPU.
export HIP_VISIBLE_DEVICES="0"
I found the best place to put it is in invokeai.sh right after the start of the venv.
. .venv/bin/activate

export INVOKEAI_ROOT="$scriptdir"
PARAMS=$@

export HIP_VISIBLE_DEVICES="0"

# Check to see if dialog is installed (it seems to be fairly standard, but good to check regardless) and if the user has passed the --no-tui argument to disable the dialog TUI
This fixed my issue. I've found a programs that have the same issue. Autogen and Text-gen-webui both have the same problem and solution.
Hope this has helped! It's a lot easier than phazertech's guide imo.
Sadly, this doesn't work for me with my AMD Radeon RX 7800 XT. Also, the file name is invoke.sh not invokeai.sh

have you specified HSA_OVERRIDE_GFX_VERSION=11.0.0 as your gpu is 7XXX series?

Alex9001 · 2024-04-03T10:58:23Z

I finally got around to trying export HIP_VISIBLE_DEVICES="0" ... and nothing happened. Just as before,

::[uvicorn.access]::INFO --> 127.0.0.1:38998 - "GET /api/v1/queue/default/list HTTP/1.1" 200
./invoke.sh: line 56: 9079 Segmentation fault invokeai-web $PARAMS

hchasens · 2024-04-03T18:03:24Z

@Alex9001 This error message makes me think it might not be an ROCm issue. Never the less, it might be worth double checking to make sure your ROCm HIP runtime is up to date. I'm assuming the ROCm runtime is in your /opt/rocm/ folder? It might be worth checking that, along with your package manager to see if there are any updates. Use some of the tools AMD ships with the runtime to make sure it's communicating with your hardware properly (maybe using rocminfo or the like. If your GPU is supported you should see it listed

Serpentian · 2024-07-28T19:48:05Z

Placing export HSA_OVERRIDE_GFX_VERSION=11.0.0 right after venv activation in invoke.sh fixed the issue with AMD Radeon RX 7800 XT. Here's the source: #4211 (comment)

redhelling21 added the bug Something isn't working label Jul 24, 2023

redhelling21 changed the title ~~[bug]: Segmentation fault on starting image generation~~ [bug]: Segmentation fault on image generation start (AMD) Jul 24, 2023

Millu mentioned this issue Aug 7, 2023

[bug]: segmentation fault on startup [Python 3.11] #4041

Closed

1 task

TheKarls mentioned this issue Sep 16, 2023

[bug]: segmentation fault on AMD GPU #2894

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: Segmentation fault on image generation start (AMD) #3967

[bug]: Segmentation fault on image generation start (AMD) #3967

redhelling21 commented Jul 24, 2023

puresick commented Jul 26, 2023 •

edited

Loading

tokenwizard commented Jul 26, 2023

arvenig commented Jul 26, 2023

Alex9001 commented Jul 29, 2023

arvenig commented Aug 2, 2023

Godd67 commented Aug 5, 2023

Millu commented Aug 7, 2023

Godd67 commented Aug 10, 2023 •

edited

Loading

YabbaYabbaYabba commented Aug 14, 2023

Jeremi360 commented Aug 21, 2023 •

edited

Loading

Godd67 commented Aug 25, 2023

YabbaYabbaYabba commented Aug 28, 2023

archer31 commented Oct 13, 2023 •

edited

Loading

adeliktas commented Oct 15, 2023 •

edited

Loading

takov751 commented Oct 22, 2023

adeliktas commented Oct 23, 2023 •

edited

Loading

hchasens commented Mar 7, 2024

hchasens commented Mar 7, 2024

Alex9001 commented Mar 7, 2024

adeliktas commented Mar 17, 2024 •

edited

Loading

Developer-42 commented Mar 28, 2024

takov751 commented Mar 28, 2024 •

edited

Loading

Alex9001 commented Apr 3, 2024

hchasens commented Apr 3, 2024 •

edited

Loading

Serpentian commented Jul 28, 2024

[bug]: Segmentation fault on image generation start (AMD) #3967

[bug]: Segmentation fault on image generation start (AMD) #3967

Comments

redhelling21 commented Jul 24, 2023

Is there an existing issue for this?

OS

GPU

VRAM

What version did you experience this issue on?

What happened?

Screenshots

Additional context

Contact Details

puresick commented Jul 26, 2023 • edited Loading

tokenwizard commented Jul 26, 2023

arvenig commented Jul 26, 2023

Alex9001 commented Jul 29, 2023

arvenig commented Aug 2, 2023

Godd67 commented Aug 5, 2023

Millu commented Aug 7, 2023

Godd67 commented Aug 10, 2023 • edited Loading

YabbaYabbaYabba commented Aug 14, 2023

Jeremi360 commented Aug 21, 2023 • edited Loading

Godd67 commented Aug 25, 2023

YabbaYabbaYabba commented Aug 28, 2023

archer31 commented Oct 13, 2023 • edited Loading

adeliktas commented Oct 15, 2023 • edited Loading

takov751 commented Oct 22, 2023

adeliktas commented Oct 23, 2023 • edited Loading

hchasens commented Mar 7, 2024

hchasens commented Mar 7, 2024

Alex9001 commented Mar 7, 2024

adeliktas commented Mar 17, 2024 • edited Loading

Developer-42 commented Mar 28, 2024

takov751 commented Mar 28, 2024 • edited Loading

Alex9001 commented Apr 3, 2024

hchasens commented Apr 3, 2024 • edited Loading

Serpentian commented Jul 28, 2024

puresick commented Jul 26, 2023 •

edited

Loading

Godd67 commented Aug 10, 2023 •

edited

Loading

Jeremi360 commented Aug 21, 2023 •

edited

Loading

archer31 commented Oct 13, 2023 •

edited

Loading

adeliktas commented Oct 15, 2023 •

edited

Loading

adeliktas commented Oct 23, 2023 •

edited

Loading

adeliktas commented Mar 17, 2024 •

edited

Loading

takov751 commented Mar 28, 2024 •

edited

Loading

hchasens commented Apr 3, 2024 •

edited

Loading