Skip to content

Conversation

ggerganov
Copy link
Member

ref #16372 (comment)

The mask that we use in the FA tests is always filled with non-INF values, while in practice it often contains -INF blocks due to causal masking or due to padding. Extend the tests to add -INF values so we can exercise any logic that relies on detecting such blocks in the mask.

@ggerganov ggerganov requested a review from slaren as a code owner October 2, 2025 05:44
@github-actions github-actions bot added the testing Everything test related label Oct 2, 2025
@Green-Sky
Copy link
Collaborator

I forgot to mention this, but it is possible that cuda fails this test.

@ggerganov
Copy link
Member Author

@Green-Sky Do you mean you tried it locally and it failed?

@Green-Sky
Copy link
Collaborator

I just tried it locally and it works. I will have to go and look at the sd.cpp+chroma+fa bug again, so I actually know what fails there.

@Green-Sky
Copy link
Collaborator

Green-Sky commented Oct 2, 2025

Ok, did some more debugging. I am not sure if the issue really is with ggml, but flash attention just seems to die when we use the mask in the DiT for chroma.

The mask is:

  • 0 in [0;19] (in my example)
  • -inf in [20;511]
  • and then padded with 0 again for [512;2815]

attention_ext L_q:2816 L_k:2816 n_head:24 C:3072 d_head:128 N:1

@ggerganov
Copy link
Member Author

Does it show any error? What do you mean by "die"?

@Green-Sky
Copy link
Collaborator

Does it show any error? What do you mean by "die"?

Ah, sorry for that.
By "die", I mean silently fail. Pulling data from a backend is kind of a pain, but the resulting image is always black, so I assume it it pulled to (+/-) INF somewhere, or maybe zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants