Made Vocabulary's properties be initialized only ONCE on creation#1110
Made Vocabulary's properties be initialized only ONCE on creation#1110Lyrcaxis wants to merge 5 commits intoSciSharp:masterfrom
Conversation
|
@dpmm99 does this fix (or at least improve) the issue you were talking about in Discord? |
c982047 to
8b2b7cc
Compare
Negative. |
|
Since this didn't fix the bug shall we close this PR? |
I think it’s still a decent improvement — potentially huge performance-wise depending on use case (e.g.: batching). But I can see how this could be viewed as memory duplication, although quite minimal (~7MB assuming ~128k tokens mapping to an average of 5 char strings). The Ultimately your call. Let me know. |
|
Oh if it's a large performance improvement as well let's merge it! I thought it was jsut a potential bugfix. Do you have any benchmarks for how much of a gain it is? |
|
These are some quick benchmarks from release (ctrl+F5): The tests were done with this code: // Placed in Vocabulary class
public void RunTest() {
const float totalIters = 1_000_000;
Console.WriteLine($"Starting test with {totalIters} iterations.");
Console.WriteLine("\nTesting EOS access");
var sw = Stopwatch.StartNew();
for (int i = 0; i < totalIters; i++) { var x = EOS; }
sw.Stop();
Console.WriteLine($"EOS access - PR's way: {sw.ElapsedMilliseconds}");
unsafe {
var _vocabNative = llama_model_get_vocab(_model);
sw.Restart();
for (int i = 0; i < totalIters; i++) { var x = Normalize(LLamaVocabNative.llama_vocab_eos(_vocabNative)); }
sw.Stop();
Console.WriteLine($"EOS access - Current way: {sw.ElapsedMilliseconds}");
}
Console.WriteLine("\nTesting Vocab access");
var decoder = new StreamingTokenDecoder(Encoding.UTF8, _model);
var llamaToken = (LLamaToken) 42;
sw.Restart();
for (int i = 0; i < totalIters; i++) { var x = this.TokenToString[llamaToken]; }
sw.Stop();
Console.WriteLine($"Single token to string - PR's way: {sw.ElapsedMilliseconds}");
sw.Restart();
for (int i = 0; i < totalIters; i++) { decoder.Add(llamaToken); var x = decoder.Read(); }
sw.Stop();
Console.WriteLine($"Single token to string - Current way: {sw.ElapsedMilliseconds}");
Console.WriteLine("\nTesting IsEOG");
sw.Restart();
for (int i = 0; i < totalIters; i++) { var x = EOGTokens.Contains((int) llamaToken); }
sw.Stop();
Console.WriteLine($"EOG Test - PR's way: {sw.ElapsedMilliseconds}");
unsafe {
sw.Restart();
for (int i = 0; i < totalIters; i++) { var x = LLamaVocabNative.llama_vocab_is_eog(VocabNative, llamaToken); }
sw.Stop();
}
Console.WriteLine($"EOG Test - Current way: {sw.ElapsedMilliseconds}");
// Test accuracy
unsafe {
for (int i = 0; i < Count; i++) {
decoder.Add(llamaToken);
Debug.Assert(this.TokenToString[llamaToken] == decoder.Read());
Debug.Assert(LLamaVocabNative.llama_vocab_is_eog(VocabNative, llamaToken) == EOGTokens.Contains((int)llamaToken));
Debug.Assert(LLamaVocabNative.llama_vocab_is_control(VocabNative, llamaToken) == ControlTokens.Contains((int)llamaToken));
}
}
Console.WriteLine($"\nThe test has ended");
}
|
|
This pull request has been automatically marked as stale due to inactivity. If no further activity occurs, it will be closed in 7 days. |

Minor QOL improvement on the

Vocabularyclass with a slight performance increase.Reason was I'm worried about llama.cpp internal deadlocks/mem. corruption based on an issue reported by @dpmm99.
I'm not 100% sure if this will solve the issue, but it's one less thing to worry about. Maybe we can narrow it down little by little.