-
Notifications
You must be signed in to change notification settings - Fork 414
Update LLamaEmbedder, Examples packages, and KernelMemory examples #1170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples.
@@ -290,6 +290,14 @@ public static void llama_log_set(NativeLogConfig.LLamaLogCallback logCallback) | |||
[DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)] | |||
internal static extern void llama_kv_self_clear(SafeLLamaContextHandle ctx); | |||
|
|||
[Obsolete("Use `llama_kv_self_clear` instead")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exposed as KvCacheClear
on SafeLLamaContextHandle
now, it shouldn't be re-introduced here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it's also an obsolete function in llama.cpp anyway!
@@ -809,7 +809,8 @@ public int KvCacheCountTokens() | |||
/// </summary> | |||
public void KvCacheClear() | |||
{ | |||
NativeApi.llama_kv_self_clear(this); | |||
//NativeApi.llama_kv_self_clear(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this change was made? llama_kv_cache_clear
was renamed llama_kv_self_clear
in llama.cpp recently, so that should always be used instead now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for some reason there are/were problems with the llama.cpp submodule. I had a slightly lower version (after downloading a version initially) and that version of llama.cpp does not have the llama_kv_self_clear yet. This was the reason for why I have reintroduced the old clear. I think that it would be better to keep it still in the code for this reason (it is marked as obsolete), but it is up to you to decide.
p.s.: both functions do the same :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was probably me forgetting to update the submodule again on the last update. If you keep an eye on the update PRs when they're available feel free to give me a poke if you spot that in the future.
For llama_kv_cache_clear
vs llama_kv_self_clear
, please put it back to the newer version. If it was a public
API it'd make sense to keep it around for a while marked as Obsolete
to give people a chance to migrate, but this is an internal
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good, just a couple of minor changes needed with llama_kv_self_clear
.
I haven't tested the embedder changes yet, they look reasonable and I'll try to make some time this weekend to test it out with long embeddings.
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters
Test failed: It runs on my computer, I do not see any problem (nothing important has changed compared to before when all tests were OK). Try to restart the test. Thank you. |
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check
I've restarted the tests, but I'm not expecting them to pass right now. We seem to have an issue with one specific test at the moment that consistently fails in CI, but nobody can reproduce a failure locally :( I'm probably just going to suppress that test if I get some time to work on it this weekend. |
OK! Take your time Martin, no problem. Thank you! |
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK)
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK) - setting GpuLayerCount to 0 as an experiment
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK) - setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK) - setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK) - setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows
I have corrected several problems in the code and I could make all tests pass (in the macOSX tests with skipping only 1). One of the problems I have corrected is that I have completely rewritten the model download code (project file) because it was downloading the models every time it was compiling the code. I think GitHub did not like this... and me neither. One BUG in llama.cpp prevented the KernelMemory tests to pass. In the following code, if we do not pass a split mode different from None, then llama.cpp will crash out at this point if there is no GPU in the system. I suspect that on macOSX there is a GPU available on GitHub and that was the reason for why the macOSX tests were passing before also.
|
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK) - setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows - possible BUG in llama.cpp in 'if (params.split_mode == LLAMA_SPLIT_MODE_NONE)'... trying to set other split mode (even if there is no GPU)!
Could you split the csproj file downloading changes to a separate PR? The |
That would be a lot of work Martin and I have invested already a lot of time in this (finding the problems and correcting them). It could also cause potential problems because of the split of the commits and then I would have even more work. Please try to check all together, if possible. You can easily test the download solution on your computer and the GitHub tests were successful on all platforms (the new code has found the downloaded models). I have tested the download on my computer with no models and also with existing models and it worked fine with me. Thank you. We will need to do some cleanup later (some commented out code) because some things are not very clear yet (for example why we do not need Embeddings=true anymore, etc.), but I would prefer to first integrate this PR without further changes, if possible. I have left these in to make sure that we do not forget to do some more research. Thank you! |
That's no problem, if it's not just a matter of cherry picking some commits to a new branch don't worry about it 👍 |
I have tested the code and it was fine with me. Please do test it at least once more. Thank you.