Update LLamaEmbedder, Examples packages, and KernelMemory examples #1170

zsogitbe · 2025-04-23T19:22:54Z

Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
Examples had wrong nuget packages, updated to correct ones.
Updated KernelMemory examples.

I have tested the code and it was fine with me. Please do test it at least once more. Thank you.

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples.

martindevans · 2025-04-23T19:54:01Z

LLama/Native/NativeApi.cs

@@ -290,6 +290,14 @@ public static void llama_log_set(NativeLogConfig.LLamaLogCallback logCallback)
        [DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)]
        internal static extern void llama_kv_self_clear(SafeLLamaContextHandle ctx);

+        [Obsolete("Use `llama_kv_self_clear` instead")]


This is exposed as KvCacheClear on SafeLLamaContextHandle now, it shouldn't be re-introduced here.

Oh it's also an obsolete function in llama.cpp anyway!

martindevans · 2025-04-23T19:55:23Z

LLama/Native/SafeLLamaContextHandle.cs

@@ -809,7 +809,8 @@ public int KvCacheCountTokens()
        /// </summary>
        public void KvCacheClear()
        {
-            NativeApi.llama_kv_self_clear(this);
+            //NativeApi.llama_kv_self_clear(this);


Not sure why this change was made? llama_kv_cache_clear was renamed llama_kv_self_clear in llama.cpp recently, so that should always be used instead now.

Yes, for some reason there are/were problems with the llama.cpp submodule. I had a slightly lower version (after downloading a version initially) and that version of llama.cpp does not have the llama_kv_self_clear yet. This was the reason for why I have reintroduced the old clear. I think that it would be better to keep it still in the code for this reason (it is marked as obsolete), but it is up to you to decide.
p.s.: both functions do the same :)

That was probably me forgetting to update the submodule again on the last update. If you keep an eye on the update PRs when they're available feel free to give me a poke if you spot that in the future.

For llama_kv_cache_clear vs llama_kv_self_clear, please put it back to the newer version. If it was a public API it'd make sense to keep it around for a while marked as Obsolete to give people a chance to migrate, but this is an internal function.

martindevans

Overall looks pretty good, just a couple of minor changes needed with llama_kv_self_clear.

I haven't tested the embedder changes yet, they look reasonable and I'll try to make some time this weekend to test it out with long embeddings.

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters

zsogitbe · 2025-04-24T05:34:47Z

Test failed: It runs on my computer, I do not see any problem (nothing important has changed compared to before when all tests were OK). Try to restart the test. Thank you.

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check

martindevans · 2025-04-24T13:35:38Z

I've restarted the tests, but I'm not expecting them to pass right now. We seem to have an issue with one specific test at the moment that consistently fails in CI, but nobody can reproduce a failure locally :(

I'm probably just going to suppress that test if I get some time to work on it this weekend.

zsogitbe · 2025-04-24T14:21:36Z

I've restarted the tests, but I'm not expecting them to pass right now. We seem to have an issue with one specific test at the moment that consistently fails in CI, but nobody can reproduce a failure locally :(

I'm probably just going to suppress that test if I get some time to work on it this weekend.

OK! Take your time Martin, no problem. Thank you!

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK)

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK) - setting GpuLayerCount to 0 as an experiment

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK) - setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows

zsogitbe · 2025-04-25T08:39:51Z

I have corrected several problems in the code and I could make all tests pass (in the macOSX tests with skipping only 1).

One of the problems I have corrected is that I have completely rewritten the model download code (project file) because it was downloading the models every time it was compiling the code. I think GitHub did not like this... and me neither.

One BUG in llama.cpp prevented the KernelMemory tests to pass. In the following code, if we do not pass a split mode different from None, then llama.cpp will crash out at this point if there is no GPU in the system. I suspect that on macOSX there is a GPU available on GitHub and that was the reason for why the macOSX tests were passing before also.

    if (params.split_mode == LLAMA_SPLIT_MODE_NONE) {
        if (params.main_gpu < 0 || params.main_gpu >= (int)model->devices.size()) {
            LLAMA_LOG_ERROR("%s: invalid value for main_gpu: %d (available devices: %d)\n", __func__, params.main_gpu, (int)model->devices.size());
            llama_model_free(model);
            return nullptr;
        }
        ggml_backend_dev_t main_gpu = model->devices[params.main_gpu];
        model->devices.clear();
        model->devices.push_back(main_gpu);
    }

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples. - added missing model parameters - adding config is null check - unit tests project update to prevent the constant download of many GBs - ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! ** - skipping one test for macOS (all other tests are OK) - setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows - possible BUG in llama.cpp in 'if (params.split_mode == LLAMA_SPLIT_MODE_NONE)'... trying to set other split mode (even if there is no GPU)!

martindevans · 2025-04-25T12:54:36Z

Could you split the csproj file downloading changes to a separate PR? The SkipUnchangedFiles="true" attribute should prevent duplicate downloads, so we'll want to look into that. Having it all in one PR will just slow down merging the embedder changes.

zsogitbe · 2025-04-25T17:55:42Z

Could you split the csproj file downloading changes to a separate PR? The SkipUnchangedFiles="true" attribute should prevent duplicate downloads, so we'll want to look into that. Having it all in one PR will just slow down merging the embedder changes.

That would be a lot of work Martin and I have invested already a lot of time in this (finding the problems and correcting them). It could also cause potential problems because of the split of the commits and then I would have even more work. Please try to check all together, if possible. You can easily test the download solution on your computer and the GitHub tests were successful on all platforms (the new code has found the downloaded models). I have tested the download on my computer with no models and also with existing models and it worked fine with me. Thank you.

We will need to do some cleanup later (some commented out code) because some things are not very clear yet (for example why we do not need Embeddings=true anymore, etc.), but I would prefer to first integrate this PR without further changes, if possible. I have left these in to make sure that we do not forget to do some more research.

Thank you!

martindevans · 2025-04-25T19:29:18Z

That's no problem, if it's not just a matter of cherry picking some commits to a new branch don't worry about it 👍

Update LLamaEmbedder, Examples packages, and KernelMemory examples

ea1a178

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory). - Examples had wrong nuget packages, updated to correct ones. - Updated KernelMemory examples.

zsogitbe mentioned this pull request Apr 23, 2025

[BUG]: Get GGML_ASSERT when running KernelMemorySaveAndLoad.cs #1151

Open

martindevans reviewed Apr 23, 2025

View reviewed changes

martindevans requested changes Apr 23, 2025

View reviewed changes

zsogitbe added 3 commits April 24, 2025 06:54

The requested update and some more...

65f56e4

zsogitbe added 6 commits April 25, 2025 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update LLamaEmbedder, Examples packages, and KernelMemory examples #1170

Update LLamaEmbedder, Examples packages, and KernelMemory examples #1170

zsogitbe commented Apr 23, 2025

martindevans Apr 23, 2025

martindevans Apr 23, 2025

martindevans Apr 23, 2025

zsogitbe Apr 23, 2025 •

edited

Loading

martindevans Apr 24, 2025

martindevans left a comment

zsogitbe commented Apr 24, 2025

martindevans commented Apr 24, 2025

zsogitbe commented Apr 24, 2025

zsogitbe commented Apr 25, 2025 •

edited

Loading

martindevans commented Apr 25, 2025

zsogitbe commented Apr 25, 2025

martindevans commented Apr 25, 2025

Update LLamaEmbedder, Examples packages, and KernelMemory examples #1170

Are you sure you want to change the base?

Update LLamaEmbedder, Examples packages, and KernelMemory examples #1170

Conversation

zsogitbe commented Apr 23, 2025

martindevans Apr 23, 2025

Choose a reason for hiding this comment

martindevans Apr 23, 2025

Choose a reason for hiding this comment

martindevans Apr 23, 2025

Choose a reason for hiding this comment

zsogitbe Apr 23, 2025 • edited Loading

Choose a reason for hiding this comment

martindevans Apr 24, 2025

Choose a reason for hiding this comment

martindevans left a comment

Choose a reason for hiding this comment

zsogitbe commented Apr 24, 2025

martindevans commented Apr 24, 2025

zsogitbe commented Apr 24, 2025

zsogitbe commented Apr 25, 2025 • edited Loading

martindevans commented Apr 25, 2025

zsogitbe commented Apr 25, 2025

martindevans commented Apr 25, 2025

zsogitbe Apr 23, 2025 •

edited

Loading

zsogitbe commented Apr 25, 2025 •

edited

Loading