Skip to content

Update LLamaEmbedder, Examples packages, and KernelMemory examples #1170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

zsogitbe
Copy link
Contributor

  • Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
  • Examples had wrong nuget packages, updated to correct ones.
  • Updated KernelMemory examples.

I have tested the code and it was fine with me. Please do test it at least once more. Thank you.

- Embedding generation: Extension with Batch processing + Normalization (important  to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
@@ -290,6 +290,14 @@ public static void llama_log_set(NativeLogConfig.LLamaLogCallback logCallback)
[DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)]
internal static extern void llama_kv_self_clear(SafeLLamaContextHandle ctx);

[Obsolete("Use `llama_kv_self_clear` instead")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exposed as KvCacheClear on SafeLLamaContextHandle now, it shouldn't be re-introduced here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's also an obsolete function in llama.cpp anyway!

@@ -809,7 +809,8 @@ public int KvCacheCountTokens()
/// </summary>
public void KvCacheClear()
{
NativeApi.llama_kv_self_clear(this);
//NativeApi.llama_kv_self_clear(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this change was made? llama_kv_cache_clear was renamed llama_kv_self_clear in llama.cpp recently, so that should always be used instead now.

Copy link
Contributor Author

@zsogitbe zsogitbe Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for some reason there are/were problems with the llama.cpp submodule. I had a slightly lower version (after downloading a version initially) and that version of llama.cpp does not have the llama_kv_self_clear yet. This was the reason for why I have reintroduced the old clear. I think that it would be better to keep it still in the code for this reason (it is marked as obsolete), but it is up to you to decide.
p.s.: both functions do the same :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was probably me forgetting to update the submodule again on the last update. If you keep an eye on the update PRs when they're available feel free to give me a poke if you spot that in the future.

For llama_kv_cache_clear vs llama_kv_self_clear, please put it back to the newer version. If it was a public API it'd make sense to keep it around for a while marked as Obsolete to give people a chance to migrate, but this is an internal function.

Copy link
Member

@martindevans martindevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good, just a couple of minor changes needed with llama_kv_self_clear.

I haven't tested the embedder changes yet, they look reasonable and I'll try to make some time this weekend to test it out with long embeddings.

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
@zsogitbe
Copy link
Contributor Author

Test failed: It runs on my computer, I do not see any problem (nothing important has changed compared to before when all tests were OK). Try to restart the test. Thank you.

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- adding config is null check
@martindevans
Copy link
Member

I've restarted the tests, but I'm not expecting them to pass right now. We seem to have an issue with one specific test at the moment that consistently fails in CI, but nobody can reproduce a failure locally :(

I'm probably just going to suppress that test if I get some time to work on it this weekend.

@zsogitbe
Copy link
Contributor Author

I've restarted the tests, but I'm not expecting them to pass right now. We seem to have an issue with one specific test at the moment that consistently fails in CI, but nobody can reproduce a failure locally :(

I'm probably just going to suppress that test if I get some time to work on it this weekend.

OK! Take your time Martin, no problem. Thank you!

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- adding config is null check
- unit tests project update to prevent the constant download of many GBs
- ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- adding config is null check
- unit tests project update to prevent the constant download of many GBs
- ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **
- skipping one test for macOS (all other tests are OK)
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- adding config is null check
- unit tests project update to prevent the constant download of many GBs
- ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **
- skipping one test for macOS (all other tests are OK)
- setting GpuLayerCount to 0 as an experiment
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- adding config is null check
- unit tests project update to prevent the constant download of many GBs
- ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **
- skipping one test for macOS (all other tests are OK)
- setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- adding config is null check
- unit tests project update to prevent the constant download of many GBs
- ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **
- skipping one test for macOS (all other tests are OK)
- setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- adding config is null check
- unit tests project update to prevent the constant download of many GBs
- ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **
- skipping one test for macOS (all other tests are OK)
- setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows
@zsogitbe
Copy link
Contributor Author

zsogitbe commented Apr 25, 2025

I have corrected several problems in the code and I could make all tests pass (in the macOSX tests with skipping only 1).

One of the problems I have corrected is that I have completely rewritten the model download code (project file) because it was downloading the models every time it was compiling the code. I think GitHub did not like this... and me neither.

One BUG in llama.cpp prevented the KernelMemory tests to pass. In the following code, if we do not pass a split mode different from None, then llama.cpp will crash out at this point if there is no GPU in the system. I suspect that on macOSX there is a GPU available on GitHub and that was the reason for why the macOSX tests were passing before also.

    if (params.split_mode == LLAMA_SPLIT_MODE_NONE) {
        if (params.main_gpu < 0 || params.main_gpu >= (int)model->devices.size()) {
            LLAMA_LOG_ERROR("%s: invalid value for main_gpu: %d (available devices: %d)\n", __func__, params.main_gpu, (int)model->devices.size());
            llama_model_free(model);
            return nullptr;
        }
        ggml_backend_dev_t main_gpu = model->devices[params.main_gpu];
        model->devices.clear();
        model->devices.push_back(main_gpu);
    }

- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
- added missing model parameters
- adding config is null check
- unit tests project update to prevent the constant download of many GBs
- ** for some reason Embeddings must be set to false in the kernel memory text embedding generator => we need to follow this and check it later because this should normally be 'true' ! **
- skipping one test for macOS (all other tests are OK)
- setting GpuLayerCount to 0 in Release in CIGpuLayerCount also for Windows
- possible BUG in llama.cpp in 'if (params.split_mode == LLAMA_SPLIT_MODE_NONE)'... trying to set other split mode (even if there is no GPU)!
@martindevans
Copy link
Member

Could you split the csproj file downloading changes to a separate PR? The SkipUnchangedFiles="true" attribute should prevent duplicate downloads, so we'll want to look into that. Having it all in one PR will just slow down merging the embedder changes.

@zsogitbe
Copy link
Contributor Author

Could you split the csproj file downloading changes to a separate PR? The SkipUnchangedFiles="true" attribute should prevent duplicate downloads, so we'll want to look into that. Having it all in one PR will just slow down merging the embedder changes.

That would be a lot of work Martin and I have invested already a lot of time in this (finding the problems and correcting them). It could also cause potential problems because of the split of the commits and then I would have even more work. Please try to check all together, if possible. You can easily test the download solution on your computer and the GitHub tests were successful on all platforms (the new code has found the downloaded models). I have tested the download on my computer with no models and also with existing models and it worked fine with me. Thank you.

We will need to do some cleanup later (some commented out code) because some things are not very clear yet (for example why we do not need Embeddings=true anymore, etc.), but I would prefer to first integrate this PR without further changes, if possible. I have left these in to make sure that we do not forget to do some more research.

Thank you!

@martindevans
Copy link
Member

That's no problem, if it's not just a matter of cherry picking some commits to a new branch don't worry about it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants