Skip to content

[sync #10544] llama/ggml: add LLM training support #13105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Apr 25, 2025

original #10544

This is a rebase of the #10544 PR by @JohannesGaessler on top of the upcoming #12799 (edit: merged now into master). The purpose is only to highlight the necessary changes that need to be applied to #10544.

Testing with:

make -j && ./bin/llama-finetune --file ./wikitext-2-raw/wiki.test.raw --model ../models/llama-3.2-3b/ggml-model-f32.gguf -c 512 -b 512 -ub 512

TODOs:

  • Currently test-backend-ops asserts because ggml_set_param asserts tensor->op == GGML_OP_NONE, but does not take into account that the tensor could be a view.

Sorry, something went wrong.

@github-actions github-actions bot added testing Everything test related examples ggml changes relating to the ggml tensor library for machine learning labels Apr 25, 2025
@ggerganov
Copy link
Member Author

@JohannesGaessler This is a tentative sync - still need to wait for #12799 to get merged. The optimization code in libllama is well implemented and IMO it's OK to merge it as proposed. The optimization context could maybe be separated from the llama_context to improve the design, but it's something that can be done separately.

In #12799, the batch management is delegated to the KV cache object, so I've updated llama_context::opt_epoch_iter to use that.

@ggerganov ggerganov force-pushed the gg/llama-kv-cache-v6 branch 5 times, most recently from 780d6fb to 58115a2 Compare May 2, 2025 10:28
@zhouwg
Copy link
Contributor

zhouwg commented May 2, 2025

this new feature is very helpful for AI beginners(such as me) to understand more details in hard-core AI tech. thanks too much!

@ggerganov ggerganov force-pushed the gg/llama-kv-cache-v6 branch from 58115a2 to 7e79a42 Compare May 2, 2025 13:02
Base automatically changed from gg/llama-kv-cache-v6 to master May 2, 2025 14:48
JohannesGaessler and others added 4 commits May 2, 2025 21:23

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
more compact progress bar

refactor: llama_prepare_sbatch/ubatch

llama_save_model_to_file

gqa_mode arg for repeat_back

llama_opt_param_filter

ggml_graph_dup force_grads

refactor ggml_opt, fix test-opt

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
ggml-ci

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
@ggerganov
Copy link
Member Author

@JohannesGaessler I've rebased this and should be good to update #10544 respectively and merge. Let me know if something does not work as expected.

@JohannesGaessler
Copy link
Collaborator

Thank you. I'll take a look when I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants