Skip to content

Implementation of generate and integration of lm_eval (evaluation harness) #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking β€œSign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
b8f6b62
changes for debugging
Apr 3, 2025
4022249
temporal hack to save logits
Apr 8, 2025
505e658
simple generate function and less hacky way to save logits
Apr 8, 2025
a53855b
moved mkdir
Apr 8, 2025
c14ca4d
fast llm classes
bigximik Apr 8, 2025
6a72203
refactored logits saving, test and added hidden_test return from the …
Apr 11, 2025
e713aa2
added notebook to check logits and hidden states diffs
Apr 11, 2025
04e914c
Merge branch 'denis/generate' of github.com:ServiceNow/Fast-LLM into …
Apr 11, 2025
51f59f8
fix to a document mask for attention mask
Apr 14, 2025
9c01471
fix for an absent attention_mask
Apr 15, 2025
7f1ca8a
updated classes and funcions naming, removed temporal param from init
Apr 17, 2025
0488fdb
updated manual test
Apr 17, 2025
c65d9ba
evaluation abstraction implementation
Apr 18, 2025
1543a56
fixes for evaluation only in trainer
Apr 18, 2025
dc2b5e0
added evaluate command
Apr 18, 2025
0cb3ad7
lm_eval integration, one gpu
Apr 20, 2025
c86ae20
fixing typos
Apr 21, 2025
145ee50
fixes to make lm_eval reporting to work with wrapper object instead o…
Apr 21, 2025
85b19d8
comments and some code formatting
Apr 22, 2025
b5603ed
merge from main
Apr 29, 2025
66a45ca
steps towards distributed inference
bigximik Apr 29, 2025
938a273
more manual tests
bigximik Apr 29, 2025
eb734d9
partial implementation of data parallel lm_eval integration
bigximik May 2, 2025
4e2175a
more communication primitives added
bigximik May 5, 2025
d1addda
temporarily create hf model wrapper in training the same as standalone
bigximik May 5, 2025
a880cd3
finished batch data parallel support for lm_eval integration
bigximik May 5, 2025
4b148e0
cleaned up lm_eval arg parser, partially wrapper and renamed wrapper …
bigximik May 6, 2025
047852e
removed HF hub params handling and tokenizer parallelism setting
bigximik May 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
400 changes: 400 additions & 0 deletions check_logits_hidden_layers.ipynb

Large diffs are not rendered by default.

Binary file added classes_fast_llm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
77 changes: 77 additions & 0 deletions examples/qwen_evaluate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
training:
train_iters: 100_000
logs:
interval: 10
evaluations:
gsm8k:
type: lm_eval
cli_args:
- --tasks
- gsm8k
- --output_path
- /mnt/checkpoints/test/denis/qwen_eval_experiment/lm_eval
# stack_3b:
# iterations: 10
# interval: 10
# fineweb:
# iterations: 10
# interval: 10
checkpoint:
interval: 1000
keep: 5
test_iters: 0
export: # (1)!
format: llama
interval: 20_000
batch:
micro_batch_size: 16
sequence_length: 4096
batch_size: 32
data:
tokenizer:
path: /mnt/checkpoints/pretrained_models/Qwen2-1.5B-Instruct
bos_token: "<|endoftext|>"
datasets:
# Bad dataset they are tokenized with different tokenizer, then llama
training:
type: file
path: /mnt/datasets/test/denis/fineweb_the_stack_3b.yaml
stack_3b:
type: memmap
path: /mnt/datasets/data_collections/the_stack_3b/tokens/stack_3b/default/train/99
fineweb:
type: memmap
path: /mnt/datasets/data_collections/standalone_datasets/tokens/HuggingFaceFW/fineweb/default/train/9_1000
optimizer:
weight_decay: 0.1
beta_1: 0.9
beta_2: 0.95
learning_rate:
base: 1.0e-04 # (3)!
minimum: 1.0e-05
decay_style: cosine
decay_iterations: 100_000
warmup_iterations: 2000
pretrained: # (4)!
format: qwen2
path: /mnt/checkpoints/pretrained_models/Qwen2-1.5B-Instruct
model_weights: yes # (5)!
model:
base_model:
transformer:
use_flash_attention: yes
cross_entropy_impl: fused
multi_stage:
zero_stage: 2
distributed:
training_dtype: bf16

run:
experiment_dir: "/mnt/checkpoints/test/denis/qwen_eval_experiment"

# training:
# logs:
# interval: 10
# wandb:
# project_name: ${job.project_name}
# group_name: ${job.project_version}
77 changes: 77 additions & 0 deletions examples/smol_evaluate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
training:
train_iters: 100_000
logs:
interval: 10
evaluations:
gsm8k:
type: lm_eval
cli_args:
- --tasks
- gsm8k
- --output_path
- /mnt/checkpoints/test/denis/smol_eval_experiment/lm_eval
# stack_3b:
# type: loss
# iterations: 10
# interval: 10
# fineweb:
# iterations: 10
# interval: 10
checkpoint:
interval: 1000
keep: 5
test_iters: 0
export: # (1)!
format: llama
interval: 20_000
batch:
micro_batch_size: 16
sequence_length: 4096
batch_size: 32
data:
tokenizer:
path: /mnt/checkpoints/pretrained_models/SmolLM2-135M-Instruct
datasets:
# Bad dataset they are tokenized with different tokenizer, then llama
training:
type: file
path: /mnt/datasets/test/denis/fineweb_the_stack_3b.yaml
stack_3b:
type: memmap
path: /mnt/datasets/data_collections/the_stack_3b/tokens/stack_3b/default/train/99
fineweb:
type: memmap
path: /mnt/datasets/data_collections/standalone_datasets/tokens/HuggingFaceFW/fineweb/default/train/9_1000
optimizer:
weight_decay: 0.1
beta_1: 0.9
beta_2: 0.95
learning_rate:
base: 1.0e-04 # (3)!
minimum: 1.0e-05
decay_style: cosine
decay_iterations: 100_000
warmup_iterations: 2000
pretrained: # (4)!
format: llama
path: /mnt/checkpoints/pretrained_models/SmolLM2-135M-Instruct/
model_weights: yes # (5)!
model:
base_model:
transformer:
use_flash_attention: yes
cross_entropy_impl: fused
multi_stage:
zero_stage: 2
distributed:
training_dtype: bf16

run:
experiment_dir: "/mnt/checkpoints/test/denis/smol_eval_experiment"

# training:
# logs:
# interval: 10
# wandb:
# project_name: ${job.project_name}
# group_name: ${job.project_version}
Loading