You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model is a quantized version of **Meta-Llama-3.1-70B-Instruct** (base model: `meta-llama/Llama-3.1-70B-Instruct`), optimized for efficiency with SFT LoRA training. Key features include:
8
+
- **Rank 256** linear layers with α × 2.0
9
+
- **16384 context length** (multipacked with 32-bit batch size)
10
+
- **Liger fused cross-entropy**
11
+
- **1e-4 learning rate** (50 warmup, 3 epochs)
12
+
- Quantized for deployment (e.g., Q4_K_S, Q8_0)
13
+
14
+
This version is derived from SFT LoRA training on the `Scicom-intl/Malaysian-Instructions` dataset, with source code available at [this link](https://github.com/Scicom-AI-Enterprise-Organization/small-ablation).
0 commit comments