Skip to content

Commit 00269af

Browse files
committed
refactor(README): add more info about flash-attn installation
1 parent a48ce9a commit 00269af

File tree

1 file changed

+49
-7
lines changed

1 file changed

+49
-7
lines changed

README.md

Lines changed: 49 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,12 +37,15 @@ pip install -r requirements.txt
3737

3838
## 🏁 Search Needle Function (SNF)
3939

40-
Search Needle Function is the first RepoQA task which aims to practice LLMs' ability of **long-context code understanding and retrieval**.
41-
Its corresponding real-life application is to perform precise code search from user intent rather than simple keyword match.
40+
Search Needle Function is the first and base RepoQA task which aims to practice LLMs' ability of **long-context code understanding and retrieval**.
41+
Its corresponding real-life scenario is to perform precise code search from function description.
4242

43-
> [!Important]
43+
<details><summary>🔎 More dataset details <i>:: click to expand ::</i></summary>
44+
<div>
45+
46+
> [!Note]
4447
>
45-
> SNF includes 500 tests (5 programming languages x 10 repositories x 10 needle functions) where an LLM is given:
48+
> SNF includes 500 tests (5 programming languages x 10 repos x 10 needle functions) where an LLM is given:
4649
>
4750
> 1. A large code context sorted in file dependency
4851
> 2. A NL description of the needle function without revealing keywords like function names
@@ -51,6 +54,9 @@ Its corresponding real-life application is to perform precise code search from u
5154
> The evaluator passes a test if the searched function is syntactically closest to the ground-truth compared against
5255
> other functions (systematically parsed by `treesitter`) and the similarity is greater than a user defined threshold (by default 0.8).
5356
57+
</div>
58+
</details>
59+
5460
You can run the SNF evaluation using various backends:
5561

5662
### OpenAI Compatible Servers
@@ -74,17 +80,24 @@ repoqa.search_needle_function --model "claude-3-haiku-20240307" --backend anthro
7480
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
7581
```
7682

83+
<details><summary>🔎 Context extension for small-ctx models <i>:: click to expand ::</i></summary>
84+
<div>
85+
7786
> [!Tip]
7887
>
79-
> You can unlock the model's context using [dynamic RoPE scaling](https://blog.eleuther.ai/yarn/#dynamic-scaling).
80-
> For example, `Meta-Llama-3-8B-Instruct` has 8k context but running the default 16k test needs more (approx. 20k).
88+
> There are two ways to unlock a model's context at inference time:
8189
>
82-
> To extend the context to 32k, in its config file (`hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/[hash]/config.json`) set:
90+
> 1. **Direct Extension**: Edit `max_positional_embedding` of the model's `config.json` (e.g., `hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/[hash]/config.json`) to something like `22528`.
91+
> 2. **[Dynamic RoPE Scaling](https://blog.eleuther.ai/yarn/#dynamic-scaling)**:
92+
> To extend `Meta-Llama-3-8B-Instruct` from 8k to 32k (4x), edit the `config.json`:
8393
>
8494
> `"rope_scaling": {"type": "dynamic", "factor": 4.0}`
8595
>
8696
> Note: This works for vLLM `<0.4.3` and HuggingFace transformers. RepoQA will automatically configure dynamic RoPE for vLLM `>= 0.4.3`
8797
98+
</div>
99+
</details>
100+
88101
> [!Note]
89102
>
90103
> Reference evaluation time:
@@ -98,6 +111,35 @@ repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
98111
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code
99112
```
100113

114+
> [!Tip]
115+
>
116+
> Installing [flash-attn](https://github.com/Dao-AILab/flash-attention) and
117+
> additionally set `--attn-implementation "flash_attention_2"` can largely
118+
> lower the memory requirement.
119+
120+
<details><summary>🔨 Having trouble installing `flash-attn`? <i>:: click to expand ::</i></summary>
121+
<div>
122+
123+
> [!Tip]
124+
>
125+
> If you have trouble with `pip install flash-attn --no-build-isolation`,
126+
> you can try to directly use [pre-built wheels](https://github.com/Dao-AILab/flash-attention/releases):
127+
>
128+
> ```
129+
> export FLASH_ATTN_VER=2.5.8 # check latest version at https://github.com/Dao-AILab/flash-attention/releases
130+
> export CUDA_VER="cu122" # check available ones at https://github.com/Dao-AILab/flash-attention/releases
131+
> export TORCH_VER=$(python -c "import torch; print('.'.join(torch.__version__.split('.')[:2]))")
132+
> export PY_VER=$(python -c "import platform; print(''.join(platform.python_version().split('.')[:2]))")
133+
> export OS_ARCH=$(python -c "import platform; print(f'{platform.system().lower()}_{platform.machine()}')")
134+
>
135+
> export WHEEL=flash_attn-${FLASH_ATTN_VER}+${CUDA_VER}torch${TORCH_VER}cxx11abiFALSE-cp${PY_VER}-cp${PY_VER}-${OS_ARCH}.whl
136+
> wget https://github.com/Dao-AILab/flash-attention/releases/download/v${FLASH_ATTN_VER}/${WHEEL}
137+
> pip install ${WHEEL}
138+
> ```
139+
140+
</div>
141+
</details>
142+
101143
### Google Generative AI API (Gemini)
102144
103145
```bash

0 commit comments

Comments
 (0)