@@ -37,12 +37,15 @@ pip install -r requirements.txt
3737
3838## 🏁 Search Needle Function (SNF)
3939
40- Search Needle Function is the first RepoQA task which aims to practice LLMs' ability of ** long-context code understanding and retrieval** .
41- Its corresponding real-life application is to perform precise code search from user intent rather than simple keyword match .
40+ Search Needle Function is the first and base RepoQA task which aims to practice LLMs' ability of ** long-context code understanding and retrieval** .
41+ Its corresponding real-life scenario is to perform precise code search from function description .
4242
43- > [ !Important]
43+ <details ><summary >🔎 More dataset details <i >:: click to expand ::</i ></summary >
44+ <div >
45+
46+ > [ !Note]
4447>
45- > SNF includes 500 tests (5 programming languages x 10 repositories x 10 needle functions) where an LLM is given:
48+ > SNF includes 500 tests (5 programming languages x 10 repos x 10 needle functions) where an LLM is given:
4649>
4750> 1 . A large code context sorted in file dependency
4851> 2 . A NL description of the needle function without revealing keywords like function names
@@ -51,6 +54,9 @@ Its corresponding real-life application is to perform precise code search from u
5154> The evaluator passes a test if the searched function is syntactically closest to the ground-truth compared against
5255> other functions (systematically parsed by ` treesitter ` ) and the similarity is greater than a user defined threshold (by default 0.8).
5356
57+ </div >
58+ </details >
59+
5460You can run the SNF evaluation using various backends:
5561
5662### OpenAI Compatible Servers
@@ -74,17 +80,24 @@ repoqa.search_needle_function --model "claude-3-haiku-20240307" --backend anthro
7480repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" --backend vllm
7581```
7682
83+ <details ><summary >🔎 Context extension for small-ctx models <i >:: click to expand ::</i ></summary >
84+ <div >
85+
7786> [ !Tip]
7887>
79- > You can unlock the model's context using [ dynamic RoPE scaling] ( https://blog.eleuther.ai/yarn/#dynamic-scaling ) .
80- > For example, ` Meta-Llama-3-8B-Instruct ` has 8k context but running the default 16k test needs more (approx. 20k).
88+ > There are two ways to unlock a model's context at inference time:
8189>
82- > To extend the context to 32k, in its config file (` hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/[hash]/config.json ` ) set:
90+ > 1 . ** Direct Extension** : Edit ` max_positional_embedding ` of the model's ` config.json ` (e.g., ` hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/[hash]/config.json ` ) to something like ` 22528 ` .
91+ > 2 . ** [ Dynamic RoPE Scaling] ( https://blog.eleuther.ai/yarn/#dynamic-scaling ) ** :
92+ > To extend ` Meta-Llama-3-8B-Instruct ` from 8k to 32k (4x), edit the ` config.json ` :
8393>
8494> ` "rope_scaling": {"type": "dynamic", "factor": 4.0} `
8595>
8696> Note: This works for vLLM ` <0.4.3 ` and HuggingFace transformers. RepoQA will automatically configure dynamic RoPE for vLLM ` >= 0.4.3 `
8797
98+ </div >
99+ </details >
100+
88101> [ !Note]
89102>
90103> Reference evaluation time:
@@ -98,6 +111,35 @@ repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
98111repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code
99112```
100113
114+ > [ !Tip]
115+ >
116+ > Installing [ flash-attn] ( https://github.com/Dao-AILab/flash-attention ) and
117+ > additionally set ` --attn-implementation "flash_attention_2" ` can largely
118+ > lower the memory requirement.
119+
120+ <details ><summary >🔨 Having trouble installing `flash-attn`? <i >:: click to expand ::</i ></summary >
121+ <div >
122+
123+ > [ !Tip]
124+ >
125+ > If you have trouble with ` pip install flash-attn --no-build-isolation ` ,
126+ > you can try to directly use [ pre-built wheels] ( https://github.com/Dao-AILab/flash-attention/releases ) :
127+ >
128+ > ```
129+ > export FLASH_ATTN_VER=2.5.8 # check latest version at https://github.com/Dao-AILab/flash-attention/releases
130+ > export CUDA_VER="cu122" # check available ones at https://github.com/Dao-AILab/flash-attention/releases
131+ > export TORCH_VER=$(python -c "import torch; print('.'.join(torch.__version__.split('.')[:2]))")
132+ > export PY_VER=$(python -c "import platform; print(''.join(platform.python_version().split('.')[:2]))")
133+ > export OS_ARCH=$(python -c "import platform; print(f'{platform.system().lower()}_{platform.machine()}')")
134+ >
135+ > export WHEEL=flash_attn-${FLASH_ATTN_VER}+${CUDA_VER}torch${TORCH_VER}cxx11abiFALSE-cp${PY_VER}-cp${PY_VER}-${OS_ARCH}.whl
136+ > wget https://github.com/Dao-AILab/flash-attention/releases/download/v${FLASH_ATTN_VER}/${WHEEL}
137+ > pip install ${WHEEL}
138+ > ```
139+
140+ </div>
141+ </details>
142+
101143### Google Generative AI API (Gemini)
102144
103145```bash
0 commit comments