ExtAgents is a framework for scaling external knowledge input beyond the context length of LLMs via multi-agent collaboration.
conda create -n extagents python=3.10 -y
conda activate extagents
pip install -r requirements.txtYou can download the data with the script:
bash scripts/download_data.shOr you can download the data manually from one of the following links:
The data should be organized as follows:
./
└── data/
├── sampled_hotpot_questions.json
├── rag_1000k.jsonl
├── longbook_qa_eng.jsonl
└── longbook_qa_chn.jsonlUpdate: We have uploaded the long context Q&A datasets both before and after enhancement to Hugging Face. original refers to the original dataset, full-enhanced indicates the fully enhanced dataset, and partial-enhanced signifies the partially enhanced dataset, where only samples with a length not exceeding 128k tokens have been augmented. Welcome to download and use!
We currently support three tasks: RAG, En.QA, Zh.QA.
The RAG task is a question answering task, where the input is a question and a context. The question and answer are sampled from the HotpotQA. The context is a long text, which is the concatenation of documents retrieved from Wikipedia using BM25 embedding. We use KILT knowledge source as our knowledge source. It is based on the 2019/08/01 Wikipedia dump. We have provided the context in the data/rag_1000k.jsonl file.
The En.QA and Zh.QA tasks are question answering tasks, where the input is a question and a context. The question, answer and context are from the InfiniteBench.
Here is an example command to generate predictions for RAG task:
python main.py \
--task rag \
--output_dir results_rag \
--chunk_length 8000 \
--input_length 128000 \
--api_url "YOUR_API_URL" \
--api_key "YOUR_API_KEY" \
--model "gpt-4o-mini-2024-07-18" \
--num_workers 8 \
> rag.logThe generated predictions will be saved in the results_rag directory.
--task: Task, can berag,en,zh.--output_dir: Directory to save the generated predictions.--chunk_length: Chunk length.--input_length: Input length.--model: Model to use, default isgpt-4o-mini-2024-07-18.--api_url: Your API URL, default is os.getenv("OPENAI_BASE_URL").--api_key: Your API Key, default is os.getenv("OPENAI_API_KEY").--num_workers: Number of workers, each worker will process one example.
You can also set the environment variables OPENAI_BASE_URL and OPENAI_API_KEY to avoid typing them in the command line.
export OPENAI_BASE_URL="YOUR_API_URL"
export OPENAI_API_KEY="YOUR_API_KEY"We provide a script to evaluate the generated predictions. For RAG task, the evaluation is based on the HotpotQA. For En.QA and Zh.QA task, the evaluation is based on the InfiniteBench.
For RAG task:
bash scripts/eval_rag.sh /path/to/your/output_dirFor En.QA task:
bash scripts/eval_en.sh /path/to/your/output_dirFor Zh.QA task:
bash scripts/eval_zh.sh /path/to/your/output_dirIf you find this project helpful, please cite it as follows:
@article{liu2025extagents,
title={Scaling External Knowledge Input Beyond The Context Length of LLMs via Multi-Agent Collaboration},
author={Zijun Liu and Zhennan Wan and Peng Li and Ming Yan and Ji Zhang and Fei Huang and Yang Liu},
year={2025}
}