[ English | 日本語 ]
In this repository, we primarily release implementations of fast batch inference processing for llm-jp-eval using the following libraries: For installation and inference execution, please refer to the README.md within each module.
- vLLM
- TensorRT-LLM
- Hugging Face Transformers (baseline)
In addition, a tool for run management using Weights & Biases are published in wandb_run_management.
Please refer to the Inference Execution Method and Evaluation Method in llm-jp-eval.