Details that do not depend upon whether you are running on CPU for Linux, Windows, or macOS.
Default llama.cpp model is LLaMa2 GPTQ model from TheBloke:
-
Run LLaMa.cpp LLaMa2 model:
With documents in
user_path
folder, run:# if don't have wget, download to repo folder using below link wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf python generate.py --base_model='llama' --prompt_type=llama2 --score_model=None --langchain_mode='UserData' --user_path=user_path
For another llama.cpp model:
- Choose from TheBloke, then with documents in
user_path
folder, run:Forpython generate.py --base_model=llama --model_path_llama=https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf --score_model=None --langchain_mode='UserData' --user_path=user_path
llama.cpp
based models on CPU, for computers with low system RAM or slow CPUs, we recommend running:python generate.py --base_model=llama --model_path_llama=https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf --llamacpp_dict="{'use_mlock':False,'n_batch':256}" --max_seq_len=512 --score_model=None --langchain_mode='UserData' --user_path=user_path
-
Choose Model from GPT4All Model explorer GPT4All-J compatible model. One does not need to download manually, the GPT4ALL package will download at runtime and put it into
.cache
like Hugging Face would. -
With documents in
user_path
folder, run:python generate.py --base_model=gptj --model_path_gptj=ggml-gpt4all-j-v1.3-groovy.bin --score_model=None --langchain_mode='UserData' --user_path=user_path
or
python generate.py --base_model=gpt4all_llama --model_name_gpt4all_llama=ggml-wizardLM-7B.q4_2.bin --score_model=None --langchain_mode='UserData' --user_path=user_path
However, gpjt
model often gives no output, even outside h2oGPT. See GPT4All for details on installation instructions if you encounter any issues.
For more information about low-memory recommendations, see Low Memory.