Docker LLaMA2 Chat / 羊驼二代

ENGLISH | 中文文档

Play! Together! ONLY 3 STEPS!

Get started quickly, locally using the 7B or 13B models, using Docker.

Meta Llama2, tested by 4090, and costs 8~14GB vRAM.
Chinese Llama2 quantified, tested by 4090, and costs 5GB vRAM.
Use GGML(LLaMA.cpp), just use CPU play it.

Preview

Blogs

Use Docker to quickly get started with the official version of Llama2 Open-source Large Model
Use Docker to quickly get started with the chinese version of Llama2 Open-source Large Model
Quantizing MetaAI Llama2 chinese version large models using Transformers
Build Llama2 chinese large model that can run on CPU

Usage

Build LLaMA2 Docker image for 7B / 13B (official), 7B or 7B INT4 (chinese):

# 7B
bash scripts/make-7b.sh

# OR 13B
bash scripts/make-13b.sh

# OR 7B Chinese
bash scripts/make-7b-cn.sh

# OR 7B Chinese 4bit
bash scripts/make-7b-cn-4bit.sh

Download LLaMA2 Models from HuggingFace, or chinese models.

# MetaAI LLaMA2 Models (10~14GB vRAM)
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
git clone https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

mkdir meta-llama
mv Llama-2-7b-chat-hf meta-llama/
mv Llama-2-13b-chat-hf meta-llama/

# OR Chinese LLaMA2 (10~14GB vRAM)
git clone https://huggingface.co/LinkSoul/Chinese-Llama-2-7b

mkdir LinkSoul
mv Chinese-Llama-2-7b LinkSoul/

# OR Chinese LLaMA2 4BIT (5GB vRAM)
git clone https://huggingface.co/soulteary/Chinese-Llama-2-7b-4bit

mkdir soulteary
mv Chinese-Llama-2-7b-4bit soulteary/

keep the correct directory structure.

tree -L 2 meta-llama
soulteary
└── ...
LinkSoul
└── ...
meta-llama
├── Llama-2-13b-chat-hf
│   ├── added_tokens.json
│   ├── config.json
│   ├── generation_config.json
│   ├── LICENSE.txt
│   ├── model-00001-of-00003.safetensors
│   ├── model-00002-of-00003.safetensors
│   ├── model-00003-of-00003.safetensors
│   ├── model.safetensors.index.json
│   ├── pytorch_model-00001-of-00003.bin
│   ├── pytorch_model-00002-of-00003.bin
│   ├── pytorch_model-00003-of-00003.bin
│   ├── pytorch_model.bin.index.json
│   ├── README.md
│   ├── Responsible-Use-Guide.pdf
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.model
│   └── USE_POLICY.md
└── Llama-2-7b-chat-hf
    ├── added_tokens.json
    ├── config.json
    ├── generation_config.json
    ├── LICENSE.txt
    ├── model-00001-of-00002.safetensors
    ├── model-00002-of-00002.safetensors
    ├── model.safetensors.index.json
    ├── models--meta-llama--Llama-2-7b-chat-hf
    ├── pytorch_model-00001-of-00003.bin
    ├── pytorch_model-00002-of-00003.bin
    ├── pytorch_model-00003-of-00003.bin
    ├── pytorch_model.bin.index.json
    ├── README.md
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    ├── tokenizer.json
    ├── tokenizer.model
    └── USE_POLICY.md

Run Llama2 model in docker command:

# 7B
bash scripts/run-7b.sh
# OR 13B
bash scripts/run-13b.sh
# OR Chinese 7B
bash scripts/run-7b-cn.sh
# OR Chinese 7B 4BIT
bash scripts/run-7b-cn-4bit.sh

enjoy, open http://localhost7860 or http://ip:7860 and play with the LLaMA2!

Credit

MetaAI LLaMA2: https://ai.meta.com/llama/ ❤️
Meta LLaMA2 7B Chat: https://huggingface.co/meta-llama/Llama-2-7b-chat
Meta LLaMA2 13B Chat: https://huggingface.co/meta-llama/Llama-2-13b-chat
Chinese LLaMA2 7B: https://huggingface.co/LinkSoul/Chinese-Llama-2-7b ❤️
Chinese LLaMA2 7B GGML q4: https://huggingface.co/soulteary/Chinese-Llama-2-7b-ggml-q4
LLaMA2 GGML Converter: https://hub.docker.com/r/soulteary/llama2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

Docker LLaMA2 Chat / 羊驼二代

Preview

Blogs

Usage

Credit

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Docker LLaMA2 Chat / 羊驼二代

Preview

Blogs

Usage

Credit