Skip to content

Latest commit

 

History

History
110 lines (86 loc) · 5.37 KB

File metadata and controls

110 lines (86 loc) · 5.37 KB

Running Intel-HabanaLabs MLPerf™ Llama-70B LoRA Benchmark

This directory provides instructions to reproduce Intel-HabanaLabs's results for MLPerf Training v4.0 Llama-70B LoRA benchmark on single server with 8 Gaudi2 cards.

For more information on training deep learning models using Gaudi, refer to developer.habana.ai

MLPerf™ is a trademark and service mark of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited.

Setup

Make sure to have requested permission for donwloading Llama2 weights on the Hugging Face Hub: https://huggingface.co/meta-llama/Llama-2-7b-hf

Prepare MLPerf Directory

On each compute node, perform the following:

  1. Follow the instructions provided in the Gaudi Installation Guide to set up the environment including the $PYTHON environment variable. The guide will walk you through the process of setting up your system to run the benchmarks on Gaudi.

  2. Create directories for dataset and logs:

    export MLPERF_DIR=/path/to/mlperf/root
    export DATASETS_DIR=/path/to/datasets
    export MODEL_DIR=/path/to/model
    mkdir -p $MLPERF_DIR/Intel-HabanaLabs $MODEL_DIR $DATASETS_DIR
    
  3. This README is located in benchmarks/llm_finetune directory corresponding to Intel-HabanaLabs's Llama-70B LoRA submission. Download this whole benchmarks folder along with all subfolders and copy it under $MLPERF_DIR/Intel-HabanaLabs

Build and Deploy Intel-HabanaLabs MLPerf Training 4.0 Container

  1. Create mlperf4.0 container by running the following command.
  • TODO: update DOCKER_IMAGE once it is known and published.

    export CONTAINER_NAME=mlperf4.0
    export DOCKER_IMAGE=vault.habana.ai/gaudi-docker-mlperf/ver4.0/pytorch-installer-2.2.0:1.16.98-46
    docker run --privileged --security-opt seccomp=unconfined \
      --name $CONTAINER_NAME -td                              \
      -v /dev:/dev                                            \
      --device=/dev:/dev                                      \
      -e LOG_LEVEL_ALL=6                                      \
      -v /sys/kernel/debug:/sys/kernel/debug                  \
      -v /tmp:/tmp                                            \
      -v $MLPERF_DIR:/root/MLPERF                             \
      -v $DATASETS_DIR:/root/datasets                         \
      -v $MODEL_DIR:/root/model                               \
      --cap-add=sys_nice --cap-add=SYS_PTRACE                 \
      --user root --workdir=/root --net=host                  \
      --ulimit memlock=-1:-1 ${DOCKER_IMAGE}
  1. Start the docker.

    docker exec $CONTAINER_NAME bash -c "service ssh start"
    docker exec -it $CONTAINER_NAME bash

Download Data and Model

MLCommons hosts the model for download exclusively by MLCommons Members. You must first agree to the confidentiality notice, then follow the [link[(https://drive.google.com/drive/folders/11tBZvvrh0FCm3XuR5E849K42TqftYdUF)] to a directory containing Rclone download instructions. Follow steps 1-3 to install and activate Rclone. Finally, download the model to the desired download directory (default ./models): Log into mlperf4.0 container and run:

rclone copy mlc-llama2:Llama2-70b-fused-qkv-mlperf /root/model/Llama2-70b-fused-qkv-mlperf -P

Similarly download the data to the desired download directory (default ./dataset):

rclone copy mlc-llama2:training/scrolls_gov_report_8k /root/datasets/scrolls_gov_report_8k -P

Finetuning Llama2 70B with LoRA

  1. Inside the mlperf4.0 container, install requirements:
pip install git+https://github.com/HabanaAI/[email protected]
pip install git+https://github.com/HabanaAI/optimum-habana-fork.git@cef6209
pip install -r  /root/MLPERF/Intel-HabanaLabs/benchmarks/llm_finetune/requirements.txt
huggingface-cli login
  1. Create device warmup data:
cd /root/datasets/scrolls_gov_report_8k
python /root/MLPERF/Intel-HabanaLabs/benchmarks/llm_finetune/scripts/create_warmup_data.py
  1. Run the training.
cd /root/MLPERF/Intel-HabanaLabs/benchmarks/llm_finetune/
cp /root/MLPERF/Intel-HabanaLabs/benchmarks/llm_finetune/config.json /root/model/Llama2-70b-fused-qkv-mlperf/
./run_llama_70B_fp8_submission.sh

Supported Configurations

Validated on Intel Gaudi Software Version Framework Version(s) Mode
Gaudi 2 1.18.0 PyTorch 2.4.0 Training