LLM4HWDesign Starting Toolkit

Introduction

This repository provides a starting toolkit for participants of the LLM4HWDesign Contest at ICCAD 2024. The toolkit includes scripts and utilities for deduplicating training data, fine-tuning models, and evaluating model performance. Participants can use this toolkit to kickstart their work and streamline their development process.

Base Dataset

The base dataset used in the contest is the MG-Verilog dataset. For your submitted data, please follow the same format as the MG-Verilog dataset. Please note that you can either provide multiple levels or a single level of description for each code sequence, but we will concatenate all descriptions at different levels into one string for each code sequence following the script below.

instructions_dict = {
    "summary": "xxx",
    "detailed explanation": "yyy"
}

result = ";\n".join([f"{key}: {value}" for key, value in instructions_dict.items()]) + "\n"

'''
result should be
summary: xxx;
detailed explanation: yyy
'''

Toolkit Release Progress

Deduplication: Scripts to identify and remove duplicate samples from the dataset.
Fine-tuning: Scripts to fine-tune a pretrained language model on the MG-Verilog dataset.
Evaluation: Tools to evaluate the performance of the fine-tuned model using standard metrics.

Setup Environment

We assume CUDA 12.1. (Only needed if you want to do fine-tuning and evaluation on your own.)

conda env create -f environment.yml

Deduplication

The toolkit includes a deduplication script, which will be used to deduplicate each participant's data against the base dataset during the evaluation of Phase I. To run the deduplication script:

python minhash.py

Evaluation

The following shows an example on how to evaluate your fine-tuned model.

Prerequisites:

export HF_TOKEN=your_huggingface_token

Prepare your fine-tuned model and tokenizer in HuggingFace format.

Install Iverilog

Install VerilogEval as the benchmark:

Please read the WARNINGS in the VerilogEval before proceeding

Only support 1.0 Version https://github.com/NVlabs/verilog-eval/tree/release/1.0.0

Pay attention to the "verilog-eval" or "verilog_eval" which is used in mg-verilog's own midified VerilogEval

git clone https://github.com/NVlabs/verilog-eval.git
pip install -e verilog-eval

Evaluation Scripts:

cd model_eval
./gen.sh <path_to_folder_with_your_model_and_config> <your_huggingface_token>
#example: ./gen.sh "finetuned_model/" "hf-xxxxxxxxxx"

NOTE: The folder with your model and config should include two files (1) the generated pytorch_model.bin and (2) the model config of CodeLlama-7B-Instruct from HuggingFace

The results will be printed and logged in ./model_eval/data/gen.jsonl

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
model_eval		model_eval
model_finetune		model_finetune
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
minhash.py		minhash.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM4HWDesign Starting Toolkit

Introduction

Base Dataset

Toolkit Release Progress

Setup Environment

Deduplication

Evaluation

About

Releases

Packages

Contributors 3

Languages

GATECH-EIC/LLM4HWDesign_Starting_Toolkit

Folders and files

Latest commit

History

Repository files navigation

LLM4HWDesign Starting Toolkit

Introduction

Base Dataset

Toolkit Release Progress

Setup Environment

Deduplication

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages