Implementation of Decom-Renorm-Merge (DRM) from our paper: Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking.
DRM is a model merging technique that combines the capabilities of multiple fine-tuned models into a single multitasking model through a shared representation space.
- Implementation of both DRM-V (
vertical) and DRM-H (horizontal). - Support for a wide range of Hugging Face
transformersmodels, including decoder-only, encoder-only, and encoder-decoder architectures. - Support for merging both fully fine-tuned models and PEFT models (e.g. LoRA).
- Highly configurable merging process via a single YAML file.
- Pytorch-based, designed for clarity and extensibility.
- Trained models are released to facilitate further research.
First, clone the repository:
git clone https://github.com/yophis/decom-renorm-merge.git
cd decom-renorm-mergeNext, create a virtual environment (Python 3.10 is recommended):
python -m venv drm
source drm/bin/activateFinally, install the required dependencies:
pip install -r requirements.txtThe merging process is controlled by a single command that points to a configuration file.
-
Prepare a configuration file. Create a
.yamlfile (e.g.,config.yaml) detailing the models to merge, the base model, and the DRM hyperparameters. See the section below for a detailed explanation. -
Run the merging script. Execute the main script from the root directory of the project:
python -m drm.merge_models --config-path /path/to/your/config.yaml
After the script finishes, the merged model will be saved to the directory specified by save_path in your configuration file.
To encourage further research, we are releasing all the finetuned checkpoints used in our paper on HuggingFace Hub.
These are the individual models, each fine-tuned on a specific task, that serve as the input for the merging process.
| Fine-Tuning Dataset | HuggingFace Hub ID |
|---|---|
| QNLI | yophis/DRM-Llama-3.1-8B-qnli |
| SST2 | yophis/DRM-Llama-3.1-8B-sst2 |
| RTE | yophis/DRM-Llama-3.1-8B-rte |
| MNLI | yophis/DRM-Llama-3.1-8B-mnli |
| CoLA | yophis/DRM-Llama-3.1-8B-cola |
| Fine-Tuning Dataset | HuggingFace Hub ID |
|---|---|
| WinoGrande | yophis/DRM-DeBERTa-v3-Base-winogrande |
| StoryCloze | yophis/DRM-DeBERTa-v3-Base-storycloze |
| QASC | yophis/DRM-DeBERTa-v3-Base-qasc |
| WikiQA | yophis/DRM-DeBERTa-v3-Base-wikiqa |
| QuaRTz | yophis/DRM-DeBERTa-v3-Base-quartz |
| PAWS | yophis/DRM-DeBERTa-v3-Base-paws |
| Fine-Tuning Dataset | HuggingFace Hub ID |
|---|---|
| WinoGrande | yophis/DRM-T5-Base-winogrande |
| StoryCloze | yophis/DRM-T5-Base-storycloze |
| QASC | yophis/DRM-T5-Base-qasc |
| WikiQA | yophis/DRM-T5-Base-wikiqa |
| QuaRTz | yophis/DRM-T5-Base-quartz |
| PAWS | yophis/DRM-T5-Base-paws |
| Fine-Tuning Dataset | HuggingFace Hub ID |
|---|---|
| WinoGrande | yophis/DRM-T5-Large-winogrande |
| StoryCloze | yophis/DRM-T5-Large-storycloze |
| QASC | yophis/DRM-T5-Large-qasc |
| WikiQA | yophis/DRM-T5-Large-wikiqa |
| QuaRTz | yophis/DRM-T5-Large-quartz |
| PAWS | yophis/DRM-T5-Large-paws |
The entire merging process is controlled by a YAML config file. Below is an example and a detailed breakdown of all parameters.
# A list of models to be merged.
models:
- model: allenai/Llama-3.1-Tulu-3-8B # HuggingFace Hub ID or local path
parameters:
coefficient: 1.0
- model: allenai/Llama-3.1-Tulu-3.1-8B
parameters:
coefficient: 1.0
# The base model used to compute weight deltas.
base_model: allenai/Llama-3.1-Tulu-3-8B-DPO
# DRM-specific hyperparameters and settings.
merging_config:
# The core pruning ratio for the decomposed singular matrices (U or Vh).
# This is the main hyperparameter for DRM.
singular_matrices_drop_rate: 0.8
# Direction of the joint decomposition: "vertical" for DRM-V, "horizontal" for DRM-H.
direction: vertical
# Regex pattern to identify linear parameter weights (e.g. FFN layers).
# DRM is primarily applied to these layers.
linear_parameter_regex_pattern:
- ".*weight.*"
# Regex pattern to exclude certain linear parameters (e.g. embeddings).
linear_parameter_ignore_regex_pattern:
- ".*embed_tokens.*"
- ".*lm_head.*"
# Regex pattern of modules to ignore entirely during merging (e.g. a classification head module).
ignore_module_regex_pattern: []
# Enable disjoint averaging.
enable_disjoint_mean: true
# Enable sign resolution.
enable_sign_resolution: true
# Pruning/trim rate for non-linear modules (e.g., biases, layer norms).
non_linear_module_entries_drop_rate: 0.0
# Computation dtype for SVD. Use float32 for stability.
dtype: float32
# Path to save the final merged model.
save_path: ./tulu-drm-v-
models: A list of models to be merged. Each entry contains:model: The Hugging Face Hub repository ID or a local path to the fine-tuned model.parameters.coefficient: (Optional, default:1.0) The weighting factor for this model during the final averaging step.
-
base_model: The path or Hub ID of the base model that was used for fine-tuning. This is crucial for calculating the weight deltas (fine_tuned_model - base_model). -
merging_config: A dictionary of parameters that control the DRM algorithm.singular_matrices_drop_rate: The fraction of entries to prune (zero out) in the decomposed and renormalized singular matrices (UorV). This is the primary hyperparameter for controlling sparsity and performance in DRM. A value of0.8means 80% of the entries will be pruned, keeping the top 20%.direction: Determines the SVD strategy."vertical": Concatenates weight deltas row-wise (DRM-V). Aligns models into a shared row space."horizontal": Concatenates weight deltas column-wise (DRM-H). Aligns models into a shared column space.
linear_parameter_regex_pattern: A list of regex patterns used to identify the 2D weight matrices (e.g.,mlp.fc1.weight) where DRM will be applied.linear_parameter_ignore_regex_pattern: A list of regex patterns to exclude from the DRM process, even if they match the pattern above. This is useful for avoiding modifications to embedding or language model head layers.ignore_module_regex_pattern: A list of regex patterns to completely exclude certain modules from the entire merging process.enable_disjoint_mean: Iftrue, uses disjoint averaging (zeros are ignored) from TIES-Merging for the final merge.enable_sign_resolution: Iftrue, resolves sign conflicts across models using the method from TIES-Merging.non_linear_module_entries_drop_rate: The pruning ratio for all other parameters not matched as linear weights (e.g., biases, LayerNorm weights).dtype: The data type (float32,bfloat16,float16) to use for the SVD computation.float32is recommended for numerical stability.
-
save_path: The local directory where the final merged model will be saved in Hugging Face format.
If you find DRM useful, please consider citing our paper:
@article{chaichana2025decom,
title={Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking},
author={Chaichana, Yuatyong and Trachu, Thanapat and Limkonchotiwat, Peerat and Preechakul, Konpat and Khandhawit, Tirasan and Chuangsuwanich, Ekapol},
journal={arXiv preprint arXiv:2505.23117},
year={2025}
}