update README

RoyalSkye · RoyalSkye · commit a56b79bc94e5 · 2024-05-02T10:35:27.000+08:00
diff --git a/README.md b/README.md
@@ -1,7 +1,118 @@
-# Routing_Anything
+<h1 align="center"> MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts </h1>
 
-## Model Structure
+<p align="center">
+      <a href="https://openreview.net/forum?id=lsQnneYa8p"><img src="https://img.shields.io/static/v1?label=OpenReview&message=Forum&color=green&style=flat-square" alt="Paper"></a>&nbsp;&nbsp;&nbsp;&nbsp;<a href=""><img alt="License" src="https://img.shields.io/static/v1?label=ICML'24&message=Vienna&color=9cf&style=flat-square"></a>&nbsp;&nbsp;&nbsp;&nbsp;<a href="https://github.com/RoyalSkye/Routing-MVMoE/blob/main/LICENSE"><img src="https://img.shields.io/static/v1?label=License&message=MIT&color=orange&style=flat-square" alt="Paper"></a>
+  </p>
+The PyTorch Implementation of *ICML 2024 Poster -- [MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts]()*. MVMoE is a unified neural solver that can cope with 16 VRP variants simultaneously, even in a zero-shot manner. Concretely, the training tasks include `CVRP`, `OVRP`, `VRPB`, `VRPL`, `VRPTW`, and `OVRPTW`. The test tasks include `OVRPB`, `OVRPL`, `VRPBL`, `VRPBTW`, `VRPLTW`, `OVRPBL`, `OVRPBTW`, `OVRPLTW`, `VRPBLTW`, and `OVRPBLTW`. 
 
-## Multiple VRPs
+* ☺️ *We will attend ICML 2024 in person. Welcome to stop by our poster for discussion.*
 
-## Few-shot (e.g, Prompt-tuning)
+<p align="center"><img src="./assets/mvmoe.png" width=98%></p>
+
+## Dependencies
+
+* Python >= 3.8
+* Pytorch >= 1.12
+
+## How to Run
+
+<details>
+    <summary><strong>Train</strong></summary>
+
+```shell
+# Default: --problem_size=100 --pomo_size=100 --gpu_id=0
+# 0. POMO
+python train.py --problem={PROBLEM} --model_type=SINGLE
+
+# 1. POMO-MTL
+python train.py --problem=Train_ALL --model_type=MTL
+
+# 2. MVMoE/4E 
+python train.py --problem=Train_ALL --model_type=MOE --num_experts=4 --routing_level=node --routing_method=input_choice
+
+# 3. MVMoE/4E-L
+python train.py --problem=Train_ALL --model_type=MOE_LIGHT --num_experts=4 --routing_level=node --routing_method=input_choice
+```
+
+</details>
+
+<details>
+    <summary><strong>Evaluation</strong></summary>
+
+```shell
+# 0. POMO
+python test.py --problem={PROBLEM} --model_type=SINGLE --checkpoint={MODEL_PATH}
+
+# 1. POMO-MTL
+python test.py --problem=ALL --model_type=MTL --checkpoint={MODEL_PATH}
+
+# 2. MVMoE/4E
+python test.py --problem=ALL --model_type=MOE --num_experts=4 --routing_level=node --routing_method=input_choice --checkpoint={MODEL_PATH}
+
+# 3. MVMoE/4E-L
+python test.py --problem=ALL --model_type=MOE_LIGHT --num_experts=4 --routing_level=node --routing_method=input_choice --checkpoint={MODEL_PATH}
+
+# 4. Evaluation on CVRPLIB
+python test.py --problem=CVRP --model_type={MODEL_TYPE} --checkpoint={MODEL_PATH} --test_set_path=../data/CVRP-LIB
+```
+
+</details>
+
+<details>
+    <summary><strong>Baseline</strong></summary>
+
+```shell
+# 0. LKH3 - Support for ["CVRP", "OVRP", "VRPL", "VRPTW"]
+python LKH_baseline.py --problem={PROBLEM} --datasets={DATASET_PATH} -n=1000 --cpus=32 -runs=1 -max_trials=10000
+
+# 1. HGS - Support for ["CVRP", "VRPTW"]
+python HGS_baseline.py --problem={PROBLEM} --datasets={DATASET_PATH} -n=1000 --cpus=32 -max_iteration=20000
+
+# 2. OR-Tools - Support for all 16 VRP variants
+python OR-Tools_baseline.py --problem={PROBLEM} --datasets={DATASET_PATH} -n=1000 --cpus=32 -timelimit=20
+```
+
+</details>
+
+
+## How to Customize MoE
+
+MoEs can be easily used in Transformer-based models by replacing a Linear/MLP with an MoE layer. The input and output dimensions are kept the same as the original layer. Below, we provide two examples of how to customize MoEs.
+
+```python
+# 0. Our implementation based on https://github.com/davidmrau/mixture-of-experts
+# Supported routing levels: node/instance/problem
+# Supported routing methods: input_choice/expert_choice/soft_moe/random (only for node/instance gating levels)
+from MOELayer import MoE
+moe_layer = MoE(input_size={INPUT_DIM}, output_size={OUTPUT_DIM}, hidden_size={HIDDEN_DIM},
+                num_experts={NUM_EXPERTS}, k=2, T=1.0, noisy_gating=True, 
+                routing_level="node", routing_method="input_choice", moe_model="MLP")
+
+# 1. tutel - https://github.com/microsoft/tutel
+from tutel import moe as tutel_moe
+moe_layer = tutel_moe.moe_layer(
+                gate_type={'type': 'top', 'k': 2},
+                model_dim={INPUT_DIM},
+                experts={'type': 'ffn', 'count_per_node': {NUM_EXPERTS},
+                         'hidden_size_per_expert': {HIDDEN_DIM},
+                         'activation_fn': lambda x: torch.nn.functional.relu(x)},
+            )
+```
+
+
+## Citation
+
+```tex
+@inproceedings{zhou2024mvmoe,
+title       ={MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts},
+author      ={Jianan Zhou and Zhiguang Cao and Yaoxin Wu and Wen Song and Yining Ma and Jie Zhang and Chi Xu},
+booktitle   ={International Conference on Machine Learning},
+year        ={2024}
+}
+```
+
+## Acknowledgments
+
+* [ICML 2024 Review](https://github.com/RoyalSkye/Routing-MVMoE/blob/main/assets/Reviews_ICML24.md)
+* https://github.com/yd-kwon/POMO
+* https://github.com/davidmrau/mixture-of-experts
diff --git a/assets/Reviews_ICML24.md b/assets/Reviews_ICML24.md
@@ -6,7 +6,7 @@ We would like to thank the anonymous reviewers and (S)ACs of ICML 2024 for their
 
 ### Meta Review by Area Chair
 
-XXX
+This work represents a step towards cross-problem generalization in neural combinatorial optimization and thus the AC recommends acceptance. This is the first work to apply mixture-of-experts models to vehicle routing problems. The reviewers agree that the paper is well-written and the experiments are extensive. A weakness of this work is that mixture-of-experts yields marginal improvement upon a multi-task learning baseline (gap improvement of 0.1-0.5%). Furthermore, the proposed hierarchical gating mechanism does not appear to be advantageous beyond the default node-level input-choice gating. There is also concern about the limited scalability of the methods and the lack of discussion about other types of generalization in VRP, which was partially addressed by the authors' rebuttal.
 
 ----
 
diff --git a/assets/mvmoe.png b/assets/mvmoe.png
diff --git a/baselines/OR-Tools_baseline.py b/baselines/OR-Tools_baseline.py
@@ -338,8 +338,8 @@ def solve_or_tools_log(directory, name, depot, loc, demand, capacity, route_limi
     assignment = routing.SolveWithParameters(search_parameters)
     duration = time.time() - start
     if routing.status() not in [1, 2]:
-        print(">> OR-Tools failed to solve instance - Solver status: {}".format(routing.status()))
-        exit(0)
+        print(">> OR-Tools failed to solve instance {} - Solver status: {}".format(name, routing.status()))
+        return None, None, duration
     cost, route = print_solution(data, manager, routing, assignment, problem=problem, log_file=open(log_filename, 'w'))  # route does not include the first and last node (i.e., depot)
     print("\n".join(["{}".format(r) for r in ([data['depot']] + route + [data['depot']])]), file=open(tour_filename, 'w'))
     save_dataset((route, duration), output_filename, disable_print=True)
diff --git a/test.py b/test.py
@@ -32,8 +32,8 @@ def args2dict(args):
     parser.add_argument('--problem', type=str, default="ALL", choices=["ALL", "CVRP", "OVRP", "VRPB", "VRPL", "VRPTW", "OVRPTW",
                                                                        "OVRPB", "OVRPL", "VRPBL", "VRPBTW", "VRPLTW",
                                                                        "OVRPBL", "OVRPBTW", "OVRPLTW", "VRPBLTW", "OVRPBLTW"])
-    parser.add_argument('--problem_size', type=int, default=50)
-    parser.add_argument('--pomo_size', type=int, default=50, help="the number of start node, should <= problem size")
+    parser.add_argument('--problem_size', type=int, default=100)
+    parser.add_argument('--pomo_size', type=int, default=100, help="the number of start node, should <= problem size")
 
     # model_params
     parser.add_argument('--model_type', type=str, default="MOE_LIGHT", choices=["SINGLE", "MTL", "MOE", "MOE_LIGHT"])
diff --git a/train.py b/train.py
@@ -31,8 +31,8 @@ def args2dict(args):
     parser.add_argument('--problem', type=str, default="Train_ALL", choices=["Train_ALL", "CVRP", "OVRP", "VRPB", "VRPL", "VRPTW", "OVRPTW",
                                                                              "OVRPB", "OVRPL", "VRPBL", "VRPBTW", "VRPLTW",
                                                                              "OVRPBL", "OVRPBTW", "OVRPLTW", "VRPBLTW", "OVRPBLTW"])
-    parser.add_argument('--problem_size', type=int, default=50)
-    parser.add_argument('--pomo_size', type=int, default=50, help="the number of start node, should <= problem size")
+    parser.add_argument('--problem_size', type=int, default=100)
+    parser.add_argument('--pomo_size', type=int, default=100, help="the number of start node, should <= problem size")
 
     # model_params
     parser.add_argument('--model_type', type=str, default="MOE_LIGHT", choices=["SINGLE", "MTL", "MOE", "MOE_LIGHT"])