fhe-benchmarking · andreea-alexandru · Mar 23, 2026 · Mar 17, 2026 · Mar 17, 2026 · Mar 17, 2026
diff --git a/.gitignore b/.gitignore
@@ -50,6 +50,11 @@ harness/mnist/__pycache__/
 # Virtual environments
 virtualenv
 
+# Virtual environments
+virtualenv
+bmenv
+
+
 # Datasets, queries and results since we generate them on the fly
 datasets/*
 # But keep the folder alive with this one file, since some submissions might have fixed datasets
@@ -59,6 +64,17 @@ datasets/*
 harness/mnist/data/*
 harness/mnist/mnist_ffnn_model.pth
 
+**/build/
+
+# Temporary files for loading CIFAR-10 data and model training
+harness/cifar10/data/*
+harness/cifar10/data/
+harness/cifar10/cifar10_resnet20.pth
+harness/cifar10/cifar10_resnet20_model.pth
+
+
 # Remote-submission artifacts
 ek.lpk
 submission_remote/__pycache__/
+
+
diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
 # FHE Benchmarking Suite - ML Inference
 This repository contains the harness for the ML-inference workload of the FHE benchmarking suite of [HomomorphicEncryption.org](https://www.HomomorphicEncryption.org).
 The harness currently supports mnist model benchmarking as specified in `harness/mnist` directory.
-The `main` branch contains a reference implementation of this workload, under the `submission` subdirectory.
+The `main` branch contains a reference implementation of this workload, under the `submissions` subdirectory.
 The harness also supports an optional *remote backend execution mode* under the `submission_remote` subdirectory, where the homomorphic evaluation is executed on a remote backend.
 
-Submitters need to clone this repository, replace the content of the `submission` or `submission_remote` subdirectory by their own implementation.
+Submitters should clone this repository and add their content as a subdirectory within the `submissions` directory, or replaced the content of `submission_remote` subdirectory by their own implementation.
 They also may need to changes or replace the script `scripts/build_task.sh` to account for dependencies and build environment for their submission.
-Submitters are expected to document any changes made to the model architecture `harness/mnist/mnist.py` in the `submission/README.md` file. Submitters have the option to generate an `io/server_reported_steps.json` file, which contains fine grained metrics reported by the server in addition to the metrics reported by the harness.
+Submitters are expected to document any changes made to the model architecture `harness/mnist/mnist.py` in the `submissions/[--model]/README.md` file. Submitters have the option to generate an `io/server_reported_steps.json` file, which contains fine grained metrics reported by the server in addition to the metrics reported by the harness.
 
 ## Execution Modes
 
@@ -21,7 +21,7 @@ All steps are executed on a single machine:
 - Homomorphic inference
 - Decryption and postprocessing
 
-This corresponds to the reference submission in `submission/`.
+This corresponds to every reference submission in `submissions/`.
 
 ### Remote Backend Execution (Optional)
 
@@ -64,7 +64,7 @@ The harness script `harness/run_submission.py` will attempt to build the submiss
 
 ```console
 $ python3 harness/run_submission.py -h
-usage: run_submission.py [-h] [--num_runs NUM_RUNS] [--seed SEED] [--clrtxt CLRTXT] [--remote] {0,1,2,3}
+usage: run_submission.py [-h] [--num_runs NUM_RUNS] [--seed SEED] [--clrtxt CLRTXT] [--remote] {0,1,2,3} [--dataset] {mnist} [--model] {mlp}
 
 Run ML Inference FHE benchmark.
 
@@ -77,12 +77,14 @@ options:
   --seed SEED          Random seed for dataset and query generation
   --clrtxt CLRTXT      Specify with 1 if to rerun the cleartext computation
   --remote             Specify if to run in remote-backend mode
+  --dataset DATASET    Specify the dataset to be used (default: mnist)
+  --model   MODEL      Specify the model to be used (default: mlp)
 ```
 
 The single instance runs the inference for a single input and verifies the correctness of the obtained label compared to the ground-truth label.
 
 ```console
-$ python3 ./harness/run_submission.py 0 --seed 3 --num_runs 2
+$ python3 ./harness/run_submission.py 0 --seed 3 --num_runs 2 --dataset mnist --model mlp
 
 
 [harness] Running submission for single inference
@@ -282,19 +284,20 @@ The directory structure of this reposiroty is as follows:
 ├─ io/           # This directory is used for client<->server communication
 ├─ measurements/ # Holds logs with performance numbers
 ├─ scripts/      # Helper scripts for dependencies and build system
-└─ submission/   # This is where the workload implementation lives
-    ├─ README.md   # Submission documentation (mandatory)
-    ├─ LICENSE.md  # Optional software license (if different from Apache v2)
-    └─ [...]
+└─ submissions/   # This is where the workload implementation lives
+    └─ [--model]/
+      ├─ README.md   # Submission documentation (mandatory)
+      ├─ LICENSE.md  # Optional software license (if different from Apache v2)
+      └─ [...]
 └─ submission_remote/  # This is where the remote-backend workload implementation lives
     └─ [...]
 ```
-Submitters must overwrite the contents of the `scripts` and `submissions`
+Submitters must overwrite the contents of the `scripts` and add a subdirectory to the `submissions`
 subdirectories.
 
 ## Description of stages
 
-A submitter can edit any of the `client_*` / `server_*` sources in `/submission`. 
+A submitter can copy and edit any of the `client_*` / `server_*` sources in `/submissions/mlp`. 
 Moreover, for the particular parameters related to a workload, the submitter can modify the params files.
 If the current description of the files are inaccurate, the stage names in `run_submission` can be also 
 modified.

diff --git a/harness/cifar10/__pycache__/cifar10.cpython-312.pyc b/harness/cifar10/__pycache__/cifar10.cpython-312.pyc
diff --git a/harness/cifar10/__pycache__/resnet20.cpython-312.pyc b/harness/cifar10/__pycache__/resnet20.cpython-312.pyc
diff --git a/harness/cifar10/__pycache__/test.cpython-312.pyc b/harness/cifar10/__pycache__/test.cpython-312.pyc
diff --git a/harness/cifar10/__pycache__/train.cpython-312.pyc b/harness/cifar10/__pycache__/train.cpython-312.pyc
diff --git a/harness/generate_dataset.py b/harness/generate_dataset.py
@@ -25,13 +25,17 @@ def main():
     Usage:  python3 generate_dataset.py  <output_file>
     """
 
-    if len(sys.argv) != 2:
-        sys.exit("Usage: generate_dataset.py <output_file>")
+    if len(sys.argv) != 3:
+        sys.exit("Usage: generate_dataset.py <output_file> [dataset_name]")
 
     DATASET_PATH = Path(sys.argv[1])
+    DATASET_NAME = sys.argv[2]
     DATASET_PATH.parent.mkdir(parents=True, exist_ok=True)
 
-    mnist.export_test_data(output_file=DATASET_PATH, num_samples=10000, seed=None)
+    if DATASET_NAME == "mnist":
+        mnist.export_test_data(output_file=DATASET_PATH, num_samples=10000, seed=None)
+    else:
+        raise ValueError(f"Unsupported dataset name: {DATASET_NAME}")
 
 
 if __name__ == "__main__":

diff --git a/harness/generate_input.py b/harness/generate_input.py
@@ -25,17 +25,22 @@ def main():
     """
     Generate random value representing the query in the workload.
     """
-    __, params, seed, __, __, __ = parse_submission_arguments('Generate input for FHE benchmark.')
+    __, params, seed, __, __, __,__, dataset_name = parse_submission_arguments('Generate input for FHE benchmark.')
     PIXELS_PATH = params.get_test_input_file()
     LABELS_PATH = params.get_ground_truth_labels_file()
+
     PIXELS_PATH.parent.mkdir(parents=True, exist_ok=True)
     num_samples = params.get_batch_size()
-    mnist.export_test_pixels_labels(
-            data_dir = params.datadir(), 
-            pixels_file=PIXELS_PATH, 
-            labels_file=LABELS_PATH, 
-            num_samples=num_samples, 
-            seed=seed)
+    match dataset_name:
+        case "mnist": 
+            return mnist.export_test_pixels_labels(
+                    data_dir = params.datadir(), 
+                    pixels_file=PIXELS_PATH, 
+                    labels_file=LABELS_PATH, 
+                    num_samples=num_samples, 
+                    seed=seed)
+        case _:
+            raise ValueError(f"Unsupported dataset name: {dataset_name}")
 
 if __name__ == "__main__":
     main()
diff --git a/harness/params.py b/harness/params.py
@@ -54,8 +54,10 @@ def subdir(self):
         """Return the submission directory of this repository."""
         return self.rootdir
 
-    def datadir(self):
+    def datadir(self, dataset=None):
         """Return the dataset directory path."""
+        # if dataset:
+        #     return self.rootdir / "datasets" / dataset / instance_name(self.size)
         return self.rootdir / "datasets" / instance_name(self.size)
 
     def dataset_intermediate_dir(self):

diff --git a/harness/run_submission.py b/harness/run_submission.py
@@ -28,20 +28,32 @@ def main():
 
     # 0. Prepare running
     # Get the arguments
-    size, params, seed, num_runs, clrtxt, remote_be = utils.parse_submission_arguments('Run ML Inference FHE benchmark.')
+    size, params, seed, num_runs, clrtxt, remote_be, model_name, dataset_name = utils.parse_submission_arguments('Run ML Inference FHE benchmark.')
     test = instance_name(size)
     print(f"\n[harness] Running submission for {test} inference")
 
     # Ensure the required directories exist
     utils.ensure_directories(params.rootdir)
 
-    # Build the submission if not built already
-    utils.build_submission(params.rootdir/"scripts", remote_be)
-
     # The harness scripts are in the 'harness' directory,
-    # the submission code is either in submission or submission_remote
+    # the submission code is either in submissions or submission_remote
     harness_dir = params.rootdir/"harness"
-    exec_dir = params.rootdir/ ("submission_remote" if remote_be else "submission")
+    exec_dir = params.rootdir/ ("submission_remote" if remote_be else "submissions")
+
+    # check whether the exec_dir contains a subdirectory equals to the model name. 
+    model_exec_dir = exec_dir / model_name
+    if not model_exec_dir.is_dir():
+        print(f"[harness]: Model directory {model_exec_dir} not found.")
+        sys.exit(1)
+
+    # check whether the dataset exist 
+    dataset_exec_dir = harness_dir/dataset_name
+    if not dataset_exec_dir.is_dir():
+        print(f"[harness]: Dataset directory {dataset_exec_dir} not found.")
+        sys.exit(1)
+
+    # Build the submission if not built already
+    utils.build_submission(params.rootdir/"scripts", model_name, remote_be)
 
     # Remove and re-create IO directory
     io_dir = params.iodir()
@@ -52,12 +64,16 @@ def main():
 
     # 1. Client-side: Generate the test datasets
     dataset_path = params.datadir() / f"dataset.txt"
-    utils.run_exe_or_python(harness_dir, "generate_dataset", str(dataset_path))
-    utils.log_step(1, "Test dataset generation")
+    dataset_args = (
+        str(dataset_path),
+        str(dataset_name),
+    )
+    utils.run_exe_or_python(harness_dir, "generate_dataset", *dataset_args)
+    utils.log_step(1, f"Harness: {dataset_name.upper()} Test dataset generation")
 
     # 2.1 Communication: Get cryptographic context
     if remote_be:
-        utils.run_exe_or_python(exec_dir, "server_get_params", str(size))
+        utils.run_exe_or_python(model_exec_dir, "server_get_params", str(size))
         utils.log_step(2.1 , "Communication: Get cryptographic context")
         # Report size of context
         utils.log_size(io_dir / "client_data", "Cryptographic Context")
@@ -66,22 +82,23 @@ def main():
     # Note: this does not use the rng seed above, it lets the implementation
     #   handle its own prg needs. It means that even if called with the same
     #   seed multiple times, the keys and ciphertexts will still be different.
-    utils.run_exe_or_python(exec_dir, "client_key_generation", str(size))
-    utils.log_step(2.2 , "Key Generation")
+    utils.run_exe_or_python(model_exec_dir, "client_key_generation", str(size))
+    utils.log_step(2.2 , "Client: Key Generation")
     # Report size of keys and encrypted data
-    utils.log_size(io_dir / "public_keys", "Public and evaluation keys")
+    utils.log_size(io_dir / "public_keys", "Client: Public and evaluation keys")
 
     # 2.3 Communication: Upload evaluation key
     if remote_be:
-        utils.run_exe_or_python(exec_dir, "server_upload_ek", str(size))
+        utils.run_exe_or_python(model_exec_dir, "server_upload_ek", str(size))
         utils.log_step(2.3 , "Communication: Upload evaluation key")
 
     # 3. Server-side: Preprocess the (encrypted) dataset using exec_dir/server_preprocess_model
-    utils.run_exe_or_python(exec_dir, "server_preprocess_model")
-    utils.log_step(3, "Encrypted model preprocessing")
+    utils.run_exe_or_python(model_exec_dir, "server_preprocess_model")
+    utils.log_step(3, "Server: (Encrypted) model preprocessing")
 
     # Run steps 4-10 multiple times if requested
     for run in range(num_runs):
+        run_path = params.measuredir() / f"results-{run+1}.json"
         if num_runs > 1:
             print(f"\n         [harness] Run {run+1} of {num_runs}")
 
@@ -93,30 +110,30 @@ def main():
             genqry_seed = rng.integers(0,0x7fffffff)
             cmd_args.extend(["--seed", str(genqry_seed)])
         utils.run_exe_or_python(harness_dir, "generate_input", *cmd_args)
-        utils.log_step(4, "Input generation")
+        utils.log_step(4, f"Harness: Input generation for {dataset_name.upper()}")
 
         # 5. Client-side: Preprocess input using exec_dir/client_preprocess_input
-        utils.run_exe_or_python(exec_dir, "client_preprocess_input", str(size))
-        utils.log_step(5, "Input preprocessing")
+        utils.run_exe_or_python(model_exec_dir, "client_preprocess_input", str(size))
+        utils.log_step(5, "Client: Input preprocessing")
 
         # 6. Client-side: Encrypt the input
-        utils.run_exe_or_python(exec_dir, "client_encode_encrypt_input", str(size))
-        utils.log_step(6, "Input encryption")
-        utils.log_size(io_dir / "ciphertexts_upload", "Encrypted input")
+        utils.run_exe_or_python(model_exec_dir, "client_encode_encrypt_input", str(size))
+        utils.log_step(6, "Client: Input encryption")
+        utils.log_size(io_dir / "ciphertexts_upload", "Client: Encrypted input")
 
         # 7. Server side: Run the encrypted processing run exec_dir/server_encrypted_compute
-        utils.run_exe_or_python(exec_dir, "server_encrypted_compute", str(size))
-        utils.log_step(7, "Encrypted computation")
+        utils.run_exe_or_python(model_exec_dir, "server_encrypted_compute", str(size))
+        utils.log_step(7, "Server: Encrypted ML Inference computation")
         # Report size of encrypted results
-        utils.log_size(io_dir / "ciphertexts_download", "Encrypted results")
+        utils.log_size(io_dir / "ciphertexts_download", "Client: Encrypted results")
 
         # 8. Client-side: decrypt
-        utils.run_exe_or_python(exec_dir, "client_decrypt_decode", str(size))
-        utils.log_step(8, "Result decryption")
+        utils.run_exe_or_python(model_exec_dir, "client_decrypt_decode", str(size))
+        utils.log_step(8, "Client: Result decryption")
 
         # 9. Client-side: post-process
-        utils.run_exe_or_python(exec_dir, "client_postprocess", str(size))
-        utils.log_step(9, "Result postprocessing")
+        utils.run_exe_or_python(model_exec_dir, "client_postprocess", str(size))
+        utils.log_step(9, "Client: Result postprocessing")
 
         # 10 Verify the result for single inference or calculate quality for batch inference.
         encrypted_model_preds = params.get_encrypted_model_predictions_file()
@@ -141,10 +158,8 @@ def main():
             utils.log_step(10.2, "Harness: Run quality check")
 
         # 11. Store measurements
-        run_path = params.measuredir() / f"results-{run+1}.json"
         run_path.parent.mkdir(parents=True, exist_ok=True)
-        submission_report_path = io_dir / "server_reported_steps.json"
-        utils.save_run(run_path, submission_report_path, size)
+        utils.save_run(run_path, size)
 
     print(f"\nAll steps completed for the {instance_name(size)} inference!")