NetPress: Dynamically Generated LLM Benchmarks for Network Applications

Overview

NetPress is a dynamic benchmark generation framework for evaluating LLM agents in real-world network applications. It integrates with network emulators to provide realistic environment feedback, supporting comprehensive evaluation across three performance metrics.

Paper

The research behind NetPress is detailed in our paper:
Zhou, Y., Ruan, J., Wang, E. S., Fouladi, S., Yan, F. Y., Hsieh, K., & Liu, Z. (2025). NetPress: Dynamically Generated LLM Benchmarks for Network Applications. arXiv preprint arXiv:2506.03231. [paper]

@article{zhou2025netpress,
  title={NetPress: Dynamically Generated LLM Benchmarks for Network Applications},
  author={Zhou, Yajie and Ruan, Jiajun and Wang, Eric S and Fouladi, Sadjad and Yan, Francis Y and Hsieh, Kevin and Liu, Zaoxing},
  journal={arXiv preprint arXiv:2506.03231},
  year={2025}
}

Prerequisites

Conda package manager
Python environment

Installation

Set up the required Conda environments:

# Create Mininet environment (for Route and K8s applications)
conda env create -f environment_mininet.yml

# Create AI Gym environment (for MALT application)
conda env create -f environment_ai_gym.yml

Activate the appropriate environment:

# MALT
conda activate ai_gym_env

# Routing or K8s
conda activate mininet

Some local models (e.g. Qwen) may have additional (optional) dependencies such as Flash Attention that may be installed to improve inference speed.

conda activate ai_gym_env
pip install flash-attn==2.7.4.post1

Running From Docker

A Dockerfile is provided with all dependencies installed/configured. Note that to use GPUs for local models, you will need to install NVIDIA Container Toolkit.

# Build image.
cd /path/to/NetPress
docker build -t netpress:latest .

# Run. Optional --gpus flag to expose NVIDIA GPUs within container.
docker run -itd --name netpress_test --gpus all netpress:latest

# Enter container.
docker exec -it netpress_test /bin/bash

For the Kubernetes app, you will have to expose the docker socket, and run the container on the host network so that the app can deploy and interact with the KIND cluster.

# Expose docker socket and run on localhost.
docker run -itd --name netpress_test --network host --gpus all \
    -v /var/run/docker.sock:/var/run/docker.sock netpress:latest \

Quick Start

Execute the following commands to run the benchmark for each application:

cd experiments
./run_app_malt.sh
./run_app_route.sh
./run_app_k8s.sh

Detailed Application Guides

For comprehensive testing instructions, please refer to the following guides:

Results Analysis

Performance Metrics

Our evaluation framework measures three key dimensions:

Correctness: Evaluates if the LLM agent produces accurate solution for each network query.
Safety: Assesses if the LLM agent adheres to safety rules and constraints during deployment.
Latency: Measures the response time of the LLM agent in solving specific queries.

Statistical Analysis

Confidence interval comparisons between different agents
Comprehensive breakdown analysis of performance metrics

Contributing

Guide for adding new network applications coming soon.

Contact

For questions or support, please:

Open an issue on GitHub
Contact directly at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
app-k8s		app-k8s
app-malt		app-malt
app-route		app-route
assets/images		assets/images
experiments		experiments
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
environment_ai_gym.yml		environment_ai_gym.yml
environment_mininet.yml		environment_mininet.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NetPress: Dynamically Generated LLM Benchmarks for Network Applications

Overview

Paper

Prerequisites

Installation

Running From Docker

Quick Start

Detailed Application Guides

Results Analysis

Performance Metrics

Statistical Analysis

Contributing

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Froot-NetSys/NetPress

Folders and files

Latest commit

History

Repository files navigation

NetPress: Dynamically Generated LLM Benchmarks for Network Applications

Overview

Paper

Prerequisites

Installation

Running From Docker

Quick Start

Detailed Application Guides

Results Analysis

Performance Metrics

Statistical Analysis

Contributing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages