Skip to content

Froot-NetSys/NetPress

Repository files navigation

NetPress: Dynamically Generated LLM Benchmarks for Network Applications

Overview

NetPress is a dynamic benchmark generation framework for evaluating LLM agents in real-world network applications. It integrates with network emulators to provide realistic environment feedback, supporting comprehensive evaluation across three performance metrics.

Paper

The research behind NetPress is detailed in our paper:
Zhou, Y., Ruan, J., Wang, E. S., Fouladi, S., Yan, F. Y., Hsieh, K., & Liu, Z. (2025). NetPress: Dynamically Generated LLM Benchmarks for Network Applications. arXiv preprint arXiv:2506.03231. [paper]

@article{zhou2025netpress,
  title={NetPress: Dynamically Generated LLM Benchmarks for Network Applications},
  author={Zhou, Yajie and Ruan, Jiajun and Wang, Eric S and Fouladi, Sadjad and Yan, Francis Y and Hsieh, Kevin and Liu, Zaoxing},
  journal={arXiv preprint arXiv:2506.03231},
  year={2025}
}

Prerequisites

  • Conda package manager
  • Python environment

Installation

  1. Set up the required Conda environments:
# Create Mininet environment (for Route and K8s applications)
conda env create -f environment_mininet.yml

# Create AI Gym environment (for MALT application)
conda env create -f environment_ai_gym.yml
  1. Activate the appropriate environment:
# MALT
conda activate ai_gym_env

# Routing or K8s
conda activate mininet
  1. Some local models (e.g. Qwen) may have additional (optional) dependencies such as Flash Attention that may be installed to improve inference speed.
conda activate ai_gym_env
pip install flash-attn==2.7.4.post1

Running From Docker

A Dockerfile is provided with all dependencies installed/configured. Note that to use GPUs for local models, you will need to install NVIDIA Container Toolkit.

# Build image.
cd /path/to/NetPress
docker build -t netpress:latest .

# Run. Optional --gpus flag to expose NVIDIA GPUs within container.
docker run -itd --name netpress_test --gpus all netpress:latest

# Enter container.
docker exec -it netpress_test /bin/bash

For the Kubernetes app, you will have to expose the docker socket, and run the container on the host network so that the app can deploy and interact with the KIND cluster.

# Expose docker socket and run on localhost.
docker run -itd --name netpress_test --network host --gpus all \
    -v /var/run/docker.sock:/var/run/docker.sock netpress:latest \

Quick Start

Execute the following commands to run the benchmark for each application:

cd experiments
./run_app_malt.sh
./run_app_route.sh
./run_app_k8s.sh

Detailed Application Guides

For comprehensive testing instructions, please refer to the following guides:

Results Analysis

Performance Metrics

Our evaluation framework measures three key dimensions:

  • Correctness: Evaluates if the LLM agent produces accurate solution for each network query.
  • Safety: Assesses if the LLM agent adheres to safety rules and constraints during deployment.
  • Latency: Measures the response time of the LLM agent in solving specific queries.

Statistical Analysis

  • Confidence interval comparisons between different agents Metrics Breakdown Analysis

  • Comprehensive breakdown analysis of performance metrics Metrics Breakdown Analysis

Contributing

Guide for adding new network applications coming soon.

Contact

For questions or support, please:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published