Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 35 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,46 @@ This is the Github repo corresponding to our [NAACL '24 Industry Track Paper](ht

**Paper**: https://arxiv.org/abs/2406.06435

## Setup
This repository is based off of the ALIGN system [codebase](https://github.com/ITM-Kitware/align-system). Instructions for how to set up your system can be found there (install using either `pip` or `poetry`). It is generally recommended to use a virtual Python environment to manage dependencies.

### Things to Note:
(1) This code requires Python version >=3.10 (virtual/conda env recommended). \
(2) This repo was tested on a version of the ALIGN system corresponding to this [commit-id](https://github.com/ITM-Kitware/align-system/commit/7b67c76bf11313e31af43af53588fe70803943e7). To use this version, please run the following before running the code:
## Setup (updated 06/30/2024 to evaluate on CodeAct agent)
Set up the conda environment (`python<=3.10`) first:
```bash
conda create -n align_system python=3.10
conda activate align_system
```

After setting up the conda env, install the `align-system` from the forked repo as follows (the forked repo contains `CodeActAgent` class - we will merge later with the official code repo for `align-system`):
```bash
git clone https://github.com/wjdghks950/align-system.git
cd align-system
pip install -e .
```
pip install -e git+https://github.com/ITM-Kitware/align-system.git@7b67c76bf11313e31af43af53588fe70803943e7#egg=align_system

Other dependencies:
```bash
pip install vllm
```

Start CodeAct:

```bash
# start model serving at port 8080
export CUDA_VISIBLE_DEVICES=0,1
./scripts/start_vllm.sh /shared/nas2/shared/llms/CodeActAgent-Mistral-7b-v0.1/

# start code exec server at 8081
./scripts/code_execution/start_jupyter_server.sh 8081

# then play around with interactive demo to make sure everything works
./scripts/run_codeact_demo.sh
```

## TODO List
- [ ] Implement the `__call__` method for [CodeActAgentADM](https://github.com/wjdghks950/align-system/blob/3446b221867c4e35e349dac8e03e2640b5ad1245/align_system/algorithms/codeact_agent_adm.py#L127).
- [ ] `CodeActAgentADM` is being called by [generate_outputs(...)](https://github.com/wjdghks950/align-system/blob/3446b221867c4e35e349dac8e03e2640b5ad1245/align_system/evaluation/adm_evaluator.py#L3) method in [run_evaluator.py](https://github.com/wjdghks950/llm-alignable-dm/blob/996e9be8c45b58305b4b0f187c7306dde0b667da/scripts/run_evaluator.py#L247).
- [ ] For the `CodeActAgentADM` configs, refer to [configs/codeact-agent](https://github.com/wjdghks950/llm-alignable-dm/tree/main/configs/codeact-agent).


## Overview
To run a particular LLM-based decision-maker, use the `run_evaluator.py` file in the `scripts/` directory. This script takes as input a particular config file (found in the `configs/` directory) and a GPU ID:

Expand Down
19 changes: 19 additions & 0 deletions configs/codeact-agent/align/high.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: paper-dataset-1-12/codeact-agent/align/high
dataset: data/paper-dataset-1-12.json
llama_2_single_kdma_adm:
baseline: false
chat_template: falcon-instruct.jinja
device: cuda:0
hf_model: tiiuae/codeact-agent
n_negative_samples: 0
n_positive_samples: 1
precision: half
shuffle: false
temperature: 0.7
target_kdma_values:
continuation_of_care: 10
fairness: 10
moral_deservingness: 10
protocol_focus: 10
risk_aversion: 10
utilitarianism: 10
19 changes: 19 additions & 0 deletions configs/codeact-agent/align/low.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: paper-dataset-1-12/codeact-agent/align/low
dataset: data/paper-dataset-1-12.json
llama_2_single_kdma_adm:
baseline: false
chat_template: falcon-instruct.jinja
device: cuda:0
hf_model: tiiuae/codeact-agent
n_negative_samples: 0
n_positive_samples: 1
precision: half
shuffle: false
temperature: 0.7
target_kdma_values:
continuation_of_care: 0
fairness: 0
moral_deservingness: 0
protocol_focus: 0
risk_aversion: 0
utilitarianism: 0
19 changes: 19 additions & 0 deletions configs/codeact-agent/align_self-consistency/high.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: paper-dataset-1-12/codeact-agent/align_self-consistency/high
dataset: data/paper-dataset-1-12.json
llama_2_single_kdma_adm:
baseline: false
chat_template: falcon-instruct.jinja
device: cuda:0
hf_model: tiiuae/codeact-agent
n_negative_samples: 5
n_positive_samples: 5
precision: half
shuffle: true
temperature: 0.7
target_kdma_values:
continuation_of_care: 10
fairness: 10
moral_deservingness: 10
protocol_focus: 10
risk_aversion: 10
utilitarianism: 10
19 changes: 19 additions & 0 deletions configs/codeact-agent/align_self-consistency/low.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: paper-dataset-1-12/codeact-agent/align_self-consistency/low
dataset: data/paper-dataset-1-12.json
llama_2_single_kdma_adm:
baseline: false
chat_template: falcon-instruct.jinja
device: cuda:0
hf_model: tiiuae/codeact-agent
n_negative_samples: 5
n_positive_samples: 5
precision: half
shuffle: true
temperature: 0.7
target_kdma_values:
continuation_of_care: 0
fairness: 0
moral_deservingness: 0
protocol_focus: 0
risk_aversion: 0
utilitarianism: 0
13 changes: 13 additions & 0 deletions configs/codeact-agent/baseline/baseline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: paper-dataset-1-12/codeact-agent/baseline
dataset: data/paper-dataset-1-12.json
llama_2_single_kdma_adm:
baseline: true
chat_template: falcon-instruct.jinja
device: cuda:0
hf_model: tiiuae/codeact-agent
n_negative_samples: 0
n_positive_samples: 1
precision: half
shuffle: false
temperature: 0.7
target_kdma_values: null
62 changes: 62 additions & 0 deletions requirements-codeact.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
torch==2.0.1 -i https://download.pytorch.org/whl/cu118
torchvision==0.15.2 -i https://download.pytorch.org/whl/cu118
torchaudio==2.0.2 -i https://download.pytorch.org/whl/cu118
pre-commit
openai==0.28
datasets
wikipedia
langchain
streamlit
backoff
charset-normalizer==3.1.0
numpy
pandas
pylatexenc
google-api-python-client
arxiv
# Alfworld
opencv-python
networkx
h5py
tqdm
vocab
revtok
Click
transformers
tokenizers
scipy==1.10.1
ipython
matplotlib
cython
nltk
pipreqs
pyyaml
pytz
visdom
sympy
pycocotools
# We use MINT's docker for ALFWorld, so no need to install these
# gym==0.15.4
# ai2thor==2.1.0
# fast-downward @ https://github.com/MarcCote/downward/archive/faster_replan.zip
# textworld @ https://github.com/MarcCote/TextWorld/archive/handcoded_expert_integration.zip
# alfworld @ git+https://github.com/xingyaoww/alfworld.git
seaborn
google-generativeai
python-dateutil
statsmodels
# APPs evaluation
pyext
grpcio
vllm
accelerate
jsonlines
gym==0.26.2
pandarallel
thefuzz
flask
gunicorn
apscheduler
docker
tornado
termcolor
130 changes: 130 additions & 0 deletions scripts/code_execution/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
import os
import time
import json
import signal
import logging
import argparse
import tornado.ioloop
import tornado.web
import tornado.httpserver
from collections import namedtuple
from jupyter import JupyterKernel, JupyterGatewayDocker, JupyterGatewayKubernetes

logging.basicConfig(level=logging.INFO)

if os.environ.get("USE_KUBERNETES", "0").lower() == "1":
JupyterKernelWrapper = JupyterGatewayKubernetes
logging.info("Using Kubernetes as the backend for JupyterGateway")
else:
JupyterKernelWrapper = JupyterGatewayDocker
logging.info("Using Docker as the backend for JupyterGateway")

# Global data structure to map convid to (JupyterKernelWrapper, JupyterKernel)
JupyterKernelType = namedtuple("JupyterKernelType", [
"kernel_wrapper",
"kernel",
"last_access_time"
])

def cleanup_kernels(app, force=False):
"""Cleanup kernels and gateway dockers that have timed out."""
KERNEL_TIMEOUT = 10 * 60 # 10 minutes
current_time = time.time()
to_delete = []
conv_id_to_kernel = app.conv_id_to_kernel
# Find all kernels that have timed out
for convid in conv_id_to_kernel.keys():
last_access = conv_id_to_kernel[convid].last_access_time
if current_time - last_access > KERNEL_TIMEOUT:
to_delete.append(convid)

if force:
to_delete = list(conv_id_to_kernel.keys())
logging.info(f"Force cleanup all {len(to_delete)} kernels")

for convid in to_delete:
# Close the kernel
# kernel: JupyterKernel = conv_id_to_kernel[convid].kernel
# kernel.shutdown() # Close the JupyterKernel
# Close the JupyterKernelWrapper by close its context manager
kernel_wrapper = conv_id_to_kernel[convid].kernel_wrapper
kernel_wrapper.__exit__(None, None, None) # Close the JupyterKernelWrapper
# Delete the entry from the global data structure
del conv_id_to_kernel[convid]
logging.info(f"Kernel closed for conversation {convid}")

class ExecuteHandler(tornado.web.RequestHandler):
async def post(self):
data = json.loads(self.request.body)
convid = data.get("convid")
code = data.get("code")

# Create a new kernel if not exist
new_kernel = False

conv_id_to_kernel = self.application.conv_id_to_kernel
if convid not in conv_id_to_kernel:
kernel_wrapper = JupyterKernelWrapper(
name=f"conv-{convid}",
)
url_suffix = kernel_wrapper.__enter__()
if os.environ.get("DEBUG", False):
logging.info(f"Kernel URL: {url_suffix}")
kernel = JupyterKernel(url_suffix, convid)
await kernel.initialize()
conv_id_to_kernel[convid] = JupyterKernelType(
kernel_wrapper,
kernel,
None
)
new_kernel = True
logging.info(f"Kernel created for conversation {convid}")

# Update last access time
kernel_access_time = time.time()
conv_id_to_kernel[convid] = conv_id_to_kernel[convid]._replace(
last_access_time=kernel_access_time
)

# Execute the code
kernel: JupyterKernel = conv_id_to_kernel[convid].kernel
result = await kernel.execute(code)

self.write(json.dumps({
"result": result,
"new_kernel_created": new_kernel
}))


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--port", type=int, default=8000)
args = parser.parse_args()

app = tornado.web.Application([
(r"/execute", ExecuteHandler),
# Add other routes here
])
app.conv_id_to_kernel = {}

# Wrap cleanup_kernels to pass the app object
periodic_cleanup = tornado.ioloop.PeriodicCallback(
lambda: cleanup_kernels(app),
int(os.environ.get("CLEANUP_TIMEOUT_MS", 60000))
)
periodic_cleanup.start()

# Setup signal handler
def signal_handler(signum, frame, app):
logging.info("Received SIGINT, cleaning up...")
cleanup_kernels(app, force=True)
tornado.ioloop.IOLoop.current().stop()
logging.info("Cleanup complete, shutting down.")

signal.signal(
signal.SIGINT,
lambda signum, frame: signal_handler(signum, frame, app)
)
server = tornado.httpserver.HTTPServer(app)
server.listen(args.port)
tornado.ioloop.IOLoop.current().start()
Loading