ITM-Kitware · wjdghks950 · Jul 1, 2024 · Jul 1, 2024 · Jul 1, 2024 · Jul 1, 2024
diff --git a/README.md b/README.md
@@ -5,17 +5,46 @@ This is the Github repo corresponding to our [NAACL '24 Industry Track Paper](ht
 
 **Paper**: https://arxiv.org/abs/2406.06435
 
-## Setup
-This repository is based off of the ALIGN system [codebase](https://github.com/ITM-Kitware/align-system). Instructions for how to set up your system can be found there (install using either `pip` or `poetry`). It is generally recommended to use a virtual Python environment to manage dependencies. 
 
-### Things to Note:
-(1) This code requires Python version >=3.10 (virtual/conda env recommended). \
-(2) This repo was tested on a version of the ALIGN system corresponding to this [commit-id](https://github.com/ITM-Kitware/align-system/commit/7b67c76bf11313e31af43af53588fe70803943e7). To use this version, please run the following before running the code:
+## Setup (updated 06/30/2024 to evaluate on CodeAct agent)
+Set up the conda environment (`python<=3.10`) first:
+```bash
+conda create -n align_system python=3.10
+conda activate align_system
+```
 
+After setting up the conda env, install the `align-system` from the forked repo as follows (the forked repo contains `CodeActAgent` class - we will merge later with the official code repo for `align-system`):
+```bash
+git clone https://github.com/wjdghks950/align-system.git
+cd align-system
+pip install -e .
 ```
-pip install -e git+https://github.com/ITM-Kitware/align-system.git@7b67c76bf11313e31af43af53588fe70803943e7#egg=align_system
+
+Other dependencies:
+```bash
+pip install vllm
 ```
 
+Start CodeAct:
+
+```bash
+# start model serving at port 8080
+export CUDA_VISIBLE_DEVICES=0,1
+./scripts/start_vllm.sh /shared/nas2/shared/llms/CodeActAgent-Mistral-7b-v0.1/
+
+# start code exec server at 8081
+./scripts/code_execution/start_jupyter_server.sh 8081
+
+# then play around with interactive demo to make sure everything works
+./scripts/run_codeact_demo.sh
+```
+
+## TODO List
+- [ ] Implement the `__call__` method for [CodeActAgentADM](https://github.com/wjdghks950/align-system/blob/3446b221867c4e35e349dac8e03e2640b5ad1245/align_system/algorithms/codeact_agent_adm.py#L127).
+- [ ] `CodeActAgentADM` is being called by [generate_outputs(...)](https://github.com/wjdghks950/align-system/blob/3446b221867c4e35e349dac8e03e2640b5ad1245/align_system/evaluation/adm_evaluator.py#L3) method in [run_evaluator.py](https://github.com/wjdghks950/llm-alignable-dm/blob/996e9be8c45b58305b4b0f187c7306dde0b667da/scripts/run_evaluator.py#L247).
+- [ ] For the `CodeActAgentADM` configs, refer to [configs/codeact-agent](https://github.com/wjdghks950/llm-alignable-dm/tree/main/configs/codeact-agent).
+
+
 ## Overview
 To run a particular LLM-based decision-maker, use the `run_evaluator.py` file in the `scripts/` directory. This script takes as input a particular config file (found in the `configs/` directory) and a GPU ID:
 

diff --git a/configs/codeact-agent/align/high.yml b/configs/codeact-agent/align/high.yml
@@ -0,0 +1,19 @@
+name: paper-dataset-1-12/codeact-agent/align/high
+dataset: data/paper-dataset-1-12.json
+llama_2_single_kdma_adm:
+  baseline: false
+  chat_template: falcon-instruct.jinja
+  device: cuda:0
+  hf_model: tiiuae/codeact-agent
+  n_negative_samples: 0
+  n_positive_samples: 1
+  precision: half
+  shuffle: false
+  temperature: 0.7
+target_kdma_values:
+  continuation_of_care: 10
+  fairness: 10
+  moral_deservingness: 10
+  protocol_focus: 10
+  risk_aversion: 10
+  utilitarianism: 10
diff --git a/configs/codeact-agent/align/low.yml b/configs/codeact-agent/align/low.yml
@@ -0,0 +1,19 @@
+name: paper-dataset-1-12/codeact-agent/align/low
+dataset: data/paper-dataset-1-12.json
+llama_2_single_kdma_adm:
+  baseline: false
+  chat_template: falcon-instruct.jinja
+  device: cuda:0
+  hf_model: tiiuae/codeact-agent
+  n_negative_samples: 0
+  n_positive_samples: 1
+  precision: half
+  shuffle: false
+  temperature: 0.7
+target_kdma_values:
+  continuation_of_care: 0
+  fairness: 0
+  moral_deservingness: 0
+  protocol_focus: 0
+  risk_aversion: 0
+  utilitarianism: 0
diff --git a/configs/codeact-agent/align_self-consistency/high.yml b/configs/codeact-agent/align_self-consistency/high.yml
@@ -0,0 +1,19 @@
+name: paper-dataset-1-12/codeact-agent/align_self-consistency/high
+dataset: data/paper-dataset-1-12.json
+llama_2_single_kdma_adm:
+  baseline: false
+  chat_template: falcon-instruct.jinja
+  device: cuda:0
+  hf_model: tiiuae/codeact-agent
+  n_negative_samples: 5
+  n_positive_samples: 5
+  precision: half
+  shuffle: true
+  temperature: 0.7
+target_kdma_values:
+  continuation_of_care: 10
+  fairness: 10
+  moral_deservingness: 10
+  protocol_focus: 10
+  risk_aversion: 10
+  utilitarianism: 10
diff --git a/configs/codeact-agent/align_self-consistency/low.yml b/configs/codeact-agent/align_self-consistency/low.yml
@@ -0,0 +1,19 @@
+name: paper-dataset-1-12/codeact-agent/align_self-consistency/low
+dataset: data/paper-dataset-1-12.json
+llama_2_single_kdma_adm:
+  baseline: false
+  chat_template: falcon-instruct.jinja
+  device: cuda:0
+  hf_model: tiiuae/codeact-agent
+  n_negative_samples: 5
+  n_positive_samples: 5
+  precision: half
+  shuffle: true
+  temperature: 0.7
+target_kdma_values:
+  continuation_of_care: 0
+  fairness: 0
+  moral_deservingness: 0
+  protocol_focus: 0
+  risk_aversion: 0
+  utilitarianism: 0
diff --git a/configs/codeact-agent/baseline/baseline.yml b/configs/codeact-agent/baseline/baseline.yml
@@ -0,0 +1,13 @@
+name: paper-dataset-1-12/codeact-agent/baseline
+dataset: data/paper-dataset-1-12.json
+llama_2_single_kdma_adm:
+  baseline: true
+  chat_template: falcon-instruct.jinja
+  device: cuda:0
+  hf_model: tiiuae/codeact-agent
+  n_negative_samples: 0
+  n_positive_samples: 1
+  precision: half
+  shuffle: false
+  temperature: 0.7
+target_kdma_values: null
diff --git a/requirements-codeact.txt b/requirements-codeact.txt
@@ -0,0 +1,62 @@
+torch==2.0.1 -i https://download.pytorch.org/whl/cu118
+torchvision==0.15.2 -i https://download.pytorch.org/whl/cu118
+torchaudio==2.0.2 -i https://download.pytorch.org/whl/cu118
+pre-commit
+openai==0.28
+datasets
+wikipedia
+langchain
+streamlit
+backoff
+charset-normalizer==3.1.0
+numpy
+pandas
+pylatexenc
+google-api-python-client
+arxiv
+# Alfworld
+opencv-python
+networkx
+h5py
+tqdm
+vocab
+revtok
+Click
+transformers
+tokenizers
+scipy==1.10.1
+ipython
+matplotlib
+cython
+nltk
+pipreqs
+pyyaml
+pytz
+visdom
+sympy
+pycocotools
+# We use MINT's docker for ALFWorld, so no need to install these
+# gym==0.15.4
+# ai2thor==2.1.0
+# fast-downward @ https://github.com/MarcCote/downward/archive/faster_replan.zip
+# textworld @ https://github.com/MarcCote/TextWorld/archive/handcoded_expert_integration.zip
+# alfworld @ git+https://github.com/xingyaoww/alfworld.git
+seaborn
+google-generativeai
+python-dateutil
+statsmodels
+# APPs evaluation
+pyext
+grpcio
+vllm
+accelerate
+jsonlines
+gym==0.26.2
+pandarallel
+thefuzz
+flask
+gunicorn
+apscheduler
+docker
+tornado
+termcolor
diff --git a/scripts/code_execution/api.py b/scripts/code_execution/api.py
@@ -0,0 +1,130 @@
+import os
+import time
+import json
+import signal
+import logging
+import argparse
+import tornado.ioloop
+import tornado.web
+import tornado.httpserver
+from collections import namedtuple
+from jupyter import JupyterKernel, JupyterGatewayDocker, JupyterGatewayKubernetes
+
+logging.basicConfig(level=logging.INFO)
+
+if os.environ.get("USE_KUBERNETES", "0").lower() == "1":
+    JupyterKernelWrapper = JupyterGatewayKubernetes
+    logging.info("Using Kubernetes as the backend for JupyterGateway")
+else:
+    JupyterKernelWrapper = JupyterGatewayDocker
+    logging.info("Using Docker as the backend for JupyterGateway")
+
+# Global data structure to map convid to (JupyterKernelWrapper, JupyterKernel)
+JupyterKernelType = namedtuple("JupyterKernelType", [
+    "kernel_wrapper",
+    "kernel",
+    "last_access_time"
+])
+
+def cleanup_kernels(app, force=False):
+    """Cleanup kernels and gateway dockers that have timed out."""
+    KERNEL_TIMEOUT = 10 * 60  # 10 minutes
+    current_time = time.time()
+    to_delete = []
+    conv_id_to_kernel = app.conv_id_to_kernel
+    # Find all kernels that have timed out
+    for convid in conv_id_to_kernel.keys():
+        last_access = conv_id_to_kernel[convid].last_access_time
+        if current_time - last_access > KERNEL_TIMEOUT:
+            to_delete.append(convid)
+
+    if force:
+        to_delete = list(conv_id_to_kernel.keys())
+        logging.info(f"Force cleanup all {len(to_delete)} kernels")
+
+    for convid in to_delete:
+        # Close the kernel
+        # kernel: JupyterKernel = conv_id_to_kernel[convid].kernel
+        # kernel.shutdown()  # Close the JupyterKernel
+        # Close the JupyterKernelWrapper by close its context manager
+        kernel_wrapper = conv_id_to_kernel[convid].kernel_wrapper
+        kernel_wrapper.__exit__(None, None, None)  # Close the JupyterKernelWrapper
+        # Delete the entry from the global data structure
+        del conv_id_to_kernel[convid]
+        logging.info(f"Kernel closed for conversation {convid}")
+
+class ExecuteHandler(tornado.web.RequestHandler):
+    async def post(self):
+        data = json.loads(self.request.body)
+        convid = data.get("convid")
+        code = data.get("code")
+
+        # Create a new kernel if not exist
+        new_kernel = False
+
+        conv_id_to_kernel = self.application.conv_id_to_kernel
+        if convid not in conv_id_to_kernel:
+            kernel_wrapper = JupyterKernelWrapper(
+                name=f"conv-{convid}",
+            )
+            url_suffix = kernel_wrapper.__enter__()
+            if os.environ.get("DEBUG", False):
+                logging.info(f"Kernel URL: {url_suffix}")
+            kernel = JupyterKernel(url_suffix, convid)
+            await kernel.initialize()
+            conv_id_to_kernel[convid] = JupyterKernelType(
+                kernel_wrapper,
+                kernel,
+                None
+            )
+            new_kernel = True
+            logging.info(f"Kernel created for conversation {convid}")
+
+        # Update last access time
+        kernel_access_time = time.time()
+        conv_id_to_kernel[convid] = conv_id_to_kernel[convid]._replace(
+            last_access_time=kernel_access_time
+        )
+
+        # Execute the code
+        kernel: JupyterKernel = conv_id_to_kernel[convid].kernel
+        result = await kernel.execute(code)
+
+        self.write(json.dumps({
+            "result": result,
+            "new_kernel_created": new_kernel
+        }))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+
+    app = tornado.web.Application([
+        (r"/execute", ExecuteHandler),
+        # Add other routes here
+    ])
+    app.conv_id_to_kernel = {}
+
+    # Wrap cleanup_kernels to pass the app object
+    periodic_cleanup = tornado.ioloop.PeriodicCallback(
+        lambda: cleanup_kernels(app),
+        int(os.environ.get("CLEANUP_TIMEOUT_MS", 60000))
+    )
+    periodic_cleanup.start()
+
+    # Setup signal handler
+    def signal_handler(signum, frame, app):
+        logging.info("Received SIGINT, cleaning up...")
+        cleanup_kernels(app, force=True)
+        tornado.ioloop.IOLoop.current().stop()
+        logging.info("Cleanup complete, shutting down.")
+
+    signal.signal(
+        signal.SIGINT,
+        lambda signum, frame: signal_handler(signum, frame, app)
+    )
+    server = tornado.httpserver.HTTPServer(app)
+    server.listen(args.port)
+    tornado.ioloop.IOLoop.current().start()