feat: Add RemyxCodeExecutor for research paper execution #2141

salma-remyx · 2025-10-11T07:14:22Z

Why are these changes needed?

This PR adds RemyxCodeExecutor, enabling AG2 agents to execute code from research papers in pre-configured Docker environments. This addresses a critical pain point in AI research: reproducing experimental results from papers typically requires hours of environment setup, dependency resolution, and CUDA version matching.

Problem:

Hours spent debugging environment configurations before running a single experiment
Conflicting dependencies, missing system libraries, and undocumented assumptions block reproducibility
No standardized way to discover and execute research papers programmatically with AI agents

RemyxCodeExecutor provides:

Instant execution: Run code from 1000+ ArXiv papers in ~30 seconds with minimal setup
Pre-built environments: Docker images with dependencies, CUDA libraries, and code pre-installed
AI-guided exploration: Interactive explore() method where agents explain and experiment with paper code
Paper discovery: Search API integration to find executable research papers
Three execution modes: Quick start, interactive learning, and batch automation

This follows the pattern established by YepCodeCodeExecutor (#1982), extending AG2's code execution capabilities to research-specific workflows.

Checks

I've included doc changes for https://docs.ag2.ai/:
- Added comprehensive section in code-execution.mdx with usage examples
- Added installation instructions in Optional-Dependencies.mdx
- Created complete tutorial notebook agentchat_remyx_executor.ipynb with real examples
I've added tests corresponding to the changes:
- Unit tests in test_remyx_executor.py
- Integration tests in test_remyx_executor_integration.py (real API and Docker execution)
I've made sure all auto checks have passed

CLAassistant · 2025-10-22T07:10:34Z

All committers have signed the CLA.

github-actions · 2025-10-30T18:54:52Z

🤖 Code Review - PR #2141: RemyxCodeExecutor

Summary

This PR adds RemyxCodeExecutor to enable AG2 agents to execute research paper code in pre-configured Docker environments. The implementation follows the established pattern of YepCodeCodeExecutor and properly extends DockerCommandLineCodeExecutor. Overall, this is a well-structured contribution with good documentation and test coverage.

✅ Strengths

Excellent Documentation
- Comprehensive docstrings with clear examples
- Tutorial notebook with real-world usage patterns
- Well-documented integration with Remyx API
Good Architecture
- Clean inheritance from DockerCommandLineCodeExecutor
- Proper separation of concerns
- Good use of optional dependencies pattern
Solid Test Coverage
- Unit tests with mocking (336 lines)
- Integration tests with real API/Docker (285 lines)
- Good coverage of error cases

🔍 Issues & Recommendations

High Priority

1. API Key Inconsistency (autogen/coding/remyx_code_executor.py:127)

self.api_key = api_key or os.getenv("REMYX_API_KEY")

Issue: The code checks for REMYX_API_KEY but the integration tests and docs reference REMYXAI_API_KEY.

Impact: Users may be confused about which environment variable to use, leading to authentication failures.

Recommendation:

self.api_key = api_key or os.getenv("REMYX_API_KEY") or os.getenv("REMYXAI_API_KEY")

2. API Key Not Used (autogen/coding/remyx_code_executor.py:119-134)

Issue: The api_key is stored but never passed to remyxai_get_asset(). The remyxai SDK likely reads it from environment variables, but this makes the constructor parameter misleading.

Recommendation: Either:

Pass the API key to the remyxai SDK if it supports it
Document that the API key parameter is for future use
Remove the api_key parameter if it's not needed

3. Missing Cleanup in `explore()` and `create_agents()` Methods (autogen/coding/remyx_code_executor.py:279-434)

Issue: The Docker container is never explicitly cleaned up after exploration sessions. This could lead to resource leaks in automated scenarios.

Recommendation: Add cleanup logic or document that users should manually call cleanup:

# In explore() method, add:
try:
    result = executor_agent.initiate_chat(...)
    return result
finally:
    if self.auto_remove:
        self.stop()  # Clean up container

4. Potential AttributeError in `format_chat_result()` (autogen/coding/remyx_code_executor.py:542-544)

total_cost = result.cost.get("usage_including_cached_inference", {}).get("total_cost", 0)

Issue: If result.cost is None, this will raise AttributeError.

Fix:

total_cost = (result.cost or {}).get("usage_including_cached_inference", {}).get("total_cost", 0)

Medium Priority

5. Unused Import (autogen/coding/remyx_code_executor.py:319, 460)

from autogen import ConversableAgent, LLMConfig

Issue: LLMConfig is imported but never used in explore() and create_agents() methods.

Fix: Remove unused import.

6. Hardcoded Path Assumptions (autogen/coding/remyx_code_executor.py:346-362)

Issue: The system message assumes the repository is always at /app. While this may be standard for Remyx images, it reduces flexibility.

Recommendation: Use self._asset_metadata.get("working_directory", "/app") consistently.

7. Missing Type Hints (autogen/coding/remyx_code_executor.py:287, 436)

Issue: Return type is Any for explore() and create_agents() methods.

Recommendation: Use proper type hints:

from autogen import ChatResult
def explore(...) -> ChatResult:

8. Container Environment Merge Logic (autogen/coding/remyx_code_executor.py:165-170)

Issue: The merge order is backwards - user-provided environment should take precedence over paper requirements, not the other way around.

Current (incorrect):

container_env.update(existing_env)  # Paper vars overwrite user vars

Should be:

existing_env.update(container_env)  # User vars take precedence
container_kwargs["environment"] = existing_env

Low Priority

9. RemyxCodeResult Not Used (autogen/coding/remyx_code_executor.py:41-46)

Issue: RemyxCodeResult is defined with arxiv_id and paper_title fields, but the executor returns CommandLineCodeResult which doesn't include this metadata.

Recommendation: Either:

Use RemyxCodeResult in execute_code_blocks() to include paper metadata
Remove RemyxCodeResult if it's not needed

10. Test Cleanup Repetition (test/coding/test_remyx_executor_integration.py)

Issue: Container cleanup code is duplicated in every test method.

Recommendation: Use a pytest fixture with yield for automatic cleanup:

@pytest.fixture
def executor(self, arxiv_id):
    executor = RemyxCodeExecutor(arxiv_id=arxiv_id, ...)
    yield executor
    # Cleanup
    if hasattr(executor, "_container") and executor._container:
        try:
            executor._container.stop()
            executor._container.remove()
        except Exception:
            pass

11. Inconsistent Emoji Usage

Issue: The codebase uses emojis in print statements (explore method) which may not render properly in all terminals.

Recommendation: Make emoji usage consistent with the rest of the codebase or make it optional.

🔒 Security Considerations

✅ Good Security Practices:

Docker container isolation for code execution
Configurable timeouts to prevent runaway processes
Auto-remove option to clean up containers
Environment variable handling for secrets

⚠️ Recommendations:

API Key Exposure Risk
- The API key is stored in plaintext in self.api_key
- Recommendation: Document that API keys should be kept secure and not logged
Docker Image Trust
- Code pulls and executes arbitrary Docker images from Remyx
- Recommendation: Add documentation about:
  - Image verification/signing
  - Trust model for Remyx images
  - What security guarantees Remyx provides
Environment Variable Passthrough
- The code automatically passes environment variables to containers (line 159-163)
- Recommendation: Add option to whitelist/blacklist environment variables for security

🚀 Performance Considerations

Docker Image Pull
- First execution will be slow due to image pulling
- Recommendation: Document expected first-run latency (~30 seconds as mentioned in PR description)
Container Lifecycle
- The executor doesn't reuse containers across multiple code blocks
- Recommendation: This is fine for the use case, but document that each executor creates one container
Memory and Resource Limits
- No Docker resource limits configured by default
- Recommendation: Consider documenting how users can set resource limits via container_create_kwargs

📊 Test Coverage Assessment

Current Coverage: 21.13% (per Codecov)

Coverage Analysis:

Well Tested:

Initialization logic (with/without arxiv_id, with/without image)
API key handling
Environment variable setup
Error cases (paper not found, no Docker image)
Basic properties (paper_info, code_extractor)

Needs More Coverage:

execute_code_blocks() method (only parent class tested)
explore() method (complex agent orchestration)
create_agents() method
format_chat_result() static method
Error handling in async execution scenarios

Recommendations:

Add integration tests for explore() method (marked as @pytest.mark.slow)
Add unit tests for execute_code_blocks() file extension fix
Test edge cases in format_chat_result() (None values, empty chat history)

📝 Documentation Quality

Excellent:

✅ Docstrings are comprehensive
✅ Tutorial notebook with multiple examples
✅ Installation instructions added
✅ Usage examples in docs

Minor Issues:

Line 319: Import of unused LLMConfig
Could benefit from a troubleshooting section in docs
Missing information about Docker resource requirements

🎯 Summary & Recommendations

Must Fix Before Merge:

✅ Fix API key environment variable inconsistency (REMYX_API_KEY vs REMYXAI_API_KEY)
✅ Fix environment variable merge order (user should override paper defaults)
✅ Fix potential AttributeError in format_chat_result()
⚠️ Add container cleanup to explore() method or document lifecycle management

Should Fix:

Remove unused LLMConfig import
Use or remove RemyxCodeResult class
Improve test cleanup with fixtures
Add proper type hints for return values

Nice to Have:

Document Docker security model
Add resource limit configuration examples
Increase test coverage for interactive methods
Add troubleshooting documentation

Overall Assessment

Score: 8/10 - This is a high-quality contribution with good architecture, documentation, and testing. The issues identified are mostly minor and easy to fix. The main concerns are around API key handling consistency and resource cleanup in long-running exploration sessions.

Recommendation: Approve with minor changes ✅

The PR demonstrates good understanding of the codebase patterns and follows the established conventions. Once the high-priority issues are addressed, this will be a valuable addition to AG2.

Generated by Claude Code Review

priyansh4320

hi @salma-remyx can you please fix the pre-commit check

salma-remyx · 2025-11-05T18:57:46Z

hi @salma-remyx can you please fix the pre-commit check

Got it! Let me know if there's anything else to add/adjust

priyansh4320 · 2025-11-09T06:06:19Z

autogen/coding/remyx_code_executor.py

+        if verbose:
+            print("=" * 80)
+            print("🔬 Interactive Research Exploration Session")
+            print("=" * 80)
+            print(f"📄 Paper: {self.arxiv_id or 'Custom image'}")
+
+            if interactive:
+                print("\n💬 INTERACTIVE MODE")
+                print("   - Press ENTER to continue")
+                print("   - Type guidance/questions")
+                print("   - Type 'exit' to end")
+            else:
+                print("\n🤖 AUTOMATED MODE")
+
+            print("=" * 80)
+            print()
+


let's add a logger.log instead of prints, can you please add this change

priyansh4320 · 2025-11-09T06:06:37Z

autogen/coding/__init__.py

+
+    __all__.extend(["RemyxCodeExecutor", "RemyxCodeResult"])
+except ImportError:
+    pass


Let's log an error message before passing

priyansh4320 · 2025-11-09T06:10:58Z

autogen/coding/remyx_code_executor.py

+        # Default exploration goal
+        default_goal = """Perform an interactive exploration of this research paper:
+**Phase 1: Understanding** (2-3 turns)
+1. Examine the directory structure
+2. Read README and identify key files
+3. Understand the paper's implementation
+**Phase 2: Experimentation** (3-5 turns)
+4. Run a minimal working example
+5. Experiment with different parameters
+6. Generate visualizations if applicable
+**Phase 3: Analysis** (2-3 turns)
+7. Explain key implementation details
+8. Answer any questions about the code
+9. Suggest possible modifications/experiments
+Work step-by-step. Wait for human guidance between phases.
+Type TERMINATE when exploration is complete."""
+


let's keep this as default for goal variable instead of creating a new var and adding up to memory.

priyansh4320 · 2025-11-09T06:19:04Z

autogen/coding/remyx_code_executor.py

+        # Create system message for writer agent
+        system_message = f"""{paper_context}
+**Your Mission:**
+{goal or default_goal}
+**Important Guidelines:**
+- Repository is at /app with all dependencies installed
+- Execute ONE command at a time - don't rush
+- Use absolute paths starting with /app
+- Be conversational and explain your actions
+- If you encounter errors, debug step-by-step
+- Wait for human feedback before major actions (if interactive mode)
+- Focus on lightweight demos unless instructed otherwise
+- You can install additional packages if needed
+**What You Can Do:**
+✓ Read and analyze code
+✓ Execute Python/bash commands
+✓ Modify code for experiments
+✓ Generate plots and visualizations
+✓ Install additional dependencies
+✓ Answer questions about implementation
+✓ Suggest improvements or experiments
+Begin by exploring the repository structure to understand what's available."""


let's give a user the power to make edits in system messages.
this will help in dealing with clients such as Ollama where people use small size models.
this will also improve UX with remyx for Dev where system message will can cator to a specific domain.

add a new param: system_message:
which will add up to system_message = f"""{paper_context}
Your Mission:
{goal or default_goal}. {system_message}"""

priyansh4320 · 2025-11-09T06:19:52Z

autogen/coding/remyx_code_executor.py

+        if verbose:
+            print("\n" + "=" * 80)
+            print("✅ Exploration Complete!")
+            print("=" * 80)
+            print("\n📊 Session Summary:")
+            print(f"   • Total messages: {len(result.chat_history)}")
+            print(f"   • Cost: ${result.cost['usage_including_cached_inference']['total_cost']:.4f}")
+
+            if result.summary:
+                print("\n💬 Final Status:")
+                # Print first 200 chars of summary
+                summary_preview = result.summary[:200] + "..." if len(result.summary) > 200 else result.summary
+                print(f"   {summary_preview}")
+
+            print("\n💾 Full chat history available in returned object")
+            print("   Access with: result.chat_history")
+            print("=" * 80)


same here replace print with logging

priyansh4320 · 2025-11-09T06:23:58Z

autogen/coding/remyx_code_executor.py

+        # Create system message
+        system_message = f"""{paper_context}
+**Your Mission:**
+{goal or default_goal}
+**Guidelines:**
+- Repository is at /app with all dependencies installed
+- Execute ONE command at a time
+- Use absolute paths starting with /app
+- Be conversational and explain your actions
+- Debug step-by-step if errors occur
+Begin by exploring the repository structure."""
+


another Idea I have about the UX, let's keep agent system_messages editable by devs.
where user will can have more control over LLM and can do prompt grounding/ or expected output examples to the agent/ or some custom guidelines. let's have defaults with editables system_messages.

priyansh4320 · 2025-11-09T06:25:49Z

autogen/coding/remyx_code_executor.py

+    @staticmethod
+    def format_chat_result(result: Any) -> str:
+        """
+        Format a ChatResult object into a readable summary.
+
+        Useful for displaying exploration results later.
+
+        Args:
+            result: The ChatResult object from explore() or initiate_chat()
+
+        Returns:


if this is a utility method let's move it to a utils file

priyansh4320 · 2025-11-09T06:29:48Z

notebook/agentchat_remyx_executor.ipynb

the content looks awesome, let's edit this according to reviewed changes

priyansh4320 · 2025-11-09T06:42:34Z

test/coding/test_remyx_executor_integration.py

add a groupchat test with RemyxCodeExecutor for this integration tests. assert content of the result with re.search("content",generated_content) i.e. regular expressions

priyansh4320 · 2025-11-09T07:05:46Z

Hi @salma-remyx , Can you please address the reviewed changes.

github-actions · 2025-11-26T20:13:46Z

🔄 Re-Review - PR #2141: RemyxCodeExecutor

Review Status

This is a re-review based on the previous Claude Code review from October 30, 2025. I've analyzed the current state of the code to determine which issues have been addressed and which remain outstanding.

✅ What Has Been Addressed

1. API Key Environment Variable Consistency ✅ FIXED

Previous Issue: Line 127 used REMYXAI_API_KEY but integration tests referenced inconsistent variable names.

Current Status: ✅ RESOLVED - The code now consistently uses REMYXAI_API_KEY throughout:

Line 127: self.api_key = api_key or os.getenv("REMYXAI_API_KEY")
Integration tests (line 48): Check for both REMYX_API_KEY and REMYXAI_API_KEY
Docstring (line 69): Documents REMYXAI_API_KEY

This is now consistent and well-documented.

⚠️ Outstanding Critical Issues

1. API Key Not Actually Used 🔴 HIGH PRIORITY

Location: autogen/coding/remyx_code_executor.py:127-134

Issue: The api_key parameter is stored in self.api_key but is never passed to the remyxai_get_asset() function at line 134.

Impact:

The constructor parameter is misleading to users
API authentication likely relies entirely on environment variables
If the remyxai SDK does not read from env vars, authentication will fail silently

Recommendation: Check if the remyxai_get_asset() function accepts an API key parameter. If not, either document that the API key parameter is for future use, or remove the parameter and rely solely on environment variables.

2. Potential AttributeError 🔴 HIGH PRIORITY

Location: autogen/coding/remyx_code_executor.py:426

Issue: At line 426, the code directly accesses result.cost without checking if it exists, but at line 547, the same access is properly protected.

Impact: If result.cost is None, line 426 will raise TypeError.

Fix: Use safe dictionary access at line 426 similar to line 547.

3. Environment Variable Merge Order 🟡 MEDIUM PRIORITY

Location: autogen/coding/remyx_code_executor.py:165-170

Current Behavior: User-provided environment variables take precedence over paper requirements.

Recommendation: Document this behavior clearly in the docstring. Users should understand what happens when they provide environment variables that conflict with paper requirements.

📋 Summary & Recommendations

🔴 Must Fix Before Merge:

Fix AttributeError risk at line 426 (unsafe result.cost access)
Clarify API key usage - Either use it or remove the parameter
Document environment variable merge order - Make the precedence behavior explicit

🟡 Should Fix:

Clarify container cleanup strategy in explore()
Decide on RemyxCodeResult usage (use it or remove it)

🎯 Overall Assessment

Score: 7.5/10 - This is a solid contribution with good architecture and documentation. The previous review main issue (API key consistency) has been addressed. However, there are still 2-3 critical issues that could cause runtime errors.

Recommendation: Request Changes ⚠️

The PR is close to ready, but the AttributeError risk and API key usage need to be addressed before merging. These are straightforward fixes that should take less than 30 minutes.

📝 Action Items for @salma-remyx

Critical: Fix line 426 to safely access result.cost
Critical: Clarify whether the api_key parameter is actually used by the remyxai SDK, or remove it
Important: Document the environment variable precedence behavior in the docstring
Optional: Decide on RemyxCodeResult usage

Once these are addressed, this will be ready to merge! Great work on addressing the API key consistency issue from the previous review.

Generated by Claude Code Review - Re-Review on 2025-11-26

codecov · 2025-11-26T20:17:59Z

Codecov Report

❌ Patch coverage is 21.12676% with 168 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
autogen/coding/remyx_code_executor.py	20.79%	160 Missing ⚠️
autogen/coding/factory.py	0.00%	5 Missing and 1 partial ⚠️
autogen/coding/__init__.py	60.00%	2 Missing ⚠️

Files with missing lines	Coverage Δ
autogen/coding/base.py	`100.00% <ø> (ø)`
autogen/coding/__init__.py	`75.00% <60.00%> (-6.82%)`	⬇️
autogen/coding/factory.py	`48.27% <0.00%> (-12.60%)`	⬇️
autogen/coding/remyx_code_executor.py	`20.79% <20.79%> (ø)`

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

salma-remyx added 8 commits October 10, 2025 20:05

adding RemyxCodeExecutor

3857795

adding explore() method

fd92d88

add tutorial

3dc96ec

add docs

152cbe1

add tests

dad7696

feat: Add RemyxCodeExecutor for research paper execution

9149703

update executor test

263f2a0

fix: Apply ruff formatting to test file

4c6954e

randombet requested review from marklysze and randombet October 12, 2025 06:24

update docs + llm config

683287c

salma-remyx force-pushed the feat/remyx-code-executor branch from 23f15a8 to 683287c Compare October 22, 2025 07:13

Merge branch 'main' into feat/remyx-code-executor

bfa3c35

priyansh4320 reviewed Oct 30, 2025

View reviewed changes

salma-remyx added 2 commits November 3, 2025 09:21

Merge branch 'main' into feat/remyx-code-executor

659f0ff

fix formatting

36b25dc

priyansh4320 requested changes Nov 9, 2025

View reviewed changes

Merge branch 'main' into feat/remyx-code-executor

990be95

feat: Add RemyxCodeExecutor for research paper execution #2141

Are you sure you want to change the base?

feat: Add RemyxCodeExecutor for research paper execution #2141

Uh oh!

Conversation

salma-remyx commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Checks

Uh oh!

CLAassistant commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 30, 2025

🤖 Code Review - PR #2141: RemyxCodeExecutor

Summary

✅ Strengths

🔍 Issues & Recommendations

High Priority

1. API Key Inconsistency (autogen/coding/remyx_code_executor.py:127)

2. API Key Not Used (autogen/coding/remyx_code_executor.py:119-134)

3. Missing Cleanup in explore() and create_agents() Methods (autogen/coding/remyx_code_executor.py:279-434)

4. Potential AttributeError in format_chat_result() (autogen/coding/remyx_code_executor.py:542-544)

Medium Priority

5. Unused Import (autogen/coding/remyx_code_executor.py:319, 460)

6. Hardcoded Path Assumptions (autogen/coding/remyx_code_executor.py:346-362)

7. Missing Type Hints (autogen/coding/remyx_code_executor.py:287, 436)

8. Container Environment Merge Logic (autogen/coding/remyx_code_executor.py:165-170)

Low Priority

9. RemyxCodeResult Not Used (autogen/coding/remyx_code_executor.py:41-46)

10. Test Cleanup Repetition (test/coding/test_remyx_executor_integration.py)

11. Inconsistent Emoji Usage

🔒 Security Considerations

✅ Good Security Practices:

⚠️ Recommendations:

🚀 Performance Considerations

📊 Test Coverage Assessment

Coverage Analysis:

📝 Documentation Quality

🎯 Summary & Recommendations

Must Fix Before Merge:

Should Fix:

Nice to Have:

Overall Assessment

Uh oh!

priyansh4320 left a comment

Choose a reason for hiding this comment

Uh oh!

salma-remyx commented Nov 5, 2025

Uh oh!

priyansh4320 Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

priyansh4320 commented Nov 9, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

🔄 Re-Review - PR #2141: RemyxCodeExecutor

Review Status

✅ What Has Been Addressed

1. API Key Environment Variable Consistency ✅ FIXED

⚠️ Outstanding Critical Issues

1. API Key Not Actually Used 🔴 HIGH PRIORITY

2. Potential AttributeError 🔴 HIGH PRIORITY

salma-remyx commented Oct 11, 2025 •

edited

Loading

CLAassistant commented Oct 22, 2025 •

edited

Loading

3. Missing Cleanup in `explore()` and `create_agents()` Methods (autogen/coding/remyx_code_executor.py:279-434)

4. Potential AttributeError in `format_chat_result()` (autogen/coding/remyx_code_executor.py:542-544)

priyansh4320 Nov 9, 2025 •

edited

Loading