Skip to content

Conversation

@salma-remyx
Copy link
Contributor

@salma-remyx salma-remyx commented Oct 11, 2025

Why are these changes needed?

This PR adds RemyxCodeExecutor, enabling AG2 agents to execute code from research papers in pre-configured Docker environments. This addresses a critical pain point in AI research: reproducing experimental results from papers typically requires hours of environment setup, dependency resolution, and CUDA version matching.

Problem:

  • Hours spent debugging environment configurations before running a single experiment
  • Conflicting dependencies, missing system libraries, and undocumented assumptions block reproducibility
  • No standardized way to discover and execute research papers programmatically with AI agents

RemyxCodeExecutor provides:

  • Instant execution: Run code from 1000+ ArXiv papers in ~30 seconds with minimal setup
  • Pre-built environments: Docker images with dependencies, CUDA libraries, and code pre-installed
  • AI-guided exploration: Interactive explore() method where agents explain and experiment with paper code
  • Paper discovery: Search API integration to find executable research papers
  • Three execution modes: Quick start, interactive learning, and batch automation

This follows the pattern established by YepCodeCodeExecutor (#1982), extending AG2's code execution capabilities to research-specific workflows.

Checks

  • I've included doc changes for https://docs.ag2.ai/:
    • Added comprehensive section in code-execution.mdx with usage examples
    • Added installation instructions in Optional-Dependencies.mdx
    • Created complete tutorial notebook agentchat_remyx_executor.ipynb with real examples
  • I've added tests corresponding to the changes:
    • Unit tests in test_remyx_executor.py
    • Integration tests in test_remyx_executor_integration.py (real API and Docker execution)
  • I've made sure all auto checks have passed

@CLAassistant
Copy link

CLAassistant commented Oct 22, 2025

CLA assistant check
All committers have signed the CLA.

@salma-remyx salma-remyx force-pushed the feat/remyx-code-executor branch from 23f15a8 to 683287c Compare October 22, 2025 07:13
@github-actions
Copy link
Contributor

🤖 Code Review - PR #2141: RemyxCodeExecutor

Summary

This PR adds RemyxCodeExecutor to enable AG2 agents to execute research paper code in pre-configured Docker environments. The implementation follows the established pattern of YepCodeCodeExecutor and properly extends DockerCommandLineCodeExecutor. Overall, this is a well-structured contribution with good documentation and test coverage.


✅ Strengths

  1. Excellent Documentation

    • Comprehensive docstrings with clear examples
    • Tutorial notebook with real-world usage patterns
    • Well-documented integration with Remyx API
  2. Good Architecture

    • Clean inheritance from DockerCommandLineCodeExecutor
    • Proper separation of concerns
    • Good use of optional dependencies pattern
  3. Solid Test Coverage

    • Unit tests with mocking (336 lines)
    • Integration tests with real API/Docker (285 lines)
    • Good coverage of error cases

🔍 Issues & Recommendations

High Priority

1. API Key Inconsistency (autogen/coding/remyx_code_executor.py:127)

self.api_key = api_key or os.getenv("REMYX_API_KEY")

Issue: The code checks for REMYX_API_KEY but the integration tests and docs reference REMYXAI_API_KEY.

Impact: Users may be confused about which environment variable to use, leading to authentication failures.

Recommendation:

self.api_key = api_key or os.getenv("REMYX_API_KEY") or os.getenv("REMYXAI_API_KEY")

2. API Key Not Used (autogen/coding/remyx_code_executor.py:119-134)

Issue: The api_key is stored but never passed to remyxai_get_asset(). The remyxai SDK likely reads it from environment variables, but this makes the constructor parameter misleading.

Recommendation: Either:

  • Pass the API key to the remyxai SDK if it supports it
  • Document that the API key parameter is for future use
  • Remove the api_key parameter if it's not needed

3. Missing Cleanup in explore() and create_agents() Methods (autogen/coding/remyx_code_executor.py:279-434)

Issue: The Docker container is never explicitly cleaned up after exploration sessions. This could lead to resource leaks in automated scenarios.

Recommendation: Add cleanup logic or document that users should manually call cleanup:

# In explore() method, add:
try:
    result = executor_agent.initiate_chat(...)
    return result
finally:
    if self.auto_remove:
        self.stop()  # Clean up container

4. Potential AttributeError in format_chat_result() (autogen/coding/remyx_code_executor.py:542-544)

total_cost = result.cost.get("usage_including_cached_inference", {}).get("total_cost", 0)

Issue: If result.cost is None, this will raise AttributeError.

Fix:

total_cost = (result.cost or {}).get("usage_including_cached_inference", {}).get("total_cost", 0)

Medium Priority

5. Unused Import (autogen/coding/remyx_code_executor.py:319, 460)

from autogen import ConversableAgent, LLMConfig

Issue: LLMConfig is imported but never used in explore() and create_agents() methods.

Fix: Remove unused import.

6. Hardcoded Path Assumptions (autogen/coding/remyx_code_executor.py:346-362)

Issue: The system message assumes the repository is always at /app. While this may be standard for Remyx images, it reduces flexibility.

Recommendation: Use self._asset_metadata.get("working_directory", "/app") consistently.

7. Missing Type Hints (autogen/coding/remyx_code_executor.py:287, 436)

Issue: Return type is Any for explore() and create_agents() methods.

Recommendation: Use proper type hints:

from autogen import ChatResult
def explore(...) -> ChatResult:

8. Container Environment Merge Logic (autogen/coding/remyx_code_executor.py:165-170)

Issue: The merge order is backwards - user-provided environment should take precedence over paper requirements, not the other way around.

Current (incorrect):

container_env.update(existing_env)  # Paper vars overwrite user vars

Should be:

existing_env.update(container_env)  # User vars take precedence
container_kwargs["environment"] = existing_env

Low Priority

9. RemyxCodeResult Not Used (autogen/coding/remyx_code_executor.py:41-46)

Issue: RemyxCodeResult is defined with arxiv_id and paper_title fields, but the executor returns CommandLineCodeResult which doesn't include this metadata.

Recommendation: Either:

  • Use RemyxCodeResult in execute_code_blocks() to include paper metadata
  • Remove RemyxCodeResult if it's not needed

10. Test Cleanup Repetition (test/coding/test_remyx_executor_integration.py)

Issue: Container cleanup code is duplicated in every test method.

Recommendation: Use a pytest fixture with yield for automatic cleanup:

@pytest.fixture
def executor(self, arxiv_id):
    executor = RemyxCodeExecutor(arxiv_id=arxiv_id, ...)
    yield executor
    # Cleanup
    if hasattr(executor, "_container") and executor._container:
        try:
            executor._container.stop()
            executor._container.remove()
        except Exception:
            pass

11. Inconsistent Emoji Usage

Issue: The codebase uses emojis in print statements (explore method) which may not render properly in all terminals.

Recommendation: Make emoji usage consistent with the rest of the codebase or make it optional.


🔒 Security Considerations

✅ Good Security Practices:

  1. Docker container isolation for code execution
  2. Configurable timeouts to prevent runaway processes
  3. Auto-remove option to clean up containers
  4. Environment variable handling for secrets

⚠️ Recommendations:

  1. API Key Exposure Risk

    • The API key is stored in plaintext in self.api_key
    • Recommendation: Document that API keys should be kept secure and not logged
  2. Docker Image Trust

    • Code pulls and executes arbitrary Docker images from Remyx
    • Recommendation: Add documentation about:
      • Image verification/signing
      • Trust model for Remyx images
      • What security guarantees Remyx provides
  3. Environment Variable Passthrough

    • The code automatically passes environment variables to containers (line 159-163)
    • Recommendation: Add option to whitelist/blacklist environment variables for security

🚀 Performance Considerations

  1. Docker Image Pull

    • First execution will be slow due to image pulling
    • Recommendation: Document expected first-run latency (~30 seconds as mentioned in PR description)
  2. Container Lifecycle

    • The executor doesn't reuse containers across multiple code blocks
    • Recommendation: This is fine for the use case, but document that each executor creates one container
  3. Memory and Resource Limits

    • No Docker resource limits configured by default
    • Recommendation: Consider documenting how users can set resource limits via container_create_kwargs

📊 Test Coverage Assessment

Current Coverage: 21.13% (per Codecov)

Coverage Analysis:

Well Tested:

  • Initialization logic (with/without arxiv_id, with/without image)
  • API key handling
  • Environment variable setup
  • Error cases (paper not found, no Docker image)
  • Basic properties (paper_info, code_extractor)

Needs More Coverage:

  • execute_code_blocks() method (only parent class tested)
  • explore() method (complex agent orchestration)
  • create_agents() method
  • format_chat_result() static method
  • Error handling in async execution scenarios

Recommendations:

  1. Add integration tests for explore() method (marked as @pytest.mark.slow)
  2. Add unit tests for execute_code_blocks() file extension fix
  3. Test edge cases in format_chat_result() (None values, empty chat history)

📝 Documentation Quality

Excellent:

  • ✅ Docstrings are comprehensive
  • ✅ Tutorial notebook with multiple examples
  • ✅ Installation instructions added
  • ✅ Usage examples in docs

Minor Issues:

  • Line 319: Import of unused LLMConfig
  • Could benefit from a troubleshooting section in docs
  • Missing information about Docker resource requirements

🎯 Summary & Recommendations

Must Fix Before Merge:

  1. ✅ Fix API key environment variable inconsistency (REMYX_API_KEY vs REMYXAI_API_KEY)
  2. ✅ Fix environment variable merge order (user should override paper defaults)
  3. ✅ Fix potential AttributeError in format_chat_result()
  4. ⚠️ Add container cleanup to explore() method or document lifecycle management

Should Fix:

  1. Remove unused LLMConfig import
  2. Use or remove RemyxCodeResult class
  3. Improve test cleanup with fixtures
  4. Add proper type hints for return values

Nice to Have:

  1. Document Docker security model
  2. Add resource limit configuration examples
  3. Increase test coverage for interactive methods
  4. Add troubleshooting documentation

Overall Assessment

Score: 8/10 - This is a high-quality contribution with good architecture, documentation, and testing. The issues identified are mostly minor and easy to fix. The main concerns are around API key handling consistency and resource cleanup in long-running exploration sessions.

Recommendation: Approve with minor changes

The PR demonstrates good understanding of the codebase patterns and follows the established conventions. Once the high-priority issues are addressed, this will be a valuable addition to AG2.


Generated by Claude Code Review

Copy link
Collaborator

@priyansh4320 priyansh4320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @salma-remyx can you please fix the pre-commit check

@salma-remyx
Copy link
Contributor Author

hi @salma-remyx can you please fix the pre-commit check

Got it! Let me know if there's anything else to add/adjust

Comment on lines +395 to +411
if verbose:
print("=" * 80)
print("🔬 Interactive Research Exploration Session")
print("=" * 80)
print(f"📄 Paper: {self.arxiv_id or 'Custom image'}")

if interactive:
print("\n💬 INTERACTIVE MODE")
print(" - Press ENTER to continue")
print(" - Type guidance/questions")
print(" - Type 'exit' to end")
else:
print("\n🤖 AUTOMATED MODE")

print("=" * 80)
print()

Copy link
Collaborator

@priyansh4320 priyansh4320 Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a logger.log instead of prints, can you please add this change

Comment on lines +35 to +38

__all__.extend(["RemyxCodeExecutor", "RemyxCodeResult"])
except ImportError:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's log an error message before passing

Comment on lines +325 to +341
# Default exploration goal
default_goal = """Perform an interactive exploration of this research paper:
**Phase 1: Understanding** (2-3 turns)
1. Examine the directory structure
2. Read README and identify key files
3. Understand the paper's implementation
**Phase 2: Experimentation** (3-5 turns)
4. Run a minimal working example
5. Experiment with different parameters
6. Generate visualizations if applicable
**Phase 3: Analysis** (2-3 turns)
7. Explain key implementation details
8. Answer any questions about the code
9. Suggest possible modifications/experiments
Work step-by-step. Wait for human guidance between phases.
Type TERMINATE when exploration is complete."""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep this as default for goal variable instead of creating a new var and adding up to memory.

Comment on lines +345 to +366
# Create system message for writer agent
system_message = f"""{paper_context}
**Your Mission:**
{goal or default_goal}
**Important Guidelines:**
- Repository is at /app with all dependencies installed
- Execute ONE command at a time - don't rush
- Use absolute paths starting with /app
- Be conversational and explain your actions
- If you encounter errors, debug step-by-step
- Wait for human feedback before major actions (if interactive mode)
- Focus on lightweight demos unless instructed otherwise
- You can install additional packages if needed
**What You Can Do:**
✓ Read and analyze code
✓ Execute Python/bash commands
✓ Modify code for experiments
✓ Generate plots and visualizations
✓ Install additional dependencies
✓ Answer questions about implementation
✓ Suggest improvements or experiments
Begin by exploring the repository structure to understand what's available."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's give a user the power to make edits in system messages.
this will help in dealing with clients such as Ollama where people use small size models.
this will also improve UX with remyx for Dev where system message will can cator to a specific domain.

add a new param: system_message:
which will add up to system_message = f"""{paper_context}
Your Mission:
{goal or default_goal}. {system_message}"""

Comment on lines +420 to +436
if verbose:
print("\n" + "=" * 80)
print("✅ Exploration Complete!")
print("=" * 80)
print("\n📊 Session Summary:")
print(f" • Total messages: {len(result.chat_history)}")
print(f" • Cost: ${result.cost['usage_including_cached_inference']['total_cost']:.4f}")

if result.summary:
print("\n💬 Final Status:")
# Print first 200 chars of summary
summary_preview = result.summary[:200] + "..." if len(result.summary) > 200 else result.summary
print(f" {summary_preview}")

print("\n💾 Full chat history available in returned object")
print(" Access with: result.chat_history")
print("=" * 80)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here replace print with logging

Comment on lines +473 to +484
# Create system message
system_message = f"""{paper_context}
**Your Mission:**
{goal or default_goal}
**Guidelines:**
- Repository is at /app with all dependencies installed
- Execute ONE command at a time
- Use absolute paths starting with /app
- Be conversational and explain your actions
- Debug step-by-step if errors occur
Begin by exploring the repository structure."""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another Idea I have about the UX, let's keep agent system_messages editable by devs.
where user will can have more control over LLM and can do prompt grounding/ or expected output examples to the agent/ or some custom guidelines. let's have defaults with editables system_messages.

Comment on lines +519 to +529
@staticmethod
def format_chat_result(result: Any) -> str:
"""
Format a ChatResult object into a readable summary.
Useful for displaying exploration results later.
Args:
result: The ChatResult object from explore() or initiate_chat()
Returns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a utility method let's move it to a utils file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the content looks awesome, let's edit this according to reviewed changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a groupchat test with RemyxCodeExecutor for this integration tests. assert content of the result with re.search("content",generated_content) i.e. regular expressions

@priyansh4320
Copy link
Collaborator

Hi @salma-remyx , Can you please address the reviewed changes.

@github-actions
Copy link
Contributor

🔄 Re-Review - PR #2141: RemyxCodeExecutor

Review Status

This is a re-review based on the previous Claude Code review from October 30, 2025. I've analyzed the current state of the code to determine which issues have been addressed and which remain outstanding.


✅ What Has Been Addressed

1. API Key Environment Variable Consistency ✅ FIXED

Previous Issue: Line 127 used REMYXAI_API_KEY but integration tests referenced inconsistent variable names.

Current Status:RESOLVED - The code now consistently uses REMYXAI_API_KEY throughout:

  • Line 127: self.api_key = api_key or os.getenv("REMYXAI_API_KEY")
  • Integration tests (line 48): Check for both REMYX_API_KEY and REMYXAI_API_KEY
  • Docstring (line 69): Documents REMYXAI_API_KEY

This is now consistent and well-documented.


⚠️ Outstanding Critical Issues

1. API Key Not Actually Used 🔴 HIGH PRIORITY

Location: autogen/coding/remyx_code_executor.py:127-134

Issue: The api_key parameter is stored in self.api_key but is never passed to the remyxai_get_asset() function at line 134.

Impact:

  • The constructor parameter is misleading to users
  • API authentication likely relies entirely on environment variables
  • If the remyxai SDK does not read from env vars, authentication will fail silently

Recommendation: Check if the remyxai_get_asset() function accepts an API key parameter. If not, either document that the API key parameter is for future use, or remove the parameter and rely solely on environment variables.


2. Potential AttributeError 🔴 HIGH PRIORITY

Location: autogen/coding/remyx_code_executor.py:426

Issue: At line 426, the code directly accesses result.cost without checking if it exists, but at line 547, the same access is properly protected.

Impact: If result.cost is None, line 426 will raise TypeError.

Fix: Use safe dictionary access at line 426 similar to line 547.


3. Environment Variable Merge Order 🟡 MEDIUM PRIORITY

Location: autogen/coding/remyx_code_executor.py:165-170

Current Behavior: User-provided environment variables take precedence over paper requirements.

Recommendation: Document this behavior clearly in the docstring. Users should understand what happens when they provide environment variables that conflict with paper requirements.


📋 Summary & Recommendations

🔴 Must Fix Before Merge:

  1. Fix AttributeError risk at line 426 (unsafe result.cost access)
  2. Clarify API key usage - Either use it or remove the parameter
  3. Document environment variable merge order - Make the precedence behavior explicit

🟡 Should Fix:

  1. Clarify container cleanup strategy in explore()
  2. Decide on RemyxCodeResult usage (use it or remove it)

🎯 Overall Assessment

Score: 7.5/10 - This is a solid contribution with good architecture and documentation. The previous review main issue (API key consistency) has been addressed. However, there are still 2-3 critical issues that could cause runtime errors.

Recommendation: Request Changes ⚠️

The PR is close to ready, but the AttributeError risk and API key usage need to be addressed before merging. These are straightforward fixes that should take less than 30 minutes.


📝 Action Items for @salma-remyx

  1. Critical: Fix line 426 to safely access result.cost
  2. Critical: Clarify whether the api_key parameter is actually used by the remyxai SDK, or remove it
  3. Important: Document the environment variable precedence behavior in the docstring
  4. Optional: Decide on RemyxCodeResult usage

Once these are addressed, this will be ready to merge! Great work on addressing the API key consistency issue from the previous review.


Generated by Claude Code Review - Re-Review on 2025-11-26

@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 21.12676% with 168 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
autogen/coding/remyx_code_executor.py 20.79% 160 Missing ⚠️
autogen/coding/factory.py 0.00% 5 Missing and 1 partial ⚠️
autogen/coding/__init__.py 60.00% 2 Missing ⚠️
Files with missing lines Coverage Δ
autogen/coding/base.py 100.00% <ø> (ø)
autogen/coding/__init__.py 75.00% <60.00%> (-6.82%) ⬇️
autogen/coding/factory.py 48.27% <0.00%> (-12.60%) ⬇️
autogen/coding/remyx_code_executor.py 20.79% <20.79%> (ø)

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants