Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update firecrawl node #163

Merged
merged 2 commits into from
Feb 25, 2025
Merged

feat: update firecrawl node #163

merged 2 commits into from
Feb 25, 2025

Conversation

olbychos
Copy link
Contributor

@olbychos olbychos commented Feb 25, 2025

Important

Update Firecrawl node to API v1, refactor Firecrawl tool for enhanced configuration and response handling, and add integration tests.

  • Firecrawl Node Update:
    • Update url in Firecrawl class in connections.py to use API version v1.
  • Firecrawl Tool Refactor:
    • Replace PageOptions and ExtractorOptions with new configuration options in firecrawl.py.
    • Add _build_scrape_data() and _format_agent_response() methods in FirecrawlTool to build request payload and format responses.
    • Update execute() method to use new methods for request and response handling.
  • Testing:
    • Add test_firecrawl.py for integration tests covering basic scrape, agent-optimized output, and custom options.

This description was created by Ellipsis for d4ac7e5. It will automatically update as commits are pushed.

@olbychos olbychos self-assigned this Feb 25, 2025
@olbychos olbychos requested a review from a team as a code owner February 25, 2025 19:18
Copy link

github-actions bot commented Feb 25, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
dynamiq/connections
   connections.py4225487%14–18, 58, 66, 77, 100, 464–465, 467–468, 472–473, 475–476, 489–490, 492, 518, 520, 532, 534–536, 610–611, 652, 752, 805, 850–851, 862, 956–957, 990–991, 1027–1028, 1077–1078, 1080, 1083–1084, 1089, 1094, 1101–1102, 1105–1109
dynamiq/nodes/tools
   firecrawl.py87989%91, 94, 96, 98, 127–128, 142–143, 146
TOTAL11912363869% 

Tests Skipped Failures Errors Time
416 0 💤 0 ❌ 0 🔥 45.765s ⏱️

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to df1cc32 in 4 minutes and 6 seconds

More details
  • Looked at 355 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. dynamiq/connections/connections.py:636
  • Draft comment:
    Updated Firecrawl API URL to v1. Ensure the trailing slash is consistently handled when concatenating endpoints.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The first part of the comment is just stating what changed, which isn't useful. The trailing slash concern seems speculative - there's no evidence of any actual issue. The URL already has a trailing slash defined, and we don't see any endpoint concatenation code that would be problematic. Without seeing actual evidence of incorrect slash handling, this seems like an unnecessary concern.
    Maybe there are endpoint concatenation issues elsewhere in the codebase that make this a valid concern? The comment could be pointing out a real architectural issue.
    Even if there are issues elsewhere, we're supposed to ignore cross-file issues and only focus on what's visible in this diff. Based on just this code, there's no evidence of any slash handling problems.
    The comment should be deleted as it's partly just restating the obvious change and partly raising a speculative concern without evidence of an actual issue.
2. dynamiq/nodes/tools/firecrawl.py:87
  • Draft comment:
    The conditional logic for 'blockAds' is unusual: when block_ads is true, it sets blockAds to None. Confirm if this behavior is intentional.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The comment asks for confirmation of intended behavior, which violates our rule about not asking authors to confirm their intention. Additionally, looking at line 98-99, we see that None values are filtered out, suggesting this is likely intentional behavior to handle the API's expectations differently for this particular field. The code appears to be deliberately written this way.
    The unusual pattern could indicate a bug in the code that should be fixed. The inconsistency with other boolean fields is notable.
    While the pattern is different, the code is consistent in its structure and the filtering of None values suggests this is an intentional API design choice. The comment violates our rule about asking for confirmation.
    Delete the comment. It violates our rule about asking for confirmation of intent, and the code appears to be deliberately structured this way to work with the API's expectations.
3. tests/integration/nodes/tools/test_firecrawl.py:62
  • Draft comment:
    Ensure that concatenating the connection URL with 'scrape' yields the expected endpoint without introducing extra or missing slashes.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the author to ensure that the concatenation of the URL is correct, which is a form of asking the author to double-check their work. This violates the rule against asking the author to ensure behavior is intended.
4. dynamiq/connections/connections.py:636
  • Draft comment:
    Updated Firecrawl URL from v0 to v1. Ensure that the backend supports the new version and that this change is backward-compatible.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the PR author to ensure compatibility and backend support, which violates the rule against asking for confirmation or testing. It doesn't provide a specific code suggestion or point out a specific issue.
5. dynamiq/nodes/tools/firecrawl.py:129
  • Draft comment:
    Endpoint URL concatenation using string addition may lead to errors if the base URL format changes. Consider using urljoin or a similar utility for robustness.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    While using urljoin is generally a good practice for URL handling, there's no strong evidence here that the current approach is problematic. We don't know the format of self.connection.url - it could be properly formatted already. The change is minor and speculative - it might help in some edge cases but isn't fixing a clear bug.
    The comment points out a real best practice in URL handling. URL concatenation can indeed cause issues if the base URL doesn't end with a slash or contains query parameters.
    However, this seems more like a minor optimization than a critical issue. The code has likely been working fine, and we don't have evidence that the URL format is causing problems.
    Delete the comment. While the suggestion follows best practices, it's a minor optimization without clear evidence of actual problems, making it more of a "nice to have" than a necessary change.
6. dynamiq/nodes/tools/firecrawl.py:68
  • Draft comment:
    Removal of manual camelCase conversion in favor of Pydantic aliasing (populate_by_name) is a good step. Confirm that all API field names are correctly mapped.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the author to confirm that the API field names are correctly mapped after a change in the code. This falls under asking the author to confirm their intention or to double-check things, which is against the rules.

Workflow ID: wflow_obONKl4Z320Uf4t2


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on d4ac7e5 in 1 minute and 48 seconds

More details
  • Looked at 18 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. dynamiq/nodes/tools/firecrawl.py:90
  • Draft comment:
    Handling of 'blockAds' is inconsistent with similar boolean fields. Only setting it when false might be intended, but if the API requires an explicit true when enabled, this could lead to issues.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The inconsistency is real, but is it a problem? The default value of block_ads is True (line 65), and the code only sets it to False when needed. This suggests it's an API where blockAds=true is the default behavior, and we only need to specify it when we want to disable it. The other boolean fields follow a different pattern but may have different API requirements.
    I don't have visibility into the Firecrawl API documentation or requirements. The different handling could be intentional based on API needs.
    Without clear evidence that this inconsistency causes actual problems or violates API requirements, we should assume the author implemented it this way for a reason.
    Delete the comment. While there is an inconsistency, we don't have strong evidence that it's incorrect or problematic.
2. dynamiq/nodes/tools/firecrawl.py:90
  • Draft comment:
    Refactored 'blockAds' handling: now the key is only added when self.block_ads is false (setting it to False), omitting it (thus letting the API default to true) when self.block_ads is true. Please add a brief comment to explain this behavior for clarity.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    While the comment accurately describes the change, it's primarily requesting documentation for something that follows a clear pattern in the code. The behavior is consistent with how other boolean flags are handled in the conditional_fields dictionary. The code is self-documenting and follows the established pattern of the codebase.
    The comment might have merit since this is an API contract and documenting default behaviors can be important. The change does represent a subtle behavioral difference in how the API field is handled.
    However, the code follows a clear pattern visible throughout the file where fields are only included when they differ from defaults. The behavior is consistent and self-documenting.
    The comment should be deleted as it requests documentation for behavior that is already clear from the code's context and patterns.
3. dynamiq/nodes/tools/firecrawl.py:106
  • Draft comment:
    Typographical error: The closing tag in the first section uses a backslash ('<\Source URL>') instead of a forward slash ('</Source URL>'). Consider changing it for consistency with common tag syntax.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. dynamiq/nodes/tools/firecrawl.py:112
  • Draft comment:
    Typographical error: The closing tag generated in the loop uses a backslash (e.g., '<\Markdown>') instead of a forward slash (e.g., ''). Consider updating it for proper tag syntax.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_nCKag4o04AaCVasU


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@olbychos olbychos merged commit a4c90e6 into main Feb 25, 2025
8 checks passed
@olbychos olbychos deleted the feat/update_frecrawl branch February 25, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants