feat: Add fine-grained failure recovery. #149

sitaowang1998 · 2025-05-29T20:34:49Z

Description

Adds fine-grained failure recovery by rolling back minimum number of tasks.

When tasks fail, Spider finds the minimum graph in the task graph that:

Contains all failed tasks in a job
Contains only finished, running and ready tasks
All Data passed into the graph are persisted
Has no children that are finished, running or ready.

Spider rolls back all tasks in the subgraph to ready or pending, and executes them again.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

GitHub workflows pass.
Unit tests pass in dev container.
Integration tests pass in dev container.

Summary by CodeRabbit

New Features
- Introduced job recovery functionality to automatically detect and recover failed jobs, resetting task states as needed.
- Added support for marking data as checkpointed (persisted), ensuring important data is retained during recovery.
Bug Fixes
- Improved logic for resetting and retrying failed tasks, including better handling of task dependencies and data persistence.
Tests
- Added comprehensive unit tests for job recovery and data persistence features.
- Enhanced storage tests to verify correct handling of persisted data and partial job resets.

…t job

coderabbitai · 2025-05-29T20:34:56Z

Warning

Rate limit exceeded

@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 41 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f412bc3 and 55bfdaf.

📒 Files selected for processing (2)

src/spider/scheduler/scheduler.cpp (5 hunks)
src/spider/storage/mysql/MySqlStorage.cpp (8 hunks)

Walkthrough

This change introduces a job recovery mechanism to the scheduler, enabling detection and recovery of failed jobs and tasks by analyzing task graphs and data persistence. It adds new interfaces and implementations for marking data as persisted, resetting tasks, and retrieving failed jobs. Comprehensive tests and build configuration updates are included.

Changes

File(s)	Change Summary
src/spider/CMakeLists.txt, tests/CMakeLists.txt	Added new source/header and test files for JobRecovery to build/test targets.
src/spider/client/Data.hpp, src/spider/core/Data.hpp	Added checkpointing/persisted state methods and flags to Data and its Builder.
src/spider/core/JobRecovery.cpp, src/spider/core/JobRecovery.hpp	Implemented and declared the JobRecovery class for reconstructing minimal subgraphs for job recovery.
src/spider/scheduler/scheduler.cpp	Added a recovery loop thread for failed jobs; integrates JobRecovery and task resetting into the scheduler.
src/spider/storage/DataStorage.hpp, src/spider/storage/mysql/MySqlStorage.cpp, src/spider/storage/mysql/MySqlStorage.hpp	Added/implemented set_data_persisted method for marking data persisted in storage.
src/spider/storage/MetadataStorage.hpp, src/spider/storage/mysql/MySqlStorage.cpp, src/spider/storage/mysql/MySqlStorage.hpp	Added/implemented get_failed_jobs and reset_tasks methods for job/task recovery.
tests/scheduler/test-JobRecovery.cpp	Added comprehensive unit tests for job recovery scenarios and task graph handling.
tests/storage/test-DataStorage.cpp	Extended data storage tests to verify persisted state logic.
tests/storage/test-MetadataStorage.cpp	Added test for partial job reset and task state transitions.
tests/utils/CoreDataUtils.hpp	Updated data_equal utility to compare persisted state.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant StorageFactory
    participant MetadataStorage
    participant DataStorage
    participant JobRecovery

    loop Every 1000 seconds
        Scheduler->>StorageFactory: get_connection()
        Scheduler->>MetadataStorage: get_failed_jobs(conn)
        MetadataStorage-->>Scheduler: [failed_job_ids]
        loop for each failed_job_id
            Scheduler->>JobRecovery: construct(job_id, conn, data_store, metadata_store)
            Scheduler->>JobRecovery: compute_graph()
            JobRecovery->>MetadataStorage: load_task_graph(job_id)
            JobRecovery->>DataStorage: get_data(data_id)
            JobRecovery-->>Scheduler: ready_tasks, pending_tasks
            Scheduler->>MetadataStorage: reset_tasks(conn, ready_tasks, pending_tasks)
        end
    end

Suggested reviewers

davidlion

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (6)

src/spider/core/JobRecovery.cpp (1)
1-1: Use angle brackets for header inclusion.

For consistency with other includes in this file, use angle brackets instead of quotes.
-#include "JobRecovery.hpp"
+#include <spider/core/JobRecovery.hpp>
src/spider/core/JobRecovery.hpp (3)
41-43: Consider returning const references for better performance.

The getter methods currently return vectors by value, which involves copying. Since these are likely read-only operations, returning const references would be more efficient.
-    auto get_ready_tasks() -> std::vector<boost::uuids::uuid>;
+    auto get_ready_tasks() const -> std::vector<boost::uuids::uuid> const&;

-    auto get_pending_tasks() -> std::vector<boost::uuids::uuid>;
+    auto get_pending_tasks() const -> std::vector<boost::uuids::uuid> const&;
Note: This would require converting the internal absl::flat_hash_set to std::vector during compute_graph() instead of in the getters.

52-53: Parameter could be passed by const reference.

The not_persisted parameter is an output parameter but the method signature doesn't clearly indicate this. Consider using a more explicit approach.
-    auto check_task_input(Task const& task, absl::flat_hash_set<boost::uuids::uuid>& not_persisted)
+    auto check_task_input(Task const& task, absl::flat_hash_set<boost::uuids::uuid>* not_persisted)
             -> StorageErr;
Using a pointer makes it clearer at the call site that this is an output parameter.

23-28: Constructor parameters could benefit from named parameter idiom.

With four shared_ptr parameters of different types, there's a risk of passing arguments in the wrong order. Consider using a builder pattern or configuration struct.

Consider creating a configuration struct:
struct JobRecoveryConfig {
    boost::uuids::uuid job_id;
    std::shared_ptr<StorageConnection> storage_connection;
    std::shared_ptr<DataStorage> data_store;
    std::shared_ptr<MetadataStorage> metadata_store;
};

JobRecovery(JobRecoveryConfig config);
tests/scheduler/test-JobRecovery.cpp (1)
114-115: Avoid shadowing member variables.

The local variables ready_tasks and pending_tasks could be confused with potential member variables. Consider using different names for clarity.
-    auto ready_tasks = recovery.get_ready_tasks();
-    auto pending_tasks = recovery.get_pending_tasks();
+    auto const recovered_ready_tasks = recovery.get_ready_tasks();
+    auto const recovered_pending_tasks = recovery.get_pending_tasks();
src/spider/storage/mysql/MySqlStorage.cpp (1)
1178-1181: Consider indexing for performance.

The SQL query filters on state and compares retry with max_retry. Ensure these columns are properly indexed for efficient query execution.

Consider adding a composite index on (state, retry, max_retry) or at least on state if not already present:
CREATE INDEX idx_tasks_state_retry ON tasks(state, retry, max_retry);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7e5ff and accbdfd.

📒 Files selected for processing (15)

src/spider/CMakeLists.txt (2 hunks)
src/spider/client/Data.hpp (4 hunks)
src/spider/core/Data.hpp (1 hunks)
src/spider/core/JobRecovery.cpp (1 hunks)
src/spider/core/JobRecovery.hpp (1 hunks)
src/spider/scheduler/scheduler.cpp (5 hunks)
src/spider/storage/DataStorage.hpp (1 hunks)
src/spider/storage/MetadataStorage.hpp (1 hunks)
src/spider/storage/mysql/MySqlStorage.cpp (8 hunks)
src/spider/storage/mysql/MySqlStorage.hpp (2 hunks)
tests/CMakeLists.txt (1 hunks)
tests/scheduler/test-JobRecovery.cpp (1 hunks)
tests/storage/test-DataStorage.cpp (2 hunks)
tests/storage/test-MetadataStorage.cpp (1 hunks)
tests/utils/CoreDataUtils.hpp (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (3)

src/spider/client/Data.hpp (3)

src/spider/client/TaskContext.hpp (1)

m_data_store (263-263)

src/spider/client/Exception.hpp (2)

ConnectionException (14-15)

ConnectionException (14-14)

src/spider/storage/DataStorage.hpp (16)

conn (23-23)

conn (26-27)

conn (30-31)

conn (33-33)

conn (44-49)

conn (60-65)

conn (67-67)

conn (68-68)

conn (69-69)

conn (71-72)

conn (74-78)

conn (80-84)

conn (86-90)

conn (92-92)

conn (94-94)

conn (96-96)

tests/scheduler/test-JobRecovery.cpp (2)

src/spider/core/Data.hpp (1)

gen (45-48)

src/spider/core/JobRecovery.hpp (1)

task (52-53)

src/spider/core/JobRecovery.cpp (2)

src/spider/core/JobRecovery.hpp (4)

JobRecovery (23-28)

task_id (76-76)

task (52-53)

data_id (63-63)

src/spider/client/TaskContext.hpp (2)

m_metadata_store (265-265)

m_data_store (263-263)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: non-storage-unit-tests (ubuntu-22.04)
GitHub Check: non-storage-unit-tests (ubuntu-24.04)
GitHub Check: lint

🔇 Additional comments (30)

tests/CMakeLists.txt (1)

13-13: LGTM! Test integration follows established patterns.

The addition of the JobRecovery test file to the build system is correctly placed and follows the existing alphabetical ordering convention.

src/spider/storage/DataStorage.hpp (1)

68-68: Well-designed interface addition for persistence tracking.

The new set_data_persisted method follows the established interface patterns with appropriate parameter types and return value. This is a breaking change for existing implementations, but that's expected when adding new functionality.

src/spider/core/Data.hpp (2)

34-36: Clean implementation of persistence state management.

The set_persisted and is_persisted methods follow established naming conventions and const-correctness patterns. The [[nodiscard]] attribute on the getter is consistent with other query methods in the class.

43-43: Appropriate default initialization for persistence flag.

The m_persisted member is correctly initialized to false, which is the expected default state for newly created data objects that haven't been checkpointed yet.

src/spider/CMakeLists.txt (2)

7-7: Correct integration of JobRecovery source file.

The addition follows the established pattern for core components and maintains alphabetical ordering within the sources list.

31-31: Proper header file integration.

The JobRecovery header is correctly added to the headers list with appropriate placement maintaining alphabetical order.

tests/storage/test-DataStorage.cpp (2)

43-43: LGTM! Necessary change to enable mutation.

Removing the const qualifier is required to allow calling set_persisted(true) on the data object in the extended test.

59-64: LGTM! Comprehensive persistence state testing.

The test properly verifies the complete persistence workflow:

Sets the persistence flag on the data object

Calls the storage interface to persist the state

Retrieves the data again to verify persistence is maintained

Uses the updated data_equal utility that includes persistence comparison

This provides good coverage for the new persistence functionality.

tests/utils/CoreDataUtils.hpp (1)

23-25: LGTM! Essential addition for comprehensive data comparison.

The persistence state check follows the same pattern as other property comparisons and is necessary for accurate test assertions now that Data objects have persistence state.

src/spider/scheduler/scheduler.cpp (3)

11-11: LGTM! Necessary includes for recovery functionality.

The new includes support the recovery feature:

vector for job ID storage

boost/uuid/uuid_io.hpp for UUID string conversion in logging

JobRecovery.hpp for the recovery logic implementation

Also applies to: 21-21, 27-27

48-48: LGTM! Consistent timing configuration.

The recovery interval constant follows the same pattern as the cleanup interval, maintaining consistency in the codebase.

330-335: LGTM! Consistent thread management.

The recovery thread follows the established pattern:

Created with the same parameter passing style as other threads

Properly joined before server shutdown

Maintains consistency with existing thread lifecycle management

Also applies to: 339-339

src/spider/storage/MetadataStorage.hpp (2)

81-89: LGTM! Well-documented interface addition.

The get_failed_jobs method is properly documented and follows established interface patterns:

Clear documentation explaining the method's purpose

Consistent parameter style with storage connection and output parameter

Appropriate return type for error handling

90-102: LGTM! Comprehensive task reset interface.

The reset_tasks method provides flexible task state management:

Detailed documentation explaining the dual-state reset capability

Separate parameters for ready and pending tasks allow fine-grained control

Consistent with other interface methods in style and error handling

tests/storage/test-MetadataStorage.cpp (1)

549-633: Well-structured test for partial reset functionality.

The test case properly validates the new reset_tasks method by:

Setting up a complex task graph with dependencies

Executing only parent_1 before the reset

Verifying that parent_1 maintains its Succeed state and outputs after partial reset

Confirming that parent_2 and child_task are appropriately reset to Ready and Pending states respectively

The test complements the existing full job reset test and provides good coverage for the partial recovery feature.

src/spider/client/Data.hpp (2)

81-99: Consistent implementation of checkpointing functionality.

The set_checkpointed() method follows the established pattern from set_locality(), properly handling connection reuse and error cases. The integration with the storage layer via set_data_persisted() is appropriate.

129-137: Clean extension of the builder pattern.

The builder properly supports the checkpointing feature with:

A fluent set_checkpointed() method

Appropriate default value (false) for backward compatibility

Correct propagation of the persisted flag to the constructed Data object

Also applies to: 152-152, 200-200

src/spider/storage/mysql/MySqlStorage.hpp (1)

78-84: Appropriate interface extensions for job recovery.

The new methods follow established patterns in the codebase:

get_failed_jobs() uses output parameters consistent with other getter methods

reset_tasks() clearly separates ready and pending task lists for precise state control

set_data_persisted() mirrors the existing set_data_locality() interface

These additions provide the necessary storage layer support for the fine-grained failure recovery feature.

Also applies to: 173-173

src/spider/core/JobRecovery.cpp (4)

35-58: Well-implemented graph traversal for recovery computation.

The breadth-first approach correctly identifies all tasks that need to be recovered by:

Starting with failed tasks

Processing their descendants and ancestors based on data persistence

Propagating errors appropriately

The use of deque for BFS is appropriate for maintaining traversal order.

60-71: Efficient caching implementation.

The data caching logic appropriately:

Checks cache before making storage calls

Only caches successfully retrieved data

Returns the same error if retrieval fails

This optimization is important given the potentially large number of data objects in a task graph.

73-99: Correct logic for identifying non-persisted dependencies.

The method properly:

Handles optional data IDs and parent task references

Collects parent tasks whose outputs haven't been persisted

Propagates storage errors appropriately

This ensures that all tasks producing non-persisted data consumed by the current task are included in the recovery subgraph.

101-148: Sound implementation of the core recovery algorithm.

The method correctly:

Includes non-pending child tasks in the recovery subgraph (as they may have consumed failed task outputs)

Determines task readiness based on input data persistence

Propagates recovery to parent tasks when their outputs aren't persisted

The distinction between ready and pending tasks based on data persistence aligns with the PR's objective of fine-grained recovery.

tests/scheduler/test-JobRecovery.cpp (3)

26-70: LGTM! Well-structured test for basic recovery scenario.

The test properly sets up a single failed task and verifies that recovery correctly identifies it as ready. Good cleanup at the end.

502-505: Good use of STL algorithms for verification.

Using std::ranges::find to check task presence in the pending list is clean and idiomatic. The assertions properly verify all expected tasks are in the pending state.

179-377: Excellent documentation with visual diagrams.

The Graphviz dot notation in comments clearly illustrates the task dependency graphs being tested. This makes the test scenarios much easier to understand.
src/spider/storage/mysql/MySqlStorage.cpp (5)

1258-1283: Complex logic for data persistence management needs careful review.

The logic to determine which data should be marked as not persisted is complex and could have edge cases. The current implementation:

Gets all persisted data referenced by tasks

Filters out data referenced by tasks not in the ready/pending lists

Updates remaining data to not persisted

Please verify that this logic correctly handles all scenarios, especially:

Data referenced by multiple tasks (some being reset, some not)

Data that should remain persisted due to external references

Consider adding detailed comments explaining the logic.

1851-1875: Good addition of retry logic before marking job as failed.

The modification properly checks retry count against max retries before marking the job as failed. This aligns well with the retry semantics needed for job recovery.

1775-1781: SQL query change from NOT EXISTS to NOT IN.

The change from NOT EXISTS to NOT IN for updating tasks to ready state could have different performance characteristics. NOT IN can be problematic with NULL values.

Verify that:

The task_id column cannot contain NULL values

Performance is acceptable with the NOT IN approach

Consider using NOT EXISTS for potentially better performance:
UPDATE `tasks` SET `state` = 'ready' 
WHERE `id` IN (SELECT `task_id` FROM `task_inputs` WHERE `output_task_id` = ?) 
AND `state` = 'pending' 
AND NOT EXISTS (
    SELECT 1 FROM `task_inputs` 
    WHERE `task_id` = `tasks`.`id` 
    AND `output_task_id` = ? 
    AND `value` IS NULL 
    AND `data_id` IS NULL
)
2167-2169: Consistent handling of persisted flag in data operations.

Good addition of the persisted column in INSERT statements for both driver and task data. The implementation is consistent across both methods.

Also applies to: 2217-2219

2411-2428: Well-implemented set_data_persisted method.

The new method properly updates the persisted flag with appropriate error handling and transaction management.

src/spider/scheduler/scheduler.cpp

src/spider/storage/mysql/MySqlStorage.cpp

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/spider/scheduler/scheduler.cpp (1)
48-48: Consider documenting the recovery interval choice.

The 1000-second (16.7 minutes) recovery interval is quite long. While this may be appropriate for production workloads to avoid excessive overhead, consider adding a comment explaining the rationale for this interval.
+// Recovery check interval in seconds (16.7 minutes to balance responsiveness with system load)
 constexpr int cRecoveryInterval = 1000;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between accbdfd and f412bc3.

📒 Files selected for processing (4)

src/spider/core/JobRecovery.cpp (1 hunks)
src/spider/core/JobRecovery.hpp (1 hunks)
src/spider/scheduler/scheduler.cpp (5 hunks)
tests/scheduler/test-JobRecovery.cpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

tests/scheduler/test-JobRecovery.cpp
src/spider/core/JobRecovery.hpp
src/spider/core/JobRecovery.cpp

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/spider/scheduler/scheduler.cpp (1)

src/spider/storage/MetadataStorage.hpp (16)

conn (27-27)

conn (29-29)

conn (30-30)

conn (32-33)

conn (35-36)

conn (40-43)

conn (45-51)

conn (53-54)

conn (56-57)

conn (59-60)

conn (62-66)

conn (69-70)

conn (72-76)

conn (78-78)

conn (80-80)

conn (87-88)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: lint
GitHub Check: non-storage-unit-tests (ubuntu-24.04)
GitHub Check: non-storage-unit-tests (ubuntu-22.04)

🔇 Additional comments (3)

src/spider/scheduler/scheduler.cpp (3)

11-11: LGTM: Include statements are necessary for the new functionality.

The added includes are all required for the recovery feature:

<vector> for job_ids storage

<boost/uuid/uuid_io.hpp> for UUID string conversion in logging

<spider/core/JobRecovery.hpp> for the JobRecovery class

Also applies to: 21-21, 27-27

153-212: Excellent implementation of the recovery loop with comprehensive error handling.

The recovery loop follows established patterns and includes proper error handling at each stage. The previous critical issue with reset_tasks parameters has been correctly resolved - the function now properly passes recovery.get_ready_tasks() and recovery.get_pending_tasks() as separate parameters.

Key strengths:

Consistent pattern with other loop functions

Comprehensive error handling with appropriate logging

Proper resource management with storage connections

Graceful degradation on failures (continues with next job)

330-335: Proper thread management for the recovery functionality.

The recovery thread is correctly created and joined, following the same pattern as existing heartbeat and cleanup threads. The thread lifecycle management ensures clean shutdown.

Also applies to: 339-339

sitaowang1998 added 30 commits May 15, 2025 11:47

Add storage functions to get job with failed tasks and partially rese…

467861d

…t job

Update retry count when resetting tasks

409970b

Fix typo

319c21f

Set the job fail only if we run out of retries

39c2f97

Bug fix for sql statments

cfb49d2

Add unit test for partial job reset

7a8a0a2

WIP for job recovery

4a30602

Add persisted in data

66eb5d2

Add data persistence in storage and client

22fa399

Add data persistence test

cd3b530

WIP

d8a99ca

Add persisted in data

cb1a1e5

Add data persistence in storage and client

42411df

Add data persistence test

041679f

Fix the tests

f9bcb0e

Merge branch 'main' into data_persistence

027c835

Merge branch 'data_persistence' into checkpoint-frontier

6cc86a0

Fix relative header

cc031dc

Add job recovery

46ae85a

Add getters for job recovery result

74aade0

Add docstring

f93723d

Fix clang-tidy

11c2cd7

Fix clang tidy

584ad73

Fix clang tidy

e9ccff6

Add recovery loop in scheduler

b7e0911

Fix clang tidy

24b858a

Fix clang tidy

668902b

Bug fix

dc38efc

Add simple job recovery unit test

dc6918e

Adding children of rollbacked task into queue

14f2b94

sitaowang1998 added 13 commits May 28, 2025 16:11

Break up job recovery step into separate function

0307dda

Fix clang-tidy

0ef1aa2

Fix typo

8838cda

Add multiple task recovery test

7172eee

Merge branch 'main' into checkpoint-frontier

ddfc24a

Add cleanup for tests

b09b56a

Add driver removel in tests

cfb0159

Add more tests for job recovery

5a2d396

Fix not updating task state to ready when multiple children exists

084adc8

Fix multiple parents

0e96dc7

Add more tests

cdc841f

Fix clang-tidy

7571304

Use dot graph for docstring

accbdfd

sitaowang1998 requested a review from a team as a code owner May 29, 2025 20:34

coderabbitai bot reviewed May 29, 2025

View reviewed changes

src/spider/scheduler/scheduler.cpp Show resolved Hide resolved

src/spider/storage/mysql/MySqlStorage.cpp Show resolved Hide resolved

sitaowang1998 added 4 commits May 29, 2025 16:41

Make JobRecovery getters const

bd57749

Use const for ready and pending tasks in test

d32dc7f

Bug fix

f412bc3

Bug fix

c3f438c

coderabbitai bot reviewed May 29, 2025

View reviewed changes

Fix interval

55bfdaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add fine-grained failure recovery. #149

feat: Add fine-grained failure recovery. #149

Uh oh!

sitaowang1998 commented May 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented May 29, 2025 •

edited

Loading

Rate limit exceeded

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add fine-grained failure recovery. #149

Are you sure you want to change the base?

feat: Add fine-grained failure recovery. #149

Uh oh!

Conversation

sitaowang1998 commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sitaowang1998 commented May 29, 2025 •

edited

Loading

coderabbitai bot commented May 29, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)