-
Notifications
You must be signed in to change notification settings - Fork 9
feat: Add data persistence. #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughA persisted state flag was introduced for data objects, enabling them to be marked as persisted both during construction via the builder and after creation via the data instance. This required updates to the core data class, client-side builder, storage interface, MySQL storage implementation, and related tests to support storing, updating, and verifying the persisted flag. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant DataBuilder
participant Data
participant DataStore
participant StorageFactory
participant StorageConnection
Client->>DataBuilder: set_persisted()
DataBuilder->>DataBuilder: m_persisted = true
Client->>DataBuilder: build()
DataBuilder->>Data: set_persisted(m_persisted)
DataBuilder->>DataStore: add_data(Data)
DataStore->>StorageFactory: get_connection()
StorageFactory-->>DataStore: StorageConnection
DataStore->>StorageConnection: store Data (with persisted flag)
sequenceDiagram
participant Client
participant Data
participant DataStore
participant StorageFactory
participant StorageConnection
Client->>Data: set_persisted()
Data->>Data: set_persisted(true)
Data->>DataStore: update_data_persisted(Data)
DataStore->>StorageFactory: get_connection()
StorageFactory-->>DataStore: StorageConnection
DataStore->>StorageConnection: update persisted flag in storage
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (5)
⏰ Context from checks skipped due to timeout of 90000ms (3)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/spider/storage/DataStorage.hpp (1)
68-68: Add documentation for the new method.The new method lacks the documentation comment block that other methods in this interface have.
+ /** + * Update the persisted state of a data object in storage. + * @param conn + * @param data The data object with the desired persisted state + * @return StorageErr::Success if the update succeeds. Error types otherwise. + */ virtual auto set_data_persisted(StorageConnection& conn, Data const& data) -> StorageErr = 0;tests/storage/test-DataStorage.cpp (1)
59-65: Consider testing both true and false states.While the test confirms the ability to set and retrieve the persisted state, it only tests setting it to
false. For more thorough testing, consider also verifying that setting it totrueworks correctly.// Set data persisted should succeed data.set_persisted(false); REQUIRE(data_storage->set_data_persisted(*conn, data).success()); // Get data should match REQUIRE(data_storage->get_data(*conn, data.get_id(), &result).success()); REQUIRE(spider::test::data_equal(data, result)); + +// Set data persisted to true should also succeed +data.set_persisted(true); +REQUIRE(data_storage->set_data_persisted(*conn, data).success()); +// Get data should match with persisted=true +REQUIRE(data_storage->get_data(*conn, data.get_id(), &result).success()); +REQUIRE(spider::test::data_equal(data, result));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/spider/client/Data.hpp(4 hunks)src/spider/core/Data.hpp(1 hunks)src/spider/storage/DataStorage.hpp(1 hunks)src/spider/storage/mysql/MySqlStorage.cpp(5 hunks)src/spider/storage/mysql/MySqlStorage.hpp(1 hunks)tests/storage/test-DataStorage.cpp(2 hunks)tests/utils/CoreDataUtils.hpp(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/spider/core/Data.hpp (1)
src/spider/client/Data.hpp (1)
m_persisted(135-138)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint
- GitHub Check: non-storage-unit-tests (ubuntu-24.04)
- GitHub Check: non-storage-unit-tests (ubuntu-22.04)
🔇 Additional comments (13)
src/spider/core/Data.hpp (2)
34-37: Clean implementation of persisted state getters and setters.The implementation of persisted state handling follows the class's existing patterns with proper use of the
[[nodiscard]]attribute for the getter.
43-43: Appropriate default initialization.The
m_persistedflag is correctly initialized tofalseby default, ensuring data isn't considered persisted unless explicitly marked as such.src/spider/storage/mysql/MySqlStorage.hpp (1)
164-164: Method declaration correctly overrides the interface.The declaration properly overrides the pure virtual method from the parent interface, maintaining consistency with other storage methods.
tests/storage/test-DataStorage.cpp (1)
43-43: Dropping const qualifier to enable mutation.The
constqualifier has been removed from thedataobject to enable mutation for testing the new persisted flag functionality.tests/utils/CoreDataUtils.hpp (1)
23-25: Good addition to handle the new persisted state in equality checks.The comparison of
is_persisted()is properly added to thedata_equalfunction, ensuring that twoDataobjects are only considered equal if their persisted states match. This change is consistent with the existing pattern of comparing all relevant properties.src/spider/storage/mysql/MySqlStorage.cpp (4)
1990-1991: SQL statement properly updated to store persisted state.The addition of the
persistedcolumn to the SQL insert statement and corresponding parameter binding ensures that the persistence state is correctly stored in the database when creating new Data objects.Also applies to: 1998-1998
2040-2041: Good consistency in parallel implementation.The same changes for the persisted state have been correctly applied to the
add_task_datamethod, maintaining consistency withadd_driver_data.Also applies to: 2048-2048
2088-2088: Data retrieval correctly includes persisted state.The SQL select statement and subsequent object initialization properly retrieve and set the persisted state when loading Data objects from the database.
Also applies to: 2104-2104
2234-2251: Well-implemented method for updating persisted state.The new
set_data_persistedmethod follows the established pattern for database operations in this class, with proper:
- SQL statement preparation
- Parameter binding
- Exception handling
- Transaction management (commit/rollback)
This implementation enables updating the persisted state of existing Data objects in the database.
src/spider/client/Data.hpp (4)
82-100: Good implementation of post-creation checkpointing.The
set_checkpointed()method provides a clear API for marking data as persisted after creation. The implementation:
- Sets the persisted flag on the core data object
- Properly manages database connections (reusing existing or creating new)
- Follows the established error handling pattern
- Has clear documentation with appropriate exception specification
This method enables clients to checkpoint existing data objects, preventing them from being cleaned up.
130-138: Builder API properly extended for checkpointing during construction.The
set_checkpointed()method in the Builder class follows the fluent interface pattern used throughout the class. It has good documentation and properly maintains the internal persistence state.
210-210: Member variable correctly initialized.The
m_persistedmember variable is properly initialized tofalseby default, ensuring that data is not persisted unless explicitly requested.
153-153: Build method correctly applies persisted state.The addition to the
build()method ensures that the persisted state from the Builder is correctly transferred to the core::Data object during construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/spider/storage/mysql/MySqlStorage.cpp (1)
2234-2251: Consider adding validation for robustness.The implementation correctly follows the established pattern for database updates with proper transaction management. However, consider these improvements:
- Validate that the data record exists before attempting the update
- Check the update count to ensure the operation affected a record
- Consider optimizing by checking if the persisted state is actually changing
auto MySqlDataStorage::set_data_persisted(StorageConnection& conn, Data const& data) -> StorageErr { try { sql::bytes id_bytes = uuid_get_bytes(data.get_id()); std::unique_ptr<sql::PreparedStatement> statement( static_cast<MySqlConnection&>(conn)->prepareStatement( "UPDATE `data` SET `persisted` = ? WHERE `id` = ?" ) ); statement->setBoolean(1, data.is_persisted()); statement->setBytes(2, &id_bytes); - statement->executeUpdate(); + int32_t const update_count = statement->executeUpdate(); + if (update_count == 0) { + static_cast<MySqlConnection&>(conn)->rollback(); + return StorageErr{StorageErrType::KeyNotFoundErr, "Data record not found"}; + } } catch (sql::SQLException& e) { static_cast<MySqlConnection&>(conn)->rollback(); return StorageErr{StorageErrType::OtherErr, e.what()}; } static_cast<MySqlConnection&>(conn)->commit(); return StorageErr{}; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/spider/client/Data.hpp(4 hunks)src/spider/storage/DataStorage.hpp(1 hunks)src/spider/storage/mysql/MySqlStorage.cpp(5 hunks)src/spider/storage/mysql/MySqlStorage.hpp(1 hunks)tests/storage/test-DataStorage.cpp(2 hunks)tests/utils/CoreDataUtils.hpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/utils/CoreDataUtils.hpp
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint
- GitHub Check: non-storage-unit-tests (ubuntu-24.04)
- GitHub Check: non-storage-unit-tests (ubuntu-22.04)
🔇 Additional comments (9)
src/spider/storage/mysql/MySqlStorage.hpp (1)
164-164: LGTM! Method declaration follows established patterns.The new
set_data_persistedmethod declaration correctly overrides the pure virtual method from theDataStorageinterface. The method signature is consistent with other storage methods in the class, and the placement maintains logical grouping with other data manipulation methods.tests/storage/test-DataStorage.cpp (2)
43-43: Correct removal of const qualifier to enable mutation.Removing the
constqualifier from thedatavariable is necessary to allow callingdata.set_persisted(true)in the subsequent test code. This change enables testing the persistence functionality while maintaining the existing test logic.
59-65: Comprehensive test coverage for data persistence functionality.The new test sequence thoroughly validates the persistence feature:
- Sets the data as persisted using
data.set_persisted(true)- Calls the storage method to persist the flag
- Retrieves and verifies the data matches the updated state
This follows the established testing patterns and ensures the new persistence functionality works correctly with the storage layer.
src/spider/storage/DataStorage.hpp (1)
68-68: Well-designed interface extension for persistence functionality.The new
set_data_persistedpure virtual method appropriately extends theDataStorageinterface. The method signature is consistent with existing interface methods, and the placement maintains logical grouping with other data manipulation operations. This provides a clean abstraction for setting data persistence across different storage implementations.src/spider/storage/mysql/MySqlStorage.cpp (2)
1990-1991: LGTM: SQL statements properly updated for persisted column support.The INSERT statements correctly include the new
persistedcolumn and use secure parameter binding. The implementation is consistent with existing patterns in the codebase.Also applies to: 2040-2041, 1998-1998, 2048-2048
2088-2088: LGTM: SELECT statements correctly updated to retrieve persisted state.The queries properly include the
persistedcolumn and the boolean value is correctly retrieved and set on the data object.Also applies to: 2104-2104
src/spider/client/Data.hpp (3)
82-100: LGTM: Well-implemented checkpointing functionality.The
set_checkpointed()method correctly follows the established pattern fromset_locality(), with proper error handling and support for both existing and new database connections. The documentation is clear and the exception handling is appropriate.
130-138: LGTM: Clean builder pattern implementation.The builder's
set_checkpointed()method is well-implemented, following the established builder pattern and providing clear documentation.
153-153: LGTM: Proper integration of persisted state in builder.The addition of the
m_persistedmember variable with appropriate default value and its usage in thebuild()method correctly integrates the checkpointing functionality into the builder pattern.Also applies to: 210-210
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add a real PR description.
- Is there a reason we use both "checkpointed" and "persisted" as terms for this? Are there plans for different types of "persistent" data? It would be less complex to stick to a single term.
For theoretical part, especially for the paper, all the task inputs/outputs that are plain value are considered persisted in the storage. However, in practice, these two terms are used interchangeably. |
I see. As discussed offline let's only use one for the code. |
Description
Add support for client to set a
Dataas persisted when building the data or throughData::set_persistedat runtime. TheDatamarked persisted will not be garbage collected.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Summary by CodeRabbit