-
Notifications
You must be signed in to change notification settings - Fork 1
Add API for sweep line over MPAs #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -936,6 +936,72 @@ int lotman_get_context_int(const char *key, int *output, char **err_msg); | |||||
| A reference to a char array that can store any error messages. | ||||||
| */ | ||||||
|
|
||||||
| int lotman_get_max_mpas_for_period(int64_t start_ms, int64_t end_ms, bool include_deletion, char **output, | ||||||
| char **err_msg); | ||||||
| /** | ||||||
| DESCRIPTION: A function for determining the maximum summed Management Policy Attributes (MPAs) | ||||||
| across all overlapping lots during a specified time period. This function uses a sweep line | ||||||
| algorithm to efficiently calculate the peak resource allocation at any point during the period. | ||||||
| This is useful for capacity planning and scheduling systems that need to determine available | ||||||
| space for new lot allocations, e.g. "Can I create a lot with 50GB dedicated storage from time | ||||||
| A to time B without overcommitting resources?" | ||||||
|
|
||||||
| RETURNS: Returns 0 on success. Any other values indicate an error. | ||||||
|
|
||||||
| INPUTS: | ||||||
| start_ms: | ||||||
| A Unix timestamp in milliseconds indicating the start of the query period (inclusive). | ||||||
|
|
||||||
| end_ms: | ||||||
| A Unix timestamp in milliseconds indicating the end of the query period (inclusive). | ||||||
| Must be greater than start_ms or the function will return an error. | ||||||
|
|
||||||
| include_deletion: | ||||||
| A boolean indicating which lot endpoint to consider: | ||||||
| - When false: lots are considered active until their expiration_time | ||||||
| - When true: lots are considered active until their deletion_time | ||||||
| For most capacity planning scenarios, false is recommended since expired lots may still | ||||||
| consume resources even if they become opportunistic. | ||||||
|
|
||||||
| output: | ||||||
| A reference to a char * that will be allocated and populated with a JSON string containing | ||||||
| the results. The caller is responsible for freeing this memory. | ||||||
|
|
||||||
| err_msg: | ||||||
| A reference to a char array that can store any error messages. | ||||||
|
|
||||||
| Output JSON Specification: | ||||||
| The output JSON contains both the query parameters (for logging/debugging) and the results: | ||||||
| { | ||||||
| "start_ms": <input start_ms value, in unix milliseconds>, | ||||||
| "end_ms": <input end_ms value, in unix milliseconds>, | ||||||
| "include_deletion": <input include_deletion value>, | ||||||
| "max_dedicated_GB": <maximum sum of dedicated_GB at any point during the period>, | ||||||
| "max_opportunistic_GB": <maximum sum of opportunistic_GB at any point during the period>, | ||||||
| "max_combined_GB": <maximum sum of (dedicated_GB + opportunistic_GB) at any point during the period>, | ||||||
| "max_num_objects": <maximum sum of max_num_objects at any point during the period> | ||||||
| } | ||||||
|
|
||||||
| Notes: | ||||||
| - max_dedicated_GB represents the maximum cumulative storage Lotman has dedicated to lots during the | ||||||
| specified period | ||||||
| - max_combined_GB sums over both opportunistic and dedicated storage, representing the total maximum storage | ||||||
| Lotman has allocated to lots during the specified period | ||||||
| - max_opportunistic_GB and max_combined_GB may be produced by different sets of overlapping lots | ||||||
|
||||||
| - max_opportunistic_GB and max_combined_GB may be produced by different sets of overlapping lots | |
| - max_dedicated_GB, max_opportunistic_GB, and max_combined_GB may each be produced by different sets of overlapping lots at different points in time |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2182,6 +2182,153 @@ bool lotman::Checks::will_be_orphaned(const std::string <BR, const std::string | |
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Implementation of sweep line algorithm for finding maximum MPAs during a time period. | ||
| * | ||
| * This implements the classic sweep line algorithm for interval scheduling problems. | ||
| * See: https://www.geeksforgeeks.org/maximum-number-of-overlapping-intervals/ | ||
| * | ||
| * The algorithm works by: | ||
| * 1. Creating "events" for each lot's start (creation) and end (expiration/deletion) | ||
| * 2. Sorting all events by time | ||
| * 3. Sweeping through events chronologically, tracking current resource usage with deltas | ||
| * that correspond to each event's attributes | ||
| * 4. Recording the maximum usage observed at any point | ||
| * | ||
| * Key semantic: Lot lifetimes are INCLUSIVE intervals [creation_time, end_time]. | ||
| * A lot is active at both its start and end timestamps. Therefore, we schedule | ||
| * removal events at end_time + 1 (the first moment the lot is no longer active). | ||
| */ | ||
|
|
||
| std::pair<lotman::MaxMPAResult, std::string> lotman::get_max_mpas_for_period_internal(int64_t start_ms, int64_t end_ms, | ||
| bool include_deletion) { | ||
| // Validate input | ||
| if (start_ms >= end_ms) { | ||
| return {{0.0, 0.0, 0.0, 0}, "Error: start_ms must be less than end_ms"}; | ||
| } | ||
|
|
||
| auto &storage = lotman::db::StorageManager::get_storage(); | ||
|
|
||
| // Determine which time field to use for lot end time | ||
| using MPA = lotman::db::ManagementPolicyAttributes; | ||
| using Parent = lotman::db::Parent; | ||
| using namespace sqlite_orm; | ||
|
|
||
| // Query lots that overlap with the period, filtering to only ROOT lots. | ||
| // | ||
| // IMPORTANT: We only count root lots (self-parent lots) to avoid double-counting in hierarchies. | ||
| // A root lot is one where the lot has only itself as a parent in the parents table. | ||
| // Child lots consume quota from their parents, so counting both would be incorrect. | ||
| // | ||
| // For example, if parent_lot has 5GB and child_lot (child of parent_lot) has 3GB, | ||
| // the maximum capacity usage should be 5GB (from the parent), not 8GB (parent + child). | ||
| // | ||
| // Overlap condition for inclusive intervals: creation_time <= end_ms AND end_time >= start_ms | ||
| // This correctly handles all overlap cases including point-in-time overlaps at boundaries. | ||
| // | ||
| // Root lot condition: EXISTS exactly one parent record WHERE parent = lot_name | ||
| // We use a SQL subquery to identify root lots directly in the database for optimal performance. | ||
| std::string time_field = include_deletion ? "deletion_time" : "expiration_time"; | ||
| std::string query = "SELECT mpa.lot_name, mpa.dedicated_GB, mpa.opportunistic_GB, mpa.max_num_objects, " | ||
| " mpa.creation_time, mpa." + | ||
| time_field + | ||
| " " | ||
| "FROM management_policy_attributes mpa " | ||
| "WHERE mpa.creation_time <= ? AND mpa." + | ||
| time_field + | ||
| " >= ? " | ||
| " AND mpa.lot_name IN ( " | ||
| " SELECT p.lot_name " | ||
| " FROM parents p " | ||
| " WHERE p.lot_name = p.parent " | ||
| " GROUP BY p.lot_name " | ||
| " HAVING COUNT(*) = 1 " | ||
| " )"; | ||
|
|
||
| std::map<int64_t, std::vector<int>> query_int_map{{end_ms, {1}}, {start_ms, {2}}}; | ||
| auto rp = lotman::db::SQL_get_matches_multi_col(query, 6, std::map<std::string, std::vector<int>>(), query_int_map); | ||
|
|
||
| if (!rp.second.empty()) { | ||
| return {{0.0, 0.0, 0.0, 0}, "Database query failed: " + rp.second}; | ||
| } | ||
|
|
||
| auto &lots = rp.first; | ||
|
|
||
| // If no root lots overlap, return zeros with no error | ||
| if (lots.empty()) { | ||
| return {{0.0, 0.0, 0.0, 0}, ""}; | ||
| } | ||
|
|
||
| // Event structure for sweep line algorithm | ||
| struct Event { | ||
| int64_t time; | ||
| double ded_delta; // Change in dedicated storage | ||
| double opp_delta; // Change in opportunistic storage | ||
| int64_t obj_delta; // Change in object count | ||
| bool is_start; // true for creation event, false for expiration/deletion event | ||
| }; | ||
|
|
||
| std::vector<Event> events; | ||
| events.reserve(lots.size() * 2); // Each lot creates at most 2 events | ||
|
|
||
| // Build event list from query results | ||
| // Each row contains: [lot_name, dedicated_GB, opportunistic_GB, max_num_objects, creation_time, end_time] | ||
| for (const auto &lot_row : lots) { | ||
| // Parse query results from string vector (columns 0-5) | ||
| // lot_row[0] = lot_name (string, not used in sweep line) | ||
| double dedicated = std::stod(lot_row[1]); // dedicated_GB | ||
| double opportunistic = std::stod(lot_row[2]); // opportunistic_GB | ||
| int64_t objects = std::stoll(lot_row[3]); // max_num_objects | ||
| int64_t creation = std::stoll(lot_row[4]); // creation_time | ||
| int64_t end_time = std::stoll(lot_row[5]); // expiration_time or deletion_time | ||
|
|
||
| // Clamp lot start to query range (if lot starts before start_ms, treat as starting at start_ms) | ||
| int64_t effective_start = std::max(start_ms, creation); | ||
|
|
||
| // Add creation/start event at the lot's effective start time | ||
| events.push_back({effective_start, dedicated, opportunistic, objects, true}); | ||
|
|
||
| // Add expiration/deletion event AFTER the lot ends (since lot is active through end_time inclusive) | ||
| // Only add if the lot ends before the query period ends | ||
| if (end_time < end_ms) { | ||
| // Schedule removal at end_time + 1 (first moment lot is no longer active) | ||
| events.push_back({end_time + 1, -dedicated, -opportunistic, -objects, false}); | ||
|
||
| } | ||
| // If end_time >= end_ms, the lot extends beyond our query range, so no removal event needed | ||
| } | ||
|
|
||
| // Sort events chronologically, with start events before end events at the same timestamp | ||
| std::sort(events.begin(), events.end(), [](const Event &a, const Event &b) { | ||
| if (a.time != b.time) { | ||
| return a.time < b.time; | ||
| } | ||
| // At same time, process starts before ends (true sorts before false) | ||
| // This ensures we correctly handle simultaneous creation/expiration events | ||
| return a.is_start > b.is_start; | ||
| }); | ||
|
|
||
| // Sweep through events chronologically, tracking current and maximum resource usage | ||
| double current_ded = 0.0, current_opp = 0.0, current_combined = 0.0; | ||
| double max_ded = 0.0, max_opp = 0.0, max_combined = 0.0; | ||
| int64_t current_obj = 0, max_obj = 0; | ||
|
|
||
| for (const auto &event : events) { | ||
| // Update current resource usage based on event deltas | ||
| current_ded += event.ded_delta; | ||
| current_opp += event.opp_delta; | ||
| current_obj += event.obj_delta; | ||
| current_combined = current_ded + current_opp; | ||
|
|
||
| // Track the maximum values observed at any point | ||
| max_ded = std::max(max_ded, current_ded); | ||
| max_opp = std::max(max_opp, current_opp); | ||
| max_combined = std::max(max_combined, current_combined); | ||
| max_obj = std::max(max_obj, current_obj); | ||
| } | ||
|
|
||
| return {{max_ded, max_opp, max_combined, max_obj}, ""}; | ||
| } | ||
|
|
||
| void lotman::Context::set_caller(const std::string caller) { | ||
| m_caller = std::make_shared<std::string>(caller); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is internally inconsistent regarding the include_deletion parameter. Line 963-964 states "false is recommended since expired lots may still consume resources even if they become opportunistic" - but if expired lots still consume resources, then true (using deletion_time) would be more accurate for capacity planning, not false. This contradicts the PR description which states expiration_time is "recommended for capacity planning". Please clarify the intended semantics: if lots stop consuming dedicated resources at expiration_time but continue consuming opportunistic resources until deletion_time, or if they continue consuming all resources until deletion_time.