Skip to content

Conversation

@GuoTeng-ECNU
Copy link

No description provided.

@joechenrh joechenrh requested a review from Copilot May 23, 2025 06:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a full adaptive filter-cache subsystem, including sampling heat via HeatBuckets, allocating filter units with a greedy algorithm and two-heap adjustment, and integrating with a LightGBM-based model and YCSB workloads.

  • Add HeatBuckets and SamplesPool to track and update key-range hotness
  • Implement GreedyAlgo, FilterCacheHeap/FilterCacheHeapManager, and FilterCacheManager for dynamic unit allocation
  • Wire up client APIs (FilterCacheClient) and model I/O (ClfModel), plus minor YCSB DB changes

Reviewed Changes

Copilot reviewed 80 out of 86 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
db/art/heat_buckets.h/.cc Sampling and hotness‐tracking buckets
db/art/greedy_algo.h/.cc Greedy OPT unit‐allocation algorithm
db/art/filter_cache_item.h/.cc Skeleton for per‐segment filter units
db/art/filter_cache_heap.h Heap structures for cost/benefit adjustment
db/art/filter_cache_client.h/.cc Async client integration with thread pool
db/art/filter_cache.h Core FilterCache and FilterCacheManager logic
db/art/clf_model.h/.cc CSV I/O & LightGBM server interface
YCSB/rocksdb/rocksdb_db.cc Log workload & switch update to insert
YCSB/leveldb/leveldb_db.cc Remove unused SerializeRow
Files not reviewed (6)
  • CMakeLists.txt: Language not supported
  • TARGETS: Language not supported
  • YCSB/Makefile: Language not supported
  • YCSB/rocksdb/rocksdb.properties: Language not supported
  • YCSB/workloads/workloadt: Language not supported
  • build_tools/build_detect_platform: Language not supported
Comments suppressed due to low confidence (1)

db/art/clf_model.h:45

  • LessorComparor is misspelled; consider LessOrComparator or simply LessComparator.
inline bool RangeRatePairLessorComparor(const RangeRatePair& pair_1, const RangeRatePair& pair_2);

class HeatBuckets {
private:
// TODO: mutex can be optimized
static std::vector<std::string> seperators_;
Copy link

Copilot AI May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name seperators_ is misspelled; it should be separators_.

Suggested change
static std::vector<std::string> seperators_;
static std::vector<std::string> separators_;

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +50
double enable_benifit;
SegmentAlgoHelper(const uint32_t& id, const uint32_t& cnt, const uint32_t& size, const uint16_t& units) {
segment_id = id; visit_cnt = cnt; size_per_unit = size; units_num = units;
enable_benifit = StandardBenefit(visit_cnt, units_num);
// assert(units_num <= MAX_UNITS_NUM);
}
SegmentAlgoHelper(const uint32_t& id, SegmentAlgoInfo& segment_algo_info) {
segment_id = id; visit_cnt = segment_algo_info.visit_cnt;
size_per_unit = segment_algo_info.size_per_unit; units_num = 0;
enable_benifit = StandardBenefit(visit_cnt, units_num);
Copy link

Copilot AI May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field enable_benifit is misspelled; it should be enable_benefit.

Suggested change
double enable_benifit;
SegmentAlgoHelper(const uint32_t& id, const uint32_t& cnt, const uint32_t& size, const uint16_t& units) {
segment_id = id; visit_cnt = cnt; size_per_unit = size; units_num = units;
enable_benifit = StandardBenefit(visit_cnt, units_num);
// assert(units_num <= MAX_UNITS_NUM);
}
SegmentAlgoHelper(const uint32_t& id, SegmentAlgoInfo& segment_algo_info) {
segment_id = id; visit_cnt = segment_algo_info.visit_cnt;
size_per_unit = segment_algo_info.size_per_unit; units_num = 0;
enable_benifit = StandardBenefit(visit_cnt, units_num);
double enable_benefit;
SegmentAlgoHelper(const uint32_t& id, const uint32_t& cnt, const uint32_t& size, const uint16_t& units) {
segment_id = id; visit_cnt = cnt; size_per_unit = size; units_num = units;
enable_benefit = StandardBenefit(visit_cnt, units_num);
// assert(units_num <= MAX_UNITS_NUM);
}
SegmentAlgoHelper(const uint32_t& id, SegmentAlgoInfo& segment_algo_info) {
segment_id = id; visit_cnt = segment_algo_info.visit_cnt;
size_per_unit = segment_algo_info.size_per_unit; units_num = 0;
enable_benefit = StandardBenefit(visit_cnt, units_num);

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +26
inline bool FilterCacheHeapNodeLessComparor(const FilterCacheHeapNode& node_1, const FilterCacheHeapNode& node_2);
inline bool FilterCacheHeapNodeGreaterComparor(const FilterCacheHeapNode& node_1, const FilterCacheHeapNode& node_2);
Copy link

Copilot AI May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word Comparor is misspelled; it should be Comparator.

Suggested change
inline bool FilterCacheHeapNodeLessComparor(const FilterCacheHeapNode& node_1, const FilterCacheHeapNode& node_2);
inline bool FilterCacheHeapNodeGreaterComparor(const FilterCacheHeapNode& node_1, const FilterCacheHeapNode& node_2);
inline bool FilterCacheHeapNodeLessComparator(const FilterCacheHeapNode& node_1, const FilterCacheHeapNode& node_2);
inline bool FilterCacheHeapNodeGreaterComparator(const FilterCacheHeapNode& node_1, const FilterCacheHeapNode& node_2);

Copilot uses AI. Check for mistakes.
// need to generate random integer in [0, old samples_cnt_] (equal to [0, old samples_cnt_ + 1))
// new samples_cnt_ = old samples_cnt_ + 1
// if you want random integer in [a, b], use (rand() % (b-a+1))+a;
srand((unsigned)time(NULL));
Copy link

Copilot AI May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeding with srand on each call can lead to predictable random sequences; move srand to a one‐time initialization outside this function.

Suggested change
srand((unsigned)time(NULL));

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +63
std::vector<std::string>& range_seperators() {
return filter_cache_manager_.range_seperators();
Copy link

Copilot AI May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name range_seperators is misspelled; it should be range_separators.

Suggested change
std::vector<std::string>& range_seperators() {
return filter_cache_manager_.range_seperators();
std::vector<std::string>& range_separators() {
return filter_cache_manager_.range_separators();

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants