intel · byrnedj · Oct 21, 2022 · Aug 17, 2022 · Sep 19, 2022 · Sep 21, 2022
diff --git a/MultiTierDataMovement.md b/MultiTierDataMovement.md
@@ -0,0 +1,90 @@
+# Background Data Movement
+
+In order to reduce the number of online evictions and support asynchronous
+promotion - we have added two periodic workers to handle eviction and promotion.
+
+The diagram below shows a simplified version of how the background evictor
+thread (green) is integrated to the CacheLib architecture. 
+
+<p align="center">
+  <img width="640" height="360" alt="BackgroundEvictor" src="cachelib-background-evictor.png">
+</p>
+
+## Background Evictors
+
+The background evictors scan each class to see if there are objects to move the next (lower)
+tier using a given strategy. Here we document the parameters for the different
+strategies and general parameters. 
+
+- `backgroundEvictorIntervalMilSec`: The interval that this thread runs for - by default
+the background evictor threads will wake up every 10 ms to scan the AllocationClasses. Also,
+the background evictor thread will be woken up everytime there is a failed allocation (from
+a request handling thread) and the current percentage of free memory for the 
+AllocationClass is lower than `lowEvictionAcWatermark`. This may render the interval parameter
+not as important when there are many allocations occuring from request handling threads. 
+
+- `evictorThreads`: The number of background evictors to run - each thread is a assigned
+a set of AllocationClasses to scan and evict objects from. Currently, each thread gets
+an equal number of classes to scan - but as object size distribution may be unequal - future
+versions will attempt to balance the classes among threads. The range is 1 to number of AllocationClasses.
+The default is 1. 
+
+- `maxEvictionBatch`: The number of objects to remove in a given eviction call. The
+default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
+remove objects at a reasonable rate, too high and it might increase contention with user threads.
+
+- `minEvictionBatch`: Minimum number of items to evict at any time (if there are any
+candidates)
+
+- `maxEvictionPromotionHotness`: Maximum candidates to consider for eviction. This is similar to `maxEvictionBatch`
+but it specifies how many candidates will be taken into consideration, not the actual number of items to evict.
+This option can be used to configure duration of critical section on LRU lock.
+
+
+### FreeThresholdStrategy (default)
+
+- `lowEvictionAcWatermark`: Triggers background eviction thread to run
+when this percentage of the AllocationClass is free. 
+The default is `2.0`, to avoid wasting capacity we don't set this above `10.0`.
+
+- `highEvictionAcWatermark`: Stop the evictions from an AllocationClass when this 
+percentage of the AllocationClass is free. The default is `5.0`, to avoid wasting capacity we
+don't set this above `10`.
+
+
+## Background Promoters
+
+The background promoters scan each class to see if there are objects to move to a lower
+tier using a given strategy. Here we document the parameters for the different
+strategies and general parameters.
+
+- `backgroundPromoterIntervalMilSec`: The interval that this thread runs for - by default
+the background promoter threads will wake up every 10 ms to scan the AllocationClasses for
+objects to promote.
+
+- `promoterThreads`: The number of background promoters to run - each thread is a assigned
+a set of AllocationClasses to scan and promote objects from. Currently, each thread gets
+an equal number of classes to scan - but as object size distribution may be unequal - future
+versions will attempt to balance the classes among threads. The range is `1` to number of AllocationClasses. The default is `1`.
+
+- `maxProtmotionBatch`: The number of objects to promote in a given promotion call. The
+default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
+remove objects at a reasonable rate, too high and it might increase contention with user threads. 
+
+- `minPromotionBatch`: Minimum number of items to promote at any time (if there are any
+candidates)
+
+- `numDuplicateElements`: This allows us to promote items that have existing handles (read-only) since
+we won't need to modify the data when a user is done with the data. Therefore, for a short time
+the data could reside in both tiers until it is evicted from its current tier. The default is to
+not allow this (0). Setting the value to 100 will enable duplicate elements in tiers.
+
+### Background Promotion Strategy (only one currently)
+
+- `promotionAcWatermark`: Promote items if there is at least this
+percent of free AllocationClasses. Promotion thread will attempt to move `maxPromotionBatch` number of objects
+to that tier. The objects are chosen from the head of the LRU. The default is `4.0`.
+This value should correlate with `lowEvictionAcWatermark`, `highEvictionAcWatermark`, `minAcAllocationWatermark`, `maxAcAllocationWatermark`.
+- `maxPromotionBatch`: The number of objects to promote in batch during BG promotion. Analogous to
+`maxEvictionBatch`. It's value should be lower to decrease contention on hot items.
+
diff --git a/cachelib/allocator/BackgroundMover-inl.h b/cachelib/allocator/BackgroundMover-inl.h
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) Intel and its affiliates.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+namespace facebook {
+namespace cachelib {
+
+template <typename CacheT>
+BackgroundMover<CacheT>::BackgroundMover(
+    Cache& cache,
+    std::shared_ptr<BackgroundMoverStrategy> strategy,
+    MoverDir direction)
+    : cache_(cache), strategy_(strategy), direction_(direction) {
+  if (direction_ == MoverDir::Evict) {
+    moverFunc = BackgroundMoverAPIWrapper<CacheT>::traverseAndEvictItems;
+
+  } else if (direction_ == MoverDir::Promote) {
+    moverFunc = BackgroundMoverAPIWrapper<CacheT>::traverseAndPromoteItems;
+  }
+}
+
+template <typename CacheT>
+BackgroundMover<CacheT>::~BackgroundMover() {
+  stop(std::chrono::seconds(0));
+}
+
+template <typename CacheT>
+void BackgroundMover<CacheT>::work() {
+  try {
+    checkAndRun();
+  } catch (const std::exception& ex) {
+    XLOGF(ERR, "BackgroundMover interrupted due to exception: {}", ex.what());
+  }
+}
+
+template <typename CacheT>
+void BackgroundMover<CacheT>::setAssignedMemory(
+    std::vector<MemoryDescriptorType>&& assignedMemory) {
+  XLOG(INFO, "Class assigned to background worker:");
+  for (auto [tid, pid, cid] : assignedMemory) {
+    XLOGF(INFO, "Tid: {}, Pid: {}, Cid: {}", tid, pid, cid);
+  }
+
+  mutex.lock_combine([this, &assignedMemory] {
+    this->assignedMemory_ = std::move(assignedMemory);
+  });
+}
+
+// Look for classes that exceed the target memory capacity
+// and return those for eviction
+template <typename CacheT>
+void BackgroundMover<CacheT>::checkAndRun() {
+  auto assignedMemory = mutex.lock_combine([this] { return assignedMemory_; });
+
+  unsigned int moves = 0;
+  std::set<ClassId> classes{};
+  auto batches = strategy_->calculateBatchSizes(cache_, assignedMemory);
+
+  for (size_t i = 0; i < batches.size(); i++) {
+    const auto [tid, pid, cid] = assignedMemory[i];
+    const auto batch = batches[i];
+
+    classes.insert(cid);
+    const auto& mpStats = cache_.getPoolByTid(pid, tid).getStats();
+
+    if (!batch) {
+      continue;
+    }
+
+    // try moving BATCH items from the class in order to reach free target
+    auto moved = moverFunc(cache_, tid, pid, cid, batch);
+    moves += moved;
+    moves_per_class_[tid][pid][cid] += moved;
+    totalBytesMoved.add(moved * mpStats.acStats.at(cid).allocSize);
+  }
+
+  numTraversals.inc();
+  numMovedItems.add(moves);
+  totalClasses.add(classes.size());
+}
+
+template <typename CacheT>
+BackgroundMoverStats BackgroundMover<CacheT>::getStats() const noexcept {
+  BackgroundMoverStats stats;
+  stats.numMovedItems = numMovedItems.get();
+  stats.runCount = numTraversals.get();
+  stats.totalBytesMoved = totalBytesMoved.get();
+  stats.totalClasses = totalClasses.get();
+
+  return stats;
+}
+
+template <typename CacheT>
+std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
+BackgroundMover<CacheT>::getClassStats() const noexcept {
+  return moves_per_class_;
+}
+
+} // namespace cachelib
+} // namespace facebook
diff --git a/cachelib/allocator/BackgroundMover.h b/cachelib/allocator/BackgroundMover.h
@@ -0,0 +1,103 @@
+/*
+ * Copyright (c) Intel and its affiliates.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#pragma once
+
+#include "cachelib/allocator/BackgroundMoverStrategy.h"
+#include "cachelib/allocator/CacheStats.h"
+#include "cachelib/common/AtomicCounter.h"
+#include "cachelib/common/PeriodicWorker.h"
+
+namespace facebook {
+namespace cachelib {
+
+// wrapper that exposes the private APIs of CacheType that are specifically
+// needed for the cache api
+template <typename C>
+struct BackgroundMoverAPIWrapper {
+  static size_t traverseAndEvictItems(C& cache,
+                                      unsigned int tid,
+                                      unsigned int pid,
+                                      unsigned int cid,
+                                      size_t batch) {
+    return cache.traverseAndEvictItems(tid, pid, cid, batch);
+  }
+
+  static size_t traverseAndPromoteItems(C& cache,
+                                        unsigned int tid,
+                                        unsigned int pid,
+                                        unsigned int cid,
+                                        size_t batch) {
+    return cache.traverseAndPromoteItems(tid, pid, cid, batch);
+  }
+};
+
+enum class MoverDir { Evict = 0, Promote };
+
+// Periodic worker that evicts items from tiers in batches
+// The primary aim is to reduce insertion times for new items in the
+// cache
+template <typename CacheT>
+class BackgroundMover : public PeriodicWorker {
+ public:
+  using Cache = CacheT;
+  // @param cache               the cache interface
+  // @param strategy            the stragey class that defines how objects are
+  // moved,
+  //                            (promoted vs. evicted and how much)
+  BackgroundMover(Cache& cache,
+                  std::shared_ptr<BackgroundMoverStrategy> strategy,
+                  MoverDir direction_);
+
+  ~BackgroundMover() override;
+
+  BackgroundMoverStats getStats() const noexcept;
+  std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
+  getClassStats() const noexcept;
+
+  void setAssignedMemory(
+      std::vector<MemoryDescriptorType>&& assignedMemory);
+
+ private:
+  std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
+      moves_per_class_;
+  // cache allocator's interface for evicting
+  using Item = typename Cache::Item;
+
+  Cache& cache_;
+  std::shared_ptr<BackgroundMoverStrategy> strategy_;
+  MoverDir direction_;
+
+  std::function<size_t(
+      Cache&, unsigned int, unsigned int, unsigned int, size_t)>
+      moverFunc;
+
+  // implements the actual logic of running the background evictor
+  void work() override final;
+  void checkAndRun();
+
+  AtomicCounter numMovedItems{0};
+  AtomicCounter numTraversals{0};
+  AtomicCounter totalClasses{0};
+  AtomicCounter totalBytesMoved{0};
+
+  std::vector<MemoryDescriptorType> assignedMemory_;
+  folly::DistributedMutex mutex;
+};
+} // namespace cachelib
+} // namespace facebook
+
+#include "cachelib/allocator/BackgroundMover-inl.h"
diff --git a/cachelib/allocator/BackgroundMoverStrategy.h b/cachelib/allocator/BackgroundMoverStrategy.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#pragma once
+
+#include "cachelib/allocator/Cache.h"
+
+
+namespace facebook {
+namespace cachelib {
+
+struct MemoryDescriptorType {
+    MemoryDescriptorType(TierId tid, PoolId pid, ClassId cid) : 
+        tid_(tid), pid_(pid), cid_(cid) {}
+    TierId tid_;
+    PoolId pid_;
+    ClassId cid_;
+};
+
+// Base class for background eviction strategy.
+class BackgroundMoverStrategy {
+ public:
+  virtual std::vector<size_t> calculateBatchSizes(
+      const CacheBase& cache,
+      std::vector<MemoryDescriptorType> acVec) = 0;
+};
+
+} // namespace cachelib
+} // namespace facebook
diff --git a/cachelib/allocator/CMakeLists.txt b/cachelib/allocator/CMakeLists.txt
@@ -35,6 +35,7 @@ add_library (cachelib_allocator
     CCacheManager.cpp
     ContainerTypes.cpp
     FreeMemStrategy.cpp
+    FreeThresholdStrategy.cpp
     HitsPerSlabStrategy.cpp
     LruTailAgeStrategy.cpp
     MarginalHitsOptimizeStrategy.cpp

diff --git a/cachelib/allocator/Cache.h b/cachelib/allocator/Cache.h
@@ -96,6 +96,12 @@ class CacheBase {
   //
   // @param poolId    The pool id to query
   virtual const MemoryPool& getPool(PoolId poolId) const = 0;
+
+  // Get the reference  to a memory pool using a tier id, for stats purposes
+  //
+  // @param poolId    The pool id to query
+  // @param tierId    The tier of the pool id
+  virtual const MemoryPool& getPoolByTid(PoolId poolId, TierId tid) const = 0;
 
   // Get Pool specific stats (regular pools). This includes stats from the
   // Memory Pool and also the cache.