Skip to content

Background data movement #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 21, 2022
90 changes: 90 additions & 0 deletions MultiTierDataMovement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Background Data Movement

In order to reduce the number of online evictions and support asynchronous
promotion - we have added two periodic workers to handle eviction and promotion.

The diagram below shows a simplified version of how the background evictor
thread (green) is integrated to the CacheLib architecture.

<p align="center">
<img width="640" height="360" alt="BackgroundEvictor" src="cachelib-background-evictor.png">
</p>

## Background Evictors

The background evictors scan each class to see if there are objects to move the next (lower)
tier using a given strategy. Here we document the parameters for the different
strategies and general parameters.

- `backgroundEvictorIntervalMilSec`: The interval that this thread runs for - by default
the background evictor threads will wake up every 10 ms to scan the AllocationClasses. Also,
the background evictor thread will be woken up everytime there is a failed allocation (from
a request handling thread) and the current percentage of free memory for the
AllocationClass is lower than `lowEvictionAcWatermark`. This may render the interval parameter
not as important when there are many allocations occuring from request handling threads.

- `evictorThreads`: The number of background evictors to run - each thread is a assigned
a set of AllocationClasses to scan and evict objects from. Currently, each thread gets
an equal number of classes to scan - but as object size distribution may be unequal - future
versions will attempt to balance the classes among threads. The range is 1 to number of AllocationClasses.
The default is 1.

- `maxEvictionBatch`: The number of objects to remove in a given eviction call. The
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
remove objects at a reasonable rate, too high and it might increase contention with user threads.

- `minEvictionBatch`: Minimum number of items to evict at any time (if there are any
candidates)

- `maxEvictionPromotionHotness`: Maximum candidates to consider for eviction. This is similar to `maxEvictionBatch`
but it specifies how many candidates will be taken into consideration, not the actual number of items to evict.
This option can be used to configure duration of critical section on LRU lock.


### FreeThresholdStrategy (default)

- `lowEvictionAcWatermark`: Triggers background eviction thread to run
when this percentage of the AllocationClass is free.
The default is `2.0`, to avoid wasting capacity we don't set this above `10.0`.

- `highEvictionAcWatermark`: Stop the evictions from an AllocationClass when this
percentage of the AllocationClass is free. The default is `5.0`, to avoid wasting capacity we
don't set this above `10`.


## Background Promoters

The background promoters scan each class to see if there are objects to move to a lower
tier using a given strategy. Here we document the parameters for the different
strategies and general parameters.

- `backgroundPromoterIntervalMilSec`: The interval that this thread runs for - by default
the background promoter threads will wake up every 10 ms to scan the AllocationClasses for
objects to promote.

- `promoterThreads`: The number of background promoters to run - each thread is a assigned
a set of AllocationClasses to scan and promote objects from. Currently, each thread gets
an equal number of classes to scan - but as object size distribution may be unequal - future
versions will attempt to balance the classes among threads. The range is `1` to number of AllocationClasses. The default is `1`.

- `maxProtmotionBatch`: The number of objects to promote in a given promotion call. The
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
remove objects at a reasonable rate, too high and it might increase contention with user threads.

- `minPromotionBatch`: Minimum number of items to promote at any time (if there are any
candidates)

- `numDuplicateElements`: This allows us to promote items that have existing handles (read-only) since
we won't need to modify the data when a user is done with the data. Therefore, for a short time
the data could reside in both tiers until it is evicted from its current tier. The default is to
not allow this (0). Setting the value to 100 will enable duplicate elements in tiers.

### Background Promotion Strategy (only one currently)

- `promotionAcWatermark`: Promote items if there is at least this
percent of free AllocationClasses. Promotion thread will attempt to move `maxPromotionBatch` number of objects
to that tier. The objects are chosen from the head of the LRU. The default is `4.0`.
This value should correlate with `lowEvictionAcWatermark`, `highEvictionAcWatermark`, `minAcAllocationWatermark`, `maxAcAllocationWatermark`.
- `maxPromotionBatch`: The number of objects to promote in batch during BG promotion. Analogous to
`maxEvictionBatch`. It's value should be lower to decrease contention on hot items.

112 changes: 112 additions & 0 deletions cachelib/allocator/BackgroundMover-inl.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
/*
* Copyright (c) Intel and its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

namespace facebook {
namespace cachelib {

template <typename CacheT>
BackgroundMover<CacheT>::BackgroundMover(
Cache& cache,
std::shared_ptr<BackgroundMoverStrategy> strategy,
MoverDir direction)
: cache_(cache), strategy_(strategy), direction_(direction) {
if (direction_ == MoverDir::Evict) {
moverFunc = BackgroundMoverAPIWrapper<CacheT>::traverseAndEvictItems;

} else if (direction_ == MoverDir::Promote) {
moverFunc = BackgroundMoverAPIWrapper<CacheT>::traverseAndPromoteItems;
}
}

template <typename CacheT>
BackgroundMover<CacheT>::~BackgroundMover() {
stop(std::chrono::seconds(0));
}

template <typename CacheT>
void BackgroundMover<CacheT>::work() {
try {
checkAndRun();
} catch (const std::exception& ex) {
XLOGF(ERR, "BackgroundMover interrupted due to exception: {}", ex.what());
}
}

template <typename CacheT>
void BackgroundMover<CacheT>::setAssignedMemory(
std::vector<MemoryDescriptorType>&& assignedMemory) {
XLOG(INFO, "Class assigned to background worker:");
for (auto [tid, pid, cid] : assignedMemory) {
XLOGF(INFO, "Tid: {}, Pid: {}, Cid: {}", tid, pid, cid);
}

mutex.lock_combine([this, &assignedMemory] {
this->assignedMemory_ = std::move(assignedMemory);
});
}

// Look for classes that exceed the target memory capacity
// and return those for eviction
template <typename CacheT>
void BackgroundMover<CacheT>::checkAndRun() {
auto assignedMemory = mutex.lock_combine([this] { return assignedMemory_; });

unsigned int moves = 0;
std::set<ClassId> classes{};
auto batches = strategy_->calculateBatchSizes(cache_, assignedMemory);

for (size_t i = 0; i < batches.size(); i++) {
const auto [tid, pid, cid] = assignedMemory[i];
const auto batch = batches[i];

classes.insert(cid);
const auto& mpStats = cache_.getPoolByTid(pid, tid).getStats();

if (!batch) {
continue;
}

// try moving BATCH items from the class in order to reach free target
auto moved = moverFunc(cache_, tid, pid, cid, batch);
moves += moved;
moves_per_class_[tid][pid][cid] += moved;
totalBytesMoved.add(moved * mpStats.acStats.at(cid).allocSize);
}

numTraversals.inc();
numMovedItems.add(moves);
totalClasses.add(classes.size());
}

template <typename CacheT>
BackgroundMoverStats BackgroundMover<CacheT>::getStats() const noexcept {
BackgroundMoverStats stats;
stats.numMovedItems = numMovedItems.get();
stats.runCount = numTraversals.get();
stats.totalBytesMoved = totalBytesMoved.get();
stats.totalClasses = totalClasses.get();

return stats;
}

template <typename CacheT>
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
BackgroundMover<CacheT>::getClassStats() const noexcept {
return moves_per_class_;
}

} // namespace cachelib
} // namespace facebook
103 changes: 103 additions & 0 deletions cachelib/allocator/BackgroundMover.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
/*
* Copyright (c) Intel and its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include "cachelib/allocator/BackgroundMoverStrategy.h"
#include "cachelib/allocator/CacheStats.h"
#include "cachelib/common/AtomicCounter.h"
#include "cachelib/common/PeriodicWorker.h"

namespace facebook {
namespace cachelib {

// wrapper that exposes the private APIs of CacheType that are specifically
// needed for the cache api
template <typename C>
struct BackgroundMoverAPIWrapper {
static size_t traverseAndEvictItems(C& cache,
unsigned int tid,
unsigned int pid,
unsigned int cid,
size_t batch) {
return cache.traverseAndEvictItems(tid, pid, cid, batch);
}

static size_t traverseAndPromoteItems(C& cache,
unsigned int tid,
unsigned int pid,
unsigned int cid,
size_t batch) {
return cache.traverseAndPromoteItems(tid, pid, cid, batch);
}
};

enum class MoverDir { Evict = 0, Promote };

// Periodic worker that evicts items from tiers in batches
// The primary aim is to reduce insertion times for new items in the
// cache
template <typename CacheT>
class BackgroundMover : public PeriodicWorker {
public:
using Cache = CacheT;
// @param cache the cache interface
// @param strategy the stragey class that defines how objects are
// moved,
// (promoted vs. evicted and how much)
BackgroundMover(Cache& cache,
std::shared_ptr<BackgroundMoverStrategy> strategy,
MoverDir direction_);

~BackgroundMover() override;

BackgroundMoverStats getStats() const noexcept;
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
getClassStats() const noexcept;

void setAssignedMemory(
std::vector<MemoryDescriptorType>&& assignedMemory);

private:
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
moves_per_class_;
// cache allocator's interface for evicting
using Item = typename Cache::Item;

Cache& cache_;
std::shared_ptr<BackgroundMoverStrategy> strategy_;
MoverDir direction_;

std::function<size_t(
Cache&, unsigned int, unsigned int, unsigned int, size_t)>
moverFunc;

// implements the actual logic of running the background evictor
void work() override final;
void checkAndRun();

AtomicCounter numMovedItems{0};
AtomicCounter numTraversals{0};
AtomicCounter totalClasses{0};
AtomicCounter totalBytesMoved{0};

std::vector<MemoryDescriptorType> assignedMemory_;
folly::DistributedMutex mutex;
};
} // namespace cachelib
} // namespace facebook

#include "cachelib/allocator/BackgroundMover-inl.h"
42 changes: 42 additions & 0 deletions cachelib/allocator/BackgroundMoverStrategy.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include "cachelib/allocator/Cache.h"


namespace facebook {
namespace cachelib {

struct MemoryDescriptorType {
MemoryDescriptorType(TierId tid, PoolId pid, ClassId cid) :
tid_(tid), pid_(pid), cid_(cid) {}
TierId tid_;
PoolId pid_;
ClassId cid_;
};

// Base class for background eviction strategy.
class BackgroundMoverStrategy {
public:
virtual std::vector<size_t> calculateBatchSizes(
const CacheBase& cache,
std::vector<MemoryDescriptorType> acVec) = 0;
};

} // namespace cachelib
} // namespace facebook
1 change: 1 addition & 0 deletions cachelib/allocator/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ add_library (cachelib_allocator
CCacheManager.cpp
ContainerTypes.cpp
FreeMemStrategy.cpp
FreeThresholdStrategy.cpp
HitsPerSlabStrategy.cpp
LruTailAgeStrategy.cpp
MarginalHitsOptimizeStrategy.cpp
Expand Down
6 changes: 6 additions & 0 deletions cachelib/allocator/Cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,12 @@ class CacheBase {
//
// @param poolId The pool id to query
virtual const MemoryPool& getPool(PoolId poolId) const = 0;

// Get the reference to a memory pool using a tier id, for stats purposes
//
// @param poolId The pool id to query
// @param tierId The tier of the pool id
virtual const MemoryPool& getPoolByTid(PoolId poolId, TierId tid) const = 0;

// Get Pool specific stats (regular pools). This includes stats from the
// Memory Pool and also the cache.
Expand Down
Loading