Skip to content

Commit e7b3709

Browse files
byrnedjvinser52
authored andcommitted
Background data movement (#20)
Background data movement using periodic workers. Attempts to evict/promote items per given thresholds for each class. These reduce p99 latency since there is a higher chance that an allocation slot is free in the tier we are allocating in. fix race in promotion where releaseBackToAllocator was being called before wakeUpWaiters. reinsert to mm container on failed promotion
1 parent 5a7db9d commit e7b3709

18 files changed

+646
-74
lines changed

MultiTierDataMovement.md

+90
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Background Data Movement
2+
3+
In order to reduce the number of online evictions and support asynchronous
4+
promotion - we have added two periodic workers to handle eviction and promotion.
5+
6+
The diagram below shows a simplified version of how the background evictor
7+
thread (green) is integrated to the CacheLib architecture.
8+
9+
<p align="center">
10+
<img width="640" height="360" alt="BackgroundEvictor" src="cachelib-background-evictor.png">
11+
</p>
12+
13+
## Background Evictors
14+
15+
The background evictors scan each class to see if there are objects to move the next (lower)
16+
tier using a given strategy. Here we document the parameters for the different
17+
strategies and general parameters.
18+
19+
- `backgroundEvictorIntervalMilSec`: The interval that this thread runs for - by default
20+
the background evictor threads will wake up every 10 ms to scan the AllocationClasses. Also,
21+
the background evictor thread will be woken up everytime there is a failed allocation (from
22+
a request handling thread) and the current percentage of free memory for the
23+
AllocationClass is lower than `lowEvictionAcWatermark`. This may render the interval parameter
24+
not as important when there are many allocations occuring from request handling threads.
25+
26+
- `evictorThreads`: The number of background evictors to run - each thread is a assigned
27+
a set of AllocationClasses to scan and evict objects from. Currently, each thread gets
28+
an equal number of classes to scan - but as object size distribution may be unequal - future
29+
versions will attempt to balance the classes among threads. The range is 1 to number of AllocationClasses.
30+
The default is 1.
31+
32+
- `maxEvictionBatch`: The number of objects to remove in a given eviction call. The
33+
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
34+
remove objects at a reasonable rate, too high and it might increase contention with user threads.
35+
36+
- `minEvictionBatch`: Minimum number of items to evict at any time (if there are any
37+
candidates)
38+
39+
- `maxEvictionPromotionHotness`: Maximum candidates to consider for eviction. This is similar to `maxEvictionBatch`
40+
but it specifies how many candidates will be taken into consideration, not the actual number of items to evict.
41+
This option can be used to configure duration of critical section on LRU lock.
42+
43+
44+
### FreeThresholdStrategy (default)
45+
46+
- `lowEvictionAcWatermark`: Triggers background eviction thread to run
47+
when this percentage of the AllocationClass is free.
48+
The default is `2.0`, to avoid wasting capacity we don't set this above `10.0`.
49+
50+
- `highEvictionAcWatermark`: Stop the evictions from an AllocationClass when this
51+
percentage of the AllocationClass is free. The default is `5.0`, to avoid wasting capacity we
52+
don't set this above `10`.
53+
54+
55+
## Background Promoters
56+
57+
The background promoters scan each class to see if there are objects to move to a lower
58+
tier using a given strategy. Here we document the parameters for the different
59+
strategies and general parameters.
60+
61+
- `backgroundPromoterIntervalMilSec`: The interval that this thread runs for - by default
62+
the background promoter threads will wake up every 10 ms to scan the AllocationClasses for
63+
objects to promote.
64+
65+
- `promoterThreads`: The number of background promoters to run - each thread is a assigned
66+
a set of AllocationClasses to scan and promote objects from. Currently, each thread gets
67+
an equal number of classes to scan - but as object size distribution may be unequal - future
68+
versions will attempt to balance the classes among threads. The range is `1` to number of AllocationClasses. The default is `1`.
69+
70+
- `maxProtmotionBatch`: The number of objects to promote in a given promotion call. The
71+
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
72+
remove objects at a reasonable rate, too high and it might increase contention with user threads.
73+
74+
- `minPromotionBatch`: Minimum number of items to promote at any time (if there are any
75+
candidates)
76+
77+
- `numDuplicateElements`: This allows us to promote items that have existing handles (read-only) since
78+
we won't need to modify the data when a user is done with the data. Therefore, for a short time
79+
the data could reside in both tiers until it is evicted from its current tier. The default is to
80+
not allow this (0). Setting the value to 100 will enable duplicate elements in tiers.
81+
82+
### Background Promotion Strategy (only one currently)
83+
84+
- `promotionAcWatermark`: Promote items if there is at least this
85+
percent of free AllocationClasses. Promotion thread will attempt to move `maxPromotionBatch` number of objects
86+
to that tier. The objects are chosen from the head of the LRU. The default is `4.0`.
87+
This value should correlate with `lowEvictionAcWatermark`, `highEvictionAcWatermark`, `minAcAllocationWatermark`, `maxAcAllocationWatermark`.
88+
- `maxPromotionBatch`: The number of objects to promote in batch during BG promotion. Analogous to
89+
`maxEvictionBatch`. It's value should be lower to decrease contention on hot items.
90+

cachelib/allocator/BackgroundMover-inl.h

+4
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ void BackgroundMover<CacheT>::checkAndRun() {
6565
auto assignedMemory = mutex_.lock_combine([this] { return assignedMemory_; });
6666

6767
unsigned int moves = 0;
68+
std::set<ClassId> classes{};
6869
auto batches = strategy_->calculateBatchSizes(cache_, assignedMemory);
6970

7071
for (size_t i = 0; i < batches.size(); i++) {
@@ -74,6 +75,7 @@ void BackgroundMover<CacheT>::checkAndRun() {
7475
if (batch == 0) {
7576
continue;
7677
}
78+
classes.insert(cid);
7779
const auto& mpStats = cache_.getPoolByTid(pid, tid).getStats();
7880
// try moving BATCH items from the class in order to reach free target
7981
auto moved = moverFunc(cache_, tid, pid, cid, batch);
@@ -84,6 +86,7 @@ void BackgroundMover<CacheT>::checkAndRun() {
8486

8587
numTraversals_.inc();
8688
numMovedItems_.add(moves);
89+
totalClasses_.add(classes.size());
8790
}
8891

8992
template <typename CacheT>
@@ -92,6 +95,7 @@ BackgroundMoverStats BackgroundMover<CacheT>::getStats() const noexcept {
9295
stats.numMovedItems = numMovedItems_.get();
9396
stats.runCount = numTraversals_.get();
9497
stats.totalBytesMoved = totalBytesMoved_.get();
98+
stats.totalClasses = totalClasses_.get();
9599

96100
return stats;
97101
}

cachelib/allocator/BackgroundMover.h

+1
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ class BackgroundMover : public PeriodicWorker {
9393

9494
AtomicCounter numMovedItems_{0};
9595
AtomicCounter numTraversals_{0};
96+
AtomicCounter totalClasses_{0};
9697
AtomicCounter totalBytesMoved_{0};
9798

9899
std::vector<MemoryDescriptorType> assignedMemory_;

cachelib/allocator/CacheAllocator-inl.h

+44-2
Original file line numberDiff line numberDiff line change
@@ -381,7 +381,8 @@ CacheAllocator<CacheTrait>::allocate(PoolId poolId,
381381
}
382382

383383
template <typename CacheTrait>
384-
bool CacheAllocator<CacheTrait>::shouldWakeupBgEvictor(PoolId /* pid */,
384+
bool CacheAllocator<CacheTrait>::shouldWakeupBgEvictor(TierId tid,
385+
PoolId /* pid */,
385386
ClassId /* cid */) {
386387
return false;
387388
}
@@ -413,7 +414,7 @@ CacheAllocator<CacheTrait>::allocateInternalTier(TierId tid,
413414
void* memory = allocator_[tid]->allocate(pid, requiredSize);
414415

415416
if (backgroundEvictor_.size() && !fromBgThread &&
416-
(memory == nullptr || shouldWakeupBgEvictor(pid, cid))) {
417+
(memory == nullptr || shouldWakeupBgEvictor(tid, pid, cid))) {
417418
backgroundEvictor_[BackgroundMover<CacheT>::workerId(
418419
tid, pid, cid, backgroundEvictor_.size())]
419420
->wakeUp();
@@ -1651,6 +1652,47 @@ CacheAllocator<CacheTrait>::tryEvictToNextMemoryTier(Item& item, bool fromBgThre
16511652
return tryEvictToNextMemoryTier(tid, pid, item, fromBgThread);
16521653
}
16531654

1655+
template <typename CacheTrait>
1656+
typename CacheAllocator<CacheTrait>::WriteHandle
1657+
CacheAllocator<CacheTrait>::tryPromoteToNextMemoryTier(
1658+
TierId tid, PoolId pid, Item& item, bool fromBgThread) {
1659+
if(item.isExpired()) { return {}; }
1660+
TierId nextTier = tid;
1661+
while (nextTier > 0) { // try to evict down to the next memory tiers
1662+
auto toPromoteTier = nextTier - 1;
1663+
--nextTier;
1664+
1665+
// allocateInternal might trigger another eviction
1666+
auto newItemHdl = allocateInternalTier(toPromoteTier, pid,
1667+
item.getKey(),
1668+
item.getSize(),
1669+
item.getCreationTime(),
1670+
item.getExpiryTime(),
1671+
fromBgThread);
1672+
1673+
if (newItemHdl) {
1674+
XDCHECK_EQ(newItemHdl->getSize(), item.getSize());
1675+
if (!moveRegularItem(item, newItemHdl)) {
1676+
return WriteHandle{};
1677+
}
1678+
item.unmarkMoving();
1679+
return newItemHdl;
1680+
} else {
1681+
return WriteHandle{};
1682+
}
1683+
}
1684+
1685+
return {};
1686+
}
1687+
1688+
template <typename CacheTrait>
1689+
typename CacheAllocator<CacheTrait>::WriteHandle
1690+
CacheAllocator<CacheTrait>::tryPromoteToNextMemoryTier(Item& item, bool fromBgThread) {
1691+
auto tid = getTierId(item);
1692+
auto pid = allocator_[tid]->getAllocInfo(item.getMemory()).poolId;
1693+
return tryPromoteToNextMemoryTier(tid, pid, item, fromBgThread);
1694+
}
1695+
16541696
template <typename CacheTrait>
16551697
typename CacheAllocator<CacheTrait>::RemoveRes
16561698
CacheAllocator<CacheTrait>::remove(typename Item::Key key) {

0 commit comments

Comments
 (0)