Skip to content

Commit a2721d1

Browse files
igchorbyrnedj
andcommitted
Implement background promotion and eviction
and add additional parameters to control allocation and eviction of items. Co-authored-by: Daniel Byrne <[email protected]>
1 parent acdfa0b commit a2721d1

35 files changed

+1837
-58
lines changed

Diff for: MultiTierDataMovement.md

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Background Data Movement
2+
3+
In order to reduce the number of online evictions and support asynchronous
4+
promotion - we have added two periodic workers to handle eviction and promotion.
5+
6+
The diagram below shows a simplified version of how the background evictor
7+
thread (green) is integrated to the CacheLib architecture.
8+
9+
<p align="center">
10+
<img width="640" height="360" alt="BackgroundEvictor" src="cachelib-background-evictor.png">
11+
</p>
12+
13+
## Synchronous Eviction and Promotion
14+
15+
- `disableEvictionToMemory`: Disables eviction to memory (item is always evicted to NVMe or removed
16+
on eviction)
17+
18+
## Background Evictors
19+
20+
The background evictors scan each class to see if there are objects to move the next (lower)
21+
tier using a given strategy. Here we document the parameters for the different
22+
strategies and general parameters.
23+
24+
- `backgroundEvictorIntervalMilSec`: The interval that this thread runs for - by default
25+
the background evictor threads will wake up every 10 ms to scan the AllocationClasses. Also,
26+
the background evictor thead will be woken up everytime there is a failed allocation (from
27+
a request handling thread) and the current percentage of free memory for the
28+
AllocationClass is lower than `lowEvictionAcWatermark`. This may render the interval parameter
29+
not as important when there are many allocations occuring from request handling threads.
30+
31+
- `evictorThreads`: The number of background evictors to run - each thread is a assigned
32+
a set of AllocationClasses to scan and evict objects from. Currently, each thread gets
33+
an equal number of classes to scan - but as object size distribution may be unequal - future
34+
versions will attempt to balance the classes among threads. The range is 1 to number of AllocationClasses.
35+
The default is 1.
36+
37+
- `maxEvictionBatch`: The number of objects to remove in a given eviction call. The
38+
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
39+
remove objects at a reasonable rate, too high and it might increase contention with user threads.
40+
41+
- `minEvictionBatch`: Minimum number of items to evict at any time (if there are any
42+
candidates)
43+
44+
- `maxEvictionPromotionHotness`: Maximum candidates to consider for eviction. This is similar to `maxEvictionBatch`
45+
but it specifies how many candidates will be taken into consideration, not the actual number of items to evict.
46+
This option can be used to configure duration of critical section on LRU lock.
47+
48+
49+
### FreeThresholdStrategy (default)
50+
51+
- `lowEvictionAcWatermark`: Triggers background eviction thread to run
52+
when this percentage of the AllocationClass is free.
53+
The default is `2.0`, to avoid wasting capacity we don't set this above `10.0`.
54+
55+
- `highEvictionAcWatermark`: Stop the evictions from an AllocationClass when this
56+
percentage of the AllocationClass is free. The default is `5.0`, to avoid wasting capacity we
57+
don't set this above `10`.
58+
59+
60+
## Background Promoters
61+
62+
The background promotes scan each class to see if there are objects to move to a lower
63+
tier using a given strategy. Here we document the parameters for the different
64+
strategies and general parameters.
65+
66+
- `backgroundPromoterIntervalMilSec`: The interval that this thread runs for - by default
67+
the background promoter threads will wake up every 10 ms to scan the AllocationClasses for
68+
objects to promote.
69+
70+
- `promoterThreads`: The number of background promoters to run - each thread is a assigned
71+
a set of AllocationClasses to scan and promote objects from. Currently, each thread gets
72+
an equal number of classes to scan - but as object size distribution may be unequal - future
73+
versions will attempt to balance the classes among threads. The range is `1` to number of AllocationClasses. The default is `1`.
74+
75+
- `maxProtmotionBatch`: The number of objects to promote in a given promotion call. The
76+
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
77+
remove objects at a reasonable rate, too high and it might increase contention with user threads.
78+
79+
- `minPromotionBatch`: Minimum number of items to promote at any time (if there are any
80+
candidates)
81+
82+
- `numDuplicateElements`: This allows us to promote items that have existing handles (read-only) since
83+
we won't need to modify the data when a user is done with the data. Therefore, for a short time
84+
the data could reside in both tiers until it is evicted from its current tier. The default is to
85+
not allow this (0). Setting the value to 100 will enable duplicate elements in tiers.
86+
87+
### Background Promotion Strategy (only one currently)
88+
89+
- `promotionAcWatermark`: Promote items if there is at least this
90+
percent of free AllocationClasses. Promotion thread will attempt to move `maxPromotionBatch` number of objects
91+
to that tier. The objects are chosen from the head of the LRU. The default is `4.0`.
92+
This value should correlate with `lowEvictionAcWatermark`, `highEvictionAcWatermark`, `minAcAllocationWatermark`, `maxAcAllocationWatermark`.
93+
- `maxPromotionBatch`: The number of objects to promote in batch during BG promotion. Analogous to
94+
`maxEvictionBatch`. It's value should be lower to decrease contention on hot items.
95+
96+
## Allocation policies
97+
98+
- `maxAcAllocationWatermark`: Item is always allocated in topmost tier if at least this
99+
percentage of the AllocationClass is free.
100+
- `minAcAllocationWatermark`: Item is always allocated in bottom tier if only this percent
101+
of the AllocationClass is free. If percentage of free AllocationClasses is between `maxAcAllocationWatermark`
102+
and `minAcAllocationWatermark`: then extra checks (described below) are performed to decide where to put the element.
103+
104+
By default, allocation will always be performed from the upper tier.
105+
106+
- `acTopTierEvictionWatermark`: If there is less that this percent of free memory in topmost tier, cachelib will attempt to evict from top tier. This option takes precedence before allocationWatermarks.
107+
108+
### Extra policies (used only when percentage of free AllocationClasses is between `maxAcAllocationWatermark`
109+
and `minAcAllocationWatermark`)
110+
- `sizeThresholdPolicy`: If item is smaller than this value, always allocate it in upper tier.
111+
- `defaultTierChancePercentage`: Change (0-100%) of allocating item in top tier
112+
113+
## MMContainer options
114+
115+
- `lruInsertionPointSpec`: Can be set per tier when LRU2Q is used. Determines where new items are
116+
inserted. 0 = insert to hot queue, 1 = insert to warm queue, 2 = insert to cold queue
117+
- `markUsefulChance`: Per-tier, determines chance of moving item to the head of LRU on access

Diff for: cachelib-background-evictor.png

54.9 KB
Loading

Diff for: cachelib/allocator/BackgroundEvictor-inl.h

+110
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
/*
2+
* Copyright (c) Intel and its affiliates.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
namespace facebook {
18+
namespace cachelib {
19+
20+
21+
template <typename CacheT>
22+
BackgroundEvictor<CacheT>::BackgroundEvictor(Cache& cache,
23+
std::shared_ptr<BackgroundEvictorStrategy> strategy)
24+
: cache_(cache),
25+
strategy_(strategy)
26+
{
27+
}
28+
29+
template <typename CacheT>
30+
BackgroundEvictor<CacheT>::~BackgroundEvictor() { stop(std::chrono::seconds(0)); }
31+
32+
template <typename CacheT>
33+
void BackgroundEvictor<CacheT>::work() {
34+
try {
35+
checkAndRun();
36+
} catch (const std::exception& ex) {
37+
XLOGF(ERR, "BackgroundEvictor interrupted due to exception: {}", ex.what());
38+
}
39+
}
40+
41+
template <typename CacheT>
42+
void BackgroundEvictor<CacheT>::setAssignedMemory(std::vector<std::tuple<TierId, PoolId, ClassId>> &&assignedMemory)
43+
{
44+
XLOG(INFO, "Class assigned to background worker:");
45+
for (auto [tid, pid, cid] : assignedMemory) {
46+
XLOGF(INFO, "Tid: {}, Pid: {}, Cid: {}", tid, pid, cid);
47+
}
48+
49+
mutex.lock_combine([this, &assignedMemory]{
50+
this->assignedMemory_ = std::move(assignedMemory);
51+
});
52+
}
53+
54+
// Look for classes that exceed the target memory capacity
55+
// and return those for eviction
56+
template <typename CacheT>
57+
void BackgroundEvictor<CacheT>::checkAndRun() {
58+
auto assignedMemory = mutex.lock_combine([this]{
59+
return assignedMemory_;
60+
});
61+
62+
unsigned int evictions = 0;
63+
std::set<ClassId> classes{};
64+
auto batches = strategy_->calculateBatchSizes(cache_,assignedMemory);
65+
66+
for (size_t i = 0; i < batches.size(); i++) {
67+
const auto [tid, pid, cid] = assignedMemory[i];
68+
const auto batch = batches[i];
69+
70+
classes.insert(cid);
71+
const auto& mpStats = cache_.getPoolByTid(pid,tid).getStats();
72+
73+
if (!batch) {
74+
continue;
75+
}
76+
77+
stats.evictionSize.add(batch * mpStats.acStats.at(cid).allocSize);
78+
79+
//try evicting BATCH items from the class in order to reach free target
80+
auto evicted =
81+
BackgroundEvictorAPIWrapper<CacheT>::traverseAndEvictItems(cache_,
82+
tid,pid,cid,batch);
83+
evictions += evicted;
84+
evictions_per_class_[tid][pid][cid] += evicted;
85+
}
86+
87+
stats.numTraversals.inc();
88+
stats.numEvictedItems.add(evictions);
89+
stats.totalClasses.add(classes.size());
90+
}
91+
92+
template <typename CacheT>
93+
BackgroundEvictionStats BackgroundEvictor<CacheT>::getStats() const noexcept {
94+
BackgroundEvictionStats evicStats;
95+
evicStats.numEvictedItems = stats.numEvictedItems.get();
96+
evicStats.runCount = stats.numTraversals.get();
97+
evicStats.evictionSize = stats.evictionSize.get();
98+
evicStats.totalClasses = stats.totalClasses.get();
99+
100+
return evicStats;
101+
}
102+
103+
template <typename CacheT>
104+
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
105+
BackgroundEvictor<CacheT>::getClassStats() const noexcept {
106+
return evictions_per_class_;
107+
}
108+
109+
} // namespace cachelib
110+
} // namespace facebook

Diff for: cachelib/allocator/BackgroundEvictor.h

+99
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
/*
2+
* Copyright (c) Intel and its affiliates.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
#pragma once
18+
19+
#include <gtest/gtest_prod.h>
20+
#include <folly/concurrency/UnboundedQueue.h>
21+
22+
#include "cachelib/allocator/CacheStats.h"
23+
#include "cachelib/common/PeriodicWorker.h"
24+
#include "cachelib/allocator/BackgroundEvictorStrategy.h"
25+
#include "cachelib/common/AtomicCounter.h"
26+
27+
28+
namespace facebook {
29+
namespace cachelib {
30+
31+
// wrapper that exposes the private APIs of CacheType that are specifically
32+
// needed for the eviction.
33+
template <typename C>
34+
struct BackgroundEvictorAPIWrapper {
35+
36+
static size_t traverseAndEvictItems(C& cache,
37+
unsigned int tid, unsigned int pid, unsigned int cid, size_t batch) {
38+
return cache.traverseAndEvictItems(tid,pid,cid,batch);
39+
}
40+
};
41+
42+
struct BackgroundEvictorStats {
43+
// items evicted
44+
AtomicCounter numEvictedItems{0};
45+
46+
// traversals
47+
AtomicCounter numTraversals{0};
48+
49+
// total class size
50+
AtomicCounter totalClasses{0};
51+
52+
// item eviction size
53+
AtomicCounter evictionSize{0};
54+
};
55+
56+
// Periodic worker that evicts items from tiers in batches
57+
// The primary aim is to reduce insertion times for new items in the
58+
// cache
59+
template <typename CacheT>
60+
class BackgroundEvictor : public PeriodicWorker {
61+
public:
62+
using Cache = CacheT;
63+
// @param cache the cache interface
64+
// @param target_free the target amount of memory to keep free in
65+
// this tier
66+
// @param tier id memory tier to perform eviction on
67+
BackgroundEvictor(Cache& cache,
68+
std::shared_ptr<BackgroundEvictorStrategy> strategy);
69+
70+
~BackgroundEvictor() override;
71+
72+
BackgroundEvictionStats getStats() const noexcept;
73+
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>> getClassStats() const noexcept;
74+
75+
void setAssignedMemory(std::vector<std::tuple<TierId, PoolId, ClassId>> &&assignedMemory);
76+
77+
private:
78+
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>> evictions_per_class_;
79+
80+
// cache allocator's interface for evicting
81+
82+
using Item = typename Cache::Item;
83+
84+
Cache& cache_;
85+
std::shared_ptr<BackgroundEvictorStrategy> strategy_;
86+
87+
// implements the actual logic of running the background evictor
88+
void work() override final;
89+
void checkAndRun();
90+
91+
BackgroundEvictorStats stats;
92+
93+
std::vector<std::tuple<TierId, PoolId, ClassId>> assignedMemory_;
94+
folly::DistributedMutex mutex;
95+
};
96+
} // namespace cachelib
97+
} // namespace facebook
98+
99+
#include "cachelib/allocator/BackgroundEvictor-inl.h"

Diff for: cachelib/allocator/BackgroundEvictorStrategy.h

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
/*
2+
* Copyright (c) Facebook, Inc. and its affiliates.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
#pragma once
18+
19+
#include "cachelib/allocator/Cache.h"
20+
21+
namespace facebook {
22+
namespace cachelib {
23+
24+
// Base class for background eviction strategy.
25+
class BackgroundEvictorStrategy {
26+
27+
public:
28+
virtual std::vector<size_t> calculateBatchSizes(const CacheBase& cache,
29+
std::vector<std::tuple<TierId, PoolId, ClassId>> acVec) = 0;
30+
};
31+
32+
} // namespace cachelib
33+
} // namespace facebook

0 commit comments

Comments
 (0)