Admission policies + some extra stats for cachebench #10

byrnedj · 2022-08-25T01:48:09Z

This is the first part of what was formerly the background data movement + tier admission PR

This change is

vinser52

Reviewed 1 of 24 files at r1.
Reviewable status: 1 of 24 files reviewed, 13 unresolved discussions (waiting on @byrnedj and @vinser52)

cachelib/allocator/Cache.h line 100 at r1 (raw file):

  //
  // @param poolId    The pool id to query
  // @param tierId    The tier of the pool id

Should we extend the comment for the method above (that takes only poolId) that it returns a reference to the top-most tier memory pool?

cachelib/allocator/CacheAllocator.h line 39 at r1 (raw file):

#include <folly/Range.h>
#pragma GCC diagnostic pop

I would keep empty line to avoid merge conflicts in the future.

cachelib/allocator/CacheAllocator-inl.h line 376 at r1 (raw file):

                                             uint32_t creationTime,
                                             uint32_t expiryTime,
                                             bool fromEvictorThread) {

Where the fromEvictorThread variable is used?

cachelib/allocator/CacheAllocator-inl.h line 393 at r1 (raw file):

  void *memory = nullptr;

  if (tid == 0 && config_.acTopTierEvictionWatermark > 0.0

Since we are using some heuristics here, I think a comment is needed which describes the logic.

Why do we try to findEviction first? Even if we have less free space than threshhold I think it is better to try to allocate in this tier rather than having an eviction on the hot path. Or I do not understand the logic. So a comment is really desirable.

cachelib/allocator/CacheAllocator-inl.h line 476 at r1 (raw file):

  if (freePercentage <= config_.minAcAllocationWatermark)
    return defaultTargetTier + 1;

But it might have even less free space... I thought freePercentage is a parameter that BG evictor monitor to decided whether it should evict something from this tier to reach target freePercentage, isn't it?

cachelib/allocator/CacheAllocator-inl.h line 478 at r1 (raw file):

    return defaultTargetTier + 1;

  // TODO: we can even think about creating different allocation classes for PMEM

We should avoid PMEM term.

cachelib/allocator/CacheAllocator-inl.h line 500 at r1 (raw file):

  // TODO: only works for 2 tiers
  return (folly::Random::rand32() % 100) < config_.defaultTierChancePercentage ? defaultTargetTier : defaultTargetTier + 1;

Not sure about folly implementation but frequently rand function has an internal lock and having it on a hot path might brake concurrency

cachelib/allocator/CacheAllocator-inl.h line 1683 at r1 (raw file):

    TierId sourceTierId, TierId targetTierId, PoolId pid, Item& item)
{
  if (config_.disableEvictionToMemory)

it could be return !(config_.disableEvictionToMemory) to avoid branch.

cachelib/allocator/CacheAllocatorConfig.h line 594 at r1 (raw file):

  double acTopTierEvictionWatermark{0.0}; // TODO: make it per TIER?
  uint64_t sizeThresholdPolicy{0};   
  double defaultTierChancePercentage{50.0};

I think new config option should somehow described in the comments.

cachelib/allocator/MemoryTierCacheConfig.h line 72 at r1 (raw file):

  }

    // TODO: move it to MMContainer config

Looks like we need to address this TODO comment. Not sure why it should be per tier.

cachelib/allocator/memory/MemoryAllocator.h line 652 at r1 (raw file):

  }

 // TODO:

what is it?

cachelib/allocator/memory/MemoryAllocatorStats.h line 58 at r1 (raw file):

  }

  constexpr size_t getTotalMemory() const noexcept {

should it be renamed to getTotalAllocMemory?

cachelib/allocator/memory/MemoryPool.h line 311 at r1 (raw file):

  void setNumSlabsAdvised(uint64_t value) { curSlabsAdvised_ = value; }

 // TODO:

Again, we broken incapsulation. Not sure that we have to merge some hacks to develop branch.

vinser52

Reviewable status: 1 of 24 files reviewed, 15 unresolved discussions (waiting on @byrnedj and @vinser52)

cachelib/allocator/tests/AllocatorMemoryTiersTest.cpp.orig line 1 at r1 (raw file):

/*

why are these files part of PR?

cachelib/allocator/tests/AllocatorMemoryTiersTest.h.orig line 1 at r1 (raw file):

/*

why are these files part of PR?

It's implementation is mostly based on PosixShmSegment. Also, extend ShmManager and ShmSegmentOpts to support this new segment type.

After introducing file segment type, nameToKey_ does not provide enough information to recover/remove segments on restart. This commit fixes that by replacing nameToKey_ with nameToOpts_. Previously, the Key from nameToKey_ map was only used in a single DCHECK().

* New class MemoryTierCacheConfig allows to configure a memory tier. Setting tier size and location of a file for file-backed memory are supported in this initial implementation; * New member, vector of memory tiers, is added to class CacheAllocatorConfig. * New test suite, chelib/allocator/tests/MemoryTiersTest.cpp, demonstrates the usage of and tests extended config API.

to allow using new configureMemoryTiers() API with legacy behavior. Move validation code for memory tiers to validate() method and convert ratios to sizes lazily (on get)..

It wrongly assumed that the only possible segment type is PosixSysV segment.

Now it's size is 8 bytes intead of 4. Original CompressedPtr stored only some offset with a memory Allocator. For multi-tier implementation, this is not enough. We must also store tierId and when uncompressing, select a proper allocator. An alternative could be to just resign from CompressedPtr but they are leveraged to allow the cache to be mapped to different addresses on shared memory. Changing CompressedPtr impacted CacheItem size - it increased from 32 to 44 bytes.

Without this fix removeCb called even in case when Item is moved between tiers.

It fails because CentOS is EOL. We might want to consider using CentOS Streams but for now, just remove it. Right now, we rely on build-cachelib-centos workflow anyway.

Disabled test suite allocator-test-AllocatorTypeTest to skip sporadically failing tests.

…m#43) Compensation results in ratios being different than originially specified.

Return a sum of sizes of each tier instead of just 1st tier's size.

Fix deprecation warn

Initial support of NUMA bindings

The issue was caused by incorrect behaviour of the CacheAllocator<CacheTrait>::tryEvictToNextMemoryTier method in case the evicted item is expired. We cannot simply return a handle to it, but we need to remove it from the access container and MM container.

Fix ReaperSkippingSlabTraversalWhileSlabReleasing test

item movement test updated class in test use transparent sync for item movement remove extra whitespace updates per comments updates per comments (still outstanding on the todos)

byrnedj · 2022-09-20T17:13:32Z

I rebased the PR, the outstanding issues from the review to address are:

cachelib/allocator/MemoryTierCacheConfig.h line 72 at r1 (raw file):

}

// TODO: move it to MMContainer config

Looks like we need to address this TODO comment. Not sure why it should be per tier.

cachelib/allocator/memory/MemoryAllocator.h line 652 at r1 (raw file):

}

// TODO:
what is it?

cachelib/allocator/memory/MemoryAllocatorStats.h line 58 at r1 (raw file):

}

constexpr size_t getTotalMemory() const noexcept {
should it be renamed to getTotalAllocMemory?

cachelib/allocator/memory/MemoryPool.h line 311 at r1 (raw file):

void setNumSlabsAdvised(uint64_t value) { curSlabsAdvised_ = value; }

// TODO:
Again, we broken incapsulation. Not sure that we have to merge some hacks to develop branch.

igchor

Reviewable status: 0 of 24 files reviewed, 16 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)

cachelib/allocator/CacheAllocator-inl.h line 393 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Since we are using some heuristics here, I think a comment is needed which describes the logic.

Why do we try to findEviction first? Even if we have less free space than threshhold I think it is better to try to allocate in this tier rather than having an eviction on the hot path. Or I do not understand the logic. So a comment is really desirable.

Yes, by default we should allocate in this tier (hence acTopTierEvictionWatermark is set to 0 in config). However, this condition allows you to change this behavior if you want (which can allow us to test performance of that).

cachelib/allocator/CacheAllocator-inl.h line 476 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

But it might have even less free space... I thought freePercentage is a parameter that BG evictor monitor to decided whether it should evict something from this tier to reach target freePercentage, isn't it?

Yes, BG monitors it, but we can still use it for comparison here. What do yoy mean by "it might have even less free space"?

cachelib/allocator/CacheAllocator-inl.h line 2508 at r2 (raw file):

            : 0);
    for (TierId tid = 0; tid < getNumTiers(); tid++) {
      if constexpr (std::is_same_v<MMConfig, MMLru::Config> || std::is_same_v<MMConfig, MM2Q::Config>) {

This should probably be fixed (this if was just a temp workaround).
The best way would probably be to move those 2 params to specific MMContainer config. Do we have support for setting MMCOntainer configs from JSON currently? And will it work for multi tier?

cachelib/allocator/memory/MemoryAllocator.h line 652 at r1 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

}

// TODO:
what is it?

private should be uncommented - if anything breaks, you need to add some extra function to the MemoryAllcoator

cachelib/allocator/memory/MemoryPool.h line 311 at r1 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

void setNumSlabsAdvised(uint64_t value) { curSlabsAdvised_ = value; }

// TODO:
Again, we broken incapsulation. Not sure that we have to merge some hacks to develop branch.

Agree, private should be uncommented - if anything breaks, you need to add some extra function to the MemoryPool

byrnedj · 2022-09-21T21:34:04Z

cachelib/allocator/CacheAllocator-inl.h line 2508 at r2 (raw file):
            : 0);
    for (TierId tid = 0; tid < getNumTiers(); tid++) {
      if constexpr (std::is_same_v<MMConfig, MMLru::Config> || std::is_same_v<MMConfig, MM2Q::Config>) {
This should probably be fixed (this if was just a temp workaround). The best way would probably be to move those 2 params to specific MMContainer config. Do we have support for setting MMCOntainer configs from JSON currently? And will it work for multi tier?

For this it is not trivial to add per tier mmContainer configs. Partly because in cachebench/Cache-inl.h, we have the following initialization:

    const size_t numBytes = cache_->getCacheMemoryStats().cacheSize;
    for (uint64_t i = 0; i < config_.numPools; ++i) {
      const double& ratio = config_.poolSizes[i];
      const size_t poolSize = static_cast<size_t>(numBytes * ratio);
      typename Allocator::MMConfig mmConfig =
          makeMMConfig<typename Allocator::MMConfig>(config_);
      const PoolId pid = cache_->addPool(
          folly::sformat("pool_{}", i), poolSize, {} /* allocSizes */, mmConfig,
          nullptr /* rebalanceStrategy */, nullptr /* resizeStrategy */,
          true /* ensureSufficientMem */);
      pools_.push_back(pid);
    }

The call to addPool initializes a pool that is split among tiers, so mmConfig is done per pool rather than per tier. For now I would suggest that we either:

Accept the current implementation or
Drop it from this PR and create a new PR that fully supports per-tier mmContainer creation (allowing us to have different allocators per tier too).

I have addressed the rest of the comments in the latest commit.

vinser52

Reviewed 2 of 23 files at r2.
Reviewable status: 2 of 24 files reviewed, 16 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)

cachelib/allocator/CacheAllocator.h line 1291 at r3 (raw file):

  CacheAllocator(InitMemType types, Config config);
  
  TierId getTargetTierForItem(PoolId pid, typename Item::Key key,

I think we need a comment as for other methods in this class to desccribe the logic of this method.

cachelib/allocator/CacheAllocator.h line 1686 at r3 (raw file):

  WriteHandle tryEvictToNextMemoryTier(Item& item);

  bool shouldEvictToNextMemoryTier(TierId sourceTierId,

The same as previous, the comment is needed.

cachelib/allocator/CacheAllocator-inl.h line 393 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Yes, by default we should allocate in this tier (hence acTopTierEvictionWatermark is set to 0 in config). However, this condition allows you to change this behavior if you want (which can allow us to test performance of that).

Ah, got it. Than some comment is nice to have to describe this logic.

BTW, do we have cases when this option improves performance somehow?

cachelib/allocator/CacheAllocator-inl.h line 476 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Yes, BG monitors it, but we can still use it for comparison here. What do yoy mean by "it might have even less free space"?

As I understood, if default tier has no enough space than we return next tier. But what if next tier has even less free space?
Another question: will such logic works for cases with more than two tiers? I understand that today we are limiting number of tiers to 2. But what if the constant will be changed?

cachelib/allocator/CacheAllocator-inl.h line 528 at r3 (raw file):

                                             uint32_t expiryTime) {
  auto tid = getTargetTierForItem(pid, key, size, creationTime, expiryTime);
  return allocateInternalTier(tid, pid, key, size, creationTime, expiryTime);

Is it safe to remove the loop? I can imagine situation when getTargetTierForItem return tier 0. After that concurrent threads fills the tier 0 and make it full. After that our thread will be unable to allocate in tier 0 and return empty handle. I understand it is crazy situation, but what if?

cachelib/allocator/CacheAllocator-inl.h line 2566 at r3 (raw file):

  // TODO - get rid of the duplication - right now, each tier
  // holds pool objects with mostly the same info
  return filterCompactCachePools(allocator_[currentTier()]->getPoolIds());

@igchor should it be really currentTier() instead of always 0? As I remember you added currentTier() to be used for interaction with BG workers. In this case set of pools is the same for each tier (we have duplication) and we can always work with tier 0, isn’t it?

cachelib/allocator/MemoryTierCacheConfig.h line 72 at r1 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

}
// TODO: move it to MMContainer config
Looks like we need to address this TODO comment. Not sure why it should be per tier.

Not sure that I understand, but we have MMContainers per tier but config for MMcontainers can be the same for each tier. Is my assumption correct?

igchor

Reviewable status: 2 of 25 files reviewed, 16 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)

cachelib/allocator/CacheAllocator-inl.h line 476 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

As I understood, if default tier has no enough space than we return next tier. But what if next tier has even less free space?
Another question: will such logic works for cases with more than two tiers? I understand that today we are limiting number of tiers to 2. But what if the constant will be changed?

Yes, it might happen that next tier has even less space but at least this option gives you choice - right now we always allocate in topmost (and with this patch, default setting is the same - i.e. maxAcAllocationWatermark is 0). In general, we can try to apply some more dynamic threshold, use hill climbing approach, etc
Right, currently this assumes there are only 2 tiers (some other config options do that as well). I think that if we would like to have this generic then we would have to introduce such allocationWatermarks per tier (or per each boundry between tiers)

cachelib/allocator/CacheAllocator-inl.h line 2508 at r2 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

For this it is not trivial to add per tier mmContainer configs. Partly because in cachebench/Cache-inl.h, we have the following initialization:
    const size_t numBytes = cache_->getCacheMemoryStats().cacheSize;
    for (uint64_t i = 0; i < config_.numPools; ++i) {
      const double& ratio = config_.poolSizes[i];
      const size_t poolSize = static_cast<size_t>(numBytes * ratio);
      typename Allocator::MMConfig mmConfig =
          makeMMConfig<typename Allocator::MMConfig>(config_);
      const PoolId pid = cache_->addPool(
          folly::sformat("pool_{}", i), poolSize, {} /* allocSizes */, mmConfig,
          nullptr /* rebalanceStrategy */, nullptr /* resizeStrategy */,
          true /* ensureSufficientMem */);
      pools_.push_back(pid);
    }
The call to addPool initializes a pool that is split among tiers, so mmConfig is done per pool rather than per tier. For now I would suggest that we either:

Accept the current implementation or

Drop it from this PR and create a new PR that fully supports per-tier mmContainer creation (allowing us to have different allocators per tier too).

I have addressed the rest of the comments in the latest commit.

I think we could drop the change, it might be worked on separately I belive.

cachelib/allocator/CacheAllocator-inl.h line 2566 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

@igchor should it be really currentTier() instead of always 0? As I remember you added currentTier() to be used for interaction with BG workers. In this case set of pools is the same for each tier (we have duplication) and we can always work with tier 0, isn’t it?

Right, in this particular place, 0 is OK.

cachelib/allocator/MemoryTierCacheConfig.h line 72 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Not sure that I understand, but we have MMContainers per tier but config for MMcontainers can be the same for each tier. Is my assumption correct?

I belive we would prefer to have config for each MMContainer (for each tier separately). This is because we only evict from DRAM but we evict AND promote from the second tier (which means we might want to customize insertion point for second tier, but not neceserilay for DRAM),

igchor

Reviewable status: 2 of 25 files reviewed, 16 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)

cachelib/allocator/CacheAllocator-inl.h line 393 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Ah, got it. Than some comment is nice to have to describe this logic.

BTW, do we have cases when this option improves performance somehow?

That's why I added this - to be able to benchmark to it :)

Run long tests (navy/bench) every day on CI

vinser52 requested changes Sep 2, 2022

View reviewed changes

igchor and others added 28 commits September 7, 2022 07:58

Run centos and debian workflows on push and PR

4ccdf64

Introduce FileShmSegment for file-backed shared memory

fad6c0e

It's implementation is mostly based on PosixShmSegment. Also, extend ShmManager and ShmSegmentOpts to support this new segment type.

Adjust and enable tests for ShmFileSegment

039df09

Integrate Memory Tier config API with CacheAllocator.

be64d5e

Add MemoryTierCacheConfig::fromShm()

db07a94

to allow using new configureMemoryTiers() API with legacy behavior. Move validation code for memory tiers to validate() method and convert ratios to sizes lazily (on get)..

Fix test_shm_manager.cpp test

9aa5176

It wrongly assumed that the only possible segment type is PosixSysV segment.

Run tests on CI

abfaf7f

Run long tests (navy/bench) every day on CI

2f2e78c

Moved common segment code for posix and file shm segments into ShmCommon

3cd7fb2

Enabled memory tier config API for cachebench.

6358b37

Enabled shared memory tier in cachebench.

4f2a562

Run CI on prebuild docker image

8e97fc6

Run only centos build on CI

7cfafa3

Initial multi-tier support implementation

8d26dcb

Implemented async Item movement between tiers

0b39a94

Adding example for multitiered cache

2606770

Enable workarounds in tests

6091f16

Add basic multi-tier test

69cf5ff

Set correct size for each memory tier

efab02b

Aadding new configs to hit_ratio/graph_cache_leader_fobj

9fa830c

Fix eviction flow and removeCb calls

f513b06

Without this fix removeCb called even in case when Item is moved between tiers.

Remove failing build-cachelib workflow (pmem#42)

f11cc6c

It fails because CentOS is EOL. We might want to consider using CentOS Streams but for now, just remove it. Right now, we rely on build-cachelib-centos workflow anyway.

Disabled test suite allocator-test-AllocatorTypeTest (pmem#41)

786552b

Disabled test suite allocator-test-AllocatorTypeTest to skip sporadically failing tests.

Do not compensate for rounding error when calculating tier sizes (pme…

c57827e

…m#43) Compensation results in ratios being different than originially specified.

Fixed total cache size in CacheMemoryStats (pmem#38)

bbf072b

Return a sum of sizes of each tier instead of just 1st tier's size.

vinser52 and others added 8 commits September 12, 2022 13:16

Extend cachebench to bind memory tiers to NUMA nodes

b0e2574

Merge pull request pmem#15 from vinser52/fix_deprecation_warn

f0c5c54

Fix deprecation warn

Update CI to use intel/CacheLib repo (pmem#17)

f1dfc60

Clean-up test's blacklist (pmem#18)

63c9029

Merge pull request pmem#13 from vinser52/numa_binding

a4c4ab3

Initial support of NUMA bindings

Merge pull request pmem#19 from vinser52/fix_allocator_test

829a434

Fix ReaperSkippingSlabTraversalWhileSlabReleasing test

tier admission

3181c63

item movement test updated class in test use transparent sync for item movement remove extra whitespace updates per comments updates per comments (still outstanding on the todos)

byrnedj force-pushed the admission-policies branch from 07ca327 to 3181c63 Compare September 20, 2022 17:12

igchor reviewed Sep 20, 2022

View reviewed changes

updated per review

f08e434

vinser52 requested changes Sep 21, 2022

View reviewed changes

byrnedj mentioned this pull request Sep 21, 2022

Background data movement #20

Merged

added admission policy doc

adcb0f7

igchor reviewed Sep 22, 2022

View reviewed changes

byrnedj force-pushed the develop branch from 4f8a592 to 9bb4db8 Compare November 16, 2022 13:17

byrnedj force-pushed the develop branch from 8a1b164 to 08bf0b4 Compare March 3, 2023 21:45

byrnedj force-pushed the develop branch from ce79bd9 to f6ad2ae Compare July 23, 2023 23:42

byrnedj pushed a commit that referenced this pull request Jul 23, 2023

Merge pull request #10 from igchor/cron_tests

52db035

Run long tests (navy/bench) every day on CI

byrnedj force-pushed the develop branch from f6ad2ae to 09d7bab Compare July 23, 2023 23:44

vinser52 force-pushed the develop branch from ff44c3c to 139e2e9 Compare March 1, 2024 20:13

byrnedj force-pushed the develop branch 2 times, most recently from 09d7bab to ebfca17 Compare May 21, 2024 13:24

byrnedj force-pushed the develop branch 4 times, most recently from 9ba4e79 to c1ff397 Compare June 26, 2024 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Admission policies + some extra stats for cachebench #10

Admission policies + some extra stats for cachebench #10

Uh oh!

byrnedj commented Aug 25, 2022 •

edited by rdower

Loading

Uh oh!

vinser52 left a comment

Uh oh!

vinser52 left a comment

Uh oh!

byrnedj commented Sep 20, 2022

Uh oh!

igchor left a comment

Uh oh!

byrnedj commented Sep 21, 2022

Uh oh!

vinser52 left a comment

Uh oh!

igchor left a comment

Uh oh!

igchor left a comment

Uh oh!

Uh oh!

Admission policies + some extra stats for cachebench #10

Are you sure you want to change the base?

Admission policies + some extra stats for cachebench #10

Uh oh!

Conversation

byrnedj commented Aug 25, 2022 • edited by rdower Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

byrnedj commented Sep 20, 2022

Uh oh!

igchor left a comment

Choose a reason for hiding this comment

Uh oh!

byrnedj commented Sep 21, 2022

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

igchor left a comment

Choose a reason for hiding this comment

Uh oh!

igchor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

byrnedj commented Aug 25, 2022 •

edited by rdower

Loading