Skip to content

Background data movement #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 21, 2022
Merged

Conversation

byrnedj
Copy link

@byrnedj byrnedj commented Sep 19, 2022

This is the second part of: #2


This change is Reviewable

@byrnedj byrnedj changed the base branch from main to develop September 19, 2022 21:31
@byrnedj byrnedj requested review from igchor and vinser52 September 19, 2022 21:32
Copy link
Member

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 24 files at r1.
Reviewable status: 5 of 24 files reviewed, 3 unresolved discussions (waiting on @byrnedj and @vinser52)


cachelib/allocator/BackgroundMover.h line 68 at r1 (raw file):

  ~BackgroundMover() override;
  

you could try to run clang-format on those new files - it should get rid of some of those unnecessary white chars


cachelib/allocator/BackgroundMover-inl.h line 87 at r1 (raw file):

    }

    totalBytesMoved.add(batch * mpStats.acStats.at(cid).allocSize);

shouldn;t we use moved instead of batch for this calculation?


cachelib/allocator/CacheAllocatorConfig.h line 629 at r1 (raw file):

  
  
  double minAcAllocationWatermark{0.0};

Rest of the options (sizeThresholdPolicy, etc.) will be in a separate PR, ritght? I think you might want to remove mention of sizeThresholdPolicy from the .md file for now.

Copy link
Member

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: perhaps we could just extend the test a little bit

Reviewable status: 5 of 24 files reviewed, 3 unresolved discussions (waiting on @byrnedj and @vinser52)


cachelib/allocator/tests/AllocatorMemoryTiersTest.h line 112 at r1 (raw file):

    auto perclassPstats = allocator->getBackgroundMoverClassStats(MoverDir::Promote);

    EXPECT_GT(1, stats.evictionStats.numMovedItems);

Can we add a more restrictive test? Since we know the values of watermarks we could test if specified percent of memory is free in the first tier after BG workers do their work.

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 5 of 24 files reviewed, 7 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)


MultiTierDataMovement.md line 15 at r1 (raw file):

## Synchronous Eviction and Promotion

- `disableEvictionToMemory`: Disables eviction to memory (item is always evicted to NVMe or removed

Do I understand correctly that tiering would not work in that case? Or we are going to use first-free policy for allocations?


MultiTierDataMovement.md line 26 at r1 (raw file):

- `backgroundEvictorIntervalMilSec`: The interval that this thread runs for - by default
the background evictor threads will wake up every 10 ms to scan the AllocationClasses. Also,
the background evictor thead will be woken up everytime there is a failed allocation (from

misprint: thead - > thread


MultiTierDataMovement.md line 33 at r1 (raw file):

- `evictorThreads`: The number of background evictors to run - each thread is a assigned
a set of AllocationClasses to scan and evict objects from. Currently, each thread gets
an equal number of classes to scan - but as object size distribution may be unequal - future

I am not sure that we should write about future. We can change this description when balancing is implemented.


MultiTierDataMovement.md line 62 at r1 (raw file):

## Background Promoters

The background promotes scan each class to see if there are objects to move to a lower

Misprint? promotes -> promoters

@byrnedj
Copy link
Author

byrnedj commented Sep 21, 2022

## Synchronous Eviction and Promotion

- `disableEvictionToMemory`: Disables eviction to memory (item is always evicted to NVMe or removed

Do I understand correctly that tiering would not work in that case? Or we are going to use first-free policy for allocations?

Yes. Tiering does not work if this option is on - we could implement first free or some other allocation policy if this is disabled - see #10. I removed documentation for admission policies from this version and will add it to #10.

MultiTierDataMovement.md line 33 at r1 (raw file):

- `evictorThreads`: The number of background evictors to run - each thread is a assigned
a set of AllocationClasses to scan and evict objects from. Currently, each thread gets
an equal number of classes to scan - but as object size distribution may be unequal - future

I am not sure that we should write about future. We can change this description when balancing is implemented.

The implementation is complete - I can add it to this PR or wait until this one has been merged to develop.

Can we add a more restrictive test? Since we know the values of watermarks we could test if specified percent of memory is free in the first tier after BG workers do their work.

Done. And reformatted code.

Copy link
Member

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 24 files at r1, 1 of 10 files at r2, 1 of 3 files at r3.
Reviewable status: 4 of 24 files reviewed, 5 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)


cachelib/allocator/tests/AllocatorMemoryTiersTest.h line 100 at r3 (raw file):

    //wait for bg movers
    std::this_thread::sleep_for(std::chrono::seconds(1));

Is there a way to deterministically wait for BG threads completion? Sleeps like this tend to be hard to debug and cause random failures.

Maybe we could just spin until numMovedItems (or freePercent) is what we expect it to be, and then do all the checks?


cachelib/cachebench/util/CacheConfig.cpp line 100 at r3 (raw file):

  if (configJson.count("memoryTiers")) {
    for (auto& it : configJson["memoryTiers"]) {
      memoryTierConfigs.push_back(MemoryTierConfig(it).getMemoryTierCacheConfig());

where is this code now?

@byrnedj
Copy link
Author

byrnedj commented Oct 5, 2022

Reviewed 1 of 24 files at r1, 1 of 10 files at r2, 1 of 3 files at r3.
Reviewable status: 4 of 24 files reviewed, 5 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)

cachelib/allocator/tests/AllocatorMemoryTiersTest.h line 100 at r3 (raw file):

    //wait for bg movers
    std::this_thread::sleep_for(std::chrono::seconds(1));

Is there a way to deterministically wait for BG threads completion? Sleeps like this tend to be hard to debug and cause random failures.

Maybe we could just spin until numMovedItems (or freePercent) is what we expect it to be, and then do all the checks?

Done.

cachelib/cachebench/util/CacheConfig.cpp line 100 at r3 (raw file):

  if (configJson.count("memoryTiers")) {
    for (auto& it : configJson["memoryTiers"]) {
      memoryTierConfigs.push_back(MemoryTierConfig(it).getMemoryTierCacheConfig());

where is this code now?

Fixed.

Copy link
Member

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 24 files at r1, 5 of 10 files at r2, 1 of 3 files at r3, 5 of 8 files at r4, all commit messages.
Reviewable status: 24 of 28 files reviewed, 6 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)


cachelib/allocator/BackgroundMover.h line 20 at r4 (raw file):

#include <folly/concurrency/UnboundedQueue.h>
#include <gtest/gtest_prod.h>

why is this needed?


cachelib/allocator/MMTinyLFU-inl.h line 235 at r4 (raw file):

void
MMTinyLFU::Container<T, HookPtr>::withPromotionIterator(F&& fun) {
  throw std::runtime_error("Not supported");

Can you maybe open some issue to implement this in future?


cachelib/external/fbthrift line 1 at r4 (raw file):

Subproject commit 74f3a8fb00e3963d1e6f2361fce722688d500417

Can you revert updating the dependencies? It will cause conflicts later

@byrnedj byrnedj force-pushed the background-data-movement branch from 5ec282b to dc7f5de Compare October 6, 2022 12:03
@byrnedj
Copy link
Author

byrnedj commented Oct 6, 2022

Reviewed 11 of 24 files at r1, 5 of 10 files at r2, 1 of 3 files at r3, 5 of 8 files at r4, all commit messages.
Reviewable status: 24 of 28 files reviewed, 6 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)

cachelib/allocator/BackgroundMover.h line 20 at r4 (raw file):

#include <folly/concurrency/UnboundedQueue.h>
#include <gtest/gtest_prod.h>

why is this needed?

It's not - it was leftover from development. I have removed it.

cachelib/allocator/MMTinyLFU-inl.h line 235 at r4 (raw file):

void
MMTinyLFU::Container<T, HookPtr>::withPromotionIterator(F&& fun) {
  throw std::runtime_error("Not supported");

Can you maybe open some issue to implement this in future?

Sure, done.

cachelib/external/fbthrift line 1 at r4 (raw file):

Subproject commit 74f3a8fb00e3963d1e6f2361fce722688d500417

Can you revert updating the dependencies? It will cause conflicts later

Yes, sorry about that. I have reverted them.

Copy link
Member

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: there is just one test failing on the CI (not sure if it's random or not)

Reviewable status: 22 of 28 files reviewed, 4 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)

@byrnedj
Copy link
Author

byrnedj commented Oct 17, 2022

When I run the tests locally I do not see any failures.
test_results.txt

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 24 files at r1, 4 of 10 files at r2, 1 of 3 files at r3, 2 of 8 files at r4, 1 of 2 files at r5, 3 of 3 files at r6.
Reviewable status: 27 of 28 files reviewed, 2 unresolved discussions (waiting on @byrnedj and @igchor)


MultiTierDataMovement.md line 15 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Do I understand correctly that tiering would not work in that case? Or we are going to use first-free policy for allocations?

Should it be renamed to disableMovement. According to the current description, it looks like we are prohibiting item movement between tiers. As I remember in the legacy implementation this option does not allow throwing away any item implicitly by eviction from the cache. Now according to the comment, the item can be removed by eviction.


cachelib/allocator/BackgroundMover.h line 97 at r6 (raw file):

  AtomicCounter totalBytesMoved{0};

  std::vector<std::tuple<TierId, PoolId, ClassId>> assignedMemory_;

Since we are using std::tuple<TierId, PoolId, ClassId> definition in multiple places, should it be some alias for that, e.g. using memory_descriptor_type = std::tuple<TierId, PoolId, ClassId>;. In case we want to change this type in the future it is only a single place where it is defined.
Probably, it is even better to have a separate class/struct instead of tuple, but it is up to you.

Copy link
Author

@byrnedj byrnedj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 20 of 28 files reviewed, 2 unresolved discussions (waiting on @igchor and @vinser52)


MultiTierDataMovement.md line 15 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Should it be renamed to disableMovement. According to the current description, it looks like we are prohibiting item movement between tiers. As I remember in the legacy implementation this option does not allow throwing away any item implicitly by eviction from the cache. Now according to the comment, the item can be removed by eviction.

It appears that disableEviction will be deprecated. I have removed this from the documentation.


cachelib/allocator/BackgroundMover.h line 97 at r6 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Since we are using std::tuple<TierId, PoolId, ClassId> definition in multiple places, should it be some alias for that, e.g. using memory_descriptor_type = std::tuple<TierId, PoolId, ClassId>;. In case we want to change this type in the future it is only a single place where it is defined.
Probably, it is even better to have a separate class/struct instead of tuple, but it is up to you.

Done.

Copy link
Member

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 8 of 8 files at r7.
Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @vinser52)

@byrnedj byrnedj merged commit 4f8a592 into intel:develop Oct 21, 2022
byrnedj added a commit that referenced this pull request Nov 16, 2022
Background data movement using periodic workers. Attempts to evict/promote items per given thresholds for each class. These reduce p99 latency since there is a higher chance that an allocation slot is free in the tier we are allocating in.
byrnedj added a commit that referenced this pull request Mar 3, 2023
Background data movement using periodic workers. Attempts to evict/promote items per given thresholds for each class. These reduce p99 latency since there is a higher chance that an allocation slot is free in the tier we are allocating in.
vinser52 pushed a commit that referenced this pull request Jul 17, 2023
Background data movement using periodic workers. Attempts to evict/promote items per given thresholds for each class. These reduce p99 latency since there is a higher chance that an allocation slot is free in the tier we are allocating in.
byrnedj pushed a commit that referenced this pull request Jul 23, 2023
Use docker with prebuild dependencies
byrnedj added a commit that referenced this pull request Jul 23, 2023
Background data movement using periodic workers. Attempts to evict/promote items per given thresholds for each class. These reduce p99 latency since there is a higher chance that an allocation slot is free in the tier we are allocating in.

fix race in promotion where releaseBackToAllocator
was being called before wakeUpWaiters.

reinsert to mm container on failed promotion
vinser52 pushed a commit that referenced this pull request Feb 29, 2024
Background data movement using periodic workers. Attempts to evict/promote items per given thresholds for each class. These reduce p99 latency since there is a higher chance that an allocation slot is free in the tier we are allocating in.

fix race in promotion where releaseBackToAllocator
was being called before wakeUpWaiters.

reinsert to mm container on failed promotion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants