Skip to content

Binary KVReplayGenerator #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Binary KVReplayGenerator #101

wants to merge 1 commit into from

Conversation

byrnedj
Copy link

@byrnedj byrnedj commented Feb 6, 2024

  • binary request generation and replay
  • fast forwarding of a trace
  • preloading requests into memory

This change is Reviewable

: ReplayGeneratorBase(config), binaryStream_(config) {
for (uint32_t i = 0; i < numShards_; ++i) {
stressorCtxs_.emplace_back(std::make_unique<StressorCtx>(i));
std::string_view s{"abc"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

outputStreamKeys_ += nreqs * binReqSize;
folly::setThreadName("cb_binary_gen");
traceStream_.fastForwardTrace(fastForwardCount_);
genRequests(latch);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to pass the latch here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In generating the binary file I don't need the latch, but in the other path (reading the CSV file normally) we use the latch to wait for preloading some requests. So it's not used in binary trace generation but I still pass it because it's a parameter to genRequests.

Is there a better way to handle this passing the latch in this case? genRequests does require it (counts down once we have hit a number of requests).

req->req_.ttlSecs, keyOffset);
std::memcpy(outputStreamReqs_ + nreqs, &binReq, sizeof(binReq));
std::memcpy(outputStreamKeys_ + keyOffset, req->key_.c_str(), keySize);
if ((nreqs % BIN_REQ_INT) == 0 && nreqs > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment for this part of the code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

XDCHECK_LT(req->op_, 12);
auto key = req->getKey(binaryStream_.getKeyOffset());
OpType op;
switch (req->op_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just cast OptType to uint8_t here and avoid the switch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

setEOF();
}

const Request& BinaryKVReplayGenerator::getReq(uint8_t,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function (and the entire file) looks pretty similar to KVReplayGenerator.h Can't we unify them? Perhaps make this (and other) function a template (where either BinaryRequest* or std::unique_ptr is the arg)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually pretty hard because we don't have the same underlying request queue logic in the binary trace version.

ttl_(ttl),
keyOffset_(keyOffset) {}

std::string_view getKey(char* baseAddr) const {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you will store offset relative to this pointer than you do not need to pass baseAddr as a parameter. The code of the method will look something like this:

std::string_view getKey() const {
    return std::string_view(
        reinterpret_cast<const char*>(this+ keyOffset_), keySize_);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just figured out that you store keys in separate file, correct? if so, my idea is not feasible. It is only possible if BinaryRequest objects and corresponding keys are stored in the same file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - thanks for the note!

@byrnedj byrnedj changed the base branch from develop to main April 22, 2024 16:00
------------------------
This offers much lower overhead of trace replaying. It
assumes the kvcache trace format and kvcache behavoir.
This patch supports the following:
- binary request generation and replay
- fast forwarding of a trace
- preloading requests into memory
- object size amplification
- queue free for even lower request overhead
- can parse many more requests per second than cachelib
  can process, so we can get 100% CPU usage

The limitations are:
- no trace amplification (however you can amplify the
original .csv trace and save it in binary format)
- ~4GB overhead per 100 million requests
- you need some disk space to store large traces
@byrnedj byrnedj force-pushed the binary_request branch 2 times, most recently from cf0a7c1 to 9d49e46 Compare April 22, 2024 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants