feat: refactor ceramic simulations and new benchmark for V' features #149

dav1do · 2024-02-14T04:57:10Z

Linear ws1-1438

Summary

The main goal was to add a new simulation to benchmark Model Instance Document creation for the E2E ceramic V' (recon) system. I ended up making a bunch of changes along the way and can potentially split this into multiple PRs if it's too hard to review.

I created a new scenario called ceramic-new-streams-benchmark that creates a model, all nodes index the model, and then 1 node creates new MIDs. We investigate how fast these sync to the other nodes in order to determine if we succeed. As there are no metrics for this available afaik, and the workers don't currently have access to prometheus, I ended up using redis to store before/after data on the workers so the manager can interrogate the results at the end. It respects the throttleRequests and successRequestTarget values.

Details

I had to expose the ceramic node admin DID's private key to the simulation in order to interact with the endpoints (i.e. index models). I added a did_from_pk function for this (similar to composedb), but it should probably be moved to rust-ceramic.
I corrected a handful of bugs: indexing was not succeeding, and some of the simulations did not seem to do exactly what they were designed to (e.g. running something per node instead of per simulation). I adjusted the global/node/user goose leader functions to support the various control flows, and corrected the requests to log their statuses to see a better summary of the simulation and request breakdown. This does change how the output looks, for example the ceramic simple scenario now shows more request info, even though the behavior should be unchanged.
There is now a simulation configuration object that allows setting up the model instance specific details. This includes things like how signing should be done (user specific DID key, DID per node), whether models should be shared across nodes or not. The setup makes sure this happens appropriately, and stores the goose user data/config so that the simulations can proceed without really caring what they're using (they just use the ceramic http client and the models/MIDs in the goose data struct).
- I refactored the Ceramic simulations to use this config. They are almost all the same with a slight variation and I wanted to take this further, but already got too into the weeds. We could almost have one file to define all the operations (get/update/etc) and parameterized config to allow changing the behavior, which would allow for more interesting simulations without requiring code changes. This would entail exposing some of the config options to the simulation in order to get passed down. Anyway, for now, having a bunch of simulation names is fine.
- The reason for supporting user DIDs is that the js-ceramic worker threads changes parallelize on the DID, so we wanted a more realistic scenario where not every user in the system was the same.
I noticed that the local-scripts didn't work right and fixed them (they relied on local config I had).
The new postgres database ignored the devMode flag when setting its resources, now it doesn't.

Results

Results from a 1 minute run on my MBP just to demonstrate that it works:

Worker 1 did not meet the target request rate: 228.16666666666666 < 300 (total requests: 13690 over 60)
Worker 2 did not meet the target request rate: 160.91666666666666 < 300 (total requests: 9655 over 60)

Follow up

When generating random data for models, I ran into an error like the following that crashed js-ceramic. I'm not yet sure what/where the bug is (that is, it might be postgres config/not setting the encoding to utf8), it might be js-ceramic, or something else. I switched to strings as a result but need to follow up.

[2024-02-18T17:27:14.774Z] IMPORTANT: Ceramic API running on 0.0.0.0:7007'
/js-ceramic/node_modules/rxjs/dist/cjs/internal/util/reportUnhandledError.js:13
            throw err;
            ^

error: insert into "kjzl6hvfrbw6ca1eadhvu4l06ssws62ks7gmgzc8285p2xmqaaiw0l57paapfnp" ("controller_did", "created_at", "first_anchored_at", "last_anchored_at", "stream_content", "stream_id", "tip", "updated_at") values ($1, $2, $3, $4, $5, $6, $7, $8) on conflict ("stream_id") do update set "stream_id" = $9,"controller_did" = $10,"stream_content" = $11,"tip" = $12,"last_anchored_at" = $13,"first_anchored_at" = $14,"updated_at" = $15 - unsupported Unicode escape sequence
    at Parser.parseErrorMessage (/js-ceramic/node_modules/pg-protocol/dist/parser.js:287:98)
    at Parser.handlePacket (/js-ceramic/node_modules/pg-protocol/dist/parser.js:126:29)
    at Parser.parse (/js-ceramic/node_modules/pg-protocol/dist/parser.js:39:38)
    at Socket.<anonymous> (/js-ceramic/node_modules/pg-protocol/dist/index.js:11:42)
    at Socket.emit (node:events:518:28)
    at Socket.emit (node:domain:488:12)
    at addChunk (node:internal/streams/readable:559:12)
    at readableAddChunkPushByteMode (node:internal/streams/readable:510:3)
    at Readable.push (node:internal/streams/readable:390:5)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
  length: 250,
  severity: 'ERROR',
  code: '22P05',
  detail: '\\u0000 cannot be converted to text.',
  hint: undefined,
  position: undefined,
  internalPosition: undefined,
  internalQuery: undefined,
  where: 'JSON data, line 1: ...葎󲼯􊋁򃻐򖕟𭅨򩗒𺽣񎾏򻔻񟢽\\u0000...\n' +
    "unnamed portal parameter $5 = '...'",
  schema: undefined,
  table: undefined,
  column: undefined,
  dataType: undefined,
  constraint: undefined,
  file: 'jsonfuncs.c',
  line: '621',
  routine: 'json_ereport_error'
}

linear · 2024-02-14T04:57:13Z

WS1-1438 Create performance tests for model instance documents

Most of the scenarios are very simlilar. Add a more config based approach, allowing changing whether models and keys are shared or unique. Added naming of requests to get more granular goose metrics.

Optionally share model instance docs across simulation

…n API

Includes some refactoring (sorry) and collecting metrics for success. This scenario doesn't have a prom metric as easily accessible afaik, so I used redis to have the worker capture the values before/after.

runner/src/scenario/ceramic/model_instance.rs

runner/src/scenario/ceramic/mod.rs

nathanielc

Looks great, thanks for taking on this refactor along with the end goal.

As there are no metrics for this available afaik, and the workers don't currently have access to prometheus, I ended up using redis to store before/after data on the workers so the manager can interrogate the results at the end.

This seems fine, using redis makes it quick and easy. My only question is should we invest the time now to add the metrics to the js-ceramic process and expose it to the goose workers? Is that the best long term strategy? Not that we will block this PR on that work but rather should we capture this need as issue and start on it right away?

runner/src/scenario/ceramic/mod.rs

runner/src/simulate.rs

dav1do · 2024-02-20T21:23:22Z

This seems fine, using redis makes it quick and easy. My only question is should we invest the time now to add the metrics to the js-ceramic process and expose it to the goose workers? Is that the best long term strategy? Not that we will block this PR on that work but rather should we capture this need as issue and start on it right away?

I think we should definitely have some metrics in js-ceramic that make it easy for someone to tell what's going on with their node but I'm not sure what we currently have. I'm not sure how urgent/important exposing them to the simulations is but it seems like it could be quite useful, especially as we use keramik and the simulations to identify and investigate issues, and add more correctness validation to them.

Right now, correctness checks require doing scenario specific things, but with the shared setup, we could also have some "shared correctness" checks at the end if metrics are exposed. While some validations are going to be scenario specific, having some sanity validations and/or re-publishing specific metrics to the goose output, could simplify review and understanding things. I think having a dashboard of "key metrics" for simulations would be incredibly helpful toward a holistic understanding of the network health and the simulation status/success (especially as we try to include or exclude CAS and other services in the benchmarks). Some of this might be possible already and just require providing an integrated dashboard/steps/etc in order to make it happen.

sometimes hit errors like `failed to get model count: TransactionError: request canceled because throttled load test ended (sending on a closed channel)` so we use our own reqwest client instead of the goose supplied one on the user. this only happened if the `throttleRequests` value is set

dav1do force-pushed the feat/ws1-1438-mid-simulation branch from 2345de7 to 61544b6 Compare February 16, 2024 19:53

dav1do and others added 7 commits February 16, 2024 15:00

feat: initial DID user key signing prep

fbe4417

feat: substantial refactor scenario creation

078aabe

Most of the scenarios are very simlilar. Add a more config based approach, allowing changing whether models and keys are shared or unique. Added naming of requests to get more granular goose metrics.

fix: move recon_sync scenario out of ceramic mod

af329cc

feat: replace model reuse scenario with config

7ab5f74

Optionally share model instance docs across simulation

fix: adjust local-scripts to work on more than 1 pc

6db7d16

feat: support devMode with composedb postgres resources

b95799c

feat: add ceramic admin private key to sim worker/manager to use admi…

abe2deb

…n API

dav1do force-pushed the feat/ws1-1438-mid-simulation branch 2 times, most recently from 3d4c112 to 4c065a3 Compare February 19, 2024 23:41

feat: add working MID benchmark scenario

a6976f6

Includes some refactoring (sorry) and collecting metrics for success. This scenario doesn't have a prom metric as easily accessible afaik, so I used redis to have the worker capture the values before/after.

dav1do force-pushed the feat/ws1-1438-mid-simulation branch from 4c065a3 to a6976f6 Compare February 20, 2024 16:53

dav1do marked this pull request as ready for review February 20, 2024 17:29

dav1do requested review from dbcfd, AaronGoldman, nathanielc, 3benbox and smrz2001 February 20, 2024 17:29

dav1do commented Feb 20, 2024

View reviewed changes

runner/src/scenario/ceramic/model_instance.rs Show resolved Hide resolved

runner/src/scenario/ceramic/mod.rs Show resolved Hide resolved

nathanielc approved these changes Feb 20, 2024

View reviewed changes

runner/src/scenario/ceramic/mod.rs Show resolved Hide resolved

runner/src/simulate.rs Outdated Show resolved Hide resolved

chore: clean up comments

9452188

dav1do added this pull request to the merge queue Feb 21, 2024

Merged via the queue into main with commit ff53def Feb 21, 2024
5 checks passed

dav1do deleted the feat/ws1-1438-mid-simulation branch February 21, 2024 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: refactor ceramic simulations and new benchmark for V' features #149

feat: refactor ceramic simulations and new benchmark for V' features #149

dav1do commented Feb 14, 2024 •

edited

Loading

linear bot commented Feb 14, 2024

nathanielc left a comment

dav1do commented Feb 20, 2024

feat: refactor ceramic simulations and new benchmark for V' features #149

feat: refactor ceramic simulations and new benchmark for V' features #149

Conversation

dav1do commented Feb 14, 2024 • edited Loading

Summary

Details

Results

Follow up

linear bot commented Feb 14, 2024

nathanielc left a comment

Choose a reason for hiding this comment

dav1do commented Feb 20, 2024

dav1do commented Feb 14, 2024 •

edited

Loading