feat: cas scaling #1189

smrz2001 · 2024-03-26T19:03:30Z

This PR makes two main changes:

Puts IPFS store operations behind a new feature flag (CAS_USE_IPFS_STORAGE).
Detects whether a batch queue message was received without request IDs and, if so, pulls the batch from S3. This avoids a large batch from running into the 256KB SQS message size limit.

These changes are backwards compatible. Once ready, they should be merged before the changes from this PR for updating the Scheduler to generate/store larger batches.

linear · 2024-03-26T19:03:32Z

WS1-1562 Prototype for CAS Scaling

stbrody · 2024-03-27T17:41:44Z

Can we split CAS_USE_IPFS into two env vars? One for publishing to pubsub and another for storing the commits themselves? I think it would be good to turn those off one at a time to help us detect if there are issues along the way

smrz2001 · 2024-03-27T20:03:16Z

Can we split CAS_USE_IPFS into two env vars? One for publishing to pubsub and another for storing the commits themselves? I think it would be good to turn those off one at a time to help us detect if there are issues along the way

I didn't think we'd want to apply the load of either publishing or storing commits to IPFS. I feel like that's just asking for (now gigantic) failed batches due to IPFS issues. Do we really care to spend time figuring out how IPFS does under either type of load? IMO, cutting IPFS out of the picture will be essential for scaling CAS.

stbrody · 2024-03-27T20:27:10Z

I definitely agree that the goal is to stop publishing to ipfs entirely. I'm just saying let's wind down our use of ipfs in 2 phases. First let's stop publishing notifications of the anchor commits to pubsub, while continuing to still create them in ipfs so that they are available to bitswap. Then if we see no problems with that, then we can stop creating the anchor commits against the CAS IPFS entirely.

smrz2001 · 2024-03-27T21:02:48Z

I definitely agree that the goal is to stop publishing to ipfs entirely. I'm just saying let's wind down our use of ipfs in 2 phases. First let's stop publishing notifications of the anchor commits to pubsub, while continuing to still create them in ipfs so that they are available to bitswap. Then if we see no problems with that, then we can stop creating the anchor commits against the CAS IPFS entirely.

I'll add the second flag though I'm quite certain we'll be turning it off right about the time we start benchmarking CAS with 1M sized batches 😉

I'd also posit a different way of looking at this. Our decision to keep IPFS a part of the CAS architecture shouldn't be based on whether or not IPFS has issues during testing, but on whether it's more or less likely to have issues with larger batches. I'm certain that it is much more likely to run into issues with larger batches that we'll then have to mitigate (e.g. writing intermediate Merkle-tree nodes might be be too much too fast and need additional pacing, etc.)

In other words, even if it sails through all of our benchmarking, I would still recommend turning IPFS off completely for production. Whether it works now under high load (I'm skeptical) doesn't mean that it will remain stable in production even under the same type of load.

stbrody · 2024-03-28T20:15:09Z

Sorry, when I talked about making decisions based on whether or not there are issues, I wasn't saying that we shouldn't remove IPFS unless it presents performance issues. I want to remove IPFS ASAP. I was saying that I'm concerned that once we remove it, we'll have issues in the field from our users where they aren't getting anchor commits. I'm concerned about any lingering bugs in the anchor polling system delivering anchor commits via CAR files, because up until now we've always had pubsub and bitswap as a backup, so if that isn't totally working we might not ever have noticed. I'm also worried about users in the wild running old versions of js-ceramic that don't even have the new polling and CAR file features, and might not upgrade or do anything until suddenly their writes start failing with CACAO errors due to missing anchor commits once we turn off publishing of anchors to IPFS.

So that is the reason I want to do this in two phases, so that we have time for users in the wild to report issues to us if they stop getting anchor commits, and for us to have a clearer picture where things are breaking down if that happens.

smrz2001 · 2024-03-28T20:19:59Z

Yeah, that makes sense, thanks for clarifying. If we have to choose, we can also always setup separate (dedicated) CAS clusters for V' partners that don't use IPFS at all, and leave current CAS with IPFS.

…ng-or-another-approach' into feature/ws1-1562-prototype-for-cas-scaling

smrz2001 · 2024-04-15T17:55:03Z

src/repositories/anchor-repository.ts

@@ -55,6 +55,7 @@ export class AnchorRepository implements IAnchorRepository {
  async findByRequestId(requestId: string): Promise<StoredAnchor | null> {
    const row = await this.table.where({ requestId: requestId }).first()
    if (!row) return null
+    console.log("IM HERE with ", row, requestId)


Should we take this log out, @JulissaDantes? Might flood the logs at high throughput.

You are completely right. Thank you for pointing this out!

src/repositories/anchor-repository.ts

smrz2001 added 2 commits March 18, 2024 22:02

feat: cas scaling

4015302

feat: disable ipfs storage for anchors by default

db9fbe0

smrz2001 requested review from stephhuynh18 and 3benbox March 26, 2024 19:03

smrz2001 self-assigned this Mar 26, 2024

parsing fixes

4632026

smrz2001 mentioned this pull request Mar 27, 2024

feat: cas scaling ceramicnetwork/go-cas#38

Closed

use separate env var controlling ipfs storage

be4afde

JulissaDantes added 6 commits April 11, 2024 16:38

feat: use batch insert and batch update

13f1c5d

feat: update createAnchor function

1c9a24a

feat: encode anchors when inserting

300fd58

chore: run lint

38e002d

chore: remove .only from test file

32de214

Merge branch 'feature/ws2-3168-cas-scaling-changes-require-sql-batchi…

fe5927f

…ng-or-another-approach' into feature/ws1-1562-prototype-for-cas-scaling

smrz2001 commented Apr 15, 2024

View reviewed changes

JulissaDantes reviewed Apr 15, 2024

View reviewed changes

src/repositories/anchor-repository.ts Outdated Show resolved Hide resolved

JulissaDantes and others added 3 commits April 15, 2024 15:30

chore: Update anchor-repository.ts

6e6a775

Merge branch 'develop' into feature/ws1-1562-prototype-for-cas-scaling

9ab7498

disable car file logging

05ded08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cas scaling #1189

feat: cas scaling #1189

smrz2001 commented Mar 26, 2024 •

edited

Loading

linear bot commented Mar 26, 2024

stbrody commented Mar 27, 2024 •

edited

Loading

smrz2001 commented Mar 27, 2024

stbrody commented Mar 27, 2024

smrz2001 commented Mar 27, 2024

stbrody commented Mar 28, 2024 •

edited

Loading

smrz2001 commented Mar 28, 2024 •

edited

Loading

smrz2001 Apr 15, 2024

JulissaDantes Apr 15, 2024

feat: cas scaling #1189

Are you sure you want to change the base?

feat: cas scaling #1189

Conversation

smrz2001 commented Mar 26, 2024 • edited Loading

linear bot commented Mar 26, 2024

stbrody commented Mar 27, 2024 • edited Loading

smrz2001 commented Mar 27, 2024

stbrody commented Mar 27, 2024

smrz2001 commented Mar 27, 2024

stbrody commented Mar 28, 2024 • edited Loading

smrz2001 commented Mar 28, 2024 • edited Loading

smrz2001 Apr 15, 2024

Choose a reason for hiding this comment

JulissaDantes Apr 15, 2024

Choose a reason for hiding this comment

smrz2001 commented Mar 26, 2024 •

edited

Loading

stbrody commented Mar 27, 2024 •

edited

Loading

stbrody commented Mar 28, 2024 •

edited

Loading

smrz2001 commented Mar 28, 2024 •

edited

Loading