Skip to content

Conversation

@alex-zywicki
Copy link

This PR contains:

A test to replicate #7444
A fix for #7444 (TODO)

Describe the problem you have without this PR

Replicating a deleted doc from CouchDB with schema validation fails when your schema has required fields.
This happens because CouchDB does not send the full document over the change-feed on delete only the _id, _rev, and _deleted fields.
#7444

Todos

  • Tests

@alex-zywicki
Copy link
Author

alex-zywicki commented Oct 9, 2025

I have done some experimentation at potential fixes:

// src/plugins/replication-couchdb/couchdb-helper.ts
export async function couchDBDocToRxDocData<RxDocType>(
    collection: RxCollection<RxDocType, unknown, unknown, unknown>,
    couchDocData: any
): Promise<WithDeleted<RxDocType>> {
    const primaryKey = collection.schema.primaryPath;
    let doc = couchSwapIdToPrimary(primaryKey as any, couchDocData);
    // ensure deleted flag is set.
    doc._deleted = !!doc._deleted;

    delete doc._rev;
    if (doc._deleted !== true) {
        return doc;
    }

    const found = await collection.findOne(doc[primaryKey]).exec();
    if (found === null) {
        return doc;
    }
    // no need to worry about a deep merge since doc only contains _id, _rev, _deleted in this case
    return Object.assign(doc, found.toMutableJSON())
}

I tried this, but it is looking like there is an async issue going on as the collection seems to be empty at the point where I am observing the deletion (found comes back null).

I don't know RxDB's internals well enough to comment on why that is the case, but my guess would be that in the test case I am developing against the document create and delete are happening in such short succession that the state has not reached the storage completely before the delete comes in.

Assuming that is the case I'm not sure this approach would work without either going back to couch for the previous revision (slow and unreliable) or keeping some sort of cache in the replication plugin itself (waste of space, duplicate data).

The only other approach I can think of would be to tweak the validation plugins themselves to consider deleted documents to be valid or to parameterize that as an option. I tried this and it does work, but I suspect doing so by default would have other repercussions on the system.

@pubkey what are your thoughts?

@pubkey
Copy link
Owner

pubkey commented Oct 20, 2025

Hi @alex-zywicki
Sorry for the late reply.

Tweaking the validation is not an option for me. We should "fill up" the deleted document data so that it matches the normal schema. This fill-up-data can either be retrieved from the local store or by making an additional couchdb-request.
Both of these options are ok for me.

@alex-zywicki
Copy link
Author

Hi @alex-zywicki Sorry for the late reply.

Tweaking the validation is not an option for me. We should "fill up" the deleted document data so that it matches the normal schema. This fill-up-data can either be retrieved from the local store or by making an additional couchdb-request. Both of these options are ok for me.

What would the "local store" be in this case? I assume you are referring to the underlying storage?

Going back to couch for the data is not a truly viable option as couch is not required to keep previous revisions of documents at all times, so things like compaction can wipe out the revision you need.

I tried going to the collection to get the data, but I hit a case where if you do a create followed by a delete in quick succession the initial create will not have reached the storage before you go to process the delete resulting in a failure to find the doc from the collection. The only way I can think to handle that would be for the couch replication to hold it's own cache in memory.

That in memory cache could be a simple js Map object or even could take the form of a "shadow db" using the in memory storage option. The primary issue is that both of there options would functionally double the required storage.

I suppose it would be possible to write a cache such that it evicts items that it knows have already made it to storage, but I'm not sure what the correct strategy for maintaining that over time would be.

@pubkey
Copy link
Owner

pubkey commented Oct 23, 2025

The metadata of a replication already contains the assumed-server-state. Maybe we can use that one? Otherwise I see no other way then fetching it from the couchdb server while assumeing it will always be there at that point in time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants