Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rdf-connect/ldes-client #45

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

sergiofenoll
Copy link
Collaborator

@sergiofenoll sergiofenoll commented Nov 8, 2024

The aim of this PR is to replace the usage of https://github.com/TREEcg/event-stream-client/tree/main/packages/actor-init-ldes-client with https://github.com/rdf-connect/ldes-client.

The PR is currently in Draft mode because not all features are 100% supported, in fact the version of ldes-client that will be installed is my fork which contains some small but necessary changes to make some faulty LDES feeds get ingested properly.

I will add remarks/notes about the changes in comments on this PR. If any of the remarks should move somewhere (i.e. to the README, comments in code) or should be further worked out, let me know.

app.ts Show resolved Hide resolved
app.ts Outdated
Comment on lines 42 to 51
lastVersionOnly: REPLACE_VERSIONS, // Won't emit members if they're known to be older than what is already in the state file
loose: true, // Make this configurable? IPDC needs this to be true
fetch: enhanced_fetch({
/* In comment are the default values, perhaps we want to make these configurable
concurrent: 10, // Amount of concurrent requests to a single domain
retry: {
codes: [408, 425, 429, 500, 502, 503, 504], // Which faulty HTTP status codes will trigger retry
base: 500, // Seems to be unused in the client code
maxRetries: 5,
}*/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments already mention it. but some of these things could be made configurable via environment variables.

lib/logger.ts Outdated Show resolved Hide resolved
export function memberProcessor(): WritableStream<Member> {
const logger = getLoggerFor("member-processor");

const processMember = async (member: Member) => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now implementing a Web Stream Writable (as opposed to a Node.js Streams WritableStream).
The new client uses the Web Stream API and it made more sense to follow it than to use (the rather confusing) Node.js Streams API (although it would still work).

lib/member-processor.ts Outdated Show resolved Hide resolved
@@ -122,85 +118,10 @@ export async function executeDeleteQuery (quads: RDF.Quad[]) {
}
}

export async function getLatestTimestamp (baseResource: RDF.NamedNode, treeProperties: TreeProperties) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versioning (and only emitting newer versions) now happens by the underlying library, so we don't need to fetch a member's latest timestamp anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still allows some performance gains (fewer inserts) in case you don't have a state file but do already have data (for example when upgrading the consumer), but am fine with keeping things simple for now.

README.md Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
| `LDES_POLLING_INTERVAL` | `60000` | Number of milliseconds before refetching uncacheable fragments |
| `LDES_REQUESTS_PER_MINUTE` | `0` (unlimited) | How many requests per minutes may be sent to the same host. This is optional, but any passed in value must be a positive number. |
| `LDES_ENDPOINT_HEADERS` | `{}` (no headers will be added) | Extra headers that will be added to the requests sent to the LDES endpoint. Recommended syntax:<pre>environment:<br> LDES_ENDPOINT_HEADERS: ><br> { "HEADER-NAME": "header-value" } # The leading whitespace is important!</pre> |
| `SPARQL_ENDPOINT_HEADER_<key>` | N/A | A header key-value combination which should be send as part of the headers to the SPARQL endpoint. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guessing this will also need to mvoe to a json structure, similar to LDES_ENDPOINT_HEADERS.

Now that the client exposes the LDES feed's metadata, we will use it
when processing the members (in particular, we will use the versionOf
predicate).
If the consuming feed doesn't provide the metadata users of the service
can still define the expected paths and this service will use them.

Note that `REPLACE_VERSIONS` will not work when providing these
because the ldes-client library needs the feed to provide the paths itself,
we can't tell it what paths to look for. An alternative would be to
re-implement versioning in this service and fallback to it if the feed
doesn't provide the paths.
It gets REALLY chatty since every request will be logging this and
nothing changes.
New version should fix some memory issues in the library
"safe" mode just means that any time fetch throws an exception, the
request is retried in a loop until no exception is thrown. This isn't great
and we probably want to just handle retries ourselves instead.
@nvdk nvdk marked this pull request as ready for review January 27, 2025 11:50
const quadsToRemove: Quad[] = [];
if (REPLACE_VERSIONS) {
if (versionOfPath === undefined) {
throw new Error(`Consumer is configured to replacace versions, but LDES feed did not contain versioning metadata (ldes:versionOfPath).`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants