-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rdf-connect/ldes-client
#45
base: master
Are you sure you want to change the base?
Conversation
Logs are in line with what the ldes-client uses, which makes for a bit neater output.
app.ts
Outdated
lastVersionOnly: REPLACE_VERSIONS, // Won't emit members if they're known to be older than what is already in the state file | ||
loose: true, // Make this configurable? IPDC needs this to be true | ||
fetch: enhanced_fetch({ | ||
/* In comment are the default values, perhaps we want to make these configurable | ||
concurrent: 10, // Amount of concurrent requests to a single domain | ||
retry: { | ||
codes: [408, 425, 429, 500, 502, 503, 504], // Which faulty HTTP status codes will trigger retry | ||
base: 500, // Seems to be unused in the client code | ||
maxRetries: 5, | ||
}*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments already mention it. but some of these things could be made configurable via environment variables.
export function memberProcessor(): WritableStream<Member> { | ||
const logger = getLoggerFor("member-processor"); | ||
|
||
const processMember = async (member: Member) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now implementing a Web Stream Writable (as opposed to a Node.js Streams WritableStream).
The new client uses the Web Stream API and it made more sense to follow it than to use (the rather confusing) Node.js Streams API (although it would still work).
@@ -122,85 +118,10 @@ export async function executeDeleteQuery (quads: RDF.Quad[]) { | |||
} | |||
} | |||
|
|||
export async function getLatestTimestamp (baseResource: RDF.NamedNode, treeProperties: TreeProperties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Versioning (and only emitting newer versions) now happens by the underlying library, so we don't need to fetch a member's latest timestamp anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still allows some performance gains (fewer inserts) in case you don't have a state file but do already have data (for example when upgrading the consumer), but am fine with keeping things simple for now.
| `LDES_POLLING_INTERVAL` | `60000` | Number of milliseconds before refetching uncacheable fragments | | ||
| `LDES_REQUESTS_PER_MINUTE` | `0` (unlimited) | How many requests per minutes may be sent to the same host. This is optional, but any passed in value must be a positive number. | | ||
| `LDES_ENDPOINT_HEADERS` | `{}` (no headers will be added) | Extra headers that will be added to the requests sent to the LDES endpoint. Recommended syntax:<pre>environment:<br> LDES_ENDPOINT_HEADERS: ><br> { "HEADER-NAME": "header-value" } # The leading whitespace is important!</pre> | | ||
| `SPARQL_ENDPOINT_HEADER_<key>` | N/A | A header key-value combination which should be send as part of the headers to the SPARQL endpoint. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guessing this will also need to mvoe to a json structure, similar to LDES_ENDPOINT_HEADERS.
Now that the client exposes the LDES feed's metadata, we will use it when processing the members (in particular, we will use the versionOf predicate).
If the consuming feed doesn't provide the metadata users of the service can still define the expected paths and this service will use them. Note that `REPLACE_VERSIONS` will not work when providing these because the ldes-client library needs the feed to provide the paths itself, we can't tell it what paths to look for. An alternative would be to re-implement versioning in this service and fallback to it if the feed doesn't provide the paths.
It gets REALLY chatty since every request will be logging this and nothing changes.
New version should fix some memory issues in the library
"safe" mode just means that any time fetch throws an exception, the request is retried in a loop until no exception is thrown. This isn't great and we probably want to just handle retries ourselves instead.
const quadsToRemove: Quad[] = []; | ||
if (REPLACE_VERSIONS) { | ||
if (versionOfPath === undefined) { | ||
throw new Error(`Consumer is configured to replacace versions, but LDES feed did not contain versioning metadata (ldes:versionOfPath).`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here :-)
The aim of this PR is to replace the usage of https://github.com/TREEcg/event-stream-client/tree/main/packages/actor-init-ldes-client with https://github.com/rdf-connect/ldes-client.
The PR is currently in Draft mode because not all features are 100% supported, in fact the version of ldes-client that will be installed is my fork which contains some small but necessary changes to make some faulty LDES feeds get ingested properly.
I will add remarks/notes about the changes in comments on this PR. If any of the remarks should move somewhere (i.e. to the README, comments in code) or should be further worked out, let me know.