You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's now possible for a Source to be deleted in the web-search directory.
There are on the order of 50K unique sources with feeds in the rss-fetcher db.
Having the rss-fetcher check one per second, would take13 hours. At one query per minute it would be 34 days. To go thru everything every two days, that would be 1041 queries per hour, or 17 queries per minute. I can (off hand) think of three ways to make this less painful (in order of pain for implementation on the rss-fetcher side), all may require work in the mcweb API/db:
Implement an endpoint where the rss-fetcher can present a list of source ids, and get back a list of which ones are valid. This should be doable in a single query to the mcweb-db SELECT id WHERE id IN list_of_ids_to_validate
Keep track of deleted sources in the mcweb-db (a separate table of ids would be fine) that the rss-fetcher can fetch (should never be terribly large) -- keeping the sources in place and marking them deleted would have worked here, but would require all normal queries to filter out deleted entries. My (limited) past experience is that this is pretty normal.
Add a web-search API endpoint to download a CSV of all sources.
Have the rss-fetcher page thru the sources. NOTE: the current source_id space is 1 thru 1.9million (1900 pages of 1000), tho the rss-fetcher can optimize this by only fetching pages that start with a previously unchecked id (ie; keeping a cursor of the last id checked).
The text was updated successfully, but these errors were encountered:
It's now possible for a Source to be deleted in the web-search directory.
There are on the order of 50K unique sources with feeds in the rss-fetcher db.
Having the rss-fetcher check one per second, would take13 hours. At one query per minute it would be 34 days. To go thru everything every two days, that would be 1041 queries per hour, or 17 queries per minute. I can (off hand) think of three ways to make this less painful (in order of pain for implementation on the rss-fetcher side), all may require work in the mcweb API/db:
The text was updated successfully, but these errors were encountered: