-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError when using follow_urls = True on run() #369
Comments
Thanks for your report. This is a limitation with JSON support because you cannot just append to the JSON file without breaking the format (and re-reading, appending and saving JSON is expensive as it grows). What you want here is JSON Lines (https://jsonlines.org/examples/). You can use the example in https://github.com/roniemartinez/dude/blob/master/examples/save_per_page.py as a reference for JSONL custom storage. Don't forget to add
|
Thanks for the fast reply, Ronie. I agree with the JSON limitation, the issue here, as far as I understood it, is that |
Yes, I intended to put that in there as a safe-guard since, by default (and also as a limitation), all data are saved to memory unless
I am open to new PRs for this (and since I am not using |
How to reproduce:
Error location:
https://github.com/roniemartinez/dude/blob/53d53c2bd840ea52fc341089313f122735dd6ab4/dude/base.py#LL65C13-L65C40
Error origin:
https://github.com/roniemartinez/dude/blob/53d53c2bd840ea52fc341089313f122735dd6ab4/dude/scraper.py#LL96C44-L96C55
The text was updated successfully, but these errors were encountered: