-
-
Notifications
You must be signed in to change notification settings - Fork 6
perf: Asynchronously dispatch requests in groups #10
base: main
Are you sure you want to change the base?
Conversation
|
break | ||
|
||
await asyncio.sleep(random.uniform(1, 3)) | ||
logger.info(f"Processed URL: {url}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can, contribute that change to the PR. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to do it. Maybe a task_done callback function can do. Will it also count the errors?
Fixed the issue, that is what I get for writing code without testing. Anyway, against my better judgment I have tested the script using my own account. The new execution logic should also pull new videos to process as soon as more space is available in the semaphore instead of waiting for the whole batch to finish. Every video will still have a random I've also added a check to not process the same video multiple times by checking the video URL against a log file. Would you be kind enough to test it again? |
It's working fine, thanks to your async contribution its super fast now. |
@oSumAtrIX Can you PR a fix to the readme that includes these changes? I will be a bit busy today so I'm not sure I'll be able to write it. |
@alexandreteles What changes to the readme are necessary? |
Some of the command line arguments are gone and we have a new one called |
"time": time, | ||
} if header != "YouTube" or time < RESUME_TIMESTAMP: | ||
return False | ||
case {"details": [{"name": "From Google Ads"}]}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this is necessary according to #13 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't currently verify that. Would you be able to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is necessary according to my issue entry?
Just remember the example I included there in #13 is from "My Activity.JSON" with only "Ads" selected as export.
Those entries do not show in watched_history.JSON at all so referencing #13 should be redundant.
I don't know if other entries in watched_history.JSON need those checks though.
But either way, the comment you referenced is saying the entries with "From Google Ads" are actually LEGITIMATE watch history that was scrubbed due to the changes, not ones to be omitted.
Tqdm needs to be installed to run. Should this be in the requirements file? |
That's in a different PR 😅 |
This small rewrite uses async to dispatch requests in groups of five with a small delay of
sleep: float = random.uniform(1, 3)
on each dispatch. This should result in faster execution than dispatching requests in a synchronous way while introducing some entropy to not scare YouTube too much.I cannot test it myself, so I would be glad if you could check it out @oSumAtrIX.
Thank you!
EDIT: it also introduces a retry option that tries to execute the
mark_watched
operation three times before giving up on that specific video. I did not introduce a global failure count, but this should be trivial if the current code works.