perf: Asynchronously dispatch requests in groups #10

alexandreteles · 2024-03-08T02:00:35Z

This small rewrite uses async to dispatch requests in groups of five with a small delay of sleep: float = random.uniform(1, 3) on each dispatch. This should result in faster execution than dispatching requests in a synchronous way while introducing some entropy to not scare YouTube too much.

I cannot test it myself, so I would be glad if you could check it out @oSumAtrIX.

Thank you!

EDIT: it also introduces a retry option that tries to execute the mark_watched operation three times before giving up on that specific video. I did not introduce a global failure count, but this should be trivial if the current code works.

This reverts commit 32e0dd6.

indrastorms · 2024-03-08T05:14:59Z

File "/data/data/com.termux/files/home/restore-missing-youtube-watch-history/main.py", line 106, in main
    kept: list[dict[str, Any]] = await filter_video_events(data, RESUME_TIMESTAMP)
                                 ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object async_generator can't be used in 'await' expression

indrastorms · 2024-03-08T16:11:21Z

main.py

+                        break
+
+            await asyncio.sleep(random.uniform(1, 3))
+            logger.info(f"Processed URL: {url}.")


Add counter

If you can, contribute that change to the PR. Thank you!

I don't know how to do it. Maybe a task_done callback function can do. Will it also count the errors?

alexandreteles · 2024-03-08T16:13:39Z

File "/data/data/com.termux/files/home/restore-missing-youtube-watch-history/main.py", line 106, in main
    kept: list[dict[str, Any]] = await filter_video_events(data, RESUME_TIMESTAMP)
                                 ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object async_generator can't be used in 'await' expression

Fixed the issue, that is what I get for writing code without testing. Anyway, against my better judgment I have tested the script using my own account. The new execution logic should also pull new videos to process as soon as more space is available in the semaphore instead of waiting for the whole batch to finish. Every video will still have a random asyncio.sleep() to introduce some entropy. Default concurrency is still five requests at the same time, but that can be controlled with --concurrency.

I've also added a check to not process the same video multiple times by checking the video URL against a log file.

Would you be kind enough to test it again?

indrastorms · 2024-03-08T16:18:01Z

It's working fine, thanks to your async contribution its super fast now.

alexandreteles · 2024-03-08T16:25:52Z

It's working fine, thanks to your async contribution its super fast now.

@oSumAtrIX Can you PR a fix to the readme that includes these changes? I will be a bit busy today so I'm not sure I'll be able to write it.

oSumAtrIX · 2024-03-09T03:33:03Z

@alexandreteles What changes to the readme are necessary?

alexandreteles · 2024-03-09T03:35:30Z

Some of the command line arguments are gone and we have a new one called concurrency that allows you to tell how many connections the app will do at the same time. That's about it.

oSumAtrIX · 2024-03-10T13:51:23Z

main.py

+                "time": time,
+            } if header != "YouTube" or time < RESUME_TIMESTAMP:
+                return False
+            case {"details": [{"name": "From Google Ads"}]}:


Apparently this is necessary according to #13 (comment)

I can't currently verify that. Would you be able to do so?

Which is necessary according to my issue entry?
Just remember the example I included there in #13 is from "My Activity.JSON" with only "Ads" selected as export.
Those entries do not show in watched_history.JSON at all so referencing #13 should be redundant.

I don't know if other entries in watched_history.JSON need those checks though.
But either way, the comment you referenced is saying the entries with "From Google Ads" are actually LEGITIMATE watch history that was scrubbed due to the changes, not ones to be omitted.

Mr-HaleYa · 2024-03-11T10:10:55Z

Tqdm needs to be installed to run. Should this be in the requirements file?

alexandreteles · 2024-03-12T18:00:18Z

Tqdm needs to be installed to run. Should this be in the requirements file?

That's in a different PR 😅

alexandreteles added 7 commits March 7, 2024 22:14

feat: async runner

a8ef265

feat: add URL processing tracking

5fb6531

chore: update deps

187ea7a

feat: remove shorts from processing

8096636

fix: fix file operations

32e0dd6

Revert "fix: fix file operations"

b14b81f

This reverts commit 32e0dd6.

fix: fix file operations

11cb1a9

alexandreteles requested a review from oSumAtrIX March 8, 2024 02:00

alexandreteles self-assigned this Mar 8, 2024

fix: fix async generators and improve execution logic

c6d31ca

indrastorms reviewed Mar 8, 2024

View reviewed changes

alexandreteles requested a review from indrastorms March 8, 2024 16:14

oSumAtrIX changed the title ~~feat: async runner~~ perf: Use async to dispatch requests in groups Mar 9, 2024

oSumAtrIX changed the title ~~perf: Use async to dispatch requests in groups~~ perf: Asynchronously dispatch requests in groups Mar 9, 2024

oSumAtrIX reviewed Mar 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Asynchronously dispatch requests in groups #10

perf: Asynchronously dispatch requests in groups #10

alexandreteles commented Mar 8, 2024 •

edited

Loading

indrastorms commented Mar 8, 2024 •

edited

Loading

indrastorms Mar 8, 2024

alexandreteles Mar 8, 2024

indrastorms Mar 8, 2024

alexandreteles commented Mar 8, 2024 •

edited

Loading

indrastorms commented Mar 8, 2024

alexandreteles commented Mar 8, 2024

oSumAtrIX commented Mar 9, 2024

alexandreteles commented Mar 9, 2024

oSumAtrIX Mar 10, 2024

alexandreteles Mar 10, 2024

jmorgannz Mar 10, 2024 •

edited

Loading

Mr-HaleYa commented Mar 11, 2024

alexandreteles commented Mar 12, 2024

perf: Asynchronously dispatch requests in groups #10

Are you sure you want to change the base?

perf: Asynchronously dispatch requests in groups #10

Conversation

alexandreteles commented Mar 8, 2024 • edited Loading

indrastorms commented Mar 8, 2024 • edited Loading

indrastorms Mar 8, 2024

Choose a reason for hiding this comment

alexandreteles Mar 8, 2024

Choose a reason for hiding this comment

indrastorms Mar 8, 2024

Choose a reason for hiding this comment

alexandreteles commented Mar 8, 2024 • edited Loading

indrastorms commented Mar 8, 2024

alexandreteles commented Mar 8, 2024

oSumAtrIX commented Mar 9, 2024

alexandreteles commented Mar 9, 2024

oSumAtrIX Mar 10, 2024

Choose a reason for hiding this comment

alexandreteles Mar 10, 2024

Choose a reason for hiding this comment

jmorgannz Mar 10, 2024 • edited Loading

Choose a reason for hiding this comment

Mr-HaleYa commented Mar 11, 2024

alexandreteles commented Mar 12, 2024

alexandreteles commented Mar 8, 2024 •

edited

Loading

indrastorms commented Mar 8, 2024 •

edited

Loading

alexandreteles commented Mar 8, 2024 •

edited

Loading

jmorgannz Mar 10, 2024 •

edited

Loading