Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
0.6.1: Change
create_list_for()
return, add features & improvements
- **BREAKING CHANGE** - BEFORE: - `create_list_for()` returned a `str` containing the name of the file the program wrote to - NOW: - `create_list_for()` returns a `tuple` containing - a `list` of `list`s containing the video information found by the program for the current run - by default, returns dummy video data to avoid cluttering the output - to return the actual video data, set the `video_data_returned` ListCreator attribute to `True` - dummy data: `[[0, '', '', '']]` - a `tuple` containing a `str` with the name of the channel (taken from the channel's heading) and a `str` with the name of the file written to - `('The Channel Name', 'the_name_of_the_file')` - `('The Channel Name', '')` if the ListCreator attributes are `txt=False`, `csv=False`, `md=False`, AND `video_data_returned=True` - see the **NEW FEATURES** section below for more details about `video_data_returned` - access the full documentation for the updated `create_list_for` method with `help(ListCreator.create_list_for)` in the python interpreter - **BUGFIX** - fixes `cookie_consent` blocking logic for new HTML in GDPR regions - YouTube updated the HTML formatting for blocking cookie consent, and the previous cookie consent blocking logic broke - this release fixes the blocking logic to work with the new HTML formatting - **NEW FEATURES** - overview for the new ListCreator attributes given here, but run `help(ListCreator)` in the python interpreter or read the "More API information" section in the python README to see the full documentation: - `file_suffix` allows more control over the file naming (`True` by default) - `all_video_data_in_memory` scrapes the ENTIRE YouTube channel's videos page, EVEN if files exist for the channel already (`False` by default) - must also set the `video_data_returned` attribute to `True` to actually get this information - `video_data_returned` returns the video data for all videos the program scraped (`False` by default) - data returned depends on a number of factors, see full documentation for more details - `video_id_only` saves only the video ID instead of the entire URL (`False` by default) - example: saves 'abcdefghijk' instead of 'https://www.youtube.com/watch?v=abcdefghijk' - overview for the updated `file_name` argument options in the `create_list_for` method given here, but run `help(ListCreator.create_list_for)` in the python interpreter to see the full documentation: - `file_name='auto'` names the output file(s) using the name that shows up under the banner when you navigate to the channel's homepage (with spaces removed) - `file_name='id'` names the output file(s) using the identifier from the URL provided to the `url` argument - run `help(ListCreator.create_list_for)` for a comprehensive list of examples - using `file_name='id'` is very useful when multiple channels have the SAME channel name - **PERFORMANCE IMPROVEMENTS** - BEFORE: - the program pulled the video data from the selenium instance and wrote to the file(s) directly - NOW: - the program loads the video data from the selenium instance into memory, THEN writes the saved video data from memory to the file(s) - the performance improvement is more noticeable when writing more information - for example: - writing information for 200 videos to just a csv file: negligible performance difference between writing to csv file directly and loading to memory & THEN writing to csv file - writing information for 200 videos to csv, txt, md files: slight performance difference between writing to files directly and loading to memory & THEN writing to files, but still not much of a performance difference - writing information for 20000 videos to just a csv file: noticeable performance difference between writing to csv file directly and loading to memory & THEN writing to csv file - writing information for 20000 videos to csv, txt, md files: significant performance difference between writing to to files directly and loading to memory & THEN writing to files - summary: - the performance difference between writing to ONE file directly and loading to memory & THEN writing to ONE file is barely noticeable for small jobs and more noticeable for larger jobs - the performance difference between writing to MULTIPLE files directly and loading to memory & THEN writing to MULTIPLE file is more noticeable for small jobs (compared to writing to only ONE file) and SIGNIFICANT for larger jobs - logs from tests used to benchmark performance included below: - > for https://www.youtube.com/user/schafer5 (small channel, 230 videos) - writing to 1 file directly with csv=True, txt=False, md=False - to create the file: ``` It took 9.240757292005583 seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos It took 4.265756259999762 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv This program took 19.537945401003526 seconds to complete. ``` - to update the file: ``` It took 0.8453300589972059 seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos It took 0.6392399440010195 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv This program took 7.754261410002073 seconds to complete. ``` - writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True - to create the file: ``` It took 9.163404727999989 seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos It took 4.260267737000007 seconds to load information for 230 videos into memory It took 0.002389371999996115 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv This program took 19.483281371000004 seconds to complete. ``` - to update the file: ``` It took 0.8521808300000089 seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos It took 1.0964175420000117 seconds to load information for 60 videos into memory It took 0.0015745449999826633 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv This program took 7.985743492000012 seconds to complete. ``` - writing to 3 files directly with csv=True, txt=True, md=True - to create the files: ``` It took 9.166668037003546 seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos It took 10.160974278995127 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt It took 10.164936708999448 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv It took 10.168633003995637 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md This program took 25.594990328005224 seconds to complete. ``` - to update the files: ``` It took 0.8503098270011833 seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos It took 1.5225159670007997 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv It took 1.5322243859991431 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt It took 1.5359413480036892 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md This program took 8.472728426997492 seconds to complete. ``` - writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True - to create the files: ``` It took 9.367390958000005 seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos It took 4.218187391999997 seconds to load information for 230 videos into memory It took 0.003894963000000473 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md It took 0.005060710999998719 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv It took 0.006283445999997639 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt This program took 18.754924324 seconds to complete. ``` - to update the files: ``` It took 0.8672965029999986 seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos It took 1.0901944209999996 seconds to load information for 60 videos into memory It took 0.005667658999996661 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv It took 0.008393589000000645 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt It took 0.008197031000001687 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md This program took 8.090583961999997 seconds to complete. ``` - > for https://www.youtube.com/c/KhanAcademy (medium channel, 8095 videos) - writing to 1 file directly with csv=True, txt=False, md=False - to create the file: ``` It took 322.72226654399856 seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos It took 256.63442500399833 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv This program took 585.4076739919983 seconds to complete. ``` - to update the file: ``` It took 0.8482559289986966 seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos It took 0.5600300389996846 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv This program took 7.653723870003887 seconds to complete. ``` - writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True - to create the file: ``` It took 316.9717323640002 seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos It took 248.92245618300012 seconds to load information for 8095 videos into memory It took 0.07691853599999376 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv This program took 572.114162118 seconds to complete. ``` - to update the file: ``` It took 0.8459371520000332 seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos It took 0.9670944140000302 seconds to load information for 60 videos into memory It took 0.02941359300007207 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv This program took 8.209143252000104 seconds to complete. ``` - writing to 3 files directly with csv=True, txt=True, md=True - to create the files: ``` It took 314.01985485899786 seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos It took 519.1903085960002 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt It took 519.1941804189992 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv It took 519.197644068001 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md This program took 839.4073893879977 seconds to complete. ``` - to update the files: ``` It took 0.8488957250010571 seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos It took 1.580211615000735 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv It took 1.681963879003888 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt It took 1.6842712280049454 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md This program took 8.823843261001457 seconds to complete. ``` - writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True - to create the files: ``` It took 316.342601403 seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos It took 261.87072707100003 seconds to load information for 8095 videos into memory It took 0.1363127509999913 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv It took 0.1775351439999895 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md It took 0.18588107000005039 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt This program took 584.703847726 seconds to complete. ``` - to update the files: ``` It took 0.8483775499998956 seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos It took 1.0671216570001434 seconds to load information for 60 videos into memory It took 0.17331316700006028 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv It took 0.22995445900005507 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt It took 0.23345572800008085 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md This program took 8.503321469999833 seconds to complete. ``` - > for https://www.youtube.com/user/NBCNews/videos (large channel, ~32550 videos) - writing to 1 file directly with csv=True, txt=False, md=False - to create the file: ``` It took 3420.0639533489993 seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos It took 4988.648231769999 seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv This program took 8414.909623333002 seconds to complete. ``` - to update the file: ``` ``` - writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True - to create the file: ``` It took 3367.386001154002 seconds to find 32357 videos from https://www.youtube.com/user/NBCNews/videos It took 4880.191474030002 seconds to load information for 32357 videos into memory It took 0.24478799300050014 seconds to write all 32357 videos to NBCNews_reverse_chronological_videos_list.csv This program took 8253.73690525 seconds to complete. ``` - to update the file: ``` It took 0.8474488579995523 seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos It took 1.1012943870009622 seconds to load information for 60 videos into memory It took 0.11654774600174278 seconds to write the 5 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv This program took 8.668505469999218 seconds to complete. ``` - writing to 3 files directly with csv=True, txt=True, md=True - to create the files: ``` It took 3396.025502143 seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos It took 7683.585577874001 seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.txt It took 7683.592947972 seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.md It took 7684.030176524999 seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv This program took 11086.336240618999 seconds to complete. ``` - to update the files: ``` It took 0.8738655359993572 seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos It took 1.8775347520004289 seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv It took 2.120259861001614 seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt It took 2.132926509999379 seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md This program took 9.435579917999348 seconds to complete. ``` - writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True - to create the files: ``` It took 3478.1540728540003 seconds to find 32353 videos from https://www.youtube.com/user/NBCNews/videos It took 5022.493407319 seconds to load information for 32353 videos into memory It took 0.5065521739998076 seconds to write the 6 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv It took 0.587243801997829 seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.txt It took 0.6058889249979984 seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.md This program took 8507.703900004002 seconds to complete. ``` - to update the files: ``` It took 0.8569685050024418 seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos It took 1.1060196290018212 seconds to load information for 60 videos into memory It took 0.5880495099991094 seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv It took 0.8386826800015115 seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt It took 0.8496009250011411 seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md This program took 9.45503293100046 seconds to complete. ```
- Loading branch information