Skip to content

Commit

Permalink
0.6.1: Change create_list_for() return, add features & improvements
Browse files Browse the repository at this point in the history
- **BREAKING CHANGE**
  - BEFORE:
    - `create_list_for()` returned a `str` containing the name of the file the program wrote to
  - NOW:
    - `create_list_for()` returns a `tuple` containing
      - a `list` of `list`s containing the video information found by the program for the current run
        - by default, returns dummy video data to avoid cluttering the output
        - to return the actual video data, set the `video_data_returned` ListCreator attribute to `True`
          - dummy data: `[[0, '', '', '']]`
      - a `tuple` containing a `str` with the name of the channel (taken from the channel's heading) and a `str` with the name of the file written to
        - `('The Channel Name', 'the_name_of_the_file')`
        - `('The Channel Name', '')` if the ListCreator attributes are `txt=False`, `csv=False`, `md=False`, AND `video_data_returned=True`
    - see the **NEW FEATURES** section below for more details about `video_data_returned`
  - access the full documentation for the updated `create_list_for` method with `help(ListCreator.create_list_for)` in the python interpreter

- **BUGFIX**
  - fixes `cookie_consent` blocking logic for new HTML in GDPR regions
    - YouTube updated the HTML formatting for blocking cookie consent, and the previous cookie consent blocking logic broke
    - this release fixes the blocking logic to work with the new HTML formatting

- **NEW FEATURES**
  - overview for the new ListCreator attributes given here, but run `help(ListCreator)` in the python interpreter or read the "More API information" section in the python README to see the full documentation:
    - `file_suffix` allows more control over the file naming (`True` by default)
    - `all_video_data_in_memory` scrapes the ENTIRE YouTube channel's videos page, EVEN if files exist for the channel already (`False` by default)
      - must also set the `video_data_returned` attribute to `True` to actually get this information
    - `video_data_returned` returns the video data for all videos the program scraped (`False` by default)
      - data returned depends on a number of factors, see full documentation for more details
    - `video_id_only` saves only the video ID instead of the entire URL (`False` by default)
      - example: saves 'abcdefghijk' instead of 'https://www.youtube.com/watch?v=abcdefghijk'
  - overview for the updated `file_name` argument options in the `create_list_for` method given here, but run `help(ListCreator.create_list_for)` in the python interpreter to see the full documentation:
    - `file_name='auto'` names the output file(s) using the name that shows up under the banner when you navigate to the channel's homepage (with spaces removed)
    - `file_name='id'` names the output file(s) using the identifier from the URL provided to the `url` argument
      - run `help(ListCreator.create_list_for)` for a comprehensive list of examples
      - using `file_name='id'` is very useful when multiple channels have the SAME channel name

- **PERFORMANCE IMPROVEMENTS**
  - BEFORE:
    - the program pulled the video data from the selenium instance and wrote to the file(s) directly
  - NOW:
    - the program loads the video data from the selenium instance into memory, THEN writes the saved video data from memory to the file(s)
      - the performance improvement is more noticeable when writing more information
        - for example:
          - writing information for 200 videos to just a csv file: negligible performance difference between writing to csv file directly and loading to memory & THEN writing to csv file
          - writing information for 200 videos to csv, txt, md files: slight performance difference between writing to files directly and loading to memory & THEN writing to files, but still not much of a performance difference
          - writing information for 20000 videos to just a csv file: noticeable performance difference between writing to csv file directly and loading to memory & THEN writing to csv file
          - writing information for 20000 videos to csv, txt, md files: significant performance difference between writing to to files directly and loading to memory & THEN writing to files
        - summary:
          - the performance difference between writing to ONE file directly and loading to memory & THEN writing to ONE file is barely noticeable for small jobs and more noticeable for larger jobs
          - the performance difference between writing to MULTIPLE files directly and loading to memory & THEN writing to MULTIPLE file is more noticeable for small jobs (compared to writing to only ONE file) and SIGNIFICANT for larger jobs
  - logs from tests used to benchmark performance included below:

- > for https://www.youtube.com/user/schafer5 (small channel, 230 videos)

- writing to 1 file directly with csv=True, txt=False, md=False
- to create the file:
```
It took 9.240757292005583            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.265756259999762            seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
This program took 19.537945401003526 seconds to complete.
```
- to update the file:
```
It took 0.8453300589972059          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 0.6392399440010195          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
This program took 7.754261410002073 seconds to complete.
```

- writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the file:
```
It took 9.163404727999989            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.260267737000007            seconds to load information for 230 videos into memory
It took 0.002389371999996115         seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
This program took 19.483281371000004 seconds to complete.
```
- to update the file:
```
It took 0.8521808300000089          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.0964175420000117          seconds to load information for 60 videos into memory
It took 0.0015745449999826633       seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
This program took 7.985743492000012 seconds to complete.
```

- writing to 3 files directly with csv=True, txt=True, md=True
- to create the files:
```
It took 9.166668037003546            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 10.160974278995127           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt
It took 10.164936708999448           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
It took 10.168633003995637           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md
This program took 25.594990328005224 seconds to complete.
```
- to update the files:
```
It took 0.8503098270011833          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.5225159670007997          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
It took 1.5322243859991431          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt
It took 1.5359413480036892          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md
This program took 8.472728426997492 seconds to complete.
```

- writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the files:
```
It took 9.367390958000005      seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.218187391999997      seconds to load information for 230 videos into memory
It took 0.003894963000000473   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md
It took 0.005060710999998719   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
It took 0.006283445999997639   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt
This program took 18.754924324 seconds to complete.
```
- to update the files:
```
It took 0.8672965029999986          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.0901944209999996          seconds to load information for 60 videos into memory
It took 0.005667658999996661        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
It took 0.008393589000000645        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt
It took 0.008197031000001687        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md
This program took 8.090583961999997 seconds to complete.
```

- > for https://www.youtube.com/c/KhanAcademy (medium channel, 8095 videos)

- writing to 1 file directly with csv=True, txt=False, md=False
- to create the file:
```
It took 322.72226654399856          seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 256.63442500399833          seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
This program took 585.4076739919983 seconds to complete.
```
- to update the file:
```
It took 0.8482559289986966          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 0.5600300389996846          seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
This program took 7.653723870003887 seconds to complete.
```

- writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the file:
```
It took 316.9717323640002       seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 248.92245618300012      seconds to load information for 8095 videos into memory
It took 0.07691853599999376     seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
This program took 572.114162118 seconds to complete.
```
- to update the file:
```
It took 0.8459371520000332          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 0.9670944140000302          seconds to load information for 60 videos into memory
It took 0.02941359300007207         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
This program took 8.209143252000104 seconds to complete.
```

- writing to 3 files directly with csv=True, txt=True, md=True
- to create the files:
```
It took 314.01985485899786          seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 519.1903085960002           seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt
It took 519.1941804189992           seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
It took 519.197644068001            seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md
This program took 839.4073893879977 seconds to complete.
```
- to update the files:
```
It took 0.8488957250010571          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 1.580211615000735           seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
It took 1.681963879003888           seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt
It took 1.6842712280049454          seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md
This program took 8.823843261001457 seconds to complete.
```

- writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the files:
```
It took 316.342601403           seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 261.87072707100003      seconds to load information for 8095 videos into memory
It took 0.1363127509999913      seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
It took 0.1775351439999895      seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md
It took 0.18588107000005039     seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt
This program took 584.703847726 seconds to complete.
```
- to update the files:
```
It took 0.8483775499998956          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 1.0671216570001434          seconds to load information for 60 videos into memory
It took 0.17331316700006028         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
It took 0.22995445900005507         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt
It took 0.23345572800008085         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md
This program took 8.503321469999833 seconds to complete.
```

- > for https://www.youtube.com/user/NBCNews/videos (large channel, ~32550 videos)

- writing to 1 file directly with csv=True, txt=False, md=False
- to create the file:
```
It took 3420.0639533489993          seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos
It took 4988.648231769999           seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 8414.909623333002 seconds to complete.
```
- to update the file:
```
```

- writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the file:
```
It took 3367.386001154002       seconds to find 32357 videos from https://www.youtube.com/user/NBCNews/videos
It took 4880.191474030002       seconds to load information for 32357 videos into memory
It took 0.24478799300050014     seconds to write all 32357 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 8253.73690525 seconds to complete.
```
- to update the file:
```
It took 0.8474488579995523          seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.1012943870009622          seconds to load information for 60 videos into memory
It took 0.11654774600174278         seconds to write the 5 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
This program took 8.668505469999218 seconds to complete.
```

- writing to 3 files directly with csv=True, txt=True, md=True
- to create the files:
```
It took 3396.025502143               seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos
It took 7683.585577874001            seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.txt
It took 7683.592947972               seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.md
It took 7684.030176524999            seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 11086.336240618999 seconds to complete.
```
- to update the files:
```
It took 0.8738655359993572          seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.8775347520004289          seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 2.120259861001614           seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt
It took 2.132926509999379           seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md
This program took 9.435579917999348 seconds to complete.
```

- writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the files:
```
It took 3478.1540728540003          seconds to find 32353 videos from https://www.youtube.com/user/NBCNews/videos
It took 5022.493407319              seconds to load information for 32353 videos into memory
It took 0.5065521739998076          seconds to write the 6 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 0.587243801997829           seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.txt
It took 0.6058889249979984          seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.md
This program took 8507.703900004002 seconds to complete.
```
- to update the files:
```
It took 0.8569685050024418         seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.1060196290018212         seconds to load information for 60 videos into memory
It took 0.5880495099991094         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 0.8386826800015115         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt
It took 0.8496009250011411         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md
This program took 9.45503293100046 seconds to complete.
```
  • Loading branch information
shailshouryya committed Sep 7, 2021
1 parent 65d925b commit f8ca4a6
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# General Overview

#### See the [releases](https://github.com/slow-but-steady/yt-videos-list/releases) page to see new additions/modifications for each release!
#### See this [comparison](https://github.com/slow-but-steady/yt-videos-list/compare/v0.6.0...main) page to see new additions/modifications that will be available in the NEXT release!
#### See this [comparison](https://github.com/slow-but-steady/yt-videos-list/compare/v0.6.1...main) page to see new additions/modifications that will be available in the NEXT release!

<details>
<summary><b>See sister <a href="https://github.com/slow-but-steady/YouTube-Channels">YouTube-Channels</a> repository for a list of interesting channels!</b></summary></h3>
Expand Down
2 changes: 1 addition & 1 deletion python/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Python Quick Start

#### See the [releases](https://github.com/slow-but-steady/yt-videos-list/releases) page to see new additions/modifications for each release!
#### See this [comparison](https://github.com/slow-but-steady/yt-videos-list/compare/v0.6.0...main) page to see new additions/modifications that will be available in the NEXT release!
#### See this [comparison](https://github.com/slow-but-steady/yt-videos-list/compare/v0.6.1...main) page to see new additions/modifications that will be available in the NEXT release!

<details>
<summary><b>See sister <a href="https://github.com/slow-but-steady/YouTube-Channels">YouTube-Channels</a> repository for a list of interesting channels!</b></summary></h3>
Expand Down
2 changes: 1 addition & 1 deletion python/dev/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from .custom_logger import log


__version__ = '0.6.0'
__version__ = '0.6.1'
__author__ = 'slow-but-steady'
__email__ = '[email protected]'
__development_status__ = '4 - Beta'
Expand Down
2 changes: 1 addition & 1 deletion python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

setup(
name = 'yt_videos_list',
version = '0.6.0',
version = '0.6.1',
description = 'YouTube bot to make a YouTube videos list (including all video titles and URLs uploaded by a channel) with end-to-end web scraping - no API tokens required. 🌟 Star this repo if you found it useful! 🌟',
long_description = long_description,
long_description_content_type = 'text/markdown',
Expand Down
2 changes: 1 addition & 1 deletion python/yt_videos_list/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from .custom_logger import log


__version__ = '0.6.0'
__version__ = '0.6.1'
__author__ = 'slow-but-steady'
__email__ = '[email protected]'
__development_status__ = '4 - Beta'
Expand Down

0 comments on commit f8ca4a6

Please sign in to comment.