SeaSalt Downloder is a modular downloader for any site you can make a module for.
- Supports custom websites
- Supports custom filters
- Supports custom saving methods
- Synchronous and Asynchronous download
- Scrapers
- Danbooru (
danbooru) - Safebooru (
safebooru)
- Danbooru (
- Filters
- No Filter (
no_filter) - Tag filter (
tag_filter)
- No Filter (
- Preprocessors
- Crop to aspect ratio (
crop_aspect) - Crop to aspect ratio and resize (
crop_aspect_resize) - Resize image by stretching (
resize)
- Crop to aspect ratio (
- Savers
- Save to folder (
folder)
- Save to folder (
python main.py -u <url> \
--scraper <scraper name> optional<scraper args> \
--filter <filter name> optional<filter args> \
--saver <saver name> optional<saver args>
For example, to scrape all images of Shirakami Fubuki from Safebooru, while filtering out any posts that have the 1girl tag, and saving to a folder called Fubuki with friends while discarding metadata:
python main.py -u "https://safebooru.org/index.php?page=post&s=list&tags=shirakami_fubuki+" \
--scraper safebooru \
--filter tag_filter 1girl \
--saver folder "Fubuki with friends"
SeaSalt also supports a powerful preprocessing feature. For example, you can crop all of your images to a square with
--preproc crop_aspect 1 1
The first number is the aspect ratio width, and the second the height. For example, you could crop to 16:9 with
--preproc crop_aspect 16 9
There is also an option to resize images after cropping. The following crops and resizes to a 512x512 square
--preproc crop_aspect_resize 1 1 512
512 is the width, and the height is calculated from the width
SeaSalt supports parallel downloading.
To use parallel downloading, pass the --parallel arg
By default, it will set a batch size (the number of pages to download before distributing the tasks among the workers) to 10.
The default number of threads to use is equal to the CPU cores.
You can change this with the --batch_size and --threads arguments.
Each scraper must be a python file that contains a class Scraper
The scraper class must implement the following methods
def get_posts(self, url, parallel, args):
# URL is a page containing posts
# parallel is if the user is parallel downloading
# args are the optional arguments provided by the user
# Must return a list of URLs to posts on the page
def get_post(self, url, parallel, args):
# URL is a URL to a post
# parallel is if the user is parallel downloading
# args are the optional arguments provided by the user
# Return a tuple in which t[0] is a stream to the image
# and t[1] is the image metadata
#
# The metadata must be a dictionary containing the following items:
# 'image_name', the file name (no extension)
# 'ext', the file extension
# 'tags', the tags for the image
# Other metadata can be added to the dictionary, but it is not
# guaranteed support by existing modules
def next_page(self, url, parallel, args):
# URL is a URL to the current page
# parallel is if the user is parallel downloading
# args are the optional arguments provided by the user
# Return a URL to the next pageYou are free to implement variables, imports, and other functions into your Scraper class as well
Each filter must be a python file the contains a class Filt
The Filt class must implement a method named filt
def filt(self, image, meta, args):
# image is an image stream
# meta is the image metadata (as defined in the scraper section)
# args are the optional arguments provided by the user
# return a tuple containing (image, meta)
# or return (None, None) to filter out the imageYou are free to implement variables, imports, and other functions into your Filt class as well
Each saver must be a python file that contains a class Save
Your save class must implement a method named save
class Saver:
def save(self, image, meta, args):
# image is a stream to the image
# meta is the image metadata (as defined in the scraper section)
# args are the optional arguments provided by the userYou are free to implement variables, imports, and other functions into your Saver class as well
Each preprocessor must be a python file that contains a class Processor
Your Processor class must implement a method named process
class Processor:
def save(self, image, meta, args):
# image is a stream to the image
# meta is the image metadata (as defined in the scraper section)
# args are the optional arguments provided by the user
# must return a tuple containing a stream to the processed image and the meta (stream, meta)You are free to implement variables, imports, and other functions into your Processor class as well
Modules
- Zerochan scraper
- Pixiv scraper
- Sankaku scraper