Need technical guidance on customizing scrapers 需要关于自定义刮削器的技术指导 #6520
Unanswered
Sichongzou
asked this question in
Q&A
Replies: 2 comments 1 reply
-
|
这是日志截图 This is a screenshot of the log
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
我从Jellyfin搬迁至Stash,我翻遍了Stash中各个支持的刮削器,我没找到满意的,所以我打算自己写一个,使用python做聚合爬虫,允许我一次性抓多个站点的数据合并成一个数据。

因为没有中文文档,我只能对着翻译来写,我大致翻阅了刮削器社区中的案例,对着翻译后的文档大致写了一个。
这是我的yml主文件
然后我同级目录创建了sichongzou_Metatube.py,然后从社区库引入了py_common。我的思路非常明确,我的本地目录我都是整理好的,所以我直接取本地文件的文件名称作为查找下去拿数据,类似于


至于sceneByFragment方法他会返回一个ScrapedScene也就是
然后我把这两个都是放大stash的刮削器上允许,非常完美,但是我发现我的日志部分一直会报一个Could not set image using URL 和unsupported protocol scheme "data"
然后一次刮削他会打印两边我传递的image中的Base64,日志等级是waring。因为Base64很大导致我每次进日志页面都卡的要死。但是刮削一切正常。我不知道哪里出现了问题
I moved from Jellyfin to Stash and searched through all the supported scraping tools in Stash. I couldn't find a satisfactory one, so I plan to write my own one using Python as an aggregation crawler that allows me to capture data from multiple sites at once and merge them into one data.

Because there is no Chinese document, I can only write based on the translation. I roughly browsed through the cases in the scraper community and wrote a rough version of the translated document.
This is my yml master file
Then I created sichongzou_Setatube.py in the same level directory and imported py_common from the community library. My idea is very clear. I have organized all my local directories, so I directly use the file names of local files to search and retrieve data, similar to


As for the sceneByFragment method, it will return a ScrapedScene, which is
Then I allowed both of these on the scraper that enlarged the stash perfectly, but I found that my log section kept reporting a 'Could not set image using URL' and an unsupported protocol scheme 'data'
Then, after scraping, he will print Base64 from both sides of the image I passed, with a log level of waring. Because Base64 is very large, I get stuck every time I enter the log page. But scraping everything is normal. I don't know where the problem is
Beta Was this translation helpful? Give feedback.
All reactions