-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
This project looks good, but somehow I can't get it to work at all:
markdown-crawler --debug -b test "https://www.hooplaimpro.com/improv-encyclopedia.html"
DEBUG:markdown_crawler:π Debugging enabled
INFO:markdown_crawler:πΈοΈ Crawling https://www.hooplaimpro.com/improv-encyclopedia.html at β¬ depth 3 with π§΅ 5 threads
DEBUG:markdown_crawler:Crawling: https://www.hooplaimpro.com/improv-encyclopedia.html
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.hooplaimpro.com:443
DEBUG:markdown_crawler:Started thread 1 of 5
DEBUG:markdown_crawler:Started thread 2 of 5
DEBUG:markdown_crawler:Started thread 3 of 5
DEBUG:markdown_crawler:Started thread 4 of 5
DEBUG:markdown_crawler:Started thread 5 of 5
DEBUG:urllib3.connectionpool:https://www.hooplaimpro.com:443 "GET /improv-encyclopedia.html HTTP/1.1" 200 None
INFO:markdown_crawler:Created π improv-encyclopedia-html.md
/home/urko/.virtualenvs/markdown-crawler/lib/python3.12/site-packages/markdown_crawler/__init__.py:201: UserWarning: Ignoring nested list ['body'] to avoid the possibility of infinite recursion.
for target in soup.find_all(target_links):
DEBUG:markdown_crawler:Found 0 child URLs
INFO:markdown_crawler:π All threads have finished
A different crawler was able to follow all the links in that page, no problem.
What am I missing?
Metadata
Metadata
Assignees
Labels
No labels