`🐍Website_login_mode`

"Did you know all your doors were locked?" - Riddick (The Chronicles of Riddick)

_{Created by
CriseLYJ}

🌟Website_login_mode

I collected some major website login methods, and some website crawling programs, some are registered through selenium, some are directly simulated login by capturing packets, some are using scrapy, I hope to help Xiaobai, this project is used for research and sharing The simulated landing mode of the big website, and the crawler program, I will continue to update. . .

Simulate login to some common websites and crawl corresponding information

About

The basic login is based on direct login or using selenium+webdriver. Some websites are very difficult to log in directly. For example, qq space, bilibili, etc. if you use selenium, it is relatively easy.

Although it is selenium when logging in, for efficiency, we can maintain the cookie obtained after login, and then call requests or scrapy for data collection, so the speed of data collection can be guaranteed.

Completed

Facebook
无需身份验证即可抓取Twitter前端API
微博网页版
知乎
QQZone
CSDN
淘宝
Baidu
果壳
JingDong 模拟登录和自动申请京东试用
163mail
拉钩
Bilibili
豆瓣
Baidu2
猎聘网
微信网页版登录并获取好友列表
Github
爬取图虫相应的图片

show

Bilibili automatic login test is normal, the success rate is 98%

web Weichat

图虫spider

TaoBaoweb

taobao.py为模拟登录
剩下的文件为爬虫

Github

1. Climb the sub-labels of Taobao, rank the product information by sales, and save to MongoDB by category.
2. Data analysis by pandas
3. Display the distribution of goods in each province, sales ranking, map distribution, etc. through matplotlib

Guoke.spider use caution, download faster! 10 seconds to download a bunch, screenshots I will not show, has been deleted, too many things 😝

Sina

sina.py: Log in for the simulation
spider: Folder in the crawler

1. Enter the blogger ID to crawl and get an ajax request
2. Parse the json data, crawl all the bloggers of the blogger, save to MySQL

tips of pull request

Welcome everyone to come pull request 💗

Problems

About the verification code: The method used in this project does not process the verification code. The difficulty of identifying the complex verification code is still relatively large at present. In my opinion, the best way to do reptiles is to try to avoid the verification code.
Code invalidation: Due to website policy or style change, the code is invalid, please give me an issue. If you have already solved it, you can mention PR, thank you!

Another

If you have any website that is difficult to log in, such as a website that uses selenium+webdriver and can't log in, please feel free to give me an issue.
If the repo is helpful to everyone, give a star encouragement.

something to add

After writing the project for a period of time, I found that the style of the code and the ease of use of the program, scalability, and readability of the code all have certain problems, so the next most important thing is to refactor the code so that everyone can It's easier to make some small features of your own.
If you feel that the login of a website is very representative, please feel free to ask in the issue
If the login to the site is very interesting, I will add it in a later update.
The login mechanism of the website may change frequently, so when the current simulated login rule cannot be used, please submit it in the issue.

If you have a lot of attention, I will continue to maintain this repository to bring more things and refactor the code.

Acknowledgments

Thanks for all!

Written at the end

I need your support.
And I think you can give me a 🌟star!s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-en-us.md

README-en-us.md