You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+20-14
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ As written in [Linkedin User Agreement](https://www.linkedin.com/legal/user-agre
16
16
17
17
# LinkedIn Web Scraper
18
18
19
-
Python Web Scraper for LinkedIn companies. The script fully simulate an human activity in order to get data from LinkedIn web pages. The purpose is store data from companies of a certain zone, such as:
19
+
This is a LinkedIn Python Web Scraper for companies. The script fully simulate a human activity (using [Selenium](https://selenium-python.readthedocs.io) library) in order to get data from LinkedIn web pages. The purpose is store data from companies of a certain zone, such as:
20
20
21
21
- Name
22
22
- Overview
@@ -25,34 +25,40 @@ Python Web Scraper for LinkedIn companies. The script fully simulate an human ac
25
25
- Industry
26
26
- etc.
27
27
28
-
After collected the above information, these will be stored into an .xls file.
28
+
After collected the above information, these will be stored into an `.xls` file.
29
29
30
30
### Demo
31
31
32
32
[](https://youtu.be/TKkJEo-4NTg)
First of all, donwload the web driver you prefer (Firefox or Chrome) and put it inside the folder. Then put you credential inside the **config.ini** file and specify the web driver you donwloaded. Also others kind of parameters can be setted.
41
+
First of all, download the web driver you prefer (either [Firefox](https://github.com/mozilla/geckodriver/releases) or [Chrome](https://chromedriver.chromium.org/downloads)) and put it inside project folder. After that, put your credentials in `config.ini` file and specify the `webdriver`you have downloaded. Also, others kind of parameters can be set.
37
42
38
-
The method *get_companies_name(...)* requires a link (in this case a link of a company) and will return an array of links in which each link is the page of the company.
43
+
Method `get_companies_name(...)` requires a link (in this case a link of a company) and will return an array of links in which each link is the LinkedIn company web page.
39
44
40
-
After that, you can run *retrive_data(...)* that requires the array with the links and the name of the .xls file in which you want to store information that will be collected from each link for each company.
45
+
After that, you can run `retrieve_data(...)` that requires the array with the links and the name of the `.xls` file in which you want to store all the information that will be collected from each link for each company.
41
46
42
-
Class *ManageExcelFile* will handle the I/O operation for the .xls file.
47
+
Class `ManageExcelFile` will handle the I/O operation to the `.xls` file.
43
48
44
-
# Issues
49
+
# Troubleshooting
45
50
46
-
It could happen that, after the loggin phase, LinkedIn could ask you to perform some operations instead of rediricet you to the feed (https://www.linkedin.com/feed/) page.
51
+
It could happen that, after the logging phase, LinkedIn could ask you to perform some actions/operations (e.g. "I'm not a robot", etc.) instead of redirecting you to the feed (https://www.linkedin.com/feed/) page.
0 commit comments