If you're looking for a job, say a data science role, you're probably using Linkedin Jobs. But with hundreds of jobs posted every day, it can be hard to find the ones that best match your skills.
The main purpose of this project is to help you find the best matching jobs automatically.
In this project we:
- Built a Linkedin job scraper using
Selenium
,Requests
andBeautifulSoup
. - Built a text analysis of your resume and LinkedIn jobs using
Spacy
. - Developed a
Flask
app to display data visualisations, including a word cloud of in-demand skills. The app highlights job-specific skills and keywords, compares them to your skills and generates a list of the most relevant job matches.
This project requires Python 3 and the following Python libraries installed:
-
WEB scraping libraries:
Selenium
,Requests
,BeautifulSoup
-
NLP libraries:
Spacy
,NLTK
-
Web app and visualization:
Flask
,Plotly
,Matplotlib
,Wordcloud
-
Other libraries:
pandas
,numpy
,json
-
Install the trained English pipeline from Spacy.io as follows:
python -m spacy download en_core_web_lg
The full list of requirements can be found in the requirements.txt
file.
- FLASK_app folder: contains our responsive Flask WEB application.
run.py
: main file to run the web application.scraping_linkedin.py
: Code for scraping Linkedin jobs withSelenium
andRequests
, andBeautifulSoup
for parsing html content.Spacy_text_analayzer.py
: Code to analyse text withSpacy
, search for keywords and skills, compare them with your own and return the most relevant job matches.plotly_figures.py
: Returns the configuration (data and layout) ofPlotly
figures.templates
folder: Contains 9 html pages.static
folder: Contains our customizedCSS
file andBootstrap
(compiled and minifiedCSS
bundles andJS
plugins).
- chromedriver folder: contains the chromedriver executable used by
Selenium
to control Chrome. - data folder: contains the following files:
user_credentials.txt
: Contains your LinkedIn credentials (email address and password).Skills_in_Demand.txt
: List of skills in demand (you can update this list).Skill_patterns.jsonl
: Contains the skill patterns in json format and will be used to create an entity ruler in theSpacy
model.Job_Ids.csv
andlinkedin_jobs_scraped.json
: Scraped LinkedIn job IDs and job details (description, seniority level, number of candidates, etc.).
- notebooks folder: contains the project notebooks.
- resume folder: Enter your resume (pdf format) here to analyse it with Spacy and get a list of your skills.
-
Save your LinkedIn credentials (email address and password) in
user_credentials.txt
. -
Run the following command in the FLASK_app's directory to scrape LinkedIn jobs.
python scraping_linkedin.py "data scientist" "Montreal, Quebec, Canada" 120
You can replace "Data Scientist" and "Montreal, Quebec, Canada" with the job title and the location, respectively.
120 is a timer set in seconds which allows for supplementary loading time for the webpage. The timer can be adjusted depending on your Internet speed.
-
Run the following command in the FLASK_app's directory to run the WEB application.
python run.py
-
Go to http://127.0.0.1:3001/
-
THe
Dashboard
page displays the distribution of seniority level and the number of days since the job posting. Additionally, it showcases a word cloud containing in-demand skills. This will help you define what you should be looking for to further broaden your skills. -
The
Resume_Analyzer
page uploads your resume (pdf format), displays your skills and assesses them against the most in-demand skills. -
The best matching jobs are showcased within a carousel that emphasises the matching scores.
-
The LinkedIn job role is presented on the
display_Job
page with an emphasis on the match score and highlighting essential skills required for the position that are not listed in your resume.