This is a project for predicting the hits on a web page in an IPython Notebook which demonstrates
- Understanding Data
 - Exploratory Data Analysis
- Visualization
 - Manipulation
 - Feature Engineering
 
 - Data Preparation
 - Model Selection
 - Validation
 
The goal of this repository is to demonstrate an insightful understanding of data using visualizations and feature engineering for the prediction model.
Quick Start: View a static version of the notebook in the comfort of your own browser.
- Python 2.7
 - Pandas
 - Sklearn
 - NumPy
 - Seaborn
 - Matplotlib
 - XGBoost
 
To run this notebook interactively:
- 
Clone this repo
$ git clone https://github.com/techedlaksh/website-hits-prediction $ cd website-hits-prediction - 
Create new virtual environment
$ sudo pip install virtualenv $ virtualenv venv $ source venv/bin/activate $ pip install -r requirements.txt ```
 - 
Run Notebook
$ jupyter notebook
 - 
Click on
final-notebook.ipynbin the browser and enjoy! - 
When you're done with notebook, close the jupyter from terminal and deactivate the virtual environment with
deactivate. 
- row_num: a nuber uniquely identifying each row.
 - locale: the platform of the session.
 - day_of_week: Mon-Fri, the day of the week of the session.
 - hour_of_day: 00-23, the hour of the day of the session.
 - agent_id: the device used for the session.
 - entry_page: describes the landing page of the session.
 - path_id_set: shows all the locations that were visited during the session.
 - traffic_type: indicates the channel the user cane through.
 - session_duration: the duration in seconds of the session.
 - hits: the number of interactions with the trivago page during the session.
 
Use the data provided to build a model that predicts the number of hits per session, depending on the given parameters.
Predictions will be evaluated by the root mean square error.
- Importing data with pandas
 - Understanding data using statistics with pandas
 - Exploring Data through Visualizations with Matplotlib
 - Feature Engineering
 - Data Preparation for the model
 
- Logistic Regression
 - Random Forest
 - XGBoost
 - LightBGM
 
- K-folds cross validation to valuate results locally
 
Note: Trained models are exported which can be re-used by importing it into your script and predicting your data with the saved model.