GitHub - techedlaksh/website-hits-prediction: Predicting web hits based on different attributes such as location, device, session time etc

Website Hits Prediction

This is a project for predicting the hits on a web page in an IPython Notebook which demonstrates

Understanding Data
Exploratory Data Analysis
- Visualization
- Manipulation
- Feature Engineering
Data Preparation
Model Selection
Validation

The goal of this repository is to demonstrate an insightful understanding of data using visualizations and feature engineering for the prediction model.

Quick Start: View a static version of the notebook in the comfort of your own browser.

Dependencies:

Python 2.7
Pandas
Sklearn
NumPy
Seaborn
Matplotlib
XGBoost

Installation

To run this notebook interactively:

Clone this repo

$ git clone https://github.com/techedlaksh/website-hits-prediction
$ cd website-hits-prediction

Create new virtual environment

$ sudo pip install virtualenv
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
 ```

Run Notebook
```
$ jupyter notebook
```
Click on final-notebook.ipynb in the browser and enjoy!
When you're done with notebook, close the jupyter from terminal and deactivate the virtual environment with deactivate.

Data

row_num: a nuber uniquely identifying each row.
locale: the platform of the session.
day_of_week: Mon-Fri, the day of the week of the session.
hour_of_day: 00-23, the hour of the day of the session.
agent_id: the device used for the session.
entry_page: describes the landing page of the session.
path_id_set: shows all the locations that were visited during the session.
traffic_type: indicates the channel the user cane through.
session_duration: the duration in seconds of the session.
hits: the number of interactions with the trivago page during the session.

Task

Use the data provided to build a model that predicts the number of hits per session, depending on the given parameters.

Evaluation

Predictions will be evaluated by the root mean square error.

Notebook covers these topics

Data Handling

Importing data with pandas
Understanding data using statistics with pandas
Exploring Data through Visualizations with Matplotlib
Feature Engineering
Data Preparation for the model

Model Selection

Logistic Regression
Random Forest
XGBoost
LightBGM

Valuation

K-folds cross validation to valuate results locally

Note: Trained models are exported which can be re-used by importing it into your script and predicting your data with the saved model.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
final-notebook.ipynb		final-notebook.ipynb
my_model.model		my_model.model
my_model_31.46.model		my_model_31.46.model
my_model_31.53.model		my_model_31.53.model
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt
submit.csv		submit.csv
xgb_model.model		xgb_model.model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Website Hits Prediction

Dependencies:

Installation

Data

Task

Evaluation

Notebook covers these topics

Data Handling

Model Selection

Valuation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

techedlaksh/website-hits-prediction

Folders and files

Latest commit

History

Repository files navigation

Website Hits Prediction

Dependencies:

Installation

Data

Task

Evaluation

Notebook covers these topics

Data Handling

Model Selection

Valuation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages