Ad Click Prediction for Social Networks

Requirements & Goals

Functional requirements

Build a machine learning model to predict if an ad will be clicked. For simplicity reasons, we will not focus on the cascade of classifiers that are commonly used in adtech.
ML model with good performance

Non-functional requirements

System can scale to a larger number of users with low latency.
Imbalance data: you can assume Click Through Rate (CTR) is very small in practice (1%-2%).
Serving: from the Real-Time Bidding (RTB) workflow diagram, it's important to have low latency (150 ms) for ad prediction.

Calculate and estimation

Assumptions: 4K ads requests per second which are 10 billion ads requests per month.
Data: historical ad clicks data includes [user, ads, click_or_not]. With an estimated 1% CTR, it has 100 million clicked ads. We can start with 1 month of data for training and validation.
Train/validation data split: We split train/validation to simulate the actual online system for example: split by time.
Features: naturally, the model needs to have enough capacity to learn patterns from big training data. In practice, it's common to have hundreds even thousands of features.
Training: ability to retrain many times within one day to increase model performance in an online manner.
Serving: latency within 150ms per requests and 4K request per second.
Number of predictions: a million per second

Metrics evaluation

During the training phase, we can focus on machine learning metrics instead of revenue metrics or CTR metrics. Regarding revenue-related metrics, we usually monitor during deployment. offline metrics and online metrics.
Normalized Cross-Entropy: predictive log loss divided by the cross-entropy of the background CTR. This way NCE is insensitive to background CTR.
Calibration metrics measured by the expected clicks vs the actually observed clicks.

Modeling

Model: We can use a probabilistic sparse linear classifier (logistic regression). It's popular because of the computation efficiency and sparsity features.
Feature engineering: AdvertiserID: it's easy to have millions of advertisers. One common way is to use embedding as a distributed representation for advitiserID.
Data processing: One way is subsampling the majority negative class at different sub-sampling ratios. The key here is ensuring that the validation dataset has the same distribution as the test data set.

Model deployment and testing

During the deployment phase, it's crucial to monitor the actual CTR and other revenue-related metrics.
Related to this topic, read more about A/B testing and multi-arms bandit.

A/B testing: compares the performance of two versions of content to see which one appeals more to visitors/viewers. multi-armed bandit: dynamically allocate traffic to variations that are performing well, while allocating less traffic to underperforming variations

High-level system design

It’s challenging to be able to train models every few hours to use only up-to-date data in production. Furthermore, those models need to be easily improvable through feature selection and hyperparameter tuning. This requires the ability to run offline and online tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ad.md

Ad.md

Ad Click Prediction for Social Networks

Requirements & Goals

Functional requirements

Non-functional requirements

Calculate and estimation

Metrics evaluation

Modeling

Model deployment and testing

High-level system design

Files

Ad.md

Latest commit

History

Ad.md

File metadata and controls

Ad Click Prediction for Social Networks

Requirements & Goals

Functional requirements

Non-functional requirements

Calculate and estimation

Metrics evaluation

Modeling

Model deployment and testing

High-level system design