A simple MATLAB Implementation of CGP-RANK algorithm [1]. CGP-RANK ranks an ordered list of recommendations, and it is a well suited algorithm for web-scale tasks, where the number of items available for recommendation is too large, and the context space too complex, compared to the relatively sparse click feedback.
CGP-RANK is an upper-confidence style algorithm, which exploits prior information specified in terms of a Gaussian process kernel function, which allows to share feedback in three ways: between positions in a list, between items, and between contexts [2]. Choice of kernels expresses our prior assumption of how smoothly the rewards change over the item-context space, and allows us to predict the performance of yet unexplored lists using Gaussian Process models.
You need to have the following libraries.
GPML Matlab Code version 4.2
You can download it here! Don't forget to add the library with all its subfolders to your project path.
You will need to initialize a couple things, let's walk through them one by one!
We will pass this function's handle to CGP-RANK, reward function should take as arguments a list of items and contexts, they can be multidimensional, as long as your kernels support it.
Let's first test with a very simple reward function, reward = sin(item) + cos(context) + noise.
function reward = feedback_function(items, contexts)
X = items;
Y = contexts;
reward = sin(X) + cos(Y);
noise = randn(size(reward))*0.01;
reward = (reward + noise)';
end
You must also create a function that can tell which items are available in any given round t, if all items are always available you can simply return the whole list of items at every round. For this test, we choose our item domain from -3 to 3 with 0.05 intervals, and at every round t only a randomly selected 1/4 of all items will be available for recommendation.
function item_list = available_item_list_creator(t)
items = -3:0.05:3;
item_list = datasample(items, int8(length(items)/4), 'Replace', false);
end
Similarly a random context will be selected at each round.
function context = context_creator(t)
contexts = -3:0.05:3;
context = datasample(contexts, 1);
end
Define a kernel using GPML Kernels, there is probably a kernel for your needs. If not, you can always create a custom one using covDiscrete, it is entirely possible to create diffusion kernels from graphs as used in the CGP-RANK paper by using GPML kernel functions, [check the documentation here!] (http://www.gaussianprocess.org/gpml/code/matlab/doc/manual.pdf)
For this simple reward function, we will use isotropic squared exponential kernels without scaling.
item_kernel = {@covSEisoU}; item_hyp = 0;
mask = [1,0]; % binary mask so that this kernel applies only to items
item_kernel_masked = {@covMask,{mask,item_kernel}};
context_kernel = {@covSEisoU}; context_hyp = 0;
mask = [0,1]; % binary mask so that this kernel applies only to contexts
context_kernel_masked = {@covMask,{mask,context_kernel}};
% product of two seperate kernels to create one composite kernel
kernel ={@covProd,{item_kernel_masked, context_kernel_masked}};
GP_hyperparameters.cov = [item_hyp, context_hyp];
% Total number of rounds
T = 40;
% number of items to be recommended in each round
batch_size = 5;
% f handles
context_receiver_handle = @context_creator;
available_item_list_receiver_handle = @available_item_list_creator;
feedback_handle = @feedback_function;
% calculate the optimal recommendation list for regret plots, this is an optional argument
calculate_optimal_rewards = true;
% Run!
[item_choices, observations, oracle] = CGPRank(kernel, GP_hyperparameters,...
batch_size, T, context_receiver_handle, available_item_list_receiver_handle, ...
feedback_handle, calculate_optimal_rewards);
The sublinear regret shows that the CGP-RANK converges to the optimal recommendation list across context space.
Note that as the algorithm aims to maximize rewards over time, it only needs to accurately map the highly rewarding parts of the environment, it does this by a balanced trade-off between exploration and exploitation using an upper-confidence style scheme, namely choosing item-context pairs that maximizes its UCB selection rule (predictive mean + (a time-varying factor)* predictive variance).
Continuously performing updates on GP Inference can be prohibitively expensive for only very little immediate return, especially if there is already a good number of feedback acquired. CGPRANK can be accelerated by reusing the same recommendation multiple times, accumulating feedback and performing delayed updates.
Two optional arguments, "t_feedback_delay" and "feedback_delay_growth_ratio" control the feedback delaying. After t > t_feedback_delay, feedback delaying becomes active and the updates are performed everytime the dataset expands by an amount of feedback_delay_growth_ratio since the last update, i.e. for t_feedback_delay = 100 and feedback_delay_growth_ratio = 0.1, updates after t > 100 happen at 110, 121, 133, 146, 160 ...
With the above values of feedback delay, elapsed time is:
There are a couple ways to do this that GPML Matlab supports, however only a starter code is added for Sparse Covariance Approximations that uses inducing inputs to reduce the GP Inference's complexity to O(m^2*n) where m is the number of inducing inputs, instead of the regular O(n^3) time complexity. The last argument of CGPRANK is the optional argument t_approximation, if this is set, after t > t_approximation, GPSparseCovRegression is used instead of GPInference.
To test speed up, inducing inputs are chosen randomly from previously chosen points and their number is increased very slowly over time after t_approximation. This incurred extra regret but accelerated the algorithm as seen below.
We can now test on a harder function, namely on Schaffer Function N. 2. Matern Kernels are used for this test.
Reward Function Predictive Mean of Reward Function at T = 200 Regret Plot
[1,2] Vanchinathan, H. P., Nikolic, I., Bona, F. D., & Krause, A. (2014). Explore-exploit in top-N recommender systems via Gaussian processes. Proceedings of the 8th ACM Conference on Recommender Systems - RecSys 14. doi: 10.1145/2645710.2645733