-
Notifications
You must be signed in to change notification settings - Fork 119
[HIVEMALL-220] Implement Cofactor #167
base: master
Are you sure you want to change the base?
Conversation
// storing trainable latent factors and weights | ||
private Map<String, RealVector> theta; | ||
private Map<String, RealVector> beta; | ||
private Map<String, Double> betaBias; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Object2DoubleMap<String> betaBias
instead to reduce memory consumption.
protected static RealMatrix calculateWTWSubset(List<Feature> subset, Map<String, RealVector> weights, int numFactors, float constant) { | ||
// equivalent to `B_u.T.dot((c1 - c0) * B_u)` in cofacto.py | ||
RealMatrix delta = new Array2DRowRealMatrix(numFactors, numFactors); | ||
int i = 0, j = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int i = 0, j = 0;
never used.
This could be found in IDE warning. Please resolve all IDE warnings.
if (weights.containsKey(key)) { | ||
return; | ||
} | ||
RealVector v = new ArrayRealVector(factor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final double[] v = new double[factor];
...
weight.put(key, new ArrayRealVector(v));
is better.
|
||
private static double dotFactorsAlongDims(Map<String, RealVector> weights, int dim1, int dim2) { | ||
double result = 0.d; | ||
for (String key : weights.keySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply
for(RealVector vec : weights.values()) {
...
}
What's expected behavior when vec.getEntry(dim1/2)
returns null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RealVector
implementation has now changed to double[]
, so by default it should be 0
. In any case, all entries in weights
have been initialized to uniform random or gaussian so the array entries should be valid.
*/ | ||
private static void addInPlace(RealVector u, RealVector v, double scalar) { | ||
assert u.getDimension() == v.getDimension(); | ||
for (int i = 0; i < u.getDimension(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer avoiding method calls by
for (int i = 0, size=u.getDimension(); i < size; i++) {
|
||
protected void add(TrainingSample sample) { | ||
if (size() == this.maxSize) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample
is thrown away?
You can return boolean value to check MiniBatch is full or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an explicit mini batch size is not required anymore. The upper limit is the size of the inputBuf
. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. let me see..
9864926
to
08bc526
Compare
…axony/incubator-hivemall into feature/cofactor-feature-array
Hi @myui, I'd like to continue working on cofactor and hopefully get it merged in. Any recommendations on how I can test implementation in distributed environment? |
Hi @jaxony welcome back! For pseudo evaluation, docker is easy to evaluate. This docker-compose spin-up multiple hadoop servers using @Lewuathe 's Then, you can just login to master node and setup Hive (and hivemall) on the master node. Note Hive command is just a standalone application submitting jobs to Hadoop. You can modify Hive setup process on master node following Also, I'll consider to provide you EMR testbed. |
@jaxony For security reasons, we would like to add your IP address to whitelist for accessing our EMR environment. Do you have specific IP range (e.g., university's one)? |
@myui I think the University of Melbourne's IP range is 128.250.0.0 to 128.250.0.255 |
What changes were proposed in this pull request?
Implemented new matrix factorization algorithm for the recommendation problem.
What type of PR is it?
Feature
What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-220
How was this patch tested?
Unit tests and manual testing on ML20M in a Hive dev environment
How to use this feature?
TODO
Checklist
(Please remove this section if not needed; check
x
for YES, blank for NO)./bin/format_code.sh
, for your commit?