Enable models to emit lambda functions. #5026

geektoni · 2020-05-12T09:42:09Z

We might want to emit some information from a given model which may be costly to compute and it could slow down the training ( see #4655 for an example of such scenario ). This is important because those kinds of information would be computed even if the user wants to observe different parameters.

The solution here would be to emit lambdas instead, which can then be executed to obtain the real value. I think there are two possible ways to proceed:

The object emits a lambda, which is then executed by the ParameterObserver (if the user wants to observe that particular parameter). The result of the lambda is then stored and shown to the user. In such a case, the end-user will not manage directly the emitted lambdas and it will see only the final result. We might experience anyway reduced training speed in this scenario since the observer will need to execute the lambda anyway;
The object emits a lambda, which is stored as a normal observation by the ParameterObserver. The end-user will need to execute this lambda in order to retrieve its value. In such a case, the end-user will be aware of the existence of these lambdas. Basically, besides emitting scalars/vectors/matrices, we introduce also the possibility of emitting functions.

@karlnapf @gf712 what do you think is the best way to implement this?

The text was updated successfully, but these errors were encountered:

gf712 · 2020-05-12T11:35:47Z

You mean this would be like a promise that can the be computed asynchronously? Could you give an example of a lambda you would want to pass please? From the example you give would it be something like:

observe(i, "oob_error", "Out-of-bag Error", [=](){return get_oob_error()};

And then you tell the observer whether to compute it?
The problem if you store all these lambdas is that you are storing a copy of the object each time. Also if you mean passing a lambda as an end user if you are using a target interface you won't have access to this feature (I doubt SWIG can handle this, but I think something like binder can)

geektoni · 2020-05-12T14:54:51Z

@gf712 yes, the example you wrote is exactly what I had in mind :) Then if the user wants to observe the parameter oob_error, once the observer captures the lambda, the observer will execute it and it will show to the final user just the result (not the lambda itself). All of this I think will be done asynchronously thanks to RxCpp.

I guess the second solution would be harder to implement because of the interfaces (SWIG support for lambdas seems quite limited at the moment http://www.swig.org/Doc4.0/SWIGDocumentation.html#CPlusPlus11_lambda_functions_and_expressions)

karlnapf · 2020-05-12T15:10:51Z

We wouldnt want to store the lambda. The class just offers the lambda to the observer, which can decide to call it (and store the information), or not. It shouldnt be valid after the call is over

karlnapf · 2020-05-12T15:11:30Z

The use case is that a user has applied a filter to the observer to choose what to store....in that case we dont always want to compute the oob error here

gf712 · 2020-05-12T15:16:37Z

We wouldnt want to store the lambda. The class just offers the lambda to the observer, which can decide to call it (and store the information), or not. It shouldnt be valid after the call is over

so the observe function becomes a blocking call right? Because now there is a synchronisation issue, the main routine either copies the data and lets the observer do what it wants with it (for example call it in a seperate thread), or awaits for the observer to do something (and in this case training is stopped until it is computed)

The use case is that a user has applied a filter to the observer to choose what to store....in that case we dont always want to compute the oob error here

So the user passes a string? Or is this with the AnyParameter flags?

karlnapf · 2020-05-12T15:23:44Z

basically we want this semantic (inside the RF)

if (observer->do_you_want_to_record("oob_error"))
    observer->observe(this->compute_oob_error())

as opposed to not having the if statement (i.e. always computing the oob error even if the user doesnt care about it.
Using lambdas was just one idea, another could be to literally ask the observer whether it cares about the value

gf712 · 2020-05-12T15:23:44Z

Something that would be more OOP, but quite a shift in design, is that we have Any as a swig type, pass that to an Observer, and automagically it registers that the memory inside Any should be emited. But I guess this doesn't help in this case because oob error is not a class parameter.
But we could have predefined Observer classes like ComputeOutOfBagErrorObserver, that can be passed to a bagging machine object and that forces oob to be emitted. I guess this all depends on what your view of the final API is..
And yes, the user would provide a string with the parameter name she wants to observe when constructing the observer

geektoni · 2020-05-12T15:24:10Z

The use case is that a user has applied a filter to the observer to choose what to store....in that case we dont always want to compute the oob error here

So the user passes a string? Or is this with the AnyParameter flags?

The user can pass a list of strings (e.g., oob_error) or some AnyParameterProperties (we might want to observe member variables and filter them by type, like HYPER).

gf712 · 2020-05-12T15:25:40Z

basically we want this semantic (inside the RF)
if (observer wants to record the oob error)
    observer->observe(this->compute_oob_error())
as opposed to not having the if statement (i.e. always computing the oob error even if the user doesnt care about it.
Using lambdas was just one idea, another could be to literally ask the observer whether it cares about the value

I would say just pass the function pointer, like watch_method. The lambda here might not bring much benefit, but you can try. Lambdas can be negative cost :)

gf712 · 2020-05-12T16:45:37Z

btw, I was just thinking about this and would this condition be declared once? And then when the algorithm adds a new oob index set it computes the oob error?

karlnapf · 2020-05-12T17:02:55Z

not sure what you mean

gf712 · 2020-05-12T17:17:39Z

train()
  if (observer wants to record the oob error)
      observer->observe(&compute_oob_error, m_bags, m_oob_indices);
  ...
  for i in range n_bags:
    ...
    m_bags.push_back(m);
    m_oob_indices.push_back(indices);
    // this has now triggered the observer, because both vectors have changed

in this case it makes sense to store a function pointer, rather than passing the oob error itself (a scalar)

karlnapf · 2020-05-13T09:19:35Z

I see. That seems like more work to me :)
I think being explicit and just passing the function pointer to the observer every time the error would be computed suffices. It also is more in-line with what we do with SGVector etc, where we pass a reference, and the observe might or might not clone that memory block

gf712 · 2020-05-13T09:24:05Z

I am just not sure I understand why you need to pass the function pointer in that case?
If you have

  if (observer wants to record the oob error)
      observer->observe(get_oob_error());

you will achieve the same no? Unless what you want is to compute oob error in a separate thread? And this compute_oob_error lambda captures the machine and oob indices?

karlnapf · 2020-05-13T09:37:18Z

No need if this is done. But this is just the semantics. In practice, we would like to not have to do this if statement in the train code. Rather, just the observe call. And then the observer decides. Sorry should have been more clear

gf712 · 2020-05-13T09:40:21Z

OK, I think in that case we could look into the thing I posted above? It is more effort, but it would be neat that a combination of events triggers the observer. If this is not possible with Rxcpp you could always extend the functionality using conditional variables.

Reread your comment and I think I finally get it. I think that would work fine. You would have to pass the function pointer at the start. Store that (inside it has the reference to the vectors with machines and indices) and then a second call to the observer inside the loop triggers the function if one has been registered.

gf712 · 2020-05-13T09:47:55Z

train()
  if (observer wants to record the oob error)
      observer->add_callback("oob_error", "Out-of-bag Error", [&](){
         return get_oob_error();
      });
  ...
  // observer checks if there is a callback function
  // index could be optional, as the observer could auto increment
  // string is the name of the callback/value we are observing
  // however this would be a synchronous call (probably), which goes against the observer design pattern (I think)
  observer->observe(i, "oob_error");

geektoni · 2020-05-13T10:09:39Z

train()
  if (observer wants to record the oob error)
      observer->add_callback("oob_error", "Out-of-bag Error", [&](){
         return get_oob_error();
      });
  ...
  // observer checks if there is a callback function
  // index could be optional, as the observer could auto increment
  // string is the name of the callback/value we are observing
  // however this would be a synchronous call (probably), which goes against the observer design pattern (I think)
  observer->observe(i, "oob_error");

I am sorry if I go back to the original idea, but why do we have to store twice the information for calling the get_oob_error() method? If we store the callback, and then we call it later on, wouldn't be the same as sending directly the lambda through observe() in the first place (see below)? There shouldn't be any significant memory/synchronization differences between the two approaches, right?

// Once the observer receives this observation, it checks if he wants to observe
// "oob_error" objects and then, since it is a lambda, it executes it and stores the result. 
train()
  this->observe("oob_error", "Out-of-bag Error", [&](){
      return get_oob_error();
  });

From the observer point of view we will have something like:

on_next(observed_value)
  if (observed_value can be executed) {
    
    // here we execute the lambda and we pass the result
    // to the real implementation of the observer (e.g. ParameterObserverCV, 
    // ParameterObserverScalar, etc.)
    result = observed_value()
    on_next_impl(result)
  }

I apologize if this is a silly question :)

Anyway, if I look at the if statement, I would say it is not possible to know if the observer wants to record the oob error directly (as you mentioned Gil, it kind of goes against the observer pattern).

gf712 · 2020-05-13T10:35:46Z

I am sorry if I go back to the original idea, but why do we have to store twice the information for calling the get_oob_error() method? If we store the callback, and then we call it later on, wouldn't be the same as sending directly the lambda through observe() in the first place (see below)? There shouldn't be any significant memory/synchronization differences between the two approaches, right?
// Once the observer receives this observation, it checks if he wants to observe
// "oob_error" objects and then, since it is a lambda, it executes it and stores the result. 
train()
  this->observe("oob_error", "Out-of-bag Error", [&](){
      return get_oob_error();
  });

From what I understand this observe call would have to be wrapped by a condition inside a loop right? This is the moment you are telling the observer to emit something right? Or is emitting the value by this observer triggered by something else?

From the observer point of view we will have something like:
on_next(observed_value)
  if (observed_value can be executed) {
    
    // here we execute the lambda and we pass the result
    // to the real implementation of the observer (e.g. ParameterObserverCV, 
    // ParameterObserverScalar, etc.)
    result = observed_value()
    on_next_impl(result)
  }
I apologize if this is a silly question :)

Anyway, if I look at the if statement, I would say it is not possible to know if the observer wants to record the oob error directly (as you mentioned Gil, it kind of goes against the observer pattern).

That is why I was thinking that rather than having an if there is some conditional variable logic going on that waits for the observed values, in this case m_bags and m_oob_indices, to change.

std::mutex kObserverMutex;
std::condition_variable cv;
worker() {
  std::unique_lock<std::mutex> lk(kObserverMutex);
  // the Observer thread is now locked until the two class members change
  cv.wait(lk, []{return m_bags_changed() && m_oob_indices_changed();});
  // both parameters changed -> execute callback
  execute_callback();
}

This could even lock any other call to the callback to avoid some undefined behaviour. Granted this is quite complex logic, but IMO this would be really neat, and I think this falls within the observer pattern?

karlnapf · 2020-05-13T10:55:11Z

Gil, I agree this is neat, but what about a simpler thing where we could just do (not wrapped in if)
observe("oob", &compute_oob())
and then inside the observer we could do

void observe(name, f) {
  if (user_has_requested_to_observe(name))
     observe(name, *f()) // e.g. f returns SGVector
}

karlnapf · 2020-05-13T10:57:30Z

However, if the observer has no way of filtering/selecting values/functions that it receives, all hope is lost anyways, because we either need to always compute the oob, or always store the whole cloned objects ...

geektoni · 2020-05-13T12:19:52Z

However, if the observer has no way of filtering/selecting values/functions that it receives, all hope is lost anyways, because we either need to always compute the oob, or always store the whole cloned objects ...

So, the observer (e.g. ParameterObserverLogger) can filter observation based on user's preferences. The model (e.g., BaggingMachine) which emits the observation cannot (that's because of the observer pattern). In our case, the BaggingMachine has to emit every time the oob error and then the observer can decide to use it or not.

karlnapf · 2020-05-13T12:21:15Z

So then what I wrote above (passing the lambda/pointer) should work right?

geektoni · 2020-05-13T12:24:10Z

So then what I wrote above (passing the lambda/pointer) should work right?

Yes, it should work perfectly fine (aside from possible issues about copying memory around and synchronization of the various threads). By passing the lambda, we can defer the execution of expensive functions and demand it to the observer (rather than the model) and only if the user has decided to do so.

It is the most viable option imho

gf712 · 2020-05-13T12:41:35Z

I guess in terms of sync issues we should just have a helper function for oob error that uses arguments to compute the error rather than using the state of the object (so we store the parameters in the lambda which are just memory management objects (SGVector and std::shared_ptr), so quite cheap to store/copy).

geektoni added Tag: Development Task Tag: Discussion labels May 12, 2020

geektoni self-assigned this May 12, 2020

geektoni mentioned this issue May 14, 2020

Extend observe() to emit std::function objects #5033

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable models to emit lambda functions. #5026

Enable models to emit lambda functions. #5026

geektoni commented May 12, 2020

gf712 commented May 12, 2020

geektoni commented May 12, 2020

karlnapf commented May 12, 2020

karlnapf commented May 12, 2020

gf712 commented May 12, 2020 •

edited

Loading

karlnapf commented May 12, 2020 •

edited

Loading

gf712 commented May 12, 2020 •

edited by karlnapf

Loading

geektoni commented May 12, 2020

gf712 commented May 12, 2020

gf712 commented May 12, 2020

karlnapf commented May 12, 2020

gf712 commented May 12, 2020 •

edited

Loading

karlnapf commented May 13, 2020

gf712 commented May 13, 2020 •

edited

Loading

karlnapf commented May 13, 2020

gf712 commented May 13, 2020 •

edited

Loading

gf712 commented May 13, 2020 •

edited

Loading

geektoni commented May 13, 2020

gf712 commented May 13, 2020

karlnapf commented May 13, 2020

karlnapf commented May 13, 2020

geektoni commented May 13, 2020

karlnapf commented May 13, 2020

geektoni commented May 13, 2020

gf712 commented May 13, 2020 •

edited

Loading

Enable models to emit lambda functions. #5026

Enable models to emit lambda functions. #5026

Comments

geektoni commented May 12, 2020

gf712 commented May 12, 2020

geektoni commented May 12, 2020

karlnapf commented May 12, 2020

karlnapf commented May 12, 2020

gf712 commented May 12, 2020 • edited Loading

karlnapf commented May 12, 2020 • edited Loading

gf712 commented May 12, 2020 • edited by karlnapf Loading

geektoni commented May 12, 2020

gf712 commented May 12, 2020

gf712 commented May 12, 2020

karlnapf commented May 12, 2020

gf712 commented May 12, 2020 • edited Loading

karlnapf commented May 13, 2020

gf712 commented May 13, 2020 • edited Loading

karlnapf commented May 13, 2020

gf712 commented May 13, 2020 • edited Loading

gf712 commented May 13, 2020 • edited Loading

geektoni commented May 13, 2020

gf712 commented May 13, 2020

karlnapf commented May 13, 2020

karlnapf commented May 13, 2020

geektoni commented May 13, 2020

karlnapf commented May 13, 2020

geektoni commented May 13, 2020

gf712 commented May 13, 2020 • edited Loading

gf712 commented May 12, 2020 •

edited

Loading

karlnapf commented May 12, 2020 •

edited

Loading

gf712 commented May 12, 2020 •

edited by karlnapf

Loading

gf712 commented May 12, 2020 •

edited

Loading

gf712 commented May 13, 2020 •

edited

Loading

gf712 commented May 13, 2020 •

edited

Loading

gf712 commented May 13, 2020 •

edited

Loading

gf712 commented May 13, 2020 •

edited

Loading