You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently some classification algorithms check whether the input Labels are valid, e.g. the class labels are continuous [0, 1, ..., n_classes-1], which leads to a lot of duplicate code.
These checks should be done by the Machine base class when training is performed. The Machine will then store the mapping of any Label input to an internal encoding, e.g. a binary classification task would map {10,20} -> {-1,+1} using a BinaryLabelEncoder class, and similarly there would be a MulticlassLabelsEncoder class for multiclass tasks. The properly encoded Labels are then dispatched to the train_machine method. When apply is called the returned Labels are mapped back to the user input Labels space using the LabelEncoder.
The tasks (in order):
write a LabelEncoder base class and respective BinaryLabelEncoder and MulticlassLabelsEncoder derived classes. These should also check that the Labels are valid, e.g. cannot transform {-1, 0, 1} to BinaryLabels. Add label encoder #5067
add LabelEncoder as a Machine class member
fit the LabelEncoder and transform input in train and then perform inverse operation in apply
Remove label checks from Machine subclasses, since algorithms are now guaranteed to receive a valid Label representation
xvalidation would use its own mapping that it passes on to each fold's Machine in order to keep the same mapping across folds
Most of this code already exists, but it is spread around the code base
The text was updated successfully, but these errors were encountered:
a lot of the conversion code is inside the labels classes already, so can be re-used.
E.g. here and here
Also note that some of this code is already used within the old approach, where algorithm classes convert the labels to the appropriate form (rather than the base class doing it as outlined above). See e.g. here. This would just be removed with the approach described above as the algorithms are guaranteed to receive the appropriate labels.
Finally, this old approach currently in use might cause bugs/wrong results when used within xvalidation as the mappings (might) change across folds....
Currently some classification algorithms check whether the input Labels are valid, e.g. the class labels are continuous
[0, 1, ..., n_classes-1]
, which leads to a lot of duplicate code.These checks should be done by the Machine base class when training is performed. The Machine will then store the mapping of any Label input to an internal encoding, e.g. a binary classification task would map {10,20} -> {-1,+1} using a
BinaryLabelEncoder
class, and similarly there would be aMulticlassLabelsEncoder
class for multiclass tasks. The properly encoded Labels are then dispatched to thetrain_machine
method. Whenapply
is called the returned Labels are mapped back to the user input Labels space using theLabelEncoder
.The tasks (in order):
LabelEncoder
base class and respectiveBinaryLabelEncoder
andMulticlassLabelsEncoder
derived classes. These should also check that the Labels are valid, e.g. cannot transform {-1, 0, 1} to BinaryLabels. Add label encoder #5067LabelEncoder
as aMachine
class memberLabelEncoder
and transform input intrain
and then perform inverse operation inapply
Machine
subclasses, since algorithms are now guaranteed to receive a valid Label representationMachine
in order to keep the same mapping across foldsMost of this code already exists, but it is spread around the code base
The text was updated successfully, but these errors were encountered: