-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Custom DataGrids
The default DenseInstances
type may not meet your application's needs. Fortunately, the DenseInstances
type implements a FixedDataGrid
, meaning it's easy to adapt for your own needs.
Here's the DataGrid
for in golearn 0.1:
type DataGrid interface {
// Retrieves a given Attribute's specification
GetAttribute(Attribute) (AttributeSpec, error)
// Retrieves details of every Attribute
AllAttributes() []Attribute
// Marks an Attribute as a class Attribute
AddClassAttribute(Attribute) error
// Unmarks an Attribute as a class Attribute
RemoveClassAttribute(Attribute) error
// Returns details of all class Attributes
AllClassAttributes() []Attribute
// Gets the bytes at a given position or nil
Get(AttributeSpec, int) []byte
// Convenience function for iteration.
MapOverRows([]AttributeSpec, func([][]byte, int) (bool, error)) error
}
FixedDataGrid
adds a few extra methods.
type FixedDataGrid interface {
DataGrid
// Returns a string representation of a given row
RowString(int) string
// Returns the number of Attributes and rows currently allocated
Size() (int, int)
}
Refer to the automatically up-to-date documentation for more recent versions of GoLearn.
Attribute
implementations in GoLearn describe features of the machine learning problem. As of GoLearn 0.1, the implementations that exist as part of base are CategoricalAttribute
and FloatAttributes
(both 64-bits), as well as BinaryAttribute
. AttributeSpec
structures link an Attribute
to a implementation-specific idea of where the data underlying a given Attribute
is located in memory. An example of their use in DenseInstances
is to store the column offset. DataGrid
implementations outside of base
won't be able to add additional fields to an AttributeSpec
but they can:
- Maintain local
map[AttributeSpec]int
structures to offer fast resolution. - Extend
AttributeSpec
to add additional fields (untested).
When deciding which AttributeSpec to return, implementations should use strict equality (using Attribute.Equals
, otherwise odd problems (like CategoricalAttributes
having corrupted orderings) might cause odd behaviour.
Simply returns a copy of all of the available Attributes
. This is used for determining compatibility with other DataGrid
implementations, and is usually a precursor to GetAttribute
calls. It should occur in a fixed order.
Each DataGrid
implementation keeps track of which Attribute
s are designated class variables. Normally, this is done using a map[Attribute]bool
structure.
A call to this method means that the argument should no longer appear in calls to AllClassAttributes
.
This method returns every Attribute
designated as a class Attribute
via previous calls to AddClassAttribute
.
This method takes an AttributeSpec
and a row number and returns a slice of bytes (which can be converted to another value using Attribute
-specific methods). At least one byte should be returned.
This allows algorithms to iterate over all the rows in the DataGrid
in whichever order is convenient for the underlying implementation. The first argument is a slice of AttributeSpec
structures describing which fields are needed. The second argument is a function pointer which takes two arguments. The first argument of the function pointer is a slice of byte slices containing all of the binary on a given row. The second argument is a row number. The return values are a boolean saying whether the inner algorithm has terminated, and an optional error if the inner algorithm terminated with an error.
FixedDataGrid
adds a RowString
method for easier inspection. The argument is the row number to be printed.
Size
returns the current dimensions of the FixedDataGrid
. The first value returned is the number of Attributes, the second value is the number of rows.