-
Notifications
You must be signed in to change notification settings - Fork 5
preprocessing FALCON
The function FalconPreProcess.m allows the manipulation of the measurement data in a number of ways. It needs to be run when the optimization problem has been created, before the optimization process.
[estim2]=FalconPreProcess(estim, 'normalize', [a b]) will min-max normalize the values in the dataset between a and b, with a < b (typically 0-1). Normalization is performed linearly, independently for each analyte (each column in the dataset).
[estim2]=FalconPreProcess(estim, 'bootstrap', S) where S is a string with possible values 'rows', 'columns', or 'both' will perform bootstrap sampling (resample with replacement to the original dataset size). This is a simple way to assess the confidence in parameter values estimates given the amount of signal in the data, by looking at the parameter value distributions over many bootstraps.
[estim2]=FalconPreProcess(estim, 'randomize', p) where p is a scalar with 0 < p < 1 will perform random permutations on a fraction p of the datapoints. This is an easy way to assess the influence of the data on the model optimization. Reaching the same conclusion with the original and randomized dataset is a sign that the conclusions are not well supported.
[estim2]=FalconPreProcess(estim, 'subsample', p) where p is a positive scalar will subsample the original dataset to a fraction p of its original size. If p < 1 the resulting dataset will be smaller than the original one (undersampling, useful to speed up computations on large datasets). If p > 1 the result is oversampling, which is a less aggressive way to perturbate the dataset than bootstrapping, as all original datapoints are still included, and therefore better in the case of very small datasets.
[estim2]=FalconPreProcess(estim, 'noise', p) where p is a positive scalar will add a fraction p of the error on the measurements to the measurement values, de facto generating one of the possible instances of measurement values given their distribution. This is yet another way to perturb the data, in a way that preserves the error matrix. This is useful when some measurement have a much higher standard error than others.
It is possible to combine different layers of preprocessing in one function call, like in: [estim2]=FalconPreProcess(estim, 'normalize', [-0.2 0.9], 'bootstrap', 'both','randomize', 0.5, 'subsample', 0.5)