You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each of the following muon subsystems, create json files AutoDQM_ML/metadata/histogram_lists/<subsystem>.json and AutoDQM_ML/metadata/datasets/<subsystem>.json where any bad runs are indicated for that subsystem.
An example of the histograms list would be the dt.json file from @chosila , and an example of the datasets json would be (slightly modifying the existing bad_dt.json from Si to reflect recent updates to the DataFetcher):
Once histogram and dataset lists are in place, we should proceed with training PCAs and AutoEncoders for each of the histograms To start, the default PCA/AutoEncoder options are probably fine. Later on, we can come back and try optimizing hyperparameters.
For PCAs, this should simply be a matter of running scripts/train.py with all of the relevant histograms (1 PCA per histogram). For AutoEncoders, we have the possibility of training a single AutoEncoder on multiple histograms simultaneously. I'd suggest we forego this subtlety for now and just follow the PCA-style of 1 AutoEncoder for 1 histogram. Once PCAs/AutoEncoders are trained, the saved models in json/hdf5 files should be placed in folders on Github, maybe AutoDQM_ML/data/models/<subsystem>/.
❓ Maybe it makes more sense to place these directly in the AutoDQM repo and/or /eos?
Finally, we should perform a validation of both the PCAs and AutoEncoders and summarize these in a set of slides for each subsystem. At minimum, we would want the following:
any relevant details on the histogram list
any relevant details on the "bad runs" (what histograms are affected, what was the issue, etc.)
plots of original and reconstructed histograms (with both PCA and AutoEncoder) for both good and bad runs
SSE summary plot for each histogram (with both PCA and AutoEncoder) split by train/test sets and good/bad runs
ROC curve and TPR vs. FPR table for both PCA and AutoEncoder.
Through this process, we should stay in contact with relevant DPG experts to make sure they agree with physics side of things for each subsystem. The last step of validation would be running the studies by a DPG expert and asking for their feedback.
Relevant resources (for posterity, please add any additional links you find that may be useful!):
For each of the following muon subsystems, create json files
AutoDQM_ML/metadata/histogram_lists/<subsystem>.json
andAutoDQM_ML/metadata/datasets/<subsystem>.json
where any bad runs are indicated for that subsystem.An example of the histograms list would be the
dt.json
file from @chosila , and an example of the datasetsjson
would be (slightly modifying the existingbad_dt.json
from Si to reflect recent updates to theDataFetcher
):Once histogram and dataset lists are in place, we should proceed with training PCAs and AutoEncoders for each of the histograms To start, the default PCA/AutoEncoder options are probably fine. Later on, we can come back and try optimizing hyperparameters.
For PCAs, this should simply be a matter of running
scripts/train.py
with all of the relevant histograms (1 PCA per histogram). For AutoEncoders, we have the possibility of training a single AutoEncoder on multiple histograms simultaneously. I'd suggest we forego this subtlety for now and just follow the PCA-style of 1 AutoEncoder for 1 histogram. Once PCAs/AutoEncoders are trained, the saved models injson
/hdf5
files should be placed in folders on Github, maybeAutoDQM_ML/data/models/<subsystem>/
.❓ Maybe it makes more sense to place these directly in the
AutoDQM
repo and/or/eos
?Finally, we should perform a validation of both the PCAs and AutoEncoders and summarize these in a set of slides for each subsystem. At minimum, we would want the following:
Through this process, we should stay in contact with relevant DPG experts to make sure they agree with physics side of things for each subsystem. The last step of validation would be running the studies by a DPG expert and asking for their feedback.
Relevant resources (for posterity, please add any additional links you find that may be useful!):
Checklist:
dt.json
The text was updated successfully, but these errors were encountered: