Histogram and run lists for Muon subsystems #17

sam-may · 2021-11-22T17:18:25Z

For each of the following muon subsystems, create json files AutoDQM_ML/metadata/histogram_lists/<subsystem>.json and AutoDQM_ML/metadata/datasets/<subsystem>.json where any bad runs are indicated for that subsystem.

An example of the histograms list would be the dt.json file from @chosila , and an example of the datasets json would be (slightly modifying the existing bad_dt.json from Si to reflect recent updates to the DataFetcher):

{
    "primary_datasets" : ["SingleMuon"],
    "years" : {
	"2016" :{
	    "productions" : ["PromptReco"],
	    "bad_runs" : ["281680", "281674", "281663", "273294"]
	},
	"2015" : {
	    "productions" : ["PromptReco"],
	    "bad_runs" : ["259464", "258335", "258320", "258313", "258312", "256445"]
	}
    }
}

Once histogram and dataset lists are in place, we should proceed with training PCAs and AutoEncoders for each of the histograms To start, the default PCA/AutoEncoder options are probably fine. Later on, we can come back and try optimizing hyperparameters.

For PCAs, this should simply be a matter of running scripts/train.py with all of the relevant histograms (1 PCA per histogram). For AutoEncoders, we have the possibility of training a single AutoEncoder on multiple histograms simultaneously. I'd suggest we forego this subtlety for now and just follow the PCA-style of 1 AutoEncoder for 1 histogram. Once PCAs/AutoEncoders are trained, the saved models in json/hdf5 files should be placed in folders on Github, maybe AutoDQM_ML/data/models/<subsystem>/.
❓ Maybe it makes more sense to place these directly in the AutoDQM repo and/or /eos?

Finally, we should perform a validation of both the PCAs and AutoEncoders and summarize these in a set of slides for each subsystem. At minimum, we would want the following:

any relevant details on the histogram list
any relevant details on the "bad runs" (what histograms are affected, what was the issue, etc.)
plots of original and reconstructed histograms (with both PCA and AutoEncoder) for both good and bad runs
SSE summary plot for each histogram (with both PCA and AutoEncoder) split by train/test sets and good/bad runs
ROC curve and TPR vs. FPR table for both PCA and AutoEncoder.

Through this process, we should stay in contact with relevant DPG experts to make sure they agree with physics side of things for each subsystem. The last step of validation would be running the studies by a DPG expert and asking for their feedback.

Relevant resources (for posterity, please add any additional links you find that may be useful!):

Checklist:

DT
- Histograms
  - Added by @chosila in dt.json
- Bad runs
- PCAs
- AutoEncoders
- Validation
RPC
- Histograms
- Bad runs
- PCAs
- AutoEncoders
- Validation
CSC
- Histograms
- Bad runs
- PCAs
- AutoEncoders
- Validation
EMTF
- Histograms
- Bad runs
- PCAs
- AutoEncoders
- Validation
GEM
- Histograms
- Bad runs
- PCAs
- AutoEncoders
- Validation

The text was updated successfully, but these errors were encountered:

sam-may mentioned this issue Nov 22, 2021

Initial histograms list for each subsystem #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram and run lists for Muon subsystems #17

Histogram and run lists for Muon subsystems #17

sam-may commented Nov 22, 2021 •

edited by chosila

Loading

Histogram and run lists for Muon subsystems #17

Histogram and run lists for Muon subsystems #17

Comments

sam-may commented Nov 22, 2021 • edited by chosila Loading

sam-may commented Nov 22, 2021 •

edited by chosila

Loading