Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram and run lists for Muon subsystems #17

Open
1 of 25 tasks
sam-may opened this issue Nov 22, 2021 · 0 comments
Open
1 of 25 tasks

Histogram and run lists for Muon subsystems #17

sam-may opened this issue Nov 22, 2021 · 0 comments

Comments

@sam-may
Copy link
Collaborator

sam-may commented Nov 22, 2021

For each of the following muon subsystems, create json files AutoDQM_ML/metadata/histogram_lists/<subsystem>.json and AutoDQM_ML/metadata/datasets/<subsystem>.json where any bad runs are indicated for that subsystem.

An example of the histograms list would be the dt.json file from @chosila , and an example of the datasets json would be (slightly modifying the existing bad_dt.json from Si to reflect recent updates to the DataFetcher):

{
    "primary_datasets" : ["SingleMuon"],
    "years" : {
	"2016" :{
	    "productions" : ["PromptReco"],
	    "bad_runs" : ["281680", "281674", "281663", "273294"]
	},
	"2015" : {
	    "productions" : ["PromptReco"],
	    "bad_runs" : ["259464", "258335", "258320", "258313", "258312", "256445"]
	}
    }
}

Once histogram and dataset lists are in place, we should proceed with training PCAs and AutoEncoders for each of the histograms To start, the default PCA/AutoEncoder options are probably fine. Later on, we can come back and try optimizing hyperparameters.

For PCAs, this should simply be a matter of running scripts/train.py with all of the relevant histograms (1 PCA per histogram). For AutoEncoders, we have the possibility of training a single AutoEncoder on multiple histograms simultaneously. I'd suggest we forego this subtlety for now and just follow the PCA-style of 1 AutoEncoder for 1 histogram. Once PCAs/AutoEncoders are trained, the saved models in json/hdf5 files should be placed in folders on Github, maybe AutoDQM_ML/data/models/<subsystem>/.
❓ Maybe it makes more sense to place these directly in the AutoDQM repo and/or /eos?

Finally, we should perform a validation of both the PCAs and AutoEncoders and summarize these in a set of slides for each subsystem. At minimum, we would want the following:

  1. any relevant details on the histogram list
  2. any relevant details on the "bad runs" (what histograms are affected, what was the issue, etc.)
  3. plots of original and reconstructed histograms (with both PCA and AutoEncoder) for both good and bad runs
  4. SSE summary plot for each histogram (with both PCA and AutoEncoder) split by train/test sets and good/bad runs
  5. ROC curve and TPR vs. FPR table for both PCA and AutoEncoder.

Through this process, we should stay in contact with relevant DPG experts to make sure they agree with physics side of things for each subsystem. The last step of validation would be running the studies by a DPG expert and asking for their feedback.

Relevant resources (for posterity, please add any additional links you find that may be useful!):

Checklist:

  • DT
    • Histograms
    • Bad runs
    • PCAs
    • AutoEncoders
    • Validation
  • RPC
    • Histograms
    • Bad runs
    • PCAs
    • AutoEncoders
    • Validation
  • CSC
    • Histograms
    • Bad runs
    • PCAs
    • AutoEncoders
    • Validation
  • EMTF
    • Histograms
    • Bad runs
    • PCAs
    • AutoEncoders
    • Validation
  • GEM
    • Histograms
    • Bad runs
    • PCAs
    • AutoEncoders
    • Validation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant