uhh-cms-tutorials
diff --git a/‎01_Event_Selection.ipynb
+1,057 b/‎01_Event_Selection.ipynb
+1,057
diff --git a/‎02_Network.ipynb
+386 b/‎02_Network.ipynb
+386
diff --git a/‎03_Training.ipynb
+2,548 b/‎03_Training.ipynb
+2,548
diff --git a/‎04_Training_in_Keras.ipynb
+404 b/‎04_Training_in_Keras.ipynb
+404
diff --git a/‎05_Event_Classification.ipynb
+848 b/‎05_Event_Classification.ipynb
+848
diff --git a/‎README.md
+35-2 b/‎README.md
+35-2
diff --git a/‎images/auc.png
11.8 KB b/‎images/auc.png
11.8 KB
diff --git a/‎images/dampedharmosc.JPG
21.9 KB b/‎images/dampedharmosc.JPG
21.9 KB
diff --git a/‎images/fully_connected.jpg
141 KB b/‎images/fully_connected.jpg
141 KB
diff --git a/‎images/ggFdiH.jpg
61.9 KB b/‎images/ggFdiH.jpg
61.9 KB
diff --git a/‎images/learning_rate.png
337 KB b/‎images/learning_rate.png
337 KB
diff --git a/‎images/overfitting.png
87.4 KB b/‎images/overfitting.png
87.4 KB
diff --git a/‎images/roc.png
88.7 KB b/‎images/roc.png
88.7 KB
diff --git a/‎misc/nn_full.mp4
6.24 MB b/‎misc/nn_full.mp4
6.24 MB
diff --git a/‎misc/nn_full_speed.mp4
2.43 MB b/‎misc/nn_full_speed.mp4
2.43 MB
diff --git a/‎misc/uproot_awkward_intro.ipynb
+251 b/‎misc/uproot_awkward_intro.ipynb
+251
@@ -1,2 +1,35 @@
-# hh-analysis
-Example HH analysis
+
+# Machine Learning Tutorial for Physics at the LHC
+
+## Hello hello!
+
+This is a jupyter notebook series of 5 notebooks introducing beginner concepts of event selection, machine learning and their application in Higgs physics research using the [Keras](https://keras.io/) package for python specifically taylored to the applications of the CMS group of the Institute for Experimental Physics UHH. The beginning of this tutorial is very much based on the "Online Course: Machine Learning for Physicists 2021" which is a thorough lecture series by Florian Marquard. You can find the full recorded lectures, tutorials and slides [here](  
+https://pad.gwdg.de/s/Machine_Learning_For_Physicists_2021#).
+
+Keras is an open-source machine learning framework built on the foundation of Tensorflow 2. It is increadibly versitile and is chosen by many scientific organizations like CERN and NASA for its ease of use and applicability.
+
+The series will cover everything from event selection in CMS and choosing suitable observables for training, what a neural network is and how to train them. The jupyter notebook environment is very convenient for quick scripting and trying out ideas where your code can be run on a cell by cell workflow, so you are encouraged to play around with the given code and edit it to your wants and needs!
+
+Notebooks 1 - 3 will not dive into Keras just yet but will introduce fundamental concepts of cut-based event selection and the structure and nature of neural networks using native python and numpy objects only. This will allow us to develop an understanding of what is going on behind the scenes. Later on, many of the finicky details of neural networks will be managed by Keras such that we can focus on the physics problem at hand and can turn our full attention to the research to be conducted!
+
+## Dependencies (Python 3.9)
+
+- numpy
+- tensorflow
+- keras
+- matplotlib
+- awkward
+- uproot
+- hist
+- mplhep
+- vector
+- sklearn
+
+## Additional material
+
+The folder "images" contains all the images that are used in the notebooks. They images are taken from the internet and the source links are given at the bottom of the notebooks.
+The folder "misc" contains miscellaneous material related to the notebooks like output images or animations.
+
+## License
+
+tbd
@@ -0,0 +1,251 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "67e17aad-909a-486e-b85d-9ad6d7756dd4",
+   "metadata": {},
+   "source": [
+    "# Importing and file opening"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "b6522b4c-28ac-454e-a61f-56821cee4fb6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<TTree 'Events' (1628 branches) at 0x2b030b5961d0>\n"
+     ]
+    }
+   ],
+   "source": [
+    "import pyarrow.parquet as pq\n",
+    "import numpy as np\n",
+    "import awkward as ak\n",
+    "import uproot\n",
+    "# import particle\n",
+    "import vector\n",
+    "vector.register_awkward()\n",
+    "\n",
+    "fname = \"/nfs/dust/cms/user/matsch/for_Artak/0520A050-AF68-EF43-AA5B-5AA77C74ED73.root\"\n",
+    "\n",
+    "fname_HH = \"/nfs/dust/cms/user/frahmmat/tutorial_files/HHtobbVV_4B83EAAD-1DEF-1641-BF6E-E9D72832F33A.root\"\n",
+    "fname_tt = \"/nfs/dust/cms/user/frahmmat/tutorial_files/tt_sl_0520A050-AF68-EF43-AA5B-5AA77C74ED73.root\"\n",
+    "\n",
+    "file = uproot.open(fname)\n",
+    "tree = file[\"Events\"]\n",
+    "print(tree)\n",
+    "# print(tree.keys())  # 1628 keys"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c118fe2c-f633-4ffa-8a50-ca5f5098dc75",
+   "metadata": {},
+   "source": [
+    "# directly accessing fields via filter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "942b0427-e2a3-43ab-80be-20b40410fb20",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tree = file[\"Events\"]\n",
+    "\n",
+    "fields = [\"pt\", \"eta\", \"phi\", \"mass\", \"charge\", \"btagDeepB\"]\n",
+    "\n",
+    "electrons = tree.arrays(filter_name=[\"Electron_\" + f for f in fields], entry_stop=100_000)\n",
+    "muons = tree.arrays(filter_name=[\"Muon_\" + f for f in fields], entry_stop=100_000)\n",
+    "jets = tree.arrays(filter_name=[\"Jet_\" + f for f in fields], entry_stop=100_000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "8ff7f950-b36e-4eae-8c82-f5aec25f52c7",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Muon fields: ['eta', 'mass', 'phi', 'pt', 'charge']\n",
+      "Jet fields: ['btagDeepB', 'eta', 'mass', 'phi', 'pt']\n",
+      "<class 'vector._backends.awkward_.MomentumArray4D'>\n",
+      "None\n",
+      "<class 'vector._backends.awkward_.MomentumArray4D'>\n",
+      "event fields: ['Electron', 'Muon', 'Jet']\n",
+      "None\n"
+     ]
+    }
+   ],
+   "source": [
+    "# switch naming of fields from \"{Object}_{field}\" to \"{field}\"\n",
+    "electrons = ak.zip({key.replace(\"Electron_\",\"\"):electrons[key] for key in electrons.fields}, with_name=\"Momentum4D\")\n",
+    "muons = ak.zip({key.replace(\"Muon_\",\"\"):muons[key] for key in muons.fields}, with_name=\"Momentum4D\")\n",
+    "jets = ak.zip({key.replace(\"Jet_\",\"\"):jets[key] for key in jets.fields}, with_name=\"Momentum4D\")\n",
+    "\n",
+    "# combine fields into a single awkward array via ak.zip\n",
+    "events = ak.zip({\"Electron\": electrons, \"Muon\": muons, \"Jet\": jets}, depth_limit=1)\n",
+    "\n",
+    "# save output as parquet file\n",
+    "f_out_name = \"tt_sl_test.parquet\"\n",
+    "ak.to_parquet(events, f_out_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b258600-6afc-4e46-8bb5-4360e901bc6b",
+   "metadata": {},
+   "source": [
+    "# opening parquet with awkward"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "7d047a49-a980-480f-8dab-b3cb21bf9630",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'awkward.highlevel.Array'>\n",
+      "['Electron', 'Muon', 'Jet']\n",
+      "<class 'vector._backends.awkward_.MomentumArray4D'>\n"
+     ]
+    }
+   ],
+   "source": [
+    "events_new = ak.from_parquet(\"tt_sl_test.parquet\")\n",
+    "\n",
+    "print(type(events_new.Electron))\n",
+    "# enable 4-vector behavior (pt,eta,phi,mass)\n",
+    "behaviors = {\n",
+    "    \"Jet\": \"Momentum4D\",\n",
+    "    \"Electron\": \"Momentum4D\",\n",
+    "    \"Muon\": \"Momentum4D\",\n",
+    "}\n",
+    "for f in events_new.fields:\n",
+    "    if f in behaviors:\n",
+    "        events_new[f] = ak.with_name(events_new[f], behaviors[f])\n",
+    "    \n",
+    "print(events_new.fields)\n",
+    "print(type(events_new.Electron))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "77c094fc-c0b0-4e7a-87f1-90559038de6b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[6, 7, 7, 9, 7, 8, 6, 5, 6, 8, 6, 10, 9, ... 6, 11, 7, 7, 8, 11, 8, 5, 9, 16, 11, 8]\n",
+      "[[11.3, 9.84, 7.24, 4.14, 4.92, 3.79], ... 8.74, 5.56, 5.16, 6.71, 4.8, 6.57, 4.31]]\n",
+      "[86.9, 303, 203, 542, 108, 246, 105, 309, ... 271, 149, 184, 412, 59.7, 327, 147]\n",
+      "[[101, 62.8, 47.9, 22.3, 19.2, 15.5], ... 42.2, 40.2, 35.8, 33.5, 27, 21.2, 15.1]]\n",
+      "[157, 56.7, 146, 84.8, 108, 83.2, 35.4, ... 78.4, 76, 37.1, 67.9, 194, 23.3, 38.2]\n",
+      "[[118, 66.1, 49.1, 274, 49.1, 30.9], ... 43.2, 79.3, 48.1, 35.8, 182, 156, 105]]\n",
+      "[[118, 66.1, 49.1, 274, 49.1, 30.9], ... 43.2, 79.3, 48.1, 35.8, 182, 156, 105]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# playing around with fields, testing vector behaviour\n",
+    "jets = events_new.Jet\n",
+    "\n",
+    "print(ak.num(jets))\n",
+    "\n",
+    "print(jets.mass)\n",
+    "jets = jets[ak.num(jets)>1]\n",
+    "print((jets[:,0]+jets[:,1]).mass)\n",
+    "print(jets.pt)\n",
+    "print((jets[:,0]+jets[:,1]).pt)\n",
+    "print(jets.E)\n",
+    "print(np.sqrt(jets.pt**2 + jets.pz**2 + jets.mass**2))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f1a4116-180e-453a-8d40-543c71dc0909",
+   "metadata": {},
+   "source": [
+    "# alternatively: accessing fields via coffea"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 78,
+   "id": "58c8b7fd-d7d5-4131-9cfe-0270f36bf9be",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from coffea.nanoevents import NanoEventsFactory, BaseSchema\n",
+    "from coffea.nanoevents.methods import candidate\n",
+    "\n",
+    "events = NanoEventsFactory.from_root(\n",
+    "    file,\n",
+    "    entry_stop=100000,\n",
+    "    metadata={\"dataset\": \"tt_sl\"},\n",
+    "    schemaclass=BaseSchema,\n",
+    ").events()\n",
+    "\n",
+    "muons = ak.zip(\n",
+    "    {\n",
+    "        \"pt\": events.Muon_pt,\n",
+    "        \"eta\": events.Muon_eta,\n",
+    "        \"phi\": events.Muon_phi,\n",
+    "        \"mass\": events.Muon_mass,\n",
+    "        \"charge\": events.Muon_charge,\n",
+    "    },\n",
+    "    with_name=\"PtEtaPhiMCandidate\",\n",
+    "    behavior=candidate.behavior,\n",
+    ")\n",
+    "jets = ak.zip(\n",
+    "    {\n",
+    "        \"pt\": events.Muon_pt,\n",
+    "        \"eta\": events.Muon_eta,\n",
+    "        \"phi\": events.Muon_phi,\n",
+    "        \"mass\": events.Muon_mass,\n",
+    "        \"charge\": events.Muon_charge,\n",
+    "    },\n",
+    "    with_name=\"PtEtaPhiMCandidate\",\n",
+    "    behavior=candidate.behavior,\n",
+    ")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "myenv",
+   "language": "python",
+   "name": "myenv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}