- Learned the theory behind neural networks
- How to implement a neural network model using tensorflow implementations in Python
- Learned how to perform linear regression using scikit-learn
- Learned several ways of normalizing data, such as StandardScaler() from sklearn-preprocessing
- Performed some variants of polynomial regression and tuning of the optimal degree
- Learned how to make a simple image classification model on a downloaded dataset of images from sklearn (olivietti faces)
- Managed to implement my own categorization error function, and used this in the (manual tuning of) number of units, layers, learning rate, epochs, batch size and regularization parameter
- Right now, the iterative loop of ML development requires a more intuitive method, and needs to be automated
More deeply understand the intuition behind number of layers as well as number of neurons per layerImportant to distinguish RGB- and grayscale images-
Understand the different layer types. Important: Convolutional neural networks, LEARN THIS! As of right now, you've only used the Dense layer type. - Understand the loss - SparseCategoricalCrossEntropy
Understand how the algorithm is performed online - right now you only know the linear algebra behind the establishment of the neural network. What happens in an epoch? What quantity of the neural network is trained through each epoch?Learn what the Adam optimization does to the weights in one iteration to obtain the optimal values.- Error analysis: Currently, the (earlier mentioned) categorization function seems to be a good manual implementation. However, you need to understand the built-in metrics provided by tensorflow!! The accuracy provided by tf.keras.model.evaluate or tf.keras.model.fit.history is currently a bit unclear. You need to read the documentation regarding these metrics in order tpo better evaluate your method. The hardest part of your project is to establish a framework for recognizing when thigns go wrong with respect to the data and improving the model from these metrics.
- MLOps: The goal is to have a complete library that can solve the task of cmaking multiclass classficiation of n (=5 iniitally) classes representing PMI in days. The core of the software shall be the simplicity, and if managable in time, there shall be a simple GUI for a user to to clearly see the input data and PMI days. What is MLOps? How is it used and what process does it automate?
-
Data approach - microscopy and MicrobeJ: The project will aim to make a classification model using a neural network model. The question is: what features shall be used as input data?
-
Approach 1 - Use the images as input data
- If you manage to to take good pictures in the microscope, this is good. We can choose to have a certain set of pixels for the imaging,
$512 \times 512$ or$1024 \times 1024$ preliminary (not sure yet but something like that). Each picture is equipped with a category (PMI day) and falttened to a row vector becoming one sample in an input matrix. - Advantage: Since there are so many established frameworks for image classification tasks, we can perform transfer learning from other similar image classification models from e.g tesnorflow library, to traing our model.
-
Problem: The concentration of each sample is currently unknown. So for PMI day i,
$i \in {1, ..., n}$ , the samples from different swabbings might look different depending on many circumstances, such as where the swabbing was made (nose, ear, mouth, eyes) and who made the swab of the cadaver. Hence, it might be important that the samples are homogenous if doing pure image classification, but we are not sure yet.
- If you manage to to take good pictures in the microscope, this is good. We can choose to have a certain set of pixels for the imaging,
-
Approach 2 - Use the relative abundance of each bacterial type as input data
- MicrobeJ is already able to make a few categorizations of the metrics of each bactera in the sample. Hence, this could be used as input features instead. From this, the optimal would be to see a clustering pattern w.r.t each day, when measuring the relative abundance of bacterial communities in the sample.
- Advantage: The relative abundance of bacterial type could solve the problem of not having to rely on the level of samples being homogenous or not. It is likely that the relative population of e.g filamentous bacteria is changing over time, regardles of the bacterial density of the sample, and thus that becomes a more useful approach to feature selection.
- Problem: If the concentration of any type of bacteria is not that drastic, and if the relative abundance of bacteria is very different for each cadaver, this could be a problem.
-
Approach 1 - Use the images as input data
-
Knowledge about the bacteria
- The project will to a high extent describe the methods used and the properties of the samples. This requires much knowledge about the types of bacteria involved, and relevant information about these with regards to the project.
- Gather information through reading articles and also conduct a literature study on this - i.e include relevant parts as sourcs in the paper.